Checking out the Validation Package

To check out the package first in your home directory create a folder called testarea. Change directory to the testarea folder and create a folder called TauValidation and change directory to this new folder. Setup athena using the command

asetup 17.2.0.2,here          (this is for slc5)
asetup 17.2.0.2,slc5,here     (this is for slc6)

Run the following commands to checkout the tau validation package and do an initial setup and compilation.

cmt co PhysicsAnalysis/TauID/TauValidation
cd PhyiscsAnalysis/TauID/TauValidation/cmt
cmt config
source setup.sh
gmake

The package should compile and then be ready for user configuration and use.

Samples used by Tau Validation

The samples used by the Tau Validation package are dependant on the evgen tag of the validation. The list of samples corresponding to different evgen tags is documented in the following table

Evgen Tags Signal Samples Background Samples

e850, e1127

ZTAUTAU: 106052.PythiaZtautau.recon.AOD

TTBAR: 105200.T1_McAtNlo_Jimmy.recon.AOD

ZEE: 106046.PythiaZee_no_filter.recon.AOD

QCD: 105015.J6_pythia_jetjet.recon.AOD

e1574, e1900, e1934

ZTAUTAU: 147808.PowhegPythia8_AU2CT10_Ztautau.recon.AOD

TTBAR: 105200.McAtNloJimmy_CT10_ttbar_LeptonFilter.recon.AOD

ZEE: 147806.PowhegPythia8_AU2CT10_Zee.recon.AOD

These relationships are maintained by the dictionaries tagMappings, signalSamples and backgroundSamples in the jobSettings.py script in the share folder of the TauValidation package.

Choosing a Job Option File

How to choose a job option file to use is important to understand for anyone that wishes to perform tau validation tasks. This sections details what preconfigured job options there are and how to know when to use them. Firstly there are job option files for signal sampels and background samples. For background samples truth information is turned off in the joboption filewhich is the only difference from the signal sample counter part jobo option file. Signal joboption files are prefixed by TauValTop AutoConf whilst background joboption files are prefixed by TauValTopBkg AutoConf . There are 5 types of job option files which described by the follwing suffixes in the joboption file names

  • 0 - This is the default job option file which tests the new tau ID
  • 1 - This does the same as 0 but the file is cleaned up more and seems to work in odd occassions when 0 does not
  • OldTauID - This is the same job option file as 0 however it does not setup the external packages to run and update the tau id. This will test the old tau id.
  • 64bit - This is the same as 0 but changes the Muon collection that is used which has changed for 64bit athena releases.
  • 64bitOldTauID - This changes the muon collection to the 64 bit release and does not run the new tau ID algorithms.
Standardly 0 will be used if no joboption is specified in the automatic validation but should be the standard job option file used in manual valdation too. 0 should not be used when the automatic validation informs you that it will not work, or when doing manual validation any of the athena versions for the test or references for a given task do not fall with in the extra package selection tree as seen in the manual validation Checking out Extra Packages section. In this case you should switch to using the OldTauID joboptions file. If the selection tree does work for all athena versions in a task or the automatic validation does not tell you that you should switch joboption files and your jobs still fail you can either switch to OldTauID job options or in a last ditch effort to test the new tau ID try switch to the 1 joboption files although this rarely changes anything. The same rules apply to the 64bit joboptions excluding the ones relating to 1 joboption files but the 64bit joboption files must be used for any 64bit athena releases. At the time of writing the only 64 bit athena releases are those starting with 17.7. or 17.8.

Manual Tau Validation

The manual tau validation process involves doing all job submission commands by hand for a given task. For each sample set this means creating all pathena submission commands, setting up the correct athena version, checking out all the neccessary extra packages, recompiling, then submitting the jobs. This process which is fully documented here is repeated for each sample set and each task. In the example task that means we needs to repeat this process for each of the following:

Test: s1852_s1694_r4494 (17.7.1.2, 17.3.10.1) 
Reference 1: s1846_s1694_r4494 (17.7.1.2, 17.3.10.1) 
Reference 2: s1830_r4956 (17.6.0.7, 17.2.1.4)

Part1

The first part of the Tau Validation involves submitting all the samples as jobs to be processed on the grid by the Tau Validation package. The following three sections detail what goes into performing these job submissions using the manual approach.

Athena Setup

Athena needs to be setup before doing any submissions and setting up the external packages. The athena version that needs to be set up is the one specified in the second half of the bracket in the validation information. In the example we are using, for sample e1574_s1852_s1694_r4494, athena version 17.3.10.1 needs to be set up in the directory above the PhysicsAnalysis directory. NOTE: When repeating this section for other tags, new Athena versions can be set up in the same shell. We have never experienced that setting up new Athena version in the same shell causes problems even though warnings maybe be thrown.

Setup External Packages

There are external packages that need to be checked out to convert the datasets we use so that they use the new TauID instead of the old TauID. The packages that we use are the TauDiscriminant and tauRec packages. A special process needs to be followed to set these packages up each time a new version of Athena is set up which generally means for each tag in a validation task. The following lines need to be executed in the directory above the PhysicsAnalysis directory in your testarea.

cmt co -r TauDiscriminant{VersionTD} PhysicsAnalysis/TauID/TauDiscriminant
cmt co -r tauRec{VersionTR} Reconstruction/tauRec
get_files -scripts setupWorkArea.py
python setupWorkArea.py
cd WorkArea/cmt
cmt config
source setup.sh
cmt bro gmake

This will check out the packages and recompile the work area with the version of Athena that is set up. In the above set of commands {VersionTD/TR} change depending on the Athena version that is set up. This is selected using the following set of rules:

1) tags used for NTUP_TAU D3PD production p1344 and p1443:
- TauDiscriminant-01-07-43
- tauRec-04-02-16

these tags are in any 5-digit releases with
17.2.7.5.Z with Z=3 or higher
17.2.11.Y.Z with Y=1 or higher
17.3.11.Y.Z with Y=1 or higher (so not in 17.3.10.Y.Z-IBLProd)

2) newest development tags
TauDiscriminant-01-07-43, -44, -45
tauRec-04-03-05 (contains Pi0 cell-finder)

dev releases:
17.7 base release
17.8 base release (devval, dev)
in general any release 17.W with W=7 or higher contains the Pi0 development tags

3) T0 frozen tags
- TauDiscriminant-01-07-24 and TauDiscriminant-01-07-24-01
- tauRec-04-02-14, tauRec-04-02-15, tauRec-04-02-15-01

these tags are in any 4-digit release with
17.2.7.Y, where Y=0 or higher
17.2.X.Y where X=7 or higher
17.3.X.Y where X=7 or higher (including the 5-digit IBLProd 17.3.10.Y.Z release)

4) HLT tags
- TauDiscriminant-01-07-15
- tauRec-04-02-01-01 and -02,-03,-04

these tags are in any 17.1.5.Y.Z releases

If you are using an OldTauID job option or if there does not exist an external package version corresponding to the Athena release you need to use, these external packages should not be checked out. For example, in the example task the second reference requires Athena 17.2.4.1 which does not match any of the rules listed above. This means we do not have a version of these external packages that will work for this Athena version so this should be run with the OldTauID job options and no external packages. In order to compare like with like it is suggested that the test and first reference also be run with the OldTauID.

Note that after submitting a job or before submitting a job that requires new versions or no versions of these external packages you have to remove them.

Pathena Commands

Pathena is used to submit all the jobs for the tau validation package. This means for each task sample set we need to configure a pathena command for the individual samples used in the sample set. For the example task we are using e1574_s1852_s1694_r4494, e1574_s1846_s1694_r4494 and e1574_s1830_r4956. The e1574 tells us we need to use the following list of samples:

  • valid1.147808.PowhegPythia8_AU2CT10_Ztautau.recon.AOD...
  • valid1.105200.McAtNloJimmy_CT10_ttbar_LeptonFilter.recon.AOD...
  • valid1.147806.PowhegPythia8_AU2CT10_Zee.recon.AOD...

A pathena command needs to be created for each of these samples. This can either be typed manually one by one in the terminal or but in a bash script to be batch processed. The pathena command is generally of the form

pathena {jobOptionsFile} --inDS={inputDataSetName} --outDS={outputDataSetName} --nFilesPerJob=10 --dbRelease=LATEST --supStream=GLOBAL --extFile=.root [--cmtConfig={cmtConfig} --excludeSite={excludeSites}]

Everything here is compulsory except for the options in "[]" which need to be set only occassionally. Below is an explanation of what these fields are:

  • jobOptionsFile - For background processes the job options file will be TauValTopBkg _AutoConf.py and for signal processes TauValTop _AutoConf.py as described in the Choosing Job Options section of this twiki
  • inputDataSetName - Using dq2-ls, find the full dataset name for the specific sample. Choose the sample with a tid attached to the name. If there are multiple names with a tid attached, then the one you need to choose will generally be specified. For the Ztautau e1574_s1852_s1694_r4494 sampe in the example, we use the dq2 command below and choose the input dataset to be "valid1.147808.PowhegPythia8_AU2CT10_Ztautau.recon.AOD.e1574_s1852_s1694_r4494_tid01376426_00". NOTE: dq2 should not be set up in the same shell as Athena since the python bindings between Athena and dq2 clash occassionally.
dq2-ls valid1.147808.PowhegPythia8_AU2CT10_Ztautau.recon.AOD.e1574_s1852_s1694_r4494*
  • outputDataSetName - Choose a name to call the output dataset name. This should be able to uniquely identify the results for this specific task, tag and sample. The standard way to name this output data set is to use the form: user.{username}.valid1.{datasetID}.{datasetName}.recon.AOD.{tag}.{taskName} . The output dataset for the above sample for a user bsmith would be written: user.bsmith.valid1.147808.PowhegPythia8_AU2CT10_Ztautau.recon.AOD.e1574_s1852_s1694_r4494.T2-2013-11-29-V1
    Note that even if you are using a valid2 sample name the output dataset valid1 anyway.
  • cmtConfig - Occasionally a specific cmt config needs to be specified (generally when running on slc6) which is when this option would be specified. The task description should tell you which cmt config to use
  • excludeSites - If certain sites fail to work for a job sites can be excluded using this option. More details can be found by researching the pathena command or going to the pathena twiki page.
This process needs to be repeated for each sample corresponding to the evgen tag. If you make a bash script to batch process these pathena commands dont forget to run them before moving to the next tag.

Part 2

Part 2 of the validation involves producing the comparison plots between each of the tests and their references and creating the HTML results page. Note that if you used the naming schemes to the tee in the previous section you can actually manually design part 2 setup files for the automatic validation to use or just pull the automatically generated ones even if you did not use them to submit the jobs.

Download the Data

This section details the procedure you should use to download the data you have produced by running the jobs on the grid and using the root hadd script to correctly merge the samples together. This will be explained in terms of an example sample. Previously it was mentioned that the one sample could be submitted to the GRID with the output data set as set to user.bsmith.valid1.147808.PowhegPythia8_AU2CT10_Ztautau.recon.AOD.e1574_s1852_s1694_r4494.T2-2013-11-29-V1. With DQ2 set up move to the storage directory specified in the set up file described before for part 2. Make a directory here named the date of the task (IE in the example do "mkdir 2013-11-29") and change to this directory. Run the following command

dq2-get --to-here=user.bsmith.valid1.147808.PowhegPythia8_AU2CT10_Ztautau.recon.AOD.e1574_s1852_s1694_r4494.T2-2013-11-29-V1

This downloads all the root files to the folder user.bsmith.valid1.147808.PowhegPythia8_AU2CT10_Ztautau.recon.AOD.e1574_s1852_s1694_r4494.T2-2013-11-29-V1/. Move into that directory and hadd all the root files together so that the merged dataset name is of the form

valid1.147808.AOD.e1574_s1852_s1694_r4494_TauValidation.root

Move all these root files into a folder which describes the task. For example the data for the example task for reference "a" (0) and "b"(1) the folder would be of the form:

T2-2013-11-29-0
T2-2013-11-29-1

Note that the test sample data needs to be located in both of these folders. The relevant reference is only neccessary in the corresponding folder (-0 for ref a -1 for ref b -2 for ref c etc). It is important to maintain these forms or else the second part validations scripts will not be able to process the dataset correctly. Repeat this for all samples per reference test pair, for each task.

Run the Ami Script

First setup athena within the testarea in the same location where you set up athena in part1. You can setup athena using the command and version as follows

asetup 17.2.0.2,here

Within the python folder of the TauValidation package there should be a python script called amiListConfigurationTag.py. Copy this script to the Tags folder in the python directory and if the directory does not exist create a directory called Tags at the location your_tau_validation_directory/python/Tags. Now cd into the Tags folder. The following command needs to be run for all parts of all tags.

python amiListConfigurationTag.py -configTag=${i} -output=str AMIUser=${ami_user} AMIPass=${ami_pass} > ${i}.txt

Here ${i} is the specific part of the tag for the sample being processed. ${ami_user} is the users AMI username and ${ami_pass} is the users AMI password. Please ensure you have an AMI account before you try and do the validations or you will see a number of python errors appear upon trying to run this ami script. This script downloads all the information from ami about a specific part of the tag. For the example tag e1574_s1852_s1694_r4494 this script needs to be run with ${i} as e1574, then s1852, then s1694 and finally r4494 and then this needs to be repeated for all other samples. This command creates a text file containing all the tag information about each part of the tag. If there is already a file in tags named for instance e1574.txt then you most probably do not need to run the ami script for that tag part unless it contains wrong information (IE if the command crashed when it was called the file will contain incorrect information). In this case merging has occurred which is why we see two sim tags. This means the contents of the s1852.txt and s1694.txt need to be simply concatenated together into a file called s1852_s1694.txt before proceeding to the next part. The same proceedure would need to be followed for merging that happens with rec tags.

Running the validation

To run the actual validation for part 2 of Tau Validation there is a command that needs to be run for each test reference pair. The command is structured as follows:

python Start_Validation.py --ZtautauRef=${ZTAUTAUREF} --ZtautauTest=${ZTAUTAUTEST} \
        --QCDRef=${QCDREF} --QCDTest=${QCDTEST} \
        --ZeeRef=${ZEEREF} --ZeeTest=${ZEETEST} \
        --ttbarRef=${TTBARREF} --ttbarRef=${TTBARTEST} \
        --evgenRef=${EVGENREF} --evgenTest=${EVGENTEST} \
        --simRef=${SIMREF} --recRef=${RECREF} \
        --simTest=${SIMTEST} --recTest=${RECTEST} \
        --datadir=${storage_dir} \
        --TaskName=$TASKNAME 

Here ${ZTAUTAUREF} and like variables are the dataset ids of the samples used. For instance for the sample valid1.147808.PowhegPythia8_AU2CT10_Ztautau.recon.AOD.e1574_s1852_s1694_r4494 as a test sample ${ZTAUTAUTEST} would be 147808. These variables need to be replaced with names for all of the samples. If a sample is not used those options can be excluded. The placeholders like ${SIMREF} and ${EVGENREF} need to be replaced with the correct evgen and sim tags for the test and reference. This needs to be done for the rec tags too. ${storage_dir} is the location where the downloaded data is stored. This is the folder you created earlier for the test reference pair in the downloading data section. For the example job the location would be storage_directory/2013-11-29/T2-2013-11-29-0/ for testing again reference "a".

$TASKNAME needs to be of the form

Validation.T{TNUM}-YYYY-MM-DD-{REFNUM}.{TESTTAG}.vs.{REFTAG}

Here {TNUM} is the task number you are using. YYYY is the year, MM is the month and DD is the day. {REFNUM} is the reference number, for instance, for reference "a" {REFNUM} would be 0 for reference "b" it would be 1 etc. {TESTTAG} is the full test tag and {REFTAG} is the full reference tag. For example a TaskName could be of the form:

Validation.T2-2013-11-29-0.e1574_s1852_s1694_r4494.vs.e1574_s1846_s1694_r4494

Running this script produces a number of bash scripts in the python directory. Run all approriate ones for all the necessary samples. These are generally just the ztautau zee and ttbar scripts but occasionally when we use a QCD sample that script must also be run. These scripts will produce the HTML reports and perform the actual validation. The results are stored in the temp directory you have specified in the setup files and once the validation has run all the results must be moved to somewhere they can be stored. This can be in the physics validation area in a correctly named folder. Please see the results that are there which were uploaded using the automation system for tau validation. Once this is complete index.html files must be created to point to all the respective index.html files in the results you have created. Once again the layout for the html should be gauged from the structure of the html reports already in the tauvalidation area. The temporary storage directory must be empty before running the next validation task or test reference pair.

Automation Scripts Description

Part 1

Part1 of the automation is written using python scripts. The main entry point is RunSubmission.py which has described use in the Automatic Validation description. RunSubmission create a TaskManager object and calls the TaskManager.runSubmission() to start the part 1 validation in whatever mode is chosen. TaskManager organises all the logic behind submitting tasks and producing part2 setup files. First the TaskManager reads in the valSetup.config file using a Reader object with the Reader.readSetupFile() function. Reader builds a TaskInfo file which is just a data object used by TaskManager. The TaskInfo file is contains all general information for all tasks such as common task name/version, which sites to exclude, and the extra code blocks and all tasks as their own objects. Each task specified in the valSetup.config file is built into its own Task object where the first listed tag is stored as the test and the others are added to an array storing references. Each tag in a task is stored in the task as a SampleSet object. The SampleSet object stores all information about the tag such as what athena version it requires, what extra packages are required, all dataset information for the background and signal samples, etc.The following picture shows the general layout of the validation information.

A visual representation of the object oriented design for describing a validation configuration.

-- KieranBristow - 28 Nov 2013

Topic attachments
I Attachment History Action Size Date Who Comment
PNGpng AutomationStructure.png r1 manage 27.3 K 2013-12-09 - 18:30 KieranBristow Structure of the automation code object oriented design
Edit | Attach | Watch | Print version | History: r6 < r5 < r4 < r3 < r2 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r6 - 2013-12-09 - KieranBristow
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Sandbox All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback