Workshop
https://indico.cern.ch/conferenceDisplay.py?confId=294656
I think we're going to need a new tag of TRC for the tutorial with fixes for any problems we find now.
lxplus account, grid certificate?
Introduction
TopRootCore installation
- Introduction - what is it (performance packages and top group packages)? → Sören, Hovhannes
- Where to find the latest version (or in intro?)
Make instructions for lxplus only
Check disk space?
Make a file called
install.sh
and copy-paste the following into it. You can use whichever editor you like, but clearly vi is the best and emacs the worst.
export TAG=TopRootCoreRelease-14-00-22
source /afs/cern.ch/atlas/software/dist/AtlasSetup/scripts/asetup.sh 19.0.0
mkdir $TAG
cd $TAG
svn co svn+ssh://svn.cern.ch/reps/atlasoff/PhysicsAnalysis/TopPhys/TopRootCoreRelease/tags/$TAG $TAG
cd $TAG/share
./build-all.sh
cd ../..
Then run
source install.sh
. And wait. How long does that take? Can we do it in the workshop or does it have to be done ahead of time?
Make a setup script
In the
TopRootCore-14-00-22
directory make a file called
setup.sh
and copy-paste the following text into that file.
#Grid guff, do this first. Order seems to be important
export RUCIO_ACCOUNT=<your grid username>
export ATLAS_LOCAL_ROOT_BASE=/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase
source $ATLAS_LOCAL_ROOT_BASE/user/atlasLocalSetup.sh
localSetupDQ2Client --skipConfirm
localSetupPandaClient --noAthenaCheck
voms-proxy-init -voms atlas
#Athena - for root
source /afs/cern.ch/atlas/software/dist/AtlasSetup/scripts/asetup.sh 19.0.0
#RootCore
source RootCore/scripts/setup.sh
export PYTHONPATH=$ROOTCOREDIR/../TopD3PDScripts/python:$PYTHONPATH
Note that a lot of this is lxplus specific and if you're running at a home institute you'll need to setup athena, dq2 and panda differently (probably).
Object definitions and event selection
Some slides with the 'standard' objects on.
How to find things in the code? svn browser etc?
Running MiniSL / MiniML
Hands-on running MiniSL and / or MiniML
Brief introduction to - MiniSL, ML, ZL (inc a discussion of the selection). Would be good if we could mix slides and hands-on, which would allow people to catch-up / debug?
Brief introduction to the code, then information about the settings file, the input file list, which parameters to specify.
What's all this bitmask business, explain.
We will put some data and MC files on eos.
First run data. I think the first run is only a few files so that should be easy.
Stop,
ls
look at what we have created. Isn't it wonderful.
Why do we have 4 files? What are they? what to do with them. TBrowse them.
Then run MC without systematics.
Stop,open the file and tbrowse.
Then run with all the systematics enabled for MC.
Stop, browse the file.
What about MiniZL?
Event Challenges
- Importance of showing things and suggesting changes in top reconstruction meetings
Grid submission
https://twiki.cern.ch/twiki/bin/viewauth/AtlasProtected/TopD3PDScripts
Introduction - running on data
Now move to the
examples_data
directory, i.e.
cd TopD3PDScripts/examples_data
In the following examples the general idea is that you modify a very short file to fit your needs and then run it.
1) Submit a grid data job
Let's start by modifying some settings to disable systematic uncertainties, to let the job run faster on the grid and produce less output. Make a copy of the
settings.txt
file and enable the
noSyst
option like this:
cp $ROOTCOREDIR/../TopD3PDAnalysis/control/settings.txt .
Now edit this file and un-comment the line
noSyst 1
by removing all the hash symbols. This turns off all the systematic variations.
Then, open
01ExampleSubmit.py
in your favourite text editor. Edit the value of
SubmissionHelper.Configuration.GridUsername
to be your grid username (hint, the correct answer is not
simon
). If you exit and execute the script, with the following command,
python 01ExampleSubmit.py
it will run the un-commented code in the file. This submits a MiniSL job to the grid for run 200842 (period A) of the Egamma stream. That is to say that only this line
SubmissionHelper.SubmitMiniForPeriod('data12_8TeV.periodA.physics_Egamma.PhysCont.NTUP_TOPEL.grp15_v01_p1400_p1401/', [200842])
is being run. You can check the progress of your job using the very colourful panda monitor (
http://panda.cern.ch
). The output of this job will be saved to:
user.simon.200842.Egamma.r4065_p1278_p1400_p1401_tid01205533_00.11_10_0/
Note that the run number, stream, and
Suffix
appear in the output dataset name.
If some of the jobs failed due to 'grid' reasons, then you can just rerun the script. Prun is smart enough to know if it has already run all or part of a dataset and only resubmits the failed jobs (if you don't change the suffix).
2) Download the output
Once run 200842 has completed - note that it will run merging after the initial set of jobs has finished - then we should try to download the output root files. One way to do this is the wrapper for dq2-get given in
02ExampleDownload.py
. Let's give it a try, open the file in a text editor.
You will need to set the
datasetPattern
to match your username and also the
Suffix
, if you changed it in the submission script. Every dataset that matches this pattern will downloaded. You also need to change
directory
to the place you want the files to be put locally. I find dq2 can be a bit flaky, so I set the number of attempts to be 2, which means it loops through everything twice. After making these changes run the command:
python 02ExampleDownload.py
After this, you should see that your
directory
contains a sub-directory for each dataset, and inside that you'll find all the root files from the grid.
3) Merge the output
I like to have one file per run. Edit
03ExampleMerge.py
and change the
directory
to be the same as what you set in the previous script. Running it with
python 03ExampleMerge.py
will execute hadd for each run.
4) Check that you ran on all the events
Ami contains records of the number of events in each run, and we should check that the number of events in our file matches ami (i.e. that the grid behaved!). Let's look inside
04ExampleCheckYields.py
. If you're following this example, you've got a single ntup_top run from Egamma period A. Set the
directory
field to the same as before. Now run the script:
python 04ExampleCheckYields.py
It will list all the run numbers from the GRL, the yield as reported by Ami and the yield inside your file. If the run is black then it means everything is fine, if the run is red then the yields don't match and you should investigate what went wrong.
5) Event challenge
For the event challenge people like to compare yields by period. Edit
directory
in the
05ExamplePrintCutflows.py
file and check the dataset and stream are okay. Then run the command
python 05ExamplePrintCutflows.py
For now your 'Period A' yield will just be the single run you have, but you can go back and submit all the missing period A files using the information above.
6) Submit for all of period A, egamma stream
Let's say you want to submit jobs for every run in Period A. You could just follow the instructions in section 1b of
01ExampleSubmit.py
. It's the same command as step (1), but with the array of run numbers removed from the end of the line - this means a job will be sent for every run in the container that is also on the GRL. e.g.
SubmissionHelper.SubmitMiniForPeriod('data12_8TeV.periodA.physics_Egamma.PhysCont.NTUP_TOPEL.grp15_v01_p1400_p1401/')
7) Submit a grid job for a MC simulation sample
Now move to the
examples_mc
directory, i.e.
cd TopD3PDScripts/examples_data
Like before, make a copy of
settings.txt
and enable the
noSyst
option like this:
cp $ROOTCOREDIR/../TopD3PDAnalysis/control/settings.txt .
Now edit this file and un-comment the line
noSyst 1
by removing all the hash symbols. This turns off all the systematic variations.
Then, open
01ExampleSubmit.py
in your favourite text editor. Edit the value of
SubmissionHelper.Configuration.GridUsername
to be your grid username (hint, the correct answer is not
simon
). By default you'll notice that the MCSamples array has just one entry a 117050 Powheg+Pythia ttbar sample. For the purposes of this tutorial that will do nicely, but in reality you probably want to add many more samples to this list. Anyway, if you exit and execute the script, with the following command,
python 01ExampleSubmit.py
Then it'll fire off a grid-job for that sample.
7) Some (hopefully useful) scripts
Checking yield in ami
It is very useful to check that you ran on all the available events. Ami keeps a list of the number of events in each data and MC sample. In
TopD3PDScripts/examples_misc
you'll find the example file. The command below will list the dataset names and yields for every p1400 MC sample.
python ExampleAmiYield.py mc12_8TeV.*.NTUP_TOP.*p1400
Getting cross-section for a sample
The
TopDataPreparation
package provides a nice interface and text files containing the cross-sections and k-factors of MC files. In
TopD3PDScripts/examples_misc
you can find
ExampleCrossSection.py
. Give it a try with:
python ExampleCrossSection.py 105200
If you want to do something more complicated, have a look inside the
ExampleCrossSection.py
file.
Stack plots
-
AnaExample → Simon, Anna (need to have samples ready)
- Hands on - making stack plots (maybe we could put all the needed files on eos?)
Systematic variations
- recommendations (briefly), mostly focus on detector stuff
- how to get them from
MiniXL
- get volunteers to fix this
Other selections (boosted, single top)
Avi, Soeren or Barbara
--
SimonHead - 06 Feb 2014