Workshop

https://indico.cern.ch/conferenceDisplay.py?confId=294656

I think we're going to need a new tag of TRC for the tutorial with fixes for any problems we find now.

lxplus account, grid certificate?

Introduction

TopRootCore installation

- Introduction - what is it (performance packages and top group packages)? → Sören, Hovhannes - Where to find the latest version (or in intro?)

Make instructions for lxplus only

Check disk space?

Make a file called install.sh and copy-paste the following into it. You can use whichever editor you like, but clearly vi is the best and emacs the worst.

export TAG=TopRootCoreRelease-14-00-22

source /afs/cern.ch/atlas/software/dist/AtlasSetup/scripts/asetup.sh 19.0.0
mkdir $TAG
cd $TAG
svn co svn+ssh://svn.cern.ch/reps/atlasoff/PhysicsAnalysis/TopPhys/TopRootCoreRelease/tags/$TAG $TAG
cd $TAG/share
./build-all.sh
cd ../..

Then run source install.sh. And wait. How long does that take? Can we do it in the workshop or does it have to be done ahead of time?

Make a setup script

In the TopRootCore-14-00-22 directory make a file called setup.sh and copy-paste the following text into that file.

#Grid guff, do this first. Order seems to be important
export RUCIO_ACCOUNT=<your grid username>
export ATLAS_LOCAL_ROOT_BASE=/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase
source $ATLAS_LOCAL_ROOT_BASE/user/atlasLocalSetup.sh
localSetupDQ2Client --skipConfirm
localSetupPandaClient --noAthenaCheck

voms-proxy-init -voms atlas

#Athena - for root
source /afs/cern.ch/atlas/software/dist/AtlasSetup/scripts/asetup.sh 19.0.0

#RootCore
source RootCore/scripts/setup.sh
export PYTHONPATH=$ROOTCOREDIR/../TopD3PDScripts/python:$PYTHONPATH

Note that a lot of this is lxplus specific and if you're running at a home institute you'll need to setup athena, dq2 and panda differently (probably).

Object definitions and event selection

Some slides with the 'standard' objects on. How to find things in the code? svn browser etc?

Running MiniSL / MiniML

Hands-on running MiniSL and / or MiniML

Brief introduction to - MiniSL, ML, ZL (inc a discussion of the selection). Would be good if we could mix slides and hands-on, which would allow people to catch-up / debug? Brief introduction to the code, then information about the settings file, the input file list, which parameters to specify. What's all this bitmask business, explain. We will put some data and MC files on eos. First run data. I think the first run is only a few files so that should be easy.

Stop, ls look at what we have created. Isn't it wonderful.

Why do we have 4 files? What are they? what to do with them. TBrowse them.

Then run MC without systematics. Stop,open the file and tbrowse.

Then run with all the systematics enabled for MC. Stop, browse the file.

What about MiniZL?

Event Challenges

- Importance of showing things and suggesting changes in top reconstruction meetings

Grid submission

https://twiki.cern.ch/twiki/bin/viewauth/AtlasProtected/TopD3PDScripts

Introduction - running on data

Now move to the examples_data directory, i.e.

cd TopD3PDScripts/examples_data

In the following examples the general idea is that you modify a very short file to fit your needs and then run it.

1) Submit a grid data job

Let's start by modifying some settings to disable systematic uncertainties, to let the job run faster on the grid and produce less output. Make a copy of the settings.txt file and enable the noSyst option like this:

cp $ROOTCOREDIR/../TopD3PDAnalysis/control/settings.txt .

Now edit this file and un-comment the line noSyst 1 by removing all the hash symbols. This turns off all the systematic variations.

Then, open 01ExampleSubmit.py in your favourite text editor. Edit the value of SubmissionHelper.Configuration.GridUsername to be your grid username (hint, the correct answer is not simon). If you exit and execute the script, with the following command,

python 01ExampleSubmit.py

it will run the un-commented code in the file. This submits a MiniSL job to the grid for run 200842 (period A) of the Egamma stream. That is to say that only this line

SubmissionHelper.SubmitMiniForPeriod('data12_8TeV.periodA.physics_Egamma.PhysCont.NTUP_TOPEL.grp15_v01_p1400_p1401/',  [200842])

is being run. You can check the progress of your job using the very colourful panda monitor (http://panda.cern.ch). The output of this job will be saved to:

user.simon.200842.Egamma.r4065_p1278_p1400_p1401_tid01205533_00.11_10_0/

Note that the run number, stream, and Suffix appear in the output dataset name.

If some of the jobs failed due to 'grid' reasons, then you can just rerun the script. Prun is smart enough to know if it has already run all or part of a dataset and only resubmits the failed jobs (if you don't change the suffix).

2) Download the output

Once run 200842 has completed - note that it will run merging after the initial set of jobs has finished - then we should try to download the output root files. One way to do this is the wrapper for dq2-get given in 02ExampleDownload.py. Let's give it a try, open the file in a text editor.

You will need to set the datasetPattern to match your username and also the Suffix, if you changed it in the submission script. Every dataset that matches this pattern will downloaded. You also need to change directory to the place you want the files to be put locally. I find dq2 can be a bit flaky, so I set the number of attempts to be 2, which means it loops through everything twice. After making these changes run the command:

python 02ExampleDownload.py

After this, you should see that your directory contains a sub-directory for each dataset, and inside that you'll find all the root files from the grid.

3) Merge the output

I like to have one file per run. Edit 03ExampleMerge.py and change the directory to be the same as what you set in the previous script. Running it with

python 03ExampleMerge.py

will execute hadd for each run.

4) Check that you ran on all the events

Ami contains records of the number of events in each run, and we should check that the number of events in our file matches ami (i.e. that the grid behaved!). Let's look inside 04ExampleCheckYields.py. If you're following this example, you've got a single ntup_top run from Egamma period A. Set the directory field to the same as before. Now run the script:

python 04ExampleCheckYields.py

It will list all the run numbers from the GRL, the yield as reported by Ami and the yield inside your file. If the run is black then it means everything is fine, if the run is red then the yields don't match and you should investigate what went wrong.

5) Event challenge

For the event challenge people like to compare yields by period. Edit directory in the 05ExamplePrintCutflows.py file and check the dataset and stream are okay. Then run the command

python 05ExamplePrintCutflows.py

For now your 'Period A' yield will just be the single run you have, but you can go back and submit all the missing period A files using the information above.

6) Submit for all of period A, egamma stream

Let's say you want to submit jobs for every run in Period A. You could just follow the instructions in section 1b of 01ExampleSubmit.py. It's the same command as step (1), but with the array of run numbers removed from the end of the line - this means a job will be sent for every run in the container that is also on the GRL. e.g.

SubmissionHelper.SubmitMiniForPeriod('data12_8TeV.periodA.physics_Egamma.PhysCont.NTUP_TOPEL.grp15_v01_p1400_p1401/')

7) Submit a grid job for a MC simulation sample

Now move to the examples_mc directory, i.e.

cd TopD3PDScripts/examples_data

Like before, make a copy of settings.txt and enable the noSyst option like this:

cp $ROOTCOREDIR/../TopD3PDAnalysis/control/settings.txt .

Now edit this file and un-comment the line noSyst 1 by removing all the hash symbols. This turns off all the systematic variations.

Then, open 01ExampleSubmit.py in your favourite text editor. Edit the value of SubmissionHelper.Configuration.GridUsername to be your grid username (hint, the correct answer is not simon). By default you'll notice that the MCSamples array has just one entry a 117050 Powheg+Pythia ttbar sample. For the purposes of this tutorial that will do nicely, but in reality you probably want to add many more samples to this list. Anyway, if you exit and execute the script, with the following command,

python 01ExampleSubmit.py

Then it'll fire off a grid-job for that sample.

7) Some (hopefully useful) scripts

Checking yield in ami

It is very useful to check that you ran on all the available events. Ami keeps a list of the number of events in each data and MC sample. In TopD3PDScripts/examples_misc you'll find the example file. The command below will list the dataset names and yields for every p1400 MC sample.

python ExampleAmiYield.py mc12_8TeV.*.NTUP_TOP.*p1400

Getting cross-section for a sample

The TopDataPreparation package provides a nice interface and text files containing the cross-sections and k-factors of MC files. In TopD3PDScripts/examples_misc you can find ExampleCrossSection.py. Give it a try with:

python ExampleCrossSection.py 105200

If you want to do something more complicated, have a look inside the ExampleCrossSection.py file.

Stack plots

- AnaExample → Simon, Anna (need to have samples ready) - Hands on - making stack plots (maybe we could put all the needed files on eos?)

Systematic variations

- recommendations (briefly), mostly focus on detector stuff - how to get them from MiniXL - get volunteers to fix this

Other selections (boosted, single top)

Avi, Soeren or Barbara

-- SimonHead - 06 Feb 2014

Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r2 - 2014-02-07 - SimonHead
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Sandbox All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback