Instructions on Preparing PIDCalib Samples

This page provides information on how to create PIDCalib samples. It is for experts only.

Latest PIDCalib setup instructions

Please follow the instructions on PIDCalib Package webpage. Besides the above, you also need to
getpack PIDCalib/CalibDataSel head

CalibDatSel Package: Produce tuples from DST

The files you need now boil down to: Src/TupleToolPIDCalib.cpp TupleToolPIDCalib.h EvtTupleToolPIDCalib.cpp EvtTupleToolPIDCalib.h

These are places to put Tuple variables.

And dev/ is for Run 1 where the input is various stripping lines is for Run2 where the input is various trigger lines and this matching needs to be done too. simply runs the jobs. (I think you will need to update the DV version listed).

In dev/, add

S5TeVdn = PIDCalibJob(
                    year           = "2015"
                 ,  stripVersion   = "5TeV"
                 ,  magPol         = "MagDown"
                 ,  maxFiles       = -1
                 ,  filesPerJob    = 1
                 ,  simulation     = False
                 ,  EvtMax         = -1
                 ,  bkkQuery       ="LHCb/Collision15/Beam2510GeV-VeloClosed-MagDown/Real Data/Reco15a/Turbo01aEM/95100000/FULLTURBO.DST"
                 ,  bkkFlag        = "OK"
                 ,  stream         = "Turbo"
                 ,  backend        = Dirac()
Then execute the file inside ganga:
In [5]:execfile('')
Preconfigured jobs you can just submit: 
. PIDCalib.up11.submit()
. PIDCalib.validation2015.submit()
. PIDCalib.up12.submit()
. PIDCalib.down11.submit()
. PIDCalib.S23r1Up.submit()
. PIDCalib.down12.submit()
. PIDCalib.S5TeVdn.submit()
. PIDCalib.test.submit()
. PIDCalib.S23r1Dn.submit()

In [6]:PIDCalib.S5TeVdn.submit()

After all jobs finished, you may need to download them to local directory (not sure if needed)

In [6]:for js in j.subjobs:
   ...:     if js.status == 'completed':
   ...:         js.backend.getOutputData()

CaibDataScripts: Produce tuples for each particles

The RICH performance changes as a function of time (depends on conditions and alignment changes). A RooDataSet can only hold so many events and variables before it becomes too large and won’t save correctly. Both of these facts leads us to have a) more than one file per decay channel and b) the numerical index of each file ascends with run number. This is useful so that if someone wants to run over a specific run period they can just select the few relevant files.

Hence this means that the workflow goes like this:

Ntuples finish making --> Run ranges are defined -- > Data split into those specific run ranges -- > any additional selection applied -- >mass fit performed in each run range for each charge -- > data is sWeighted --> spectator variables are added to the data set --> both charge datasets are merged . The same set of steps is repeated for each decay channel. For the protons, since there is more than one momentum range the fit is done separately in each range and then merged. For D*->D(Kpi)pi the data the definition of the run range and the data splitting is a common step for both K and pi, however the mass fits and are done twice (even though it is the same fit).

getpack PIDCalib/CalibDataScripts head

Inside, There are 3 src directories Src – for S20/S0r1 data Src_S21 for S21 Src_Run2 for S22/23

The reason for different directories is due to changes in the ntuple format/naming conventions and changes in stripping cuts, which changed the selection cuts subsequently applied. Also the variables stored in the calibration datasets has also changed as a function of time. E.g for Run 2 we save online and offline variables.

In cmt/requirements The first step is to choose the correct src directory to compile. This is just done by changing src directory to your corresponding one. Then the usual Cmt br cat make

Then go to jobs/Stripping23 and modify

Before submitting jobs to PBS, you need to do the following to make it recognize you: add the following lines in ~/.gangarc

preexecute =
import os
env = os.environ
jobid = env["PBS_JOBID"]
tmpdir = None
if "TMPDIR" in env: tmpdir = env["TMPDIR"].rstrip("/")
else: tmpdir = "/scratch/{0}".format(jobid)
postexecute =
import os
env = os.environ
jobid = env["PBS_JOBID"]
tmpdir = None
if "TMPDIR" in env: tmpdir = env["TMPDIR"].rstrip("/")
else: tmpdir = "/scratch/{0}".format(jobid)
make sure you have the above lines everytime you run jobs.

and then go to GetRunRanges/

Here you will see a set of scripts, one for each polarity, one for each particle species. You shouldn’t need to change anything – it will look inside your .gangarc file to find your gangadir location etc etc. the output of these jobs gets sent to the jobs/Stripping23/ChopTrees directory as a .pkl file. This file contains the run number ranges that the script which this script calls defines. The file that is actually run by the ganga job is $CALIBDATASCRIPTSROOT/scripts/sh/ which in turn calls $CALIBDATASCRIPTSROOT/scripts/python/ for Dst and Jpsi. All this script does is look at your tuples, see how the candidates are distributed by runnumber and then split into an number of ranges such that each range contains about a million candidates but avoids the last dataset having too few.

-- WenbinQian - 2016-03-17

This topic: LHCb > WebHome > LHCbComputing > RichSoftware > RichSoftwareCalib > PIDSamplePrepare
Topic revision: r3 - 2016-03-21 - WenbinQian
This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback