Physics Performance and Datasets exercises


In this set of exercises you will learn how to look for the datasets and find out their key properties relevant for the analysis: how to navigate their parent-child relationship and determine the software release, production configuration, alignment-calibration conditions they've been produced with, e.g. cmsdrivers, global tags, production configuration files, gridpacks, cross section and more production details of samples. You'll be exposed to using the principal services providing status and details of datasets: *DAS*, brilcalc, McM, pMp, cmsDBbrowser. You'll also find out how to compute the integrated luminosity for your analysis.


Please contact Gurpreet Singh Chahal ( and Tongguang Cheng ( for questions, comments or suggestions about this exercise.

Introduction to datasets

Summarise there the content of the slides, add a pointer to them:

Getting ready for these exercises

A bit of preparation will ease following this tutorial. Below a few concrete actions you can take before starting the hands-on session.


In order to carry out the exercises of this session you need:

  • a CMS account at CERN to access web services (you can access this twiki you have it) and log into lxplus
  • a grid certicate installed on your favourite web browser
  • read the XXX is beneficial

Setup a CMSSW area

Most of the exercises will be carried out using your web browser. A CMSSW work area will also be needed during the exercise.

Now you are ready to proceed to the exercises.

Exercise 1: Exploring dataset searching using DAS

Reminder about the general structure of a dataset name:

 dataset = /PrimaryDataset/ProcessingVersion/DataTier


dataset = /BTagCSV/Run2015B-05Aug2015-v1/AOD

To start with, you need to establish the full dataset name "with single electron". Either you know it from a reference, or you need to construct a key elements of it, and put them together in a search. With the key elements at hand, you'll be able to use the *DAS* web interface and queries with wildcard in order to establish or confirm the complete dataset name.

  • The PrimaryDataset string for real data (i.e. collected at CMS) is, strictly speaking, specified in the HLT configuration accessible via the HLT browser; however in most cases you can do reasonable guess to find what you need:
  • SingleElectron.

  • the latest reprocessing of the 2017 data can be found look at the PdmVDataReprocessing: data reprocessing campaigns documentation twiki: the version of the reprocessing is indicated by a date
  • 17Nov2017
  • the acquisition era is part of ProcessingVersion and indicates the portion of the 2017 run when the data were collected
  • 2017D


You can now place a query to DAS to find the dataset; using the wildcard increases the chances of finding at the first try what you're looking for with no need to remember the details of the naming conventions - of course, you ought to use the wildcard with a pinch of salt, not to be flooded with too many results matching your query.

dataset dataset=/SingleElectron*/*Run2017D*17Nov2017*/MINIAOD 

  • note that some sites are not accessible to the users (e.g. : tape storage)

Place your final query to DAS looking for the file you want at a site where the dataset has a presence. (note: the site where the dataset is present might change with time, thus the site chosen in detailed query which follows might need to be updated):

file dataset=/SingleElectron/Run2017D-17Nov2017-v1/MINIAOD site=T1_US_FNAL_Buffer
Does the query return a file ? If not, why and which site(s) we use to return a file ? 

You can now run on one of the files and find out its basic properties, exploiting the fact that xrootd will serve the file you've chosen from the CMS site where it's available on disk to your cmsRun process:

 edmFileUtil   --eventsInLumis  -P   root:// (*update with the actual file you've found*) 

Exercise 2: Explore information for a Monte Carlo miniAODSIM sample from DAS, McM and pMp

The sample we start from is the TTJets sample overlaid with the pile up which matches the profile of instantaneous luminosity of the 2016 data taking:

dataset dataset=/TT_TuneCUETP8M2T4_13TeV-powheg-pythia8/RunIISummer17MiniAOD-92X_upgrade2017_realistic_v7-v1/MINIAODSIM 

  • The global tag is fully specified in the ProcessingVersion, following the campaign name (RunIISummer17MiniAOD) and preceding the processing string (here absent) and the dataset version (v1)
  • 92X_upgrade2017_realistic_v7


  • Click on the Children link in the DAS presentation of the dataset
  • Multiple children are found, spanning over 6 pile up scenario

Any Monte Carlo sample is associated to a prepID, a unique identifier of the production request which has produced it. prepID 's are strings like HCA-RunIIFall15DR76-00002, formed by the physics group which has placed the production request, the production campaign and an integer number.

prepID 's are used by the Monte Carlo Management Meeting, where production requests are notified and prioritized, and by the computing operation teams, and are the identifier used in its two key web based platforms: Monte Carlo Management (McM) and production Monitoring platform (pMp).

  • TOP-RunIIFall17MiniAODv2-00002


  • Click on the Navigation tab, enter TOP-RunIIFall17MiniAODv2-00002 in the prepId field then click Search tab. Wildcard is supported in McM.

  • Each column shows different elements of the request. You can view more using Select View.


  • Click the tick icon in the column Actions


* The sequence of commands can be executed to run over a few events all the steps of the digi-reco processing

* The production Monitoring platform (pMp) is a service available to CMS members to monitor the status of progress of single Monte Carlo production requests, full campaigns, and group of requests (defined by physics working group, processing configuration, priority etc). It can be accessed directly or linked from Monte Carlo Management (McM).

Can you find the icon link to the growing history from production Monitoring platform (pMp) ?

Can you find the icon to view the chained requests? How many chains do you find if more than one, what is the difference between them?

* Click to see the full chain (GEN-SIM → DIGI-RECO → miniAOD) in which this sample is produced

Now you should feel more comfortable to explore McM. Now let's focus on its LHE and GEN-SIM requests.

hint: we just did that in step4. Or one can click on eye icon to see the cmsdriver details in the Sequence

hint: Which request is GEN-SIM request? Select View tab can expand/hide the information one can view about the request

hint: How to find chains containing the LHE or GEN-SIM request?

Exercise 3: Generating MC miniAOD events from scratch

  • Follow slides 24, 25, 26 from PPD short exercises slides.
    • Get test command for GEN-SIM or LHE request
    • Go to terminal, enter your work area
    • $> wget <URL>
    • Initialize your grid proxy certificate
    • $> voms-proxy-init -voms cms
    • Update the script in accord with your needs
    • Launch test command
    • $> source <filename>
    • Request.xml,, Request.root files will be created
    • Read logs and explore these files
    • If everything seems reasonable, you can also produce DIGI-RECO and miniAOD in same file by adding appropriate cmsdrivers, etc.

Exercise 4: Compute the integrated luminosity collected by CMS in 2017

The data collected by CMS are certified on a luminosity-section basis to determine which data is of good quality to be included in physics analyses. The data certification is carried out taking into account both the health in operation of the sub-detectors at and the scrutiny of the reconstructed physics objects by DPG and POG experts. The outcome of the certification process as more data gets collected and for each new version of the data processing is regularly updated by the DQM-DataCertification with reports at the PPD General Meeting and by means of json files, also available in this certification repository:

ls -ltrFh /afs/

The json files from the certification are used to restrict the events to be included in analysis, typically setting the lumiMask in the crab configuration.

You can see the run and luminosity section structure by opening one of the files:

cat  /afs/

Only successfully processed luminosity sections should be used to compute the integrated luminosity of your analysis: that's typically achieved by asking for the crab report, which is also in json format, and provides a summary file of the runs and luminosity sections processed by completed jobs. Here, for semplicity, we'll use directly the certification exercise for luminosity calculation, assuming all processing jobs for run 305842 have been successful.

The luminosity information can be accessed via the BRIL Work Suite , which needs a simple installation procedure:

pip install --install-option="--prefix=$HOME/.local" brilws
*bash* : export PATH=$HOME/.local/bin:/afs/$PATH
*tcsh* : setenv PATH $HOME/.local/bin:/afs/$PATH

The integrated luminosity as measured during the data taking (Norm tag:*onlineresult*), delivered and recored, is provided for the luminosity sections specified in the json, limiting to the run 305842:

brilcalc lumi --help
brilcalc lumi -b 'STABLE BEAMS'  -r 305842  -i  /afs/ 
To remove the check-JSON output, one could add the following option to the brilcalc lumi options :
To change unit of the luminosity,
 -u /fb or /pb 

#Data tag : online , Norm tag: onlineresult
| run:fill    | time              | nls | ncms | delivered(/ub) | recorded(/ub) |
| 305842:6346 | 10/29/17 23:07:27 | 862 | 862  | 118945539.409 | 115950148.878  |
| nfill | nrun | nls | ncms | totdelivered(/ub) | totrecorded(/ub) |
| 1     | 1    | 862 | 862  | 118945539.409  | 15950148.878 |

You can verify that you get the same output of you constrict yourself a json file limited to run 256843 and process it without run restrictions:

cd CMSSW_9_4_6_patch1/src 
cmsenv --min=305842 --max=305842 /afs/   | tee 305842.txt
cat 305842.txt
brilcalc lumi -b 'STABLE BEAMS'  -i  305842.txt

The luminosity measurement is updated and improved, after the first release of the online measurement, by taking into account all the luminometers available at CMS and the outcome of the Van der Meer scan analyses. A few weeks after the data taking, the Bril DPG and Luminosity POG release the best combination of luminometers in a Norm tag (e.g.: composite) json file available:

you should look here /afs/ for most up to date file.

Therefore the recommended way of computing luminosity for your analysis is :

brilcalc lumi -help
brilcalc lumi -r 305842 --normtag /cvmfs/ -u /pb -i Cert_294927-306462_13TeV_EOY2017ReReco_Collisions17_JSON_v1.txt --without-checkjson [--byls]

#Data tag : online , Norm tag: composite
| run:fill    | time              | nls | ncms | delivered(/ub) | recorded(/ub) |
| 305842:6346 | 10/29/17 23:07:26 | 862 | 862  | 117828014.670  | 114870230.725  |
| nfill | nrun | nls | ncms | totdelivered(/ub) | totrecorded(/ub) |
| 1     | 1    | 862 | 862  | 117828014.670     | 114870230.725     |

Exercise 5: Explore GlobalTag through cmsDBbrowser

What is GlobalTag

The alignment and calibration conditions needed by all stages of the data production (SIM, DIGI: for simulated events) and processing (RECO, miniAOD: for simulation and reconstruction alike) in CMSSW are defined in the Offline Conditions Database, which is read in CMSSW applications via Frontier caching servers.

The set of database tags which together define the offline conditions data are collected together in a Global Tag, which is itself stored in the database. This removes the need for the list of database tags to be defined in separate CMSSW configuration fragments and therefore decouples the conditions database from the CMSSW release; different Global Tags can be used with a given CMSSW release, with the tag itself specified in the cfg file.

The following file should be included in the cfg for any CMSSW application for which needs to read conditions data:

from Configuration.AlCa.GlobalTag import GlobalTag

Conditions of different type are (e.g. : measured beamspot, HCAL intercalibrations, etc) are identified by a specific string called tag; a Global Tag is constructed aggregating a large set of tags (typically three to four hundred).

More information can be referred to SWGuideFrontierConditions

Explore GlobalTag through cmsDbBrowser service

The cmsDbBrowser service is the web portal for administration and navigation of the existing global tag. You can find the description of your favorite global tag, all the tags it's made of, and you can explore both the meta-data and the content of tags.

  • The search for the global tag that matches both its own specific entry in the cmsDbBrowser as well as other global tags for which it is quoted in the description
  • ex3-browser-search-single.png

  • By clicking on a specific global tag you can see its description, select a specific tag and see its IOV's (intervals of validity)
  • ex3-browser-search-documentation.png

  • ... and tags which are part of one GT and not the other, and viceversa
  • ex4-browser-differences-details.png

How conditions are consumed in CMSSW

An EventSetup object holds the conditions (tags) in Global Tags as Records and Records in turn hold data. The data is uniquely identified by

  • C++ type of the data
  • a string label (which defaults to the empty string)

A Record can be returned from the EventSetup by calling the get<> templated method

   const FooRecord& fooRecord = iEventSetup.get<FooRecord>();

If the Record is not available, an exception will be thrown.

A data item can be retrieved from an EventSetup Record by passing an ESHandle<> to the Record's get method

  edm::ESHandle<Foo> fooH;

If a label other then the default empty label is assigned to the data, you must pass that label as the first argument to get

  edm::ESHandle<Foo> fooH;
  fooRecord.get("bar", fooH);

If the data item request can not be found, an exception will be thrown.

More details refer to EventSetup.

Consumption of conditions in GlobalTag for MC production

With the cmsDriver option, we can check which conditons are consumed by the current MC production step.

--customise_commands='process.GlobalTag.DumpStat = cms.untracked.bool(True)'.

From the previous exercise, you should have a complete set of configure files or cmsDriver command to generate MC miniAOD events from scratch.

Now try rerunning the cmsDriver command with the customise_commands interactively or adding 'process.GlobalTag.DumpStat = cms.untracked.bool(True)' to configure files for each step and rerunning the configuration files, you should be able to see the conditions consumed at each step print to your screen.

Compare your results with the table. Does the result meet your expectation?

Topic attachments
I Attachment History Action Size Date Who Comment
PNGpng prepID.png r1 manage 67.1 K 2018-08-31 - 08:47 TongguangCheng  
Edit | Attach | Watch | Print version | History: r13 < r12 < r11 < r10 < r9 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r13 - 2018-09-07 - TongguangCheng
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Sandbox All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback