Utilities for Accessing Pileup Information for Data

Introduction

In many cases, it is useful to know the amount of "pileup"; that is, the total number of pp interactions per bunch crossing. The most straightforward way of doing this is to use the instantaneous luminosity -- if a single bunch has an instantaneous luminosity Linsti, then the pileup is given by the formula μ = Linsti σinel / frev, where σinel is the total pp inelastic cross section and frev is the LHC orbit frequency of 11246 Hz (necessary to convert from the instantaneous luminosity, which is a per-time quantity, to a per-collision quantity). This quantity can be computed on a per-lumi section basis (where a lumi section is the fundamental unit of CMS luminosity calculation, about 23.3 seconds long).

The pileup estimation obtained from the luminosity can be compared with the number of reconstructed primary vertices per event. As the latter is affected by the vertex reconstruction efficiency (~70%), these two quantities will not necessarily agree. However, these can be used to measure the vertex reconstruction efficiency or validate the results.

The μ value obtained from the instantaneous luminosity is, by its nature, a measure of the average pileup. The distribution of pileup for individual events will naturally be a Poisson distribution around this average.

A note on nomenclature: In luminosity operations, we use "pileup" to refer to the total number of pp interactions. However, in some contexts (e.g., MC generation), "pileup" refers to the number of additional interactions added to the primary hard-scatter process. Please be aware of this distinction in order to avoid off-by-one errors in your study!

CMS provides the pileupCalc.py utility, maintained by the lumi POG group, to provide a central resources for performing these pileup calculations.

Using pileupCalc

For standard physics analyses, the lumi POG maintains a set of central files containing the pileup information. All you need is to use pileupCalc.py to extract the pileup information for the particular periods used in your analysis to obtain the distribution for your analysis.

To run pileupCalc, simply set up the CMSSW environment (note: pileupCalc.py is broken in CMSSW_10_2_X, so please use another version) and then run as follows:

pileupCalc.py -i MyAnalysisJSON.txt --inputLumiJSON pileup_latest.txt --calcMode true --minBiasXsec 69200 --maxPileupBin 100 --numPileupBins 100 MyDataPileupHistogram.root

where:

  • MyAnalysisJSON.txt is the JSON file defining the lumi sections that your analysis uses. This is generally the appropriate certification JSON file from PdmV or processedLumis.json from your CRAB job.
  • pileup_latest.txt is the appropriate pileup file for your analysis. See below for the locations of the central files.
  • minBiasXsec defines the minimum bias cross section to use (in μb). The current run 2 recommended value is 69200. Please see below for discussion of this value.
  • MyDataPileupHistogram.root is the name of the output file.

The script will then run for a few minutes and then produce the output file with the name you specified. This output file will contain a single histogram, named pileup, which contains the resulting average pileup distribution. You can then use this histogram for reweighting, making plots, etc.

You may occasionally get a warning of the form:

Run 325172, LumiSection 460 not found in Lumi/Pileup input file. Check your files!
If this error only pops up once or twice, you can safely ignore it. If you get a large number of occurrences, then something strange is going on. The central pileup files include all lumi sections in the DCSOnly JSON, and a physics analysis should normally be using a subset of this data. Please check to make sure that your input JSON file is sensible and then contact the lumi POG.

You may also occasionally get a warning of the form:

Significant probability density outside of your histogram
Consider using a higher value of --maxPileupBin
Mean 48.639296, RMS 24.106512, Integrated probability 0.961607
Again, if this happens only once or twice, it can be safely ignored (there are sometimes a couple of lumi sections which are badly behaved). If you get a large number of these errors it is probably a sign that the upper edge of your histogram is too low (as the warning suggests) -- try increasing maxPileupBin and numPileupBins.

Location of central pileup files

For Run 2, the latest pileup files can be found as follows. In all cases you should use pileup_latest.txt, which is the most recent version (or a link to it). The JSON is generally updated whenever a new version of the normtag is released, so it should always be in sync with the latest luminosity calibration.

  • 2015: /afs/cern.ch/cms/CAF/CMSCOMM/COMM_DQM/certification/Collisions15/13TeV/PileUp/
  • 2016: /afs/cern.ch/cms/CAF/CMSCOMM/COMM_DQM/certification/Collisions16/13TeV/PileUp/
  • 2017: /afs/cern.ch/cms/CAF/CMSCOMM/COMM_DQM/certification/Collisions17/13TeV/PileUp/
  • 2018: /afs/cern.ch/cms/CAF/CMSCOMM/COMM_DQM/certification/Collisions18/13TeV/PileUp/
The Run 1 pileup files are as follows:
  • 2011: /afs/cern.ch/cms/CAF/CMSCOMM/COMM_DQM/certification/Collisions11/7TeV/PileUp/pileup_2011_JSON_pixelLumi.txt
  • 2012: /afs/cern.ch/cms/CAF/CMSCOMM/COMM_DQM/certification/Collisions12/8TeV/PileUp/pileup_latest.txt

Recommended cross section

The recommended cross-section for Run 2 is 69.2 mb, which confirms the initial 2015 estimate of 69 mb. Please see this HN thread for more details. The uncertainty in this measurement is 4.6%; see PileupSystematicErrors for more information on the systematic uncertainty. Note that there is some inconsistency with the "Pythia default" cross section of 80 mb for 13 TeV.

For Run 1, the recommended cross sections are 68 mb for 7 TeV (2010-2011) and 69.4 mb for 8 TeV (2012). Note that these are also somewhat inconsistent with the Run 2 recommendation.

The following short sections explain how the pileup information is calculated. YOU DON"T HAVE TO DO THIS! It's done centrally... Those of you interested only in how to use the information should skip to Calculating Your Pileup Distribution.

Estimating the Luminosity

(NOTE: As of February 28, 2012, the new pixel luminosity measurement has been approved, but the pixel detector cannot provide bunch-by-bunch luminosity information. A hybrid scheme has been adopted, first, using the procedure below to calculate the bunch-by-bunch luminosities and the rms, and then using the pixel recorded luminosities per LumiSection to rescale to the "approved" values. This may be modified when more sophisticated techniques become available. As of September 1, 2013, the pixel luminosity values are available for 2012 data. The default pileup JSON file has been corrected to the pixel values.)

As an input to the pileup calculation, the instantaneous bunch-by-bunch luminosities and the delivered and recorded luminosity for each LumiSection are necessary. These are obtained by doing a Luminosity DataBase query using the script lumiCalc2.py provided in RecoLuminosity/LumiDB. This only need be done once, as long as the luminosity corrections are stable. As a starting point, the DCSONLY JSON files are used. The query looks like this:

lumiCalc2.py lumibylsXing --xingMinLum 0.1 -b stable -i json_DCSONLY.txt -o lumi_DCSONLY.csv

(NOTE: --xingMinLum 0.3 is required for "normal" 2012 running to remove all of the afterglow effects on non-filled bunches. .)

This produces a (very large) .csv file containing all of the needed luminosity info for each bunch for each LumiSection. (For 2011, there are many LumiSections in the DCSONLY files that have no luminosity information because either the beams were not stable or the beams had been lost.) Note that, in the "production" workflow, this step can be avoided if a suitable .csv file is made during the Luminosity validation procedure. The pileup-specific code takes a .csv file as input.

Calculating the Relevant Pileup Parameters

Next, we need to make a JSON format file containing the appropriate pileup information for each LumiSection. This step is done centrally and need only be done once unless either the luminosity corrections or the list of LumiSections in the DCSONLY sample changes. A few words on the necessary information that must be stored in the JSON file. We must have a way of calculating the number of expected pileup interactions, so we need the single-collision instantaneous luminosity averaged over all of the bunches colliding during each LumiSection. In order to normalise between LumiSections, we also need the total integrated luminosity for a given LumiSection. Finally, we need some idea of the spread of the individual bunch luminosities so that we can reproduce the tails of the distribution. So, for each LumiSection, we also store the rms of individual bunch luminosities.

The production of the pileup JSON file is done by a script called estimatePileup_makeJSON.py, also found in RecoLuminosity/LumiDB. The command looks like:

estimatePileup_makeJSON.py --csvInput lumi_DCSONLY.csv pileup_JSON.txt

Note that this calculation has been considerably streamlined compared to the old estimatePileup scripts. It runs much faster, and uses approximately 1/20th of the memory of the older versions.

The pileup JSON file contains one entry for each LumiSection with four values (LS, integrated luminosity (/ub), rms bunch luminosity (/ub/Xing), average bunch instantaneous luminosity (/ub/Xing)), which makes it a relatively large file. Note that NO assumption is made as to the minbias cross section here, allowing for flexibility later. A sample entry for a random run looks like this:

 
"160577": [[14,1.2182e+01,2.2707e-06,6.7377e-05],[15,3.3884e+01,2.4204e-06,6.7367e-05],...]

(NOTE: If the pileup JSON file is created in the manner just described, it will NOT contain the pixel luminosity measurements. See discussion below on the pileup JSON based on pixel luminosity information.)

Since all analysis JSON`s should only include subsets of Runs and LS`s of the DCS-Only JSON, you in principle do not have to calculate the pile up JSON yourself. Instead, use your specific JSON to mask the pile up JSON provided centrally, as in shown in the example above.

This is currently not recommended: If you want to anyhow calculate the pileup per LS yourself, you can use the script at /afs/cern.ch/user/l/lumipro/public/estimatePileup_makeJSON_2015.py.

2012 Pileup JSON Files

Files for 2012 can be found in the 2012 DQM area: /afs/cern.ch/cms/CAF/CMSCOMM/COMM_DQM/certification/Collisions12/8TeV/PileUp/. One is produced each week in conjunction with the certified data JSON files. The file names give the run range each file contains (i.e., pileup_JSON_DCSONLY_190389-190688.txt). The default file has been corrected to the pixel luminosity values as of September 4, 2013.

  • Previous UPDATE: 6 August 2012. New pileup JSON files have been made incorporating the pixel luminosity corrections for the ICHEP dataset. Pixel corrections will be included for the rest of the data when they become available. The JSON files with the pixel corrections have "pixelcorr" in the file name. For comparison, a parallel file without the pixel corrections has been produced (without the "pixel" in the name.) Both sets of these files, and all that will follow, contain a revised version of the pileup constants that has been recalculated to avoid the effects of non-filled bunches containing substantial luminosity signals. This resulted in artificially large r.m.s. values of the bunch-by-bunch luminosity, which produced large-ish tails on the calculated data pileup spectrum. The mean value of the number of interactions changed by a small amount (~2%) after removing this effect. Details (more explanation, plots) can be found here.

2011 Pileup JSON File(s)

Both of these files are found in the "standard" pileup /afs area: /afs/cern.ch/cms/CAF/CMSCOMM/COMM_DQM/certification/Collisions11/7TeV/PileUp/.

DCSONLY File, no Pixel Luminosity Corrections

The 2011 DCSONLY-based pileup JSON file is called pileup_JSON_2011_4_2_validation.txt . The tag "4_2_validation" is used to denote that this file contains an entry for each LumiSection in the Golden JSON files validated on the CMSSW_4_2 relase cycle. However, since this contains all LumiSections specified in the DCSONLY files with valid luminosity information for the entire 2011 dataset, it is a superset of both the Golden and Muon JSON files for both the 4_2 and 4_4 reprocessing.

DCSONLY File, WITH Pixel Luminosity Corrections

The 2011 DCSONLY-based pileup JSON file including the calculation of the luminosity using the Pixel Corrections is called pileup_2011_JSON_pixelLumi.txt . This contains all LumiSections specified in the DCSONLY files with valid luminosity information for the entire 2011 dataset, and, as such, can be used with data processed with either 4_2 or 4_4.

A comparison plot of the pileup distributions before/after the pixel luminosity corrections ("pileup_distributions_compare.pdf") is attached to this page.

Modifying Pileup JSON for Use with Different HLT Paths

It is now possible (27 Feb, 2012) to modify the pileup JSON file in a simple manner to take into account the effects of various prescales on your chosen HLT paths. This is a two-step process.

  1. First, calculate the delivered and recorded luminosity for each lumi section in your JSON file for the given trigger paths you are using. This should be done using pixelLumiCalc for Run1:
     pixelLumiCalc.py lumibyls -i InputJSONFile --hltpath "YourHLTPath" -o YourOutput.csv 
    and for Run2:
     brilcalc lumi --normtag /afs/cern.ch/user/c/cmsbril/public/2016normtags/normtag_BRILv3.json -i InputJSONFile --hltpath YourHLTPath --byls -o YourOutput.csv
    This command will create a .csv file with one entry for each LumiSection for each trigger. In order to make the JSON file, the output of this process should be a single .csv file with one entry per LumiSection . If you have mutliple triggers or are using an OR of triggers or whatever, you have to figure out what the integrated luminosity is for each LumiSection, since that has to be a unique value. Once you make the .csv file with that information, you can make the JSON file.
  2. Second, use the script from the latest version of RecoLuminosity/LumiDB called pileupReCalc_HLTpaths.py. This script takes the correct recorded luminosity from the file calculated above and places this value in the pileup JSON. This will change the weight of each LumiSection in the pileup distribution where the recorded luminosity is smaller than nominal due to trigger prescales. The syntax is as follows:
     pileupReCalc_HLTpaths.py -i YourOutput.csv --inputLumiJSON pileup_2011_JSON_pixelLumi.txt -o My_HLT_corrected_PileupJSON.txt
    for Run1 and for Run2 it is:
     pileupReCalc_HLTpaths.py -i YourOutput.csv --inputLumiJSON pileup_latest.txt -o My_HLT_corrected_PileupJSON.txt --runperiod Run2

Notes:

1.) use the head version of RecoLuminosity /LumiDB.

2.) For Run2 there may be print outs by the script indicating that the rescaling is larger than one. This happens for data where HF is used as luminometer and the reason is that empty bunch slots have non zero luminosity counts due to afterglow. This is being subtracted in the pileup_latest.txt file by setting a threshold however not in per LS values. The effect is usually below 0.5% and no problem.

Then, you can use the newly-created My_HLT_corrected_PileupJSON.txt as your new input JSON file for the pileup calculation described in the next section.

Calculating Your Pileup Distribution

For the user, the important issue is how to access this information. All that is required is a file in standard JSON format listing the LumiSections included in a user's analysis. Then, a script called pileupCalc.py in RecoLuminosity/LumiDB can be used to create the histogram of the pileup distribution corresponding exactly to the LumiSections used in the analysis. This is the input needed for the pileup reweighting tools. These scripts are available in tags V03-03-16 and later of RecoLuminosity/LumiDB.

A sample calculation looks like this:

pileupCalc.py -i MyAnalysisJSON.txt --inputLumiJSON pileup_JSON_2011_4_2_validation.txt --calcMode true --minBiasXsec 69400 --maxPileupBin 50 --numPileupBins 50  MyDataPileupHistogram.root

There are a number of important arguments and options on display here. (All of the arguments can be seen by typing pileupCalc.py --help.)

  • The only required option is "--calcMode" which tells the script which distribution you are making. The two choices are "true" and "observed". Some have found this nomenclature confusing, so here is another attempt at an explanation. Given a total inelastic cross section, the average bunch instantaneous luminosity can be directly converted into the expected number of interactions per crossing for this LumiSection. Selecting the "true" mode puts this value, and only this value, into the pileup histogram. Hence, in this case the pileup histogram contains the distribution of the mean number of interactions per crossing, which would correspond exactly to the value returned by PileupSummaryInfo::getTrueNumInteractions() in the Monte Carlo. Since this is the mean value of the poisson distribution from which the number of interactions in- and out-of-time are generated, no additional information should be required for reweighting if these values are matched in data and Monte Carlo. On the other hand, selecting the "observed" mode causes the script to enter in the histogram a properly-normalized poisson distribution with a mean corresponding to the expected number of interactions per crossing for each LumiSection. Given an expected mean number of interactions, the pileup histogram contains the distribution of the number of interactions one would actually observe given a poisson of that mean. So, this distribution is what one would see if one counted the number of events seen in a given beam crossing (by looking at the number of vertices in data, for example), or using something like PileupSummaryInfo::getPU_NumInteractions() in the Monte Carlo. This would be appropriate for pileup reweighting based on in-time-only distributions. Plots of these distributions are shown later in this page.

  • The total inelastic cross section is an input argument; the default is set to 73500 ub, but it can and should be modified by the "--minBiasXsec" option to set it to the approved value of 68000 (for 2011) or 69400 (for 2012). (This default will be set to 69400 in an upcoming version.) Note that this command option makes shifting the target distribution for reweighting as simple as regenerating another histogram with a different cross section; all issues with shifting poisson distributions and the like are automatically done correctly. This should make computation of systematic errors much simpler.

  • The user also has complete control over the binning of the output histogram. For the "true" mode, some users have found that having many bins improves the 3D pileup reweighting technique, so this can be varied. For the "observed" mode, since the number of interactions is an integer, it makes sense to have the same number of bins as there are interactions, hence the 50 and 50 in the example above. Smaller bins are allowed in this mode and are properly calculated, however.

On a more technical note, the rms of the bunch-to-bunch luminosities is used to generate tails on the distributions. The above description is almost correct, except that each LumiSection is actually represented by a gaussian centered on the mean number of interactions, and having a width corresponding to the rms of the bunch-by-bunch luminosities, even in "true" calculation mode. This is necessary because the distribution in data sees all of the colliding bunches, not just the average. The gaussian is properly convoluted with the poisson distribution for the "observed" calculation mode. Two sets of comparison plots are included here. In the first set, the expected mean number of interactions is calculated for every bunch-by-bunch collision in each LumiSection and is weighted by the integrated luminosity for each bunch pair in each LumiSection (black histogram). This is compared to (in red) the distribution of the expected number of interactions in the pileup JSON calculation using "true" mode. The seven run ranges were chosen more or less randomly based on their size in the DCSONLY JSON file. The eighth panel shows the number of interactions on a logarithmic scale, including the high-pileup runs at the end of 2011. The agreement is quite good. The second set of plots shows a comparison of the calculated "observed" distributions in each case, where, in black, the poisson distribution is calculated for each bunch crossing individually. This corresponds to the old estimatePileup function, and is one of the reasons it takes forever to run. The red histogram is the same calculation, but done once per LumiSection using the pileup JSON file. The small differences in the "true" case are smeared out here by the additional poisson distributions, resulting in even better agreement.

Mean number of interactions per bunch collision in data compared with pileup JSON calculation in "true" mode: pileup_comparison_true_distr.png

Estimated number of interactions per crossing in data compared with pileup JSON calculation in "observed" mode: pileup_comparison_obs_distribution.png

-- MichaelHildreth - 06-Jan-2012

Edit | Attach | Watch | Print version | History: r21 | r13 < r12 < r11 < r10 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r11 - 2019-05-31 - PaulLujan
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Sandbox All webs login

  • Edit
  • Attach
This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback