Utilities for Accessing Pileup Information for Data


In many cases, it is useful to know the amount of "pileup"; that is, the total number of pp interactions per bunch crossing. The most straightforward way of doing this is to use the instantaneous luminosity -- if a single bunch has an instantaneous luminosity Linsti, then the pileup is given by the formula μ = Linsti σinel / frev, where σinel is the total pp inelastic cross section and frev is the LHC orbit frequency of 11246 Hz (necessary to convert from the instantaneous luminosity, which is a per-time quantity, to a per-collision quantity). This quantity can be computed on a per-lumi section basis (where a lumi section is the fundamental unit of CMS luminosity calculation, about 23.3 seconds long).

The pileup estimation obtained from the luminosity can be compared with the number of reconstructed primary vertices per event. As the latter is affected by the vertex reconstruction efficiency (~70%), these two quantities will not necessarily agree. However, these can be used to measure the vertex reconstruction efficiency or validate the results.

The μ value obtained from the instantaneous luminosity is, by its nature, a measure of the average pileup. The distribution of pileup for individual events will therefore be a Poisson distribution around this average.

A note on nomenclature: In luminosity operations, we use "pileup" to refer to the total number of pp interactions. However, in some contexts (e.g., MC generation), "pileup" refers to the number of additional interactions added to the primary hard-scatter process. Please be aware of this distinction in order to avoid off-by-one errors in your study!

The lumi POG provides the pileupCalc.py utility, along with other central resources, for performing these pileup calculations.

Using pileupCalc

For standard physics analyses, the lumi POG maintains a set of central files containing the pileup information. All you need to do is use pileupCalc.py to extract the pileup information for the particular periods used in your analysis to obtain the pileup distribution applicable to your analysis.

To run pileupCalc, simply set up the CMSSW environment (note: pileupCalc.py is broken in CMSSW_10_2_X, so please use another version) and then run as follows:

pileupCalc.py -i MyAnalysisJSON.txt --inputLumiJSON pileup_latest.txt --calcMode true --minBiasXsec 69200 --maxPileupBin 100 --numPileupBins 100 MyDataPileupHistogram.root


  • MyAnalysisJSON.txt is the JSON file defining the lumi sections that your analysis uses. This is generally the appropriate certification JSON file from PdmV or processedLumis.json from your CRAB job.
  • pileup_latest.txt is the appropriate pileup file for your analysis. See below for the locations of the central files.
  • minBiasXsec defines the minimum bias cross section to use (in μb). The current run 2 recommended value is 69200. Please see below for discussion of this value.
  • MyDataPileupHistogram.root is the name of the output file.
All of the arguments can be seen by typing pileupCalc.py --help. For more information on what the --calcMode flag does, see "True and observed" below.

The script will then run for a few minutes and then produce the output file with the name you specified. This output file will contain a single histogram, named pileup, which contains the resulting average pileup distribution. You can then use this histogram for reweighting, making plots, etc.

You may occasionally get a warning of the form:

Run 325172, LumiSection 460 not found in Lumi/Pileup input file. Check your files!
If this error only pops up once or twice, you can safely ignore it. If you get a large number of occurrences, then something strange is going on. The central pileup files include all lumi sections in the DCSOnly JSON, and a physics analysis should normally be using a subset of this data. Please check to make sure that your input JSON file is sensible and then contact the lumi POG.

You may also occasionally get a warning of the form:

Significant probability density outside of your histogram
Consider using a higher value of --maxPileupBin
Mean 48.639296, RMS 24.106512, Integrated probability 0.961607
Again, if this happens only once or twice, it can be safely ignored (there are sometimes a couple of lumi sections which are badly behaved). If you get a large number of these errors it is probably a sign that the upper edge of your histogram is too low (as the warning suggests) -- try increasing maxPileupBin and numPileupBins.

Location of central pileup files

For Run 2, the latest pileup files can be found as follows. In all cases (except 2011) you should use pileup_latest.txt, which is the most recent version (or a link to it). The JSON is generally updated whenever a new version of the normtag is released, so it should always be in sync with the latest luminosity calibration.

  • 2018: /afs/cern.ch/cms/CAF/CMSCOMM/COMM_DQM/certification/Collisions18/13TeV/PileUp/pileup_latest.txt
  • 2017: /afs/cern.ch/cms/CAF/CMSCOMM/COMM_DQM/certification/Collisions17/13TeV/PileUp/pileup_latest.txt
  • 2016: /afs/cern.ch/cms/CAF/CMSCOMM/COMM_DQM/certification/Collisions16/13TeV/PileUp/pileup_latest.txt
  • 2015: /afs/cern.ch/cms/CAF/CMSCOMM/COMM_DQM/certification/Collisions15/13TeV/PileUp/pileup_latest.txt
The Run 1 pileup files are as follows. More details on the 2012 pileup distribution can be found in PileupRevision2012.
  • 2012: /afs/cern.ch/cms/CAF/CMSCOMM/COMM_DQM/certification/Collisions12/8TeV/PileUp/pileup_latest.txt
  • 2011: /afs/cern.ch/cms/CAF/CMSCOMM/COMM_DQM/certification/Collisions11/7TeV/PileUp/pileup_2011_JSON_pixelLumi.txt

Recommended cross section

The recommended cross-section for Run 2 is 69.2 mb, which confirms the initial 2015 estimate of 69 mb. Please see this HN thread for more details. The uncertainty in this measurement is 4.6%; see PileupSystematicErrors for more information on the systematic uncertainty. Note that there is some inconsistency with the "Pythia default" cross section of 80 mb for 13 TeV.

For Run 1, the recommended cross sections are 68 mb for 7 TeV (2010-2011) and 69.4 mb for 8 TeV (2012). Note that these are also somewhat inconsistent with the Run 2 recommendation.

Pileup for specific HLT paths

In the case where you are considering a HLT path which is prescaled, it is possible to modify the pileup JSON file in order to produce a proper pileup profile given the prescaling in the trigger. This is a two-step process:

  • First, calculate the delivered and recorded luminosity per lumi section for your given trigger paths. This will require a command of the form:
    brilcalc lumi --byls --normtag /cvmfs/cms-bril.cern.ch/cms-lumi-pog/Normtags/normtag_PHYSICS.json -i [your json] --hltpath [your HLT path] -o output.csv
    See BrilcalcQuickStart for more details on using brilcalc. In the more complicated case where you have multiple overlapping triggers or an OR, you will have to figure out the appropriate luminosity yourself and build a csv file like the one you get from the brilcalc output.
  • Second, use the script pileupReCalc_HLTpaths.py to generate a new version of the pileup file. This script takes the recorded luminosity from the file calculated above and places this value in the pileup JSON. This will change the weight of each LumiSection in the pileup distribution where the recorded luminosity is smaller than nominal due to trigger prescales. The syntax is as follows:
    pileupReCalc_HLTpaths.py -i output.csv --inputLumiJSON pileup_latest.txt -o My_HLT_corrected_PileupJSON.txt --runperiod Run2

Note: you may get occasional printouts from the script indicating that the rescaling is larger than one. This may happen where empty bunch crossings still have nonzero luminosity due to afterglow. These bunches are automatically excluded in the pileup_latest.txt file by setting a threshold, but this is not in the per-LS values. The effect is usually below 0.5% and not a problem.

Then, you can use the newly-created My_HLT_corrected_PileupJSON.txt as your new input JSON file for pileupCalc.py as described above.

True and observed pileup in data and MC

In order to do reasonable comparisons between data and MC, it is necessary to understand exactly what the histogram produced by pileupCalc.py means so that it can be compared with the correct quantity in MC. The way that pileup events are generated in MC, given an input pileup distribution, is as follows: first, a value is picked from this distribution, which we call the "true" pileup. This represents the average pileup conditions under which the event is generated. This value is stored in PileupSummaryInfo::getTrueNumInteractions(). Then, the number of pileup events, both for the in-time bunch crossing and for out-of-time bunch crossings, is selected from a Poisson distribution with a mean equal to the "true" pileup. These values are stored in the vector given by PileupSummaryInfo::getPU_NumInteractions().

True calculation mode

With --calcMode true, which is the standard method of running pileupCalc.py, the output histogram contains one entry per lumisection, where the value is equal to the average pileup for that lumisection, as calculated from the bunch instantaneous luminosity and the total inelastic cross section. This should correspond exactly to the value returned by PileupSummaryInfo::getTrueNumInteractions() in the Monte Carlo; no additional information should be required for reweighting if these values are matched in data and Monte Carlo.

Observed calculation mode

With --calcMode observed, instead of simply entering the average value for a lumi section, the script will add a properly-normalized Poisson distribution with a mean equal to the average value. (Warning: this is much slower!) Thus, the output pileup histogram contains the distribution of the number of interactions one would actually observe in individual events. This corresponds to PileupSummaryInfo::getPU_NumInteractions() in the Monte Carlo and would be appropriate for pileup reweighting based on in-time-only distributions. Plots of these distributions are shown later in this page. This distribution could also be compared to the reconstructed number of vertices, although you would also have to convolute with the vertex reconstruction efficiency in order to get correct agreement.

Bunch-by-bunch distributions

On a more technical note, the RMS of the bunch-to-bunch luminosities is used to generate tails on the distributions. The above description is almost correct, except that each LumiSection is actually represented by a Gaussian centered on the mean number of interactions, and having a width corresponding to the RMS of the bunch-by-bunch luminosities, even in true calculation mode. This is necessary because the distribution in data sees all of the colliding bunches, not just the average. The Gaussian is properly convoluted with the Poisson distribution for the observed calculation mode. Two sets of comparison plots using 2011 data are included here. In the first set, the expected mean number of interactions is calculated for every bunch-by-bunch collision in each lumi section and is weighted by the integrated luminosity for each bunch pair in each lumi section (black histogram). This is compared to (in red) the distribution of the expected number of interactions in the pileup JSON calculation using true mode. The seven run ranges were chosen more or less randomly based on their size in the DCSONLY JSON file. The eighth panel shows the number of interactions on a logarithmic scale, including the high-pileup runs at the end of 2011. The agreement is quite good. The second set of plots shows a comparison of the calculated observed distributions in each case, where, in black, the Poisson distribution is calculated for each bunch crossing individually. This corresponds to the old estimatePileup function, and is one of the reasons it takes forever to run. The red histogram is the same calculation, but done once per lumi section using the pileup JSON file. The small differences in the true case are smeared out here by the additional Poisson distributions, resulting in even better agreement.

Mean number of interactions per bunch collision in data compared with pileup JSON calculation in "true" mode: pileup_comparison_true_distr.png

Estimated number of interactions per crossing in data compared with pileup JSON calculation in "observed" mode: pileup_comparison_obs_distribution.png

Creating the pileup files

You don't need to do this yourself -- use the central files provided above! The information here is provided for reference.

First use brilcalc to obtain the luminosity information for the year. Warning: this creates a very large output file!

brilcalc lumi --xing -i json_DCSONLY.txt -b "STABLE BEAMS" --normtag /cvmfs/cms-bril.cern.ch/cms-lumi-pog/Normtags/normtag_PHYSICS.json --xingTr 0.1 -o lumi_DCSONLY.csv

where json_DCSONLY.txt is the DCSOnly JSON for the year in question and the normtag defines the latest luminosity calibrations. For more details on using brilcalc, see BrilcalcQuickStart.

The next step is to use estimatePileup_makeJSON_2015.py which processes this data into a much more manageable format -- for each lumi section, the average instantaneous luminosity over all bunches and the RMS of the individual bunch distribution is stored, so that an approximation of the bunch-by-bunch distribution can be obtained without the need to store the data for every individual bunch. The integrated luminosity per lumi section is also stored so that the results can be correctly normalized. The command to use is simply:

estimatePileup_makeJSON_2015.py --csvInput lumi_DCSONLY.csv pileup_JSON.txt

The resulting JSON file contains one entry per lumisection with four entries as follows:

"322068": [[51,4.0848e+05,5.5906e-05,6.3672e-04],[52,4.1480e+05,5.5152e-05,6.4634e-04],...]
where the four entries in each array are:
  • LS number
  • recorded luminosity (/μb)
  • RMS bunch instantaneous luminosity/orbit frequency (/μb/bunch)
  • average bunch instantaneous luminosity/orbit frequency (/μb/bunch)
Because the pileup μ is related to the instantaneous luminosity by μ = Linst σinel / frev, this means that you just need to multiply the latter two numbers by the cross section (in μb) in order to get the actual pileup and its RMS; storing the number without the cross section included means that it is easy to vary the cross section in pileupCalc.py.

-- MichaelHildreth - 06-Jan-2012

This topic: Sandbox > TWikiUsers > PaulLujan > PaulLujanSandbox
Topic revision: r12 - 2019-06-04 - PaulLujan
This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback