2013 HATS Pileup Exercises


The goal of this exercise is to introduce the tools that are used for studying pileup and understanding the information used in calculating the pileup distributions and passing them along to the Collaboration. Advanced tools, like calculating pileup for prescaled triggers, will also be explored.

The Exercise is divided into a number of Steps. The color scheme of the Exercise is as follows:

  • Commands will be embedded in a grey box, e.g.
    ls -ltr
  • Configuration files (full or snippets) will be embedded in a blue box, e.g.
    datasetpath            = /Jet/Run2010B-Nov4ReReco_v1/AOD
  • ROOT macros and codes will be embedded in a pink box, e.g.
      const TString var 	= "ST";
  • Output and screen printouts will be embedded in a green box, e.g.
     Number of events with ST > 2000 and N>= 3 is 10099. Acceptance = 0.842075
  • Important messages will be highlighted in red

Other Useful Information

The basics of what the pileup distributions look like in Data and MC, and how the distributions in MC are generated are available at the PileupTwiki linked from the 2011 PVT and this link to the 2012 PPD web pages. A tutorial for how to do pileup reweighting can be found in the CMSDAS pages (click here), since it has been presented as a Short Exercise since 2012.

Getting Started

First, a set of introductory slides can be found (linked to this TWiki page) here:

Next, we are going to examine what goes in to the pileup calculation. What you need:

  • You should set up a CMSSSW_5_3_11_patch4 working area. Follow these steps once you have properly set up your CMS environment variables:
cmsrel CMSSW_5_3_11_patch4
cd CMSSW_5_3_11_patch4/src
(need to use Git now:) git cms-addpkg RecoLuminosity/LumiDB
(compile!) scram b 
This package contains all of the luminosity query and pileup calculation scripts. If you haven't set up Git yet, you should take the time to do that. Check out the links here. Particularly important is setting up your ssh keys so that you can check out packages from the repository.
  • A JSON file with good luminosity sections from the 2012 running:

Calculating Luminosity for Pileup Work

You may already be familiar with the process of calculating the luminosity covered by your analysis. Here, we are going deeper into the luminosity information because we need more details in order to calculate the pileup. This tutorial leads you through the steps outlined on the TWiki page describing how the luminosity information is used for the pileup calculation, which can be found here. Along the way, we'll take a look at some interesting features.

As an input to the pileup calculation, the instantaneous bunch-by-bunch luminosities and the delivered and recorded luminosity for each LumiSection are necessary. These are obtained by doing a Luminosity DataBase query using the script lumiCalc2.py provided in RecoLuminosity/LumiDB which you just dowloaded. The bunch-by-bunch luminosities are required here, because it is the luminosity of each collision that determines how many additional interactions can occur each time the beams cross. Note that, at present, the luminosity can only be provided bunch-by-bunch by the HF detector. While it is fast, it has known non-linearities and aging propensities that need to be monitored and corrected. We will address this later in this exercise.

The luminosity database query can look like:

lumiCalc2.py lumibylsXing --xingMinLum 0.1 -b stable -i Cert_190456-191202_8TeV_22Jan2013ReReco_Collisions12_JSON_MuonPhys.txt -o lumi_2012a.csv

Giving the script the argument lumibylsXing causes it to produce a (very large) .csv file containing all of the needed luminosity info for each bunch for each LumiSection, including the bunch-by-bunch luminosities. You should never do this for calculating the luminosity for a given analysis. You don't need the luminosity information at this level of detail, and it places a huge burden on the luminosity database.

Note the parameter --xingMinLum: this determines which bunch crossings will be ignored because their instantaneous luminosity is too low. The units here are 10^30/cm^2/s. Here, we have set it to a low value just to see what sort of junk is lurking in the luminosity measurements.

Another note: during data collection, the default is to calculate the pileup using the DCSONLY set of JSON files. This is, by definition, a super-set of the luminosity sections available for certification. That way, pileup information is available for any possible luminosity section with real integrated luminosity.

So, you could run the script using the command given above, but don't. (This takes a while, even for this small JSON file. Don't even think of doing this for the whole dataset.) Just copy the output file lumi_2012a.csv from /uscms_data/d2/mikeh/work/HATS/CMSSW_5_3_11_patch4/src or from ~mikeh/public on lxplus to your working area. (Or just use the full file path in the commands below.)

Calculating Luminosity Distributions for Pileup

Next, we need to make a JSON format file containing the appropriate pileup information for each LumiSection. This step is usually done centrally for all of CMS and need only be done once unless either the luminosity corrections (the pixel luminosity is added, for example) or the list of LumiSections in the DCSONLY sample (never has happened) changes. A few words on the necessary information that must be stored in the JSON file. We must have a way of calculating the number of expected pileup interactions, so we need the single-collision instantaneous luminosity averaged over all of the bunches colliding during each LumiSection. In order to normalise between LumiSections, we also need the total integrated luminosity for a given LumiSection. Finally, we need some idea of the spread of the individual bunch luminosities so that we can reproduce the tails of the distribution. So, for each LumiSection, we also store the rms of the individual bunch-to-bunch luminosities.

The production of the pileup JSON file is done by a script called estimatePileup_makeJSON.py, also found in RecoLuminosity/LumiDB. The command looks like:

estimatePileup_makeJSON.py --csvInput lumi_2012a.csv  pileup_JSON.txt

The pileup JSON file contains one entry for each LumiSection with four values (LS, integrated luminosity (/ub), rms bunch luminosity (/ub/Xing), average bunch instantaneous luminosity (/ub/Xing)), which makes it a relatively large file. Note that NO assumption is made as to the minbias cross section here, allowing for flexibility later. A sample entry for a random run looks like this:

"160577": [[14,1.2182e+01,2.2707e-06,6.7377e-05],[15,3.3884e+01,2.4204e-06,6.7367e-05],...]

Run the script as listed above. Have a look at the pileup_JSON.txt file, just to see what is in it. You have just been through the process of making the pileup JSON file that all of the analysts use for pileup reweighting.

Calculating Pileup Distributions

For the user, the important issue is how to access the pileup information you just calculated. All that is required is a file in standard JSON format listing the LumiSections included in a user's analysis (so, we will re-use Cert_190456-191202_8TeV_22Jan2013ReReco_Collisions12_JSON_MuonPhys.txt).. Then, a script called pileupCalc.py in RecoLuminosity/LumiDB can be used to create the histogram of the pileup distribution corresponding exactly to the LumiSections used in the analysis. This is the input needed for the pileup reweighting tools.

First, run this calculation:

pileupCalc.py -i Cert_190456-191202_8TeV_22Jan2013ReReco_Collisions12_JSON_MuonPhys.txt --inputLumiJSON pileup_JSON.txt --calcMode true --minBiasXsec 69400 --maxPileupBin 50 --numPileupBins 500  MyDataPileupHistogram.root

The following explanations of the use of this script have been excerpted from the Pileup TWiki:

Begin Excerpt
There are a number of important arguments and options on display here. (All of the arguments can be seen by typing pileupCalc.py --help.)

  • The only required option is "--calcMode" which tells the script which distribution you are making. The two choices are "true" and "observed". Some have found this nomenclature confusing, so here is another attempt at an explanation. Given a total inelastic cross section, the average bunch instantaneous luminosity can be directly converted into the expected number of interactions per crossing for this LumiSection. Selecting the "true" mode puts this value, and only this value, into the pileup histogram. Hence, in this case the pileup histogram contains the distribution of the mean number of interactions per crossing, which would correspond exactly to the value returned by PileupSummaryInfo::getTrueNumInteractions() in the Monte Carlo. Since this is the mean value of the poisson distribution from which the number of interactions in- and out-of-time are generated, no additional information should be required for reweighting if these values are matched in data and Monte Carlo. On the other hand, selecting the "observed" mode causes the script to enter in the histogram a properly-normalized poisson distribution with a mean corresponding to the expected number of interactions per crossing for each LumiSection. Given an expected mean number of interactions, the pileup histogram contains the distribution of the number of interactions one would actually observe given a poisson of that mean. So, this distribution is what one would see if one counted the number of events seen in a given beam crossing (by looking at the number of vertices in data, for example), or using something like PileupSummaryInfo::getPU_NumInteractions() in the Monte Carlo. This would be appropriate for pileup reweighting based on in-time-only distributions. Plots of these distributions are shown later in this page.

  • The total inelastic cross section is an input argument; the default is set to 73500 ub, but it can and should be modified by the "--minBiasXsec" option to set it to the approved value of 68000 (for 2011) or 69300 (for 2012). Note that this command option makes shifting the target distribution for reweighting as simple as regenerating another histogram with a different cross section; all issues with shifting poisson distributions and the like are automatically done correctly. This should make computation of systematic errors much simpler.
  • The user also has complete control over the binning of the output histogram. For the "true" mode, some users have found that having many bins improves the 3D pileup reweighting technique, so this can be varied. For the "observed" mode, since the number of interactions is an integer, it makes sense to have the same number of bins as there are interactions, hence the 50 and 50 in the example above. Smaller bins are allowed in this mode and are properly calculated, however.

On a more technical note, the rms of the bunch-to-bunch luminosities is used to generate tails on the distributions. Each LumiSection is actually represented by a gaussian centered on the mean number of interactions, and having a width corresponding to the rms of the bunch-by-bunch luminosities, even in "true" calculation mode. This is necessary because the distribution in data sees all of the colliding bunches, not just the average. The gaussian is properly convoluted with the poisson distribution for the "observed" calculation mode.

End Excerpt

Now, have a look at the pileup plot in MyDataPileupHistogram.root:

root -l MyDataPileupHistogram.root

What does it look like? Compare it to this histogram, representing a "correct" calculation:

You should see that the distribution you made is substantially shifted to lower luminosity values. This is due to empty bunches which report non-zero luminosity values due to what is called the "afterglow" effect in HF. Basically, light levels in HF do not go instantaneously to zero once it has been hit by all of the interactions in a bunch crossing. So, unless you cut this afterglow away, you end up with a completely strange estimate of pileup, since the "afterglow" bunches have only a tiny fraction of the luminosity of the colliding bunches. If we had some means of applying a bunch mask based on the collision pattern, we could avoid this altogether, but we don't. It's an interesting effect, actually.

(NOTE: --xingMinLum 0.3 is required for "normal" 2012 running to remove all of the afterglow effects on non-filled bunches. .)

We can get back to a "normal" distribution by hacking the estimatePileup_makeJSON.py script:

cd RecoLuminosity/LumiDB/scripts
(use your favorite editor) estimatePileup_makeJSON.py
You will find on line 55 and line 67 things that look like
xingInstLumi > 0.1
Change these to
xingInstLumi > 0.3
then save and recompile (scram b). Now try running estimatePileup_makeJSON.py and pileupCalc.py again with this modified setting, and look at the resulting pileup distribution. This is a "correct" distribution. (The value of 0.3 was chosen by looking at the luminosity values of each of the bunches and putting a cut that got rid of all of the crazy low values. You can do this study yourself by hacking estimatePileup_makeJSON.py to histogram the luminosity values for each bunch.)

Applying External Corrections for Pileup Calculation

As you all know, the HF is sufficiently non-linear and unstable that alternate methods for measuring the luminosity have been developed. One of these is the Pixel Luminosity determination, derived from counting the occupancy in the pixel tracker for each luminosity section. Clearly, this cannot be used for a bunch-to-bunch measurement of luminosity, or pileup. But, it can give a relative correction by comparing the integrated luminosity for HF vs. the Pixel calculation in each lumi section. Since that is the granularity of our information anyway, we can't really hope to do much better than that.

This mode of operation is exactly the same as accounting for the trigger prescales when calculating the pileup information.

MC Studies

Reweighting: Conclusion

This has been a streamlined version of the reweighting, focused on 2012 samples and the techniques that apply there. This should be the most useful thing moving forward, since all future samples will contain the same information as the 2012 ones. Many, many more details are are available in the official PileupTwiki pages.

For an extra project, you could switch the reweighting target to the Bjet dataset (BjetFile_Pileup.root) and compare the reweighted MC reconstructed vertex distribution with what is observed in the data. This isn't exactly fair, since you are reweighting a pure Drell Yan dataset to something selected with a Bjet selection, but it might give you some idea of what sort of comparison you can make.

-- MichaelHildreth - 06-Aug-2013

Topic attachments
I Attachment History Action Size Date Who Comment
Texttxt Cert_190456-191202_8TeV_22Jan2013ReReco_Collisions12_JSON_MuonPhys.txt r1 manage 0.8 K 2013-08-07 - 14:55 MichaelHildreth Golden Muon JSON file for beginning of 2012
Edit | Attach | Watch | Print version | History: r7 < r6 < r5 < r4 < r3 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r4 - 2013-08-07 - MichaelHildreth
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback