Generate EDF Curves

Introduction

Introduction from Bob Cousins' email:

I have typically used tests based on the empirical distribution function
(EDF), which in in this case is the integrated number of accumulated
events as a function of accumulated lumi, divided by total number of
events. The "model" to be tested is that this is a straight line from 0
to 1, as integrated lumi goes from 0 to total. The most popular test in
HEP using the EDF is the Kolmogorov-Smirnov test, but others
(Anderson-Darling, etc.) are reputed to be better as omnibus tests 
(http://www.jstor.org/stable/2286009 ).

The general idea is to see if the yield of any variable grows as a constant factor of integrated luminosity as one expects.

Installation

=generateEDF.py= is in FWCore/PythonUtilities/scripts/generateEDF.py). The script is self-contained (only depends on Python, Root, and PyRoot), so one could either check out the tag V01-06-07 of FWCore/PythonUtilities or, alternatively, simply download the script:

wget "http://cmssw.cvs.cern.ch/cgi-bin/cmssw.cgi/CMSSW/FWCore/PythonUtilities/scripts/generateEDF.py" -O generateEDF.py

Simple Time-Ordered Example

Time-ordered simply means data taken earlier is listed first, later data is taken last (i.e. sorted by run number).

Without worrying about any options, one can run the script with three filenames. This script does come with an extensive --help menu.

cplager@cmslpc17> generateEDF.py 142928-143179.csv ZeeCands_142666_143179.txt Zee.png
loading luminosity information from '142928-143179.csv'.
loading events from 'ZeeCands_142666_143179.txt'

  • 142928-143179.csv is luminosity information from lumiCalc.py (more below)

  • ZeeCands_142666_143179.txt is a textfile where each line is run number, luminosity section ID, and event number separated by any combination of commas, spaces, tabs, colons, and semi-colons.

  • Zee.png is the name of the output. If you want to be able to zoom in, etc., use a =.root. extention (e.g., Zee.root).

Zee.png

Dstat is the maximum separation of the two curves. PKS is the resulting probability that the observed data is consistent with the expected curve.

An example with higher statistics looks like this:

wcand_time.png

Both of these examples do not show any problem.

Time-Ordered Example with Weights

the event text file now contains

run, lumi, event, weight

generateEDF.py complete_xing.csv run-luminosityBlock-event-n_taggable_fixed_orig.txt taggable_time.png \
  "--title=EDF of Number of Taggable Tracks" --weights --ignoreNoLumiEvents

  • --weights tells the script to use fourth column

  • --ignoreNoLumiEvents says to ignore events where there is no luminosity information. In general, this should not happen, but if one doesn't use the quite-right good luminosity JSON file, this can happen.

taggable_time.png

You'll notice several kinks in the curve, suggesting problems. See below to see how one can track down exactly when (i.e., which runs and lumi sections) this has happened.

Instantaneous Luminosity-Ordered Example

Most analyses are concerned with verifying if they are affected by "pileup" or not. A very simple way to check is to make an EDF plot, but to order the data from smallest to greatest _Average Crossing Instantaneous Luminosity_ (or AXIL for short) instead of ordering the data from smallest run number to biggest run number.

Why AXIL? AXIL is the quantity that is directly correlated with pileup:

  • Assuming a minimum bias cross section 70.3 mb, we expect 1 pileup event for an AXIL of 0.16 μb-1/s.

To do this, we need a lumi CSV file that has been generated using the lumibylsXing option instead of lumibyls option. Note: Lumi CSV files generated with lumibylsXing option will work to generate both time-ordered and AXIL-ordered EDF plots, so there is no need to generate both types of CSV files.

generateEDF.py lumiInfo.csv Wenu_Cands_Plager.txt wcand_inst.png \
  --ignoreNoLumiEvents --edfMode=instLum --noWarnings

  • --edfMode=instLum tells script to use AXIL ordering

  • --ignoreNoLumiEvents and --noWarnings tells script to skip events where there isn't luminosity information (a small percentage of lumi sections does not have AXIL information).

wcand_inst_S.png

In this case, we can see that the red line lies over the black points well (and the KS test has a reasonable value). But for many who are used to looking at the time-ordered EDF plots, these S shaped plots can look a little weird.

generateEDF.py lumiInfo.csv Wenu_Cands_Plager.txt wcand_inst.png \
   --ignoreNoLumiEvents --edfMode=instIntLum --noWarnings

  • --edfMode=instIntLum tells script to use AXIL ordering, but to use integrated luminosity as the X axis

wcand_instInt.png

Both of these plots (which have exactly the same KS distance and probability by construction) show that W → e ν does not suffer from pileup effects.

To show an extreme example, we use the number of reconstructed verticies as a weight. This is probably as extreme as an example as we are likely to find since number of reconstructed primary verticies is what many people use to measure pileup.

generateEDF.py lumiInfo.csv Wenu_Cands_Plager.txt wcand_inst_S_weights.png \
  --ignore --edfMode=instLum --noWarnings --weights

wcand_instInt_Weights.png

generateEDF.py lumiInfo.csv Wenu_Cands_Plager.txt wcand_inst_weights.png \
  --ignore --edfMode=instIntLum --noWarnings --weights

wcand_inst_Weights.png

Both these plots show a KS probability of 0. One might have expected a larger discrepency, but at the maximum AXIL we have so far, we only expect, on average, 1.4 pileup events.

More options

Printing Run, Lumi Information Corresponding to Integrated Luminosity

In the example above, we can see an elbow around 0.3/pb. Rerunning the above command, except saving it as taggable_time.root instead of taggable_time.png, one can open the canvas in Root and zoom in:

zoomed.png

O.k. So it looks like it happens around .257/pb or so. What run does that correspond to?

cplager@cmslpc17> generateEDF.py complete_xing.csv --runsWithLumis=0.25,0.2525,0.255,0.2575,0.26
(140385, 164) contains total recorded lumi 0.250000
(140387, 65) contains total recorded lumi 0.252500
(140401, 47) contains total recorded lumi 0.255000
(140401, 197) contains total recorded lumi 0.257500
(141956, 225) contains total recorded lumi 0.260000

At this point, it looks like something changed around 140401 or 141956. Knowing that several changes were made to the trigger table at 141956 (e.g., changing the primary dataset of the zero bias trigger) suggests that a trigger change could be the culprit.

Overlay Additional Prediction

If you want to overlay another prediction (e.g., say that the theory cross section is 5% higher than the observed number of events suggests),

generateEDF.py 142928-143179.csv ZeeCands_142666_143179.txt Zee_other.png \
--predicted=1.05 --predLabel="Theory Expectation"

Zee_other.png

Note that for the KS test, it is always the observed value that is used (i.e., the probabilities are the same on the two above plots).

Now let's use the script to work through an example Ken Bloom and I just went through.

generateEDF.py complete_xing.csv run-event-luminosityBlock.txt tagged.png \
  --runEventLumi "--title=EDF of Muon-Tagged Jets"

(--runEventLumi means read the events as run, event, lumi instead of run, lumi, event).


In this case, what we see is not consistent with the expected curve.

In trying to see is happening, we looked at one of the inputs to tagged jets, the number of taggable tracks. In this case, we don't want to only count number of events, but give each event a weight (in this case, the number of taggable tracks per event). Instead of listing:

run, lumi, event

Generate Lumi CSV Files

For full documentation, please visit Lumi Calculation Twiki.

To generate a CSV file without AXIL information:

lumiCalc.py -i Cert_142928-143179.txt -o lumiInfo.csv lumibyls

To generate a CSV file with AXIL information:

lumiCalc.py -i Cert_142928-143179.txt -o lumiInfo_xing.csv lumibylsXing

Note that the latter file lumiInfo_xing.csv can be used for both time-ordered and AXIL ordered EDF plots.

-- CharlesPlager - 14-Sep-2010

Topic attachments
I Attachment History Action Size DateSorted ascending Who Comment
Unknown file formatcsv 142928-143179.csv r1 manage 153.6 K 2010-09-14 - 20:18 CharlesPlager Lumi CSV File
PNGpng Zee.png r1 manage 14.7 K 2010-09-14 - 20:20 CharlesPlager EDF Curve
Texttxt ZeeCands_142666_143179.txt r1 manage 24.6 K 2010-09-14 - 20:18 CharlesPlager List of Z to ee Candidates
PNGpng Zee_other.png r1 manage 15.1 K 2010-09-14 - 20:20 CharlesPlager EDF Curve
PNGpng taggable_time.png r1 manage 15.5 K 2010-09-20 - 03:23 CharlesPlager EDF
PNGpng wcand_instInt.png r1 manage 18.0 K 2010-09-20 - 02:44 CharlesPlager EDF
PNGpng wcand_instInt_Weights.png r1 manage 18.2 K 2010-09-20 - 02:45 CharlesPlager EDF
PNGpng wcand_inst_S.png r1 manage 14.6 K 2010-09-20 - 02:44 CharlesPlager EDF
PNGpng wcand_inst_Weights.png r1 manage 18.6 K 2010-09-20 - 02:46 CharlesPlager EDF
PNGpng wcand_time.png r1 manage 17.1 K 2010-09-20 - 03:32 CharlesPlager EDF
PNGpng zoomed.png r1 manage 17.7 K 2010-09-20 - 03:23 CharlesPlager EDF

This topic: CMSPublic > SWGuideGenerateEDF
Topic revision: r4 - 2010-09-20 - CharlesPlager
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback