Generate EDF Curves

Introduction

Introduction from Bob Cousins' email:

```I have typically used tests based on the empirical distribution function
(EDF), which in in this case is the integrated number of accumulated
events as a function of accumulated lumi, divided by total number of
events. The "model" to be tested is that this is a straight line from 0
to 1, as integrated lumi goes from 0 to total. The most popular test in
HEP using the EDF is the Kolmogorov-Smirnov test, but others
(Anderson-Darling, etc.) are reputed to be better as omnibus tests
(http://www.jstor.org/stable/2286009 ).
```

The general idea is to see if the yield of any variable grows as a constant factor of integrated luminosity as one expects.

Installation

=generateEDF.py= is in `FWCore/PythonUtilities/scripts/generateEDF.py`). The script is self-contained (only depends on Python, Root, and PyRoot), so one could either check out the tag `V01-06-07` of `FWCore/PythonUtilities` or, alternatively, simply download the script:

```wget "http://cmssw.cvs.cern.ch/cgi-bin/cmssw.cgi/CMSSW/FWCore/PythonUtilities/scripts/generateEDF.py" -O generateEDF.py
```

Simple Time-Ordered Example

Time-ordered simply means data taken earlier is listed first, later data is taken last (i.e. sorted by run number).

Without worrying about any options, one can run the script with three filenames. This script does come with an extensive `--help` menu.

```cplager@cmslpc17> generateEDF.py 142928-143179.csv ZeeCands_142666_143179.txt Zee.png
```

• `142928-143179.csv` is luminosity information from `lumiCalc.py` (more below)

• `ZeeCands_142666_143179.txt` is a textfile where each line is run number, luminosity section ID, and event number separated by any combination of commas, spaces, tabs, colons, and semi-colons.

• `Zee.png` is the name of the output. If you want to be able to zoom in, etc., use a =.root. extention (e.g., Zee.root).

Dstat is the maximum separation of the two curves. PKS is the resulting probability that the observed data is consistent with the expected curve.

An example with higher statistics looks like this:

Both of these examples do not show any problem.

Time-Ordered Example with Weights

the event text file now contains

run, lumi, event, weight

```generateEDF.py complete_xing.csv run-luminosityBlock-event-n_taggable_fixed_orig.txt taggable_time.png \
"--title=EDF of Number of Taggable Tracks" --weights --ignoreNoLumiEvents
```

• `--weights` tells the script to use fourth column

• `--ignoreNoLumiEvents` says to ignore events where there is no luminosity information. In general, this should not happen, but if one doesn't use the quite-right good luminosity JSON file, this can happen.

You'll notice several kinks in the curve, suggesting problems. See below to see how one can track down exactly when (i.e., which runs and lumi sections) this has happened.

Instantaneous Luminosity-Ordered Example

Most analyses are concerned with verifying if they are affected by "pileup" or not. A very simple way to check is to make an EDF plot, but to order the data from smallest to greatest _Average Crossing Instantaneous Luminosity_ (or AXIL for short) instead of ordering the data from smallest run number to biggest run number.

Why AXIL? AXIL is the quantity that is directly correlated with pileup:

• Assuming a minimum bias cross section 70.3 mb, we expect 1 pileup event for an AXIL of 0.16 μb-1/s.

To do this, we need a lumi CSV file that has been generated using the `lumibylsXing` option instead of `lumibyls` option. Note: Lumi CSV files generated with `lumibylsXing` option will work to generate both time-ordered and AXIL-ordered EDF plots, so there is no need to generate both types of CSV files.

```generateEDF.py lumiInfo.csv Wenu_Cands_Plager.txt wcand_inst.png \
--ignoreNoLumiEvents --edfMode=instLum --noWarnings
```

• `--edfMode=instLum` tells script to use AXIL ordering

• `--ignoreNoLumiEvents` and `--noWarnings` tells script to skip events where there isn't luminosity information (a small percentage of lumi sections does not have AXIL information).

In this case, we can see that the red line lies over the black points well (and the KS test has a reasonable value). But for many who are used to looking at the time-ordered EDF plots, these S shaped plots can look a little weird.

```generateEDF.py lumiInfo.csv Wenu_Cands_Plager.txt wcand_inst.png \
--ignoreNoLumiEvents --edfMode=instIntLum --noWarnings
```

• `--edfMode=instIntLum` tells script to use AXIL ordering, but to use integrated luminosity as the X axis

Both of these plots (which have exactly the same KS distance and probability by construction) show that W → e ν does not suffer from pileup effects.

To show an extreme example, we use the number of reconstructed verticies as a weight. This is probably as extreme as an example as we are likely to find since number of reconstructed primary verticies is what many people use to measure pileup.

```generateEDF.py lumiInfo.csv Wenu_Cands_Plager.txt wcand_inst_S_weights.png \
--ignore --edfMode=instLum --noWarnings --weights
```

```generateEDF.py lumiInfo.csv Wenu_Cands_Plager.txt wcand_inst_weights.png \
--ignore --edfMode=instIntLum --noWarnings --weights
```

Both these plots show a KS probability of 0. One might have expected a larger discrepency, but at the maximum AXIL we have so far, we only expect, on average, 1.4 pileup events.

More options

Printing Run, Lumi Information Corresponding to Integrated Luminosity

In the example above, we can see an elbow around 0.3/pb. Rerunning the above command, except saving it as `taggable_time.root` instead of `taggable_time.png`, one can open the canvas in Root and zoom in:

O.k. So it looks like it happens around .257/pb or so. What run does that correspond to?

```cplager@cmslpc17> generateEDF.py complete_xing.csv --runsWithLumis=0.25,0.2525,0.255,0.2575,0.26
(140385, 164) contains total recorded lumi 0.250000
(140387, 65) contains total recorded lumi 0.252500
(140401, 47) contains total recorded lumi 0.255000
(140401, 197) contains total recorded lumi 0.257500
(141956, 225) contains total recorded lumi 0.260000

```

At this point, it looks like something changed around 140401 or 141956. Knowing that several changes were made to the trigger table at 141956 (e.g., changing the primary dataset of the zero bias trigger) suggests that a trigger change could be the culprit.

If you want to overlay another prediction (e.g., say that the theory cross section is 5% higher than the observed number of events suggests),

```generateEDF.py 142928-143179.csv ZeeCands_142666_143179.txt Zee_other.png \
--predicted=1.05 --predLabel="Theory Expectation"
```

Note that for the KS test, it is always the observed value that is used (i.e., the probabilities are the same on the two above plots).

Now let's use the script to work through an example Ken Bloom and I just went through.

```generateEDF.py complete_xing.csv run-event-luminosityBlock.txt tagged.png \
--runEventLumi "--title=EDF of Muon-Tagged Jets"
```

(`--runEventLumi` means read the events as run, event, lumi instead of run, lumi, event).

In this case, what we see is not consistent with the expected curve.

In trying to see is happening, we looked at one of the inputs to tagged jets, the number of taggable tracks. In this case, we don't want to only count number of events, but give each event a weight (in this case, the number of taggable tracks per event). Instead of listing:

run, lumi, event

Generate Lumi CSV Files

For full documentation, please visit Lumi Calculation Twiki.

To generate a CSV file without AXIL information:

```lumiCalc.py -i Cert_142928-143179.txt -o lumiInfo.csv lumibyls
```

To generate a CSV file with AXIL information:

```lumiCalc.py -i Cert_142928-143179.txt -o lumiInfo_xing.csv lumibylsXing
```

Note that the latter file `lumiInfo_xing.csv` can be used for both time-ordered and AXIL ordered EDF plots.

-- CharlesPlager - 14-Sep-2010

Topic attachments
I Attachment History Action Size Date Who Comment
csv 142928-143179.csv r1 manage 153.6 K 2010-09-14 - 20:18 CharlesPlager Lumi CSV File
png Zee.png r1 manage 14.7 K 2010-09-14 - 20:20 CharlesPlager EDF Curve
txt ZeeCands_142666_143179.txt r1 manage 24.6 K 2010-09-14 - 20:18 CharlesPlager List of Z to ee Candidates
png Zee_other.png r1 manage 15.1 K 2010-09-14 - 20:20 CharlesPlager EDF Curve
png taggable_time.png r1 manage 15.5 K 2010-09-20 - 03:23 CharlesPlager EDF
png wcand_instInt.png r1 manage 18.0 K 2010-09-20 - 02:44 CharlesPlager EDF
png wcand_instInt_Weights.png r1 manage 18.2 K 2010-09-20 - 02:45 CharlesPlager EDF
png wcand_inst_S.png r1 manage 14.6 K 2010-09-20 - 02:44 CharlesPlager EDF
png wcand_inst_Weights.png r1 manage 18.6 K 2010-09-20 - 02:46 CharlesPlager EDF
png wcand_time.png r1 manage 17.1 K 2010-09-20 - 03:32 CharlesPlager EDF
png zoomed.png r1 manage 17.7 K 2010-09-20 - 03:23 CharlesPlager EDF
Edit | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | More topic actions
Topic revision: r4 - 2010-09-20 - CharlesPlager

Create a LeftBar

 Cern Search TWiki Search Google Search CMSPublic All webs
Copyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback