TWiki> CMSPublic Web>SWGuideGenerateEDF (revision 3)EditAttachPDF
Contents:

# Generate EDF Curves

## Introduction

Introduction from Bob Cousins' email:

```I have typically used tests based on the empirical distribution function
(EDF), which in in this case is the integrated number of accumulated
events as a function of accumulated lumi, divided by total number of
events. The "model" to be tested is that this is a straight line from 0
to 1, as integrated lumi goes from 0 to total. The most popular test in
HEP using the EDF is the Kolmogorov-Smirnov test, but others
(Anderson-Darling, etc.) are reputed to be better as omnibus tests
(http://www.jstor.org/stable/2286009 ).
```

The general idea is to see if the yield of any variable grows as a constant factor of integrated luminosity as one expects.

## Installation

=generateEDF.py= is in `FWCore/PythonUtilities/scripts/generateEDF.py`). The script is self-contained (only depends on Python, Root, and PyRoot), so one could either check out the tag `V01-06-07` of `FWCore/PythonUtilities` or, alternatively, simply download the script:

```wget "http://cmssw.cvs.cern.ch/cgi-bin/cmssw.cgi/CMSSW/FWCore/PythonUtilities/scripts/generateEDF.py" -O generateEDF.py
```

## Simple Example

Without worrying about any options, one can run the script with three filenames. This script does come with an extensive `--help` menu.

```cplager@cmslpc17> generateEDF.py 142928-143179.csv ZeeCands_142666_143179.txt Zee.png

```

• `142928-143179.csv` is luminosity information from `lumiCalc.py` (more below)

• `ZeeCands_142666_143179.txt` is a textfile where each line is run number, luminosity section ID, and event number separated by any combination of commas, spaces, tabs, colons, and semi-colons.

• `Zee.png` is the name of the output:

Dstat is the maximum separation of the two curves. PKS is the resulting probability that the observed data is consistent with the expected curve.

If you want to overlay another prediction (e.g., say that the theory cross section is 5% higher than the observed number of events suggests),

## More options

```generateEDF.py 142928-143179.csv ZeeCands_142666_143179.txt Zee_other.png \
--predicted=1.05 --predLabel="Theory Expectation"
```

Note that for the KS test, it is always the observed value that is used (i.e., the probabilities are the same on the two above plots).

Now let's use the script to work through an example Ken Bloom and I just went through.

```generateEDF.py complete_xing.csv run-event-luminosityBlock.txt tagged.png \
--runEventLumi "--title=EDF of Muon-Tagged Jets"
```

(`--runEventLumi` means read the events as run, event, lumi instead of run, lumi, event).

In this case, what we see is not consistent with the expected curve.

In trying to see is happening, we looked at one of the inputs to tagged jets, the number of taggable tracks. In this case, we don't want to only count number of events, but give each event a weight (in this case, the number of taggable tracks per event). Instead of listing:

run, lumi, event

the event text file now contains

run, lumi, event, weight

```generateEDF.py complete_xing.csv run-luminosityBlock-event-n_taggable_fixed_orig.txt taggable_time.png \
"--title=EDF of Number of Taggable Tracks" --weights --ignoreNoLumiEvents
```

• `--weights` tells the script to use fourth column

• `--ignoreNoLumiEvents` says to ignore events where there is no luminosity information. In general, this should not happen, but if one doesn't use the quite-right good luminosity JSON file, this can happen.

In this case, we can see an elbow around 0.3/pb. Rerunning the above command, except saving it as `taggable_time.root` instead of `taggable_time.png`, one can open the canvas in Root and zoom in:

O.k. So it looks like it happens around .257/pb or so. What run does that correspond to?

```cplager@cmslpc17> generateEDF.py complete_xing.csv --runsWithLumis=0.25,0.2525,0.255,0.2575,0.26
(140385, 164) contains total recorded lumi 0.250000
(140387, 65) contains total recorded lumi 0.252500
(140401, 47) contains total recorded lumi 0.255000
(140401, 197) contains total recorded lumi 0.257500
(141956, 225) contains total recorded lumi 0.260000

```

At this point, it looks like something changed around 140401 or 141956. Knowing that several changes were made to the trigger table at 141956 (e.g., changing the primary dataset of the zero bias trigger) suggests that a trigger change could be the culprit.

-- CharlesPlager - 14-Sep-2010

Topic attachments
I Attachment History Action Size Date Who Comment
csv 142928-143179.csv r1 manage 153.6 K 2010-09-14 - 20:18 CharlesPlager Lumi CSV File
png Zee.png r1 manage 14.7 K 2010-09-14 - 20:20 CharlesPlager EDF Curve
png Zee_other.png r1 manage 15.1 K 2010-09-14 - 20:20 CharlesPlager EDF Curve
txt ZeeCands_142666_143179.txt r1 manage 24.6 K 2010-09-14 - 20:18 CharlesPlager List of Z to ee Candidates
Edit | Attach | Watch | Print version |  | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r3 - 2010-09-15 - CharlesPlager

Create a LeftBar

 Cern Search TWiki Search Google Search CMSPublic All webs
Copyright &© 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback