TWiki> CMSPublic Web>SWGuideGenerateEDF (revision 3)EditAttachPDF

This page is under construction.

Generate EDF Curves

Introduction

Introduction from Bob Cousins' email:

I have typically used tests based on the empirical distribution function
(EDF), which in in this case is the integrated number of accumulated
events as a function of accumulated lumi, divided by total number of
events. The "model" to be tested is that this is a straight line from 0
to 1, as integrated lumi goes from 0 to total. The most popular test in
HEP using the EDF is the Kolmogorov-Smirnov test, but others
(Anderson-Darling, etc.) are reputed to be better as omnibus tests 
(http://www.jstor.org/stable/2286009 ).

The general idea is to see if the yield of any variable grows as a constant factor of integrated luminosity as one expects.

Installation

=generateEDF.py= is in FWCore/PythonUtilities/scripts/generateEDF.py). The script is self-contained (only depends on Python, Root, and PyRoot), so one could either check out the tag V01-06-07 of FWCore/PythonUtilities or, alternatively, simply download the script:

wget "http://cmssw.cvs.cern.ch/cgi-bin/cmssw.cgi/CMSSW/FWCore/PythonUtilities/scripts/generateEDF.py" -O generateEDF.py

Simple Example

Without worrying about any options, one can run the script with three filenames. This script does come with an extensive --help menu.

cplager@cmslpc17> generateEDF.py 142928-143179.csv ZeeCands_142666_143179.txt Zee.png
loading luminosity information from '142928-143179.csv'.
loading events from 'ZeeCands_142666_143179.txt'

  • 142928-143179.csv is luminosity information from lumiCalc.py (more below)

  • ZeeCands_142666_143179.txt is a textfile where each line is run number, luminosity section ID, and event number separated by any combination of commas, spaces, tabs, colons, and semi-colons.

  • Zee.png is the name of the output:

Zee.png

Dstat is the maximum separation of the two curves. PKS is the resulting probability that the observed data is consistent with the expected curve.

If you want to overlay another prediction (e.g., say that the theory cross section is 5% higher than the observed number of events suggests),

More options

generateEDF.py 142928-143179.csv ZeeCands_142666_143179.txt Zee_other.png \
--predicted=1.05 --predLabel="Theory Expectation"

Zee_other.png

Note that for the KS test, it is always the observed value that is used (i.e., the probabilities are the same on the two above plots).

Now let's use the script to work through an example Ken Bloom and I just went through.

generateEDF.py complete_xing.csv run-event-luminosityBlock.txt tagged.png \
  --runEventLumi "--title=EDF of Muon-Tagged Jets"

(--runEventLumi means read the events as run, event, lumi instead of run, lumi, event).


In this case, what we see is not consistent with the expected curve.

In trying to see is happening, we looked at one of the inputs to tagged jets, the number of taggable tracks. In this case, we don't want to only count number of events, but give each event a weight (in this case, the number of taggable tracks per event). Instead of listing:

run, lumi, event

the event text file now contains

run, lumi, event, weight

generateEDF.py complete_xing.csv run-luminosityBlock-event-n_taggable_fixed_orig.txt taggable_time.png \
  "--title=EDF of Number of Taggable Tracks" --weights --ignoreNoLumiEvents

  • --weights tells the script to use fourth column

  • --ignoreNoLumiEvents says to ignore events where there is no luminosity information. In general, this should not happen, but if one doesn't use the quite-right good luminosity JSON file, this can happen.


In this case, we can see an elbow around 0.3/pb. Rerunning the above command, except saving it as taggable_time.root instead of taggable_time.png, one can open the canvas in Root and zoom in:


O.k. So it looks like it happens around .257/pb or so. What run does that correspond to?

cplager@cmslpc17> generateEDF.py complete_xing.csv --runsWithLumis=0.25,0.2525,0.255,0.2575,0.26
(140385, 164) contains total recorded lumi 0.250000
(140387, 65) contains total recorded lumi 0.252500
(140401, 47) contains total recorded lumi 0.255000
(140401, 197) contains total recorded lumi 0.257500
(141956, 225) contains total recorded lumi 0.260000

At this point, it looks like something changed around 140401 or 141956. Knowing that several changes were made to the trigger table at 141956 (e.g., changing the primary dataset of the zero bias trigger) suggests that a trigger change could be the culprit.

-- CharlesPlager - 14-Sep-2010

Topic attachments
I Attachment History Action Size Date Who Comment
Unknown file formatcsv 142928-143179.csv r1 manage 153.6 K 2010-09-14 - 20:18 CharlesPlager Lumi CSV File
PNGpng Zee.png r1 manage 14.7 K 2010-09-14 - 20:20 CharlesPlager EDF Curve
Texttxt ZeeCands_142666_143179.txt r1 manage 24.6 K 2010-09-14 - 20:18 CharlesPlager List of Z to ee Candidates
PNGpng Zee_other.png r1 manage 15.1 K 2010-09-14 - 20:20 CharlesPlager EDF Curve
Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r3 - 2010-09-15 - CharlesPlager
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback