2012 CMSDAS Photon HLT Short Exercise

The purpose of this exercise is to familiarize you a bit with the HLT photon triggers, how they shape the data we analyze and how those effects manifest in the reconstructed photon objects we use offline.

To do this we will employ some simple tools -- a small analyzer to pull out photon and trigger information from the CMS data and drop them into a simple root tree, and a few macros to make histograms and manipulate them a bit. In general the analysis philosophy here is sort of a "grab & go" approach: To get what data we need and detach ourselves from CMSSW as soon as possible, allowing more easy work on our laptop/desktop, whatever we may have ROOT installed on. For this exercise you don't particularly need to even know what CMSSW is, but some familiarity with root will be helpful -- we'll try not to do anything you won't find here...

We're going to attempt two tasks in this exercise:

  • Try to find at which offline transverse momentum a particular single photon trigger becomes fully efficient.
  • Look a bit at offline effects of introducing isolation requirements on a trigger

Lets get at it then:

The "Grab & Go" Part

This part is somewhat optional, since it takes a bit of time and we've done this part for you. You should familiarize yourself with the process though since in real life we will not be doing this for you.

kserver_init (respond to prompt for CERN username and passwd)
scramv1 project CMSSW CMSSW_4_2_8
cd CMSSW_4_2_8/src
cvs co -d CMSDASPhoton/CMSDASTreeMaker UserCode/DMason/CMSDASTreeMaker
scramv1 b

This pulls down the analyzer that produced the root trees we will use here, an example crab config and some macro code. The analyzer is fairly minimalist -- there certainly better and more sophisticated ways to do this -- you will find many examples of different variations on this theme throughout the code for the different exercises in CMSDAS. For a nice example of an analyzer used to make ntuples used in photon SUSY analyses, take a look at Dongwook Jang's SUSYNtuplizer -- there's a bit of a learning curve to use this guy though, so we're starting simple here.

Brief Walkthrough of Analyzer Code

Wander into the CMSDASPhoton/CMSDASTreeMaker directory that you pulled down from CVS. You'll see several directories:

<cmslpc03.fnal.gov> ls -1

interface houses the .h files for this guy -- there are two, one general one for the analyzer (CMSDASTreeMaker.h), then another (CMSDASTreeMakerBranchVars.h) which defines the the branches in the tree thats created. Its good to remember where that one is as a reference for what things are called when you're working with the ntuples. Photons, Vertices, PfCandidates are collections of vectors of the various associated quanties for those objects.

src houses the actual analyzer code that makes the tree.

By the way -- you're already seeing the words "tree" and "ntuple" used interchangeably. Get used to that.

Lets walk through the analyzer .C code in src -- there's some features there worth understanding a bit. There is the constructor, CMSDASTreeMaker::CMSDASTreeMaker(const edm::ParameterSet& iConfig) where all the input parameters from the config are defined, after that the destructor, which isn't really used here, then a beginJob() and beginRun() method -- these are useful for initialization -- and you see in the beginRun method (which as you might guess is executed whenever a new run in the data is encountered) there is some HLT related code. Since the HLT configuration can change whenever a new run is taken, you need to get the HLT menu here, including finding what triggers are present. Here we play some games with parsing the names to simplify our tree a bit. CMS trigger paths contain a version number suffix in the name which may be incremented to signify some "not really fundamental" changes in the trigger like changes in prescale values or level-1 seed. Over the course of the 2011 run some triggers reached as high as "v13", meaning to go and query whether one of these fired you must either know ahead of time which "v" was active for which run, or do what we do here, loop over a whole range of 13 trigger names. To simplify things for the exercise we empirically search for the right "v" trigger in the menu during beginRun() then strip that off, making it a branch variable for the trigger branches in the tree. I.e. you'll find a trigger like HLT_Photon30_CaloIdVL_v8 in the HLT_Photon30_CaloIdVL branch in the tree, with the branch var version set to 8. There are wildcard parsers out there but they must be used with care.

This brings us to CMSDASTreeMaker::analyze(const edm::Event& iEvent, const edm::EventSetup& iSetup), which is the meat of the code. It is called for each event of the file you're reading/processing. There is really not a whole lot going on here except a series of sections grabbing the Handle of EDM collections we want to fetch from the data file, looping over its contents, and doing something with them. In this case getting the values we want stored in the various class variables in each object & copying them into our tree. Most analyzers have sections very much like this. You'll find similar code inside what makes PAT tuples.


To make the ntuples used here you run this thing. Included is a sample config, probably configured as last used, with parameters which set thresholds to keep photon objects and PfCandidates, as well as the list of triggers preserved in the tree and required to save an event. You can run this interactively over a small number of files (though there are limits on this in the lpc farm), or to run over large datasets like the ones that make up our 2011 data you'd use CRAB. Within the crabconfigs subdirectory you'll find an example crab config that was used to produce the ntuples. In this case they are brought back to the CRAB working directory -- you might also want to ship them to mass storage like EOS instead. There are some commented out lines in the crab config that would drop the results into /pnfs resilient space.

As with all good cooking shows we've already done this for you. But now we hope you know how we did it.

Using the Ntuples

The ntuple/trees produced from the data are living on the lpc:


Together they include all the 2011 data, and consume about 17 gigs of space. You can pull them over to your laptop (though note this can take half our alloted two hours), or run on them on the lpc. Whereever you use them you will now want to get into an environment where you can run root. On the lpc this is most easily done by wandering into your favorite recent CMSSW release area and doing a eval `scramv1 runtime -sh` (or -csh depending on your shell). You don't need to do anything with CMSSW -- this is just an expedient way to get your hands on root on the lpc machines.

We're going to let root make a skeleton analyzer from which to start analyzing this data. It is not the prettiest way to do this -- to see a more complete way to do this take a look at the SUSYNtuplizer referenced above -- remember we're aiming for easiest to get your hands on right away here.

So -- once you have your ntuples somewhere you want them to be, and you have access to root, fire it up & load in the ntuples -- you can do this by just typing in commands or a small macro:

TChain *cmsdasTree = new TChain("tuple/Test");

std::cout << "loaded "<< cmsdasTree->GetEntries() << " events into cmsdasTree " << endl;


Execute the above commands or the macro you've put them into in root -- it should tell you you have a considerable number of events loaded into cmsdasTree.

Dutifully following your root User's guide, then do a cmsdasTree->MakeClass("WhateverYouWantToCallYourNtupleAnalyzerCode");

You'll have a skeleton code you can then work from to do these exercises. You'll have made two files, WhateverYouWantToCallYourNtupleAnalyzerCode.h and .C. Within the .C file there is a Loop() method that is a good place to stick the meat of any code you write for the next steps.

To actually execute this you might want to construct another little macro:

.L WhateverYouWantToCallYourNtupleAnalyzerCode.C++O;
WhateverYouWantToCallYourNtupleAnalyzerCode k;

The ++O above is excruciatingly important. You can just do a .L for this guy and something may run, but doing that invokes ROOT's C++ interpreter, which does strange, insanely permissive things. A common rookie mistake is to forget the ++O for a while, mess with your code, eventually see incredibly weird behavior (1+1=3 kinds of things), remember again the ++O, compile, see the flood of errors the interpreter missed, and gape amazed at how the typos you introduced ever possibly worked in the first place. Don't forget the ++O. Better is to write a real class ala SUSYNtuplizer, but thats beyond the scope of this exercise...

Relative Trigger Efficiency & Where to Set and Offline Cut

This aim of this first sub-exercise is to find a good choice for a photon pT cut in your analysis for a particular trigger. The HLT bases its trigger selections on the raw supercluster energy, which in the offline photon reconstruction has additional corrections applied. This and that different calibrations will be applied at datataking time vs the hopefully better ones we have for reconstructed or re-reconstructed data results, in some level of smearing between the pt threshold applied by the trigger and a pt cut you apply offline in your analysis. You can get a good estimate of where to set your offline cut by selecting events triggered by a lower threshold trigger, and then from those look at how often your candidate trigger

Modify your MakeClass thing you've created to book and fill two histograms of leading photon pT. One of these you require HLT_Photon50_CaloIdVL has passed, the other where you in addition require HLT_Photon75_CaloIdVL to have passed. Define a TFile where you write these histograms out and book the histograms ahead of the for loop which runs through all the events, then be sure to do a .Write() after that. Having produced the histograms, in a separate macro load them in and divide the Photon75 guy by the Photon50 guy. You should see something like this:

(insert plot here)

This is called a "trigger turn-on curve". That it is not a sharp step function, but more rounded is an artifact of the smearing between the quantities used by the HLT to make its cut vs the reconstructed photon quantities in your analysis. You can try to correct for this, though that can be complicated and be a source of error. What is typically done is you find where your trigger is most efficient. Often above 99% to ensure you don't need to worry tremendously about inefficiencies between the online HLT quantities and the offline analysis ones. An Erf() function is usually used to fit this and is provided for you in the macro subdirectory. Fit the Erf() function to this ratio and find where the 75 GeV trigger is 99% efficient. Typically you would set an offline cut to be the next nice happy round number above this.

A couple notes here -- first this is actually not just a photon exercise, but a general trigger task -- this kind of thing is done to find the turn on curve of pretty much any kind of trigger. Also this is not a precise measurement of the trigger efficiency -- that is best done with a technique like the CMS official tag & probe. There within an independent sample (or as independent as your statistics may allow) you choose a tag sample within which you measure your trigger efficiency via a trigger selection like "probe" sample. Last year's exercise covered this technique, and you're encouraged to take a look at it!

Look at Effects of Isolations in Photon Triggers

Here we take two triggers, one with isolation applied and one without and look at the differences manifesting in the offline quantities governing photon ID -- there is some overlap with this and the next photon short exercise. We'll hold off talking about the subtleties of isolations until then -- we're here interested in getting a feel for what your triggers do to the data.

As you hopefully know at some level by now, and will know more clearly after the next two exercises, photons usually produce narrow showers in the EM calorimeter, and the activity surrounding the crystals a prospective photon deposits energy in are used to determine whether it was actually a photon or not. Typically to measure the background effects of misreconstructed QCD jets in a photon analyses one defines a "fake photon" sample, often inverting requirements that energy deposition surrounding the photon be below some threshold. Without justifying the particular requirements here, we're going to define a "photon candidate" sample as one having:

Photon_TrackIsoPtHolDR03[i]+Photon_EcalIsoDR03[i]+Photon_HcalIsoDR03[i] < 6 GeV

And a "fake" photon sample with this quantity inverted -- i.e. >6 GeV.

Book and fill 12 histograms -- one for each of the 3 individual components in the sum above, one set for each combination of requiring HLT_Photon90_CaloIdVL_IsoL, the other requiring HLT_Photon90_CaloIdVL, photon or fake photon. You should try weighting these histograms by 1/prescale, then compare the number of events in the photon plots vs the number of events in the fake plots for either the isolated or non isolated trigger.

-- DavidMason - 04-Jan-2012

Edit | Attach | Watch | Print version | History: r6 < r5 < r4 < r3 < r2 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r6 - 2012-01-05 - DavidMorse
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback