7.5 Photon Analysis

Complete: 4
Detailed Review status



This page shows some basic instruction in order to put you in the conditions of starting an analysis on Photons.

Getting Started

Access to computing environment

First make sure that you have access to either cmslpc, or lxplus. You will also need a github account to checkout the code. To check if you’re using bash, tcsh, sh, csh, ksh, or zsh. You can request for a change in shell (cmslpc,lxplus account self-service tool).

You might as well get a proxy now as you will need it later.

voms-proxy-init -voms cms --valid 192:00
If you do not have grid access see LcgAccess

Analysis in CMSSW 94x

Set-up CMSSW Release

Below is a recipe for setting up the analysis for the CMSSW94 release:

cmsrel CMSSW_9_4_4
cd CMSSW_9_4_4/src

Create Ntuples using the ggNtuplizer

First we build the ggNtuplizer in the latest CMSSW version it’s available in: https://github.com/cmkuo/ggAnalysis/tree/94X

This step may take some time.

git cms-init 
git cms-merge-topic lsoffi:CMSSW_9_4_0_pre3_TnP 
git cms-merge-topic guitargeek:ElectronID_MVA2017_940pre3 
scram b -j8 
cd $CMSSW_BASE/external/slc6_amd64_gcc630 
git clone https://github.com/lsoffi/RecoEgamma-PhotonIdentification.git data/RecoEgamma/PhotonIdentification/data 
cd data/RecoEgamma/PhotonIdentification/data 
git checkout CMSSW_9_4_0_pre3_TnP 
cd $CMSSW_BASE/external/slc6_amd64_gcc630/ 
git clone https://github.com/lsoffi/RecoEgamma-ElectronIdentification.git data/RecoEgamma/ElectronIdentification/data 
cd data/RecoEgamma/ElectronIdentification/data 
git checkout CMSSW_9_4_0_pre3_TnP 
cd $CMSSW_BASE/src 
git cms-merge-topic cms-egamma:EGM_94X_v1 
cd EgammaAnalysis/ElectronTools/data 
git clone https://github.com/ECALELFS/ScalesSmearings.git 
cd ScalesSmearings/ 
git checkout Run2017_17Nov2017_v1
cd $CMSSW_BASE/src 
git clone https://github.com/cmkuo/HiggsAnalysis.git 
git clone -b 94X https://github.com/cmkuo/ggAnalysis.git 
scram b -j8 

After you've succesfully built the ggNtuplizer, you can see how the analyzer accesses the photon collection in ggNtuplizer_photons.cc. Use your favourite text editor to open the file.

vi ggAnalysis/ggNtuplizer/plugins/ggNtuplizer_photons.cc


emacs -nw ggAnalysis/ggNtuplizer/plugins/ggNtuplizer_photons.cc

  edm::Handle > photonHandle; //Define the photon handle and access information using getByToken
  e.getByToken(photonCollection_, photonHandle);
for (edm::View::const_iterator iPho = photonHandle->begin(); iPho != photonHandle->end(); ++iPho) {
    phoEt_            .push_back(iPho->et());
    phoEta_           .push_back(iPho->eta());
    phoPhi_           .push_back(iPho->phi());

Next, open the configuration file

vi ggAnalysis/ggNtuplizer/test/run_data_94X.py.

This file configures your cmsRun job and creates ntuples. In line 70, you will find the lines that will import the VID framework.You should see:

#####VID framework####################
from PhysicsTools.SelectorUtils.tools.vid_id_tools import *
dataFormat = DataFormat.MiniAOD
switchOnVIDElectronIdProducer(process, dataFormat)
switchOnVIDPhotonIdProducer(process, dataFormat)

These lines import VID and enable the electron and photon IDs. VID (Versioned Identification) is a standardized way to apply particle identification selections. Both electron and photon IDs are generally updated twice a year which usually happens for summer conferences and for Moriond.

 my_phoid_modules = ['RecoEgamma.PhotonIdentification.Identification.cutBasedPhotonID_Fall17_94X_V1_TrueVtx_cff',
#add them to the VID producer
for idmod in my_phoid_modules:

Ntuples from miniAOD

To create Ntuples from miniAOD files, either MC or data, we can run on a sample dataset specified in this the process.source section of the run_mc_94x.py or run_mc_94x.py file. In this example we will run on an MC dataset DYJetsToLL_M-50_TuneCP5_13TeV-amcatnloFXFX-pythia8/MINIAODSIM/94X_mc2017_realistic_v10-v1/00000/005DC030-D3F4-E711-889A-02163E01A62D.root.

process.source = cms.Source("PoolSource",
                            fileNames = cms.untracked.vstring(

Edit the file and set process.maxEvents to run over 10000 events. Execute the process with

cd $CMSSW_BASE/src/ggAnalysis/ggNtuplizer/test
cmsRun run_mc_94X.py

The output should look like this:

	--- egmGsfElectronIDs:cutBasedElectronID-Fall17-94X-V1-loose added to patElectrons
	--- egmGsfElectronIDs:cutBasedElectronID-Fall17-94X-V1-medium added to patElectrons
	--- egmGsfElectronIDs:cutBasedElectronID-Fall17-94X-V1-tight added to patElectrons
	--- egmGsfElectronIDs:cutBasedElectronID-Fall17-94X-V1-veto added to patElectrons
	--- egmGsfElectronIDs:heepElectronID-HEEPV70 added to patElectrons
	--- egmGsfElectronIDs:mvaEleID-Fall17-noIso-V1-wp80 added to patElectrons
	--- egmGsfElectronIDs:mvaEleID-Fall17-noIso-V1-wp90 added to patElectrons
	--- egmGsfElectronIDs:mvaEleID-Fall17-noIso-V1-wpLoose added to patElectrons
	--- egmGsfElectronIDs:mvaEleID-Fall17-iso-V1-wp80 added to patElectrons
	--- egmGsfElectronIDs:mvaEleID-Fall17-iso-V1-wp90 added to patElectrons
	--- egmGsfElectronIDs:mvaEleID-Fall17-iso-V1-wpLoose added to patElectrons
	--- egmPhotonIDs:cutBasedPhotonID-Fall17-94X-V1-loose added to patPhotons
	--- egmPhotonIDs:cutBasedPhotonID-Fall17-94X-V1-medium added to patPhotons
	--- egmPhotonIDs:cutBasedPhotonID-Fall17-94X-V1-tight added to patPhotons
	--- egmPhotonIDs:mvaPhoID-RunIIFall17-v1-wp80 added to patPhotons
	--- egmPhotonIDs:mvaPhoID-RunIIFall17-v1-wp90 added to patPhotons
27-Apr-2018 23:59:25 CDT  Initiating request to open file root://cmsxrootd-site.fnal.gov//store/mc/RunIIFall17MiniAOD/DYJetsToLL_M-50_TuneCP5_13TeV-amcatnloFXFX-pythia8/MINIAODSIM/94X_mc2017_realistic_v10-v1/00000/005DC030-D3F4-E711-889A-02163E01A62D.root
%MSG-w XrdAdaptor:  file_open 27-Apr-2018 23:59:29 CDT pre-events
Data is served from T2_US_Purdue instead of original site T1_US_FNAL
27-Apr-2018 23:59:30 CDT  Successfully opened file root://cmsxrootd-site.fnal.gov//store/mc/RunIIFall17MiniAOD/DYJetsToLL_M-50_TuneCP5_13TeV-amcatnloFXFX-pythia8/MINIAODSIM/94X_mc2017_realistic_v10-v1/00000/005DC030-D3F4-E711-889A-02163E01A62D.root
Begin processing the 1st record. Run 1, Event 74714917, LumiSection 42879 at 28-Apr-2018 00:01:19.401 CDT
28-Apr-2018 00:02:17 CDT  Closed file root://cmsxrootd-site.fnal.gov//store/mc/RunIIFall17MiniAOD/DYJetsToLL_M-50_TuneCP5_13TeV-amcatnloFXFX-pythia8/MINIAODSIM/94X_mc2017_realistic_v10-v1/00000/005DC030-D3F4-E711-889A-02163E01A62D.root


MessageLogger Summary

 type     category        sev    module        subroutine        count    total
 ---- -------------------- -- ---------------- ----------------  -----    -----
    1 XrdAdaptor           -w file_open                              1        1
    2 fileAction           -s file_close                             1        1
    3 fileAction           -s file_open                              2        2

 type    category    Examples: run/evt        run/evt          run/evt
 ---- -------------------- ---------------- ---------------- ----------------
    1 XrdAdaptor           pre-events                        
    2 fileAction           PostGlobalEndRun                  
    3 fileAction           pre-events       pre-events       

Severity    # Occurrences   Total Occurrences
--------    -------------   -----------------
Warning                 1                   1
System                  3                   3

This outputs a file name ggtree_mc.root which you can now analyze. If you want to change the output file name, edit this line in run_mc_94x.py

process.TFileService = cms.Service("TFileService", fileName = cms.string('ggtree_mc.root')).

If you want another dataset, you can query a dataset in the CMS Data Aggregation system. For example, using wildcards enter: dataset dataset=/GGJets_*_Pt-50_13TeV-sherpa/*Fall17*/MINIAODSIM, or use the dasgoclient:

dasgoclient -query="dataset=/GGJets*/*Fall17*/MINI*"

For testing, click on one of the datasets, e.g. /GGJets_M-1000To2000_Pt-50_13TeV-sherpa/RunIIFall17MiniAOD-PU2017_94X_mc2017_realistic_v11-v1/MINIAODSIM → click on Files and you will find the logical file names (LFN) of (note: The LFN uniquely identifies any file that is somewhere with in the /store directory tree within all of CMS storage.) of samples you can run on. Example: /store/mc/RunIIFall17MiniAOD/GGJets_M-1000To2000_Pt-50_13TeV-sherpa/MINIAODSIM/PU2017_94X_mc2017_realistic_v11-v1/30000/047D1FF7-A532-E811-B817-02163E01A10B.root.

Open run_mc_94X.py and edit the section, comment out or delete the DYJetsToLL*.root and replace it with the files you want to examine.

If you are analyzing a local file, you need to add the file: prefix like


When you've configured the run_mc_94X.py just execute with cmsRun run_mc_94X.py to get the desired ntuple.

Analyzing the Ntuples

One way to access the information contained in the ntuple directly is through the command line.

$ root -l ggtree_mc.root
root [1] .ls
root [2] ggNtuplizer->cd()
root [3] EventTree->Print()
root [4] EventTree->Draw("phoEt")

The first command opens ROOT and reads in the ntuple contained in the file ggtree_mc.root. The ".ls" command shows that the file contains a directory named ggNtuplizer which contains the TTree EventTree. The Print() command displays the contents of tree, and the Draw() command plots a particular variable. As an exercise

look at different kinematical variables (Et, eta, phi) for photons.

You can also plot a variable with some kinematic cuts. Try:

root [5] EventTree->Draw("phoEt”, "phoEt > 20.0”)
root [6] EventTree->Draw("phoEt", "phoEt > 20.0 && fabs(phoEta) < 1.4")  // Plot electron Et with minimum pT of 20 GeV in the barrel (-1.4 < eta < 1.4).

Another way to access information stored in the ntuple is to open a TBrowser.

root[7] TBrowser b

In the above example, phoEt is the transverse energy of the photon, phoEta indicates the pseudorapidity (a spatial coordinate describing the angle of the photon relative to the beam axis where eta =0 is perpendicular to the beam. Higher-eta objects are described to be more “forward” and are likely to be detected at the endcaps compared to the “central”, low-eta objects), and phoPhi is measured along the azimuthal direction. As a simple exercise, one can plot the phoPhi variable and confirm azimuthal symmetry.

HoverE (H/E), is another important ID which indicates the ratio of energy deposited in the HCAL to the energy in the ECAL.

Shower-shape variables like SigmaIEtaEta, indicate the “shape” or spread of the energy deposited in the calorimeter by the particle. Smaller values of this variable correspond to real photons. The Isolation variables phoPFChIso (Charged Hadron Isolation), phoPFPhoIso (Photon Isolation), phoPFNeuIso(Neutral Hadron Isolation) represent the relative contribution of energy from nearby charged hadrons, photons and neutral hadrons, respectively. A large contribution by these other objects may indicate a fake photon object.

These last two sets of variables, shower-shape and Isolation variables, together with HoverE, are given major consideration when checking for jet-faking photons.

Double click on the file, and the ggNtuplizer directory and look for the variable you want to examine. For example, you can scroll down and double click on phoEt and draw the Et of all the photons.

You could learn more technical details about the definitions of the different photon variables stored in the ntuples To learn more about the definitions of the differet photon variables stored in the Ntuples here. This page also gives the cut-based and MVA-based IDs for both photons and electrons for Run 2.

To practice with more Ntuples, you can look at the examples from the 2018 CMSDAS egamma hands-on exercises. Note that the pre-ntuplized examples here has been processed with an earlier version of CMSSW with different detector conditions that might be outdated for our purposes.

Writing macros

For a quick look at the physics from the Ntuples, the ROOT command line provides an interactive way to plot different variables that you are interested in. However, for more complex analyses, such as distinguishing “true” from “fake” photons from the sample it is necessary to write a longer piece of code to consider more complicated variables that may be functions of variables stored in the Ntuples generated. You can write macros in C++ (see below) or python. Python scripts are usually simpler to create but if you prefer to use C++, there’s a helpful method of TTree called MakeClass() which generates a skeleton macro for you. It also imports the variables in the tree.

Let’s begin by looking at a simple plotting script.

wget https://raw.githubusercontent.com/uzzielperez/CMSSW-workbook-practice/master/PhotonAnalysis/basicPlotter_photons.py
vi basicPlotter_photons.py

The first part of the script collects the input file/s containing the same default TTree ggNtuplizer/EventTree (you can change this if you might be using different Ntuplizer/analyzer) into a chain called tchain. We create histograms for variables that we are interested in. In this case we are interested in the Energy, momentum, Eta, and Phi variables. We select events with a minimum photon energy of 20.0 GeV and apply some other ID cuts. In the main block of the code, we loop over all the events in the input ntuple. Within this loop is a second loop over all the photons in the event. Here we can apply our selection criteria, (e.g. minimum photon energy is 20.0 GeV). The histograms are filled with events that pass the cuts and are stored in an output root file. (In a later exercise, we can apply additional ID cuts on the photons. See H->gg exercise.)

# Output file
file_out = ROOT.TFile(args.outputfile, 'recreate')
# Histograms to fill 
h_pho_pt = ROOT.TH1D('h_pho_pt', 'Photon p_{T}', 98, 20.0, 1000.0)
h_pho_Eta = ROOT.TH1D('h_pho_Eta', 'Photon Eta', 80, -3, 3)
h_pho_Phi = ROOT.TH1D('h_pho_Phi', 'Photon Phi', 80, -3.5, 3.5)
#h_pho_sigmaIEtaIEta = ROOT.TH1D('h_pho_sigmaIEtaIEta', 'Photon #sigma_{i#eta i#eta}', 100, 0.0, 0.1)
#Loop over all the events in the input ntuple
for ievent,event in enumerate(tchain):
    if ievent > args.maxevents and args.maxevents != -1: break
    if ievent % 10000 == 0: print 'Processing entry ' + str(ievent)
    # Loop over all the photons in an event
    for i in range(event.nPho):
        if (event.phoEt[i] > 20.0 and (event.phoIDbit[i]&2==2)):
            pho_vec = ROOT.TLorentzVector()
            pho_vec.SetPtEtaPhiE(event.phoEt[i], event.phoEta[i], event.phoPhi[i], event.phoE[i])

As an optional first step, check the usage of this python script.

python basicPlotter_photons.py -h 

You should see the command line options available to you. You may also want to edit the script for your purposes, e.g. change the default in the argument parser section at the top of the script.

usage: basicPlotter_photons.py [-h] [-i [INPUTFILES [INPUTFILES ...]]]
                               [-o OUTPUTFILE] [-m MAXEVENTS] [-t TTREE]
A simple ttree plotter
optional arguments:
  -h, --help            show this help message and exit
                        List of input ggNtuplizer files
  -o OUTPUTFILE, --outputfile OUTPUTFILE
                        Input ggNtuplizer file
  -m MAXEVENTS, --maxevents MAXEVENTS
                        Maximum number events to loop over. Default set to -1
                        to loop over all events in the input file/s.
  -t TTREE, --ttree TTREE
                        TTree Name

We’ll first take a look at the “ggtree.root” Ntuples we created in the previous section. To specify the input file and output file name other than the default,

python basicPlotter_photons.py -i ggtree_mc.root -o ggtree_mc_plots.root

Examine the plots you created in ggtree_mc_plots. One way to do this to open a TBrowser. Note that the events that populated these histograms might not be “real” photons. We may need to apply more IDs to get a quality collection of photons.

* Try to look at the transverse momentum plot in the log scale. 
* Modify the script by adding other variables. Try making histograms for good discriminating variables that separates real photons from fakes, e.g. |dEtaInSeed|, mini isolation, and the number of expected missing inner hits.

Real and Fake Photons

In this part, we will add another layer of complexity to our basic plotting code to separate "real" from "fake" photons. We edit our basic plotting script and import a function from selection.py. First download the code,

wget https://raw.githubusercontent.com/uzzielperez/CMSSW-workbook-practice/master/PhotonAnalysis/selection.py

At the top of the basic plotting script, basicPlotter_photons.py add #from selection import.

 #! /usr/bin/env python
import argparse
from selection import *


The selection.py file has a function called has_mcPho_match which takes a TLorentzVector for the reconstructed photon as input and determines whether the closest MC object is a photon or not.

 def has_mcPho_match(event, pho_vec):
    min_delta_r = float('Inf')
    pid = 0
    for mc in range(event.nMC):
        if event.mcPt[mc] > 1.0:
            mc_vec = ROOT.TLorentzVector()
            mc_vec.SetPtEtaPhiE(event.mcPt[mc], event.mcEta[mc], event.mcPhi[mc], event.mcE[mc])
            delta_r = pho_vec.DeltaR(mc_vec)
            if delta_r < min_delta_r:
                min_delta_r = delta_r
                if delta_r < 0.3:
                    pid = abs(event.mcPID[mc])
    if pid == 22: return True
    return False

In the histograms section add,

h_pho_sigmaIEtaIEta = ROOT.TH1D('h_pho_sigmaIEtaIEta', 'Photon #sigma_{i#eta i#eta}', 100, 0.0, 0.1)

Edit the 2nd loop (over the photons in an event) in the so that you have,

    # Loop over all the photons in an event
    for i in range(event.nPho):
        if (event.phoEt[i] > 20.0 and (event.phoIDbit[i]&2==2)):
            pho_vec = ROOT.TLorentzVector()
            pho_vec.SetPtEtaPhiE(event.phoEt[i], event.phoEta[i], event.phoPhi[i], event.phoE[i])
            if not event.isData and has_mcPho_match(event, pho_vec):

You can check out the full code here.

In the selection above, the variable phoIDbit, event.phoIDbit[i]&2==2 selects with a loose Photon ID. Now, run the plotting script with the ggtree_mc.root ntuples we produced above.

python basicPlotter_photons.py -i ggtree_mc.root -o ggtree_mc_plots.root. 

* Create corresponding histograms for fake objects (i.e. those that did not pass the selection criteria, did not match generator photons). This part is needed for the next section.
* Plot photon and fake histograms together for a given variable and compare. How do the fake histograms vary with pT? 
* What cuts should you impose when performing an analysis with low/large backgrounds?

Note: The ggtree_mc.root Ntuples we plotted earlier were obtained from running on a DYJetsToLL_M-50_TuneCP5_13TeV-amcatnloFXFX-pythia8 dataset. This sample includes only events with at least two very loose electrons. To prepare for our later exercise on the Higgs decay into two photons, let us look at a sample with at least two very loose photons instead. We will make use of a pre-Ntuplized dataset from the 2018 CMS Data Analysis school. In CMSLPC you can get the file by

xrdcp root://cmseos.fnal.gov//store/user/cmsdas/2018/short_exercises/ElectronsPhotons/GluGluHToGG_M-125_13TeV_powheg_pythia8.root .

* For practice, try querying DAS for a dataset you can Ntuplize yourself with the ggNtuplizer by following the steps in the previous sections (e.g. /GluGluHToGG_M-120_13TeV_powheg_pythia8/RunIIFall17MiniAOD-94X_mc2017_realistic_v10-v2/MINIAODSIM) . 

To make plots, we again indicate the input and output filename as follows:

python basicPlotter_photons.py -i GluGluHToGG_M-125_13TeV_powheg_pythia8.root -o H2gg.root.

ROC curves: Background Rejection and Signal Efficiency

When making cuts, we would like to reject as much background as possible while minimizing signal loss. One possible way to measure the how good a set of cuts is through ROC (Receiver Operating Characteristic) curves. ROC curves can show how the signal efficiency vs bacground rejection changes as the cut on a given variable is changed.

We can edit the basic plotting script above to make histograms for fake objects and run on the ggtree_mc.root/GluGluHToGG-M-125_13Tev ntuples. To get the same results, you can do the following for ggtree_mc.root:

wget https://raw.githubusercontent.com/uzzielperez/CMSSW-workbook-practice/Backup/PhotonAnalysis/basicPlotter_photons3.py
python basicPlotter_photons3.py -i ggtree_mc.root -o ggtree_comp_mc_plots.root
wget https://raw.githubusercontent.com/uzzielperez/CMSSW-workbook-practice/Backup/PhotonAnalysis/makeRoc_pho.py
python makeRoc_pho.py -i ggtree_comp_mc.root -v sigmaIEtaIEta

The main part of the code (makeRoc_pho.py) is in the for loop:

for i in range(h_sig.GetNbinsX()):
    eff_sig = h_sig.Integral(1, i + 1) / h_sig.Integral()
    eff_bkg = h_bkg.Integral(1, i + 1) / h_bkg.Integral()
    g_roc.SetPoint(i, 1.0 - eff_bkg, eff_sig)

The “best” cut ususally depends on the analysis one needs to perform. One usual goal is to achieve a certain signal efficiency or background rejection. Another is to minimize the distance from a hypothetical “perfect” cut with 100% signal efficiency and 100% background rejection, sqrt((1 - eff_sig)**2 + eff_bkg**2)).

* Try to code and minimise the distance from a hypothetical "perfect" cut. 

H->gg Bonus Exercise

Although the Higgs primarily decays into bbbar pairs, there is a huge background in this channel that gives enough motivation to search in the diphoton channel which will be done here. In fact, the Higgs was discovered in this channel and the ZZ channel. The hunt for the Higgs was a difficult task since it has a small cross-section and the easily reconstructed final states also have a small branching ratio. In this exercise, one can expand the basic plotting script in the earlier examples to include only events with at least two photons that pass a certain selection criteria. You will see that the reconstructed Higgs peaks at 125 GeV.

To run the script:

wget https://raw.githubusercontent.com/uzzielperez/CMSSW-workbook-practice/master/PhotonAnalysis/selection.py
wget https://raw.githubusercontent.com/uzzielperez/CMSSW-workbook-practice/master/PhotonAnalysis/makePlots_H2gg.py
python makePlots_H2gg.py -m 10000 -i root://cmsxrootd.fnal.gov///store/user/drberry/HATS/ggntuples/GluGluHToGG_M-125_13TeV_powheg_pythia8.root -o H2gg_plots.root

The output: * HiggsMassPeak.pdf: HiggsMassPeak

* Try applying R9 and regression corrects to the photon and see the improvement to the peak width.
* Search for the Higgs in the data skims from above.

Generating events and Writing your own Analyzer to access photon information

See related:

For your own study, you might want to simulate and generate events with photon final states. You can check out a number of run cards from GitHub (link above) and follow the recipes in the Generator section of the CMSSW Workbook. For our purposes, will take a quick run-through of how to use the Pythia8 interface with CMSSW and make some basic plots from it.

First get the fragment,


Once we have that ready, from the src directory of $CMSSW_BASE we will use cmsDriver.py to generate 1000 GEN level events only (no showering or detector simulation).

scram b -j 16
cmsDriver.py Configuration/GenProduction/python/ThirTeenTeV/SMGG_TuneCUEP8M1_13TeV_pythia8_cfi.py -s GEN --mc --no_exec --conditions auto:mc -n 1000

A configuration file would then be generated. To run, do

cmsRun SMGG_TuneCUEP8M1_13TeV_pythia8_cfi_py_GEN.py

You can also do

cmsRun SMGG_TuneCUEP8M1_13TeV_pythia8_cfi_py_GEN.py > logSMGG_TuneCUEP8M1_13TeV_pythia8_cfi_py_GEN.txt
to save the output to text file. This takes a while to run. After this, a root file will be generated. You have to do
edmDumpEventContent SMGG_TuneCUEP8M1_13TeV_pythia8_cfi_py_GEN.root
If done correctly, the output would look like this,
Type                                  Module               Label      Process   
GenEventInfoProduct                   "generator"          ""         "GEN"     
ROOT::Math::PositionVector3D,ROOT::Math::DefaultCoordinateSystemTag>    "genParticles"       "xyz0"     "GEN"     
double                                "ak4GenJets"         "rho"      "GEN"     
double                                "ak4GenJetsNoNu"     "rho"      "GEN"     
double                                "ak8GenJets"         "rho"      "GEN"     
double                                "ak8GenJetsNoNu"     "rho"      "GEN"     
double                                "ak4GenJets"         "sigma"    "GEN"     
double                                "ak4GenJetsNoNu"     "sigma"    "GEN"     
double                                "ak8GenJets"         "sigma"    "GEN"     
double                                "ak8GenJetsNoNu"     "sigma"    "GEN"     
edm::HepMCProduct                     "generatorSmeared"   ""         "GEN"     
edm::TriggerResults                   "TriggerResults"     ""         "GEN"     
float                                 "genParticles"       "t0"       "GEN"     
vector                        "ak4GenJets"         "rhos"     "GEN"     
vector                        "ak4GenJetsNoNu"     "rhos"     "GEN"     
vector                        "ak8GenJets"         "rhos"     "GEN"     
vector                        "ak8GenJetsNoNu"     "rhos"     "GEN"     
vector                        "ak4GenJets"         "sigmas"   "GEN"     
vector                        "ak4GenJetsNoNu"     "sigmas"   "GEN"     
vector                        "ak8GenJets"         "sigmas"   "GEN"     
vector                        "ak8GenJetsNoNu"     "sigmas"   "GEN"     
vector                           "genParticles"       ""         "GEN"     
vector                  "ak4GenJets"         ""         "GEN"     
vector                  "ak4GenJetsNoNu"     ""         "GEN"     
vector                  "ak8GenJets"         ""         "GEN"     
vector                  "ak8GenJetsNoNu"     ""         "GEN"     
vector                  "genMetCalo"         ""         "GEN"     
vector                  "genMetTrue"         ""         "GEN"     
vector             "genParticles"       ""         "GEN"   

To access the information in the root file we generated, we need to write an analyzer. You can follow the link on how to write an analyzer, but for our purposes, do the following:

mkdir PhotonDemoAnalyzer
Cd PhotonDemoAnalyzer

mkedanlzr PhotonAnalyzer
cd PhotonAnalyzer
scram b -j 8

These commands will set up the environment and the framework module for the analyzer. The mkedanlzr script generates a configuration file located in the python directory. Open ConfFile_cfg.py and replace the “myfile.root” with the desired source. In our case, it would be SMGG_TuneCUEP8M1_13TeV_pythia8_cfi_py_GEN.root. With the path to the root file, the configuration file will look like this:

import FWCore.ParameterSet.Config as cms

process = cms.Process("Demo")


process.maxEvents = cms.untracked.PSet( input = cms.untracked.int32(-1) )

process.source = cms.Source("PoolSource",
    # replace 'myfile.root' with the source file you want to use
    fileNames = cms.untracked.vstring(


process.demo = cms.EDAnalyzer('PhotonAnalyzer'

process.p = cms.Path(process.demo)

To do a quick check if it works, do

cmsRun python/ConfFile_cfg.py

The output should look as follows:

06-Aug-2018 02:10:08 CDT  Initiating request to open file file:/uscms/home/cuperez/nobackup/PhotonAnalysis/CMSSW_9_4_4/src/SMGG_TuneCUEP8M1_13TeV_pythia8_cfi_py_GEN.root
06-Aug-2018 02:10:10 CDT  Successfully opened file file:/uscms/home/cuperez/nobackup/PhotonAnalysis/CMSSW_9_4_4/src/SMGG_TuneCUEP8M1_13TeV_pythia8_cfi_py_GEN.root
Begin processing the 1st record. Run 1, Event 1, LumiSection 1 at 06-Aug-2018 02:10:12.439 CDT
Begin processing the 2nd record. Run 1, Event 2, LumiSection 1 at 06-Aug-2018 02:10:12.440 CDT
Begin processing the 3rd record. Run 1, Event 3, LumiSection 1 at 06-Aug-2018 02:10:12.440 CDT
Begin processing the 996th record. Run 1, Event 996, LumiSection 1 at 06-Aug-2018 02:10:12.597 CDT
Begin processing the 997th record. Run 1, Event 997, LumiSection 1 at 06-Aug-2018 02:10:12.597 CDT
Begin processing the 998th record. Run 1, Event 998, LumiSection 1 at 06-Aug-2018 02:10:12.597 CDT
Begin processing the 999th record. Run 1, Event 999, LumiSection 1 at 06-Aug-2018 02:10:12.598 CDT
Begin processing the 1000th record. Run 1, Event 1000, LumiSection 1 at 06-Aug-2018 02:10:12.598 CDT
06-Aug-2018 02:10:12 CDT  Closed file file:/uscms/home/cuperez/nobackup/PhotonAnalysis/CMSSW_9_4_4/src/SMGG_TuneCUEP8M1_13TeV_pythia8_cfi_py_GEN.root


MessageLogger Summary

 type     category        sev    module        subroutine        count    total
 ---- -------------------- -- ---------------- ----------------  -----    -----
    1 fileAction           -s file_close                             1        1
    2 fileAction           -s file_open                              2        2

 type    category    Examples: run/evt        run/evt          run/evt
 ---- -------------------- ---------------- ---------------- ----------------
    1 fileAction           PostGlobalEndRun                  
    2 fileAction           pre-events       pre-events       

Severity    # Occurrences   Total Occurrences
--------    -------------   -----------------
System                  3                   3

dropped waiting message count 0

Now we want to access event information like each of the photon's pt, eta, and phi for instance. We have prepared a basic analyzer that you can study and modify to your liking. Go back to CMSSW_9_4_4/src and start fresh. You can compare the newly checked out analyzer with the prepared one.

cd CMSSW_9_4_4/src/src 
svn checkout https://github.com/uzzielperez/CMSSW-workbook-practice/trunk/EDAnalyzer/DiPhotonAnalyzer/
cd EDAnalyzer/DiPhotonAnalyzer
rm -rf .svn/             #this is not needed
rm -rf plugins/.svn
rm -rf plugins/.svn 

There are a number of added lines, includes, in the plugins/BuildFile.xml, DiPhotonAnalyzer.cc and the configuration file. We will focus on the important additions. Below, we have defined structs to store information on severable variables of interests for particular object like the leading and subleading photons of an event.

 //DiPhotonAnalyzer.cc - 
 edm::Service fs;
      edm::EDGetTokenT > genParticlesToken_;
      edm::InputTag genParticles_;
      edm::InputTag particles_;

      TTree *fgenTree;
      struct eventInfo_t {
      Long64_t run;
      Long64_t LS;
      Long64_t evnum;
      eventInfo_t fEventInfo;

      struct genPhotonInfo_t{
        double pt;
        double eta;
        double phi;
        double E;

      genPhotonInfo_t fPhoton1;
      genPhotonInfo_t fPhoton2;

     struct genDiPhotonInfo_t{
        double Minv;
        double qt;

      genDiPhotonInfo_t fDiPhoton;
      int numPhotons = 0;

In the next section, we can define the tree and the branches where we want to write our selected information into. We also define a token that consumes the type of object from the input root file.

   //DiPhotonAnalyzer.cc - Tree
   fgenTree = fs->make("fgenTree","GENDiphotonTree");
   fgenTree->Branch("genPhoton1", &fPhoton1, "pt/D:eta:phi:E");
   fgenTree->Branch("genPhoton2", &fPhoton2, "pt/D:eta:phi:E");
   fgenTree->Branch("genDiPhoton", &fDiPhoton, "Minv/D:qt");

   genParticlesToken_ = consumes>(iConfig.getParameter("particles"));

After initializing the variables, we then loop over the genParticle collection and choose particles with pdgID==22 (photons) with status code 1 (end state). We also sort the photons by pt and then update the structs.

   //DiPhotonAnalyzer.cc - Loop over genParticle collection
   //There are more succinct ways to do this section but we want to begin with a more transparent code.
  for(vector::const_iterator ip = genParticles->begin(); ip != genParticles->end(); ++ip){
      if(ip->status()==1 && ip->pdgId()==22){
         //cout << "Photon end state found" << endl;
        photoncount = photoncount + 1;
        double pt = ip->pt();
        double eta = ip->eta();
        double phi = ip->phi();
        double E = ip->energy();

        //Ordering photons
        if (pt > fPhoton1.pt){
            fPhoton2.pt = fPhoton1.pt;
            fPhoton2.eta = fPhoton1.eta;
            fPhoton2.phi = fPhoton1.phi;
            fPhoton2.E = fPhoton1.E;

            fPhoton1.pt = pt;
            fPhoton1.eta = eta;
            fPhoton1.phi = phi;
            fPhoton1.E = E;
        if ((pt < fPhoton1.pt) && (pt > fPhoton2.pt)){
            fPhoton2.pt = pt;
            fPhoton2.eta = eta;
            fPhoton2.phi = phi;
            fPhoton2.E = E;
      }//end photon end state condition
  }//end loop over gen particles 

Once you have the basic kinematic variables for each photon, you can computer for more complex variables such as invariant mass and add them to your Ntuples.

   //DiPhotonAnalyzer.cc - DiPhoton object 
 //Fill DiPhoton Information
  TLorentzVector leadingPhoton, subleadingPhoton;
  leadingPhoton.SetPtEtaPhiE(fPhoton1.pt, fPhoton1.eta, fPhoton1.phi, fPhoton1.E);
  subleadingPhoton.SetPtEtaPhiE(fPhoton2.pt, fPhoton2.eta, fPhoton2.phi, fPhoton2.E);

   TLorentzVector total = leadingPhoton + subleadingPhoton;
   double ggmass = total.M();

   fDiPhoton.Minv = ggmass;
   fDiPhoton.qt = total.Pt();

Finally the information can be written to the tree.

 //Fill the tree Branches 

Take note that you need to add these lines to your configuration file python/ConfFile_cfg.py in order for it to compile properly.

process.TFileService = cms.Service("TFileService",
                fileName = cms.string("DemoDiPhotonInfo.root") #output filename

process.demo = cms.EDAnalyzer('DiPhotonAnalyzer',
    particles = cms.InputTag("genParticles")

To run and make Ntuples.

cd CMSSW_9_4_4/src/EDAnalyzer/ 
scram b -j 4
cmsRun DiPhotonAnalyzer/python/ConfFile_cfg.py

This will create DemoDiPhotonInfo.root from which can make your plots.


It's good to familiarize oneself with the recently developed NanoAOD data tier. You can learn more about it in the WorkBookNanoAOD. The goal for this new data tier is to keep the event size below 2kb/event so that ~50% of analyses can have their ntuples centrally produced. We can run through a quick example and make a quick analysis with them.

cd $CMSSW_BASE/src 
git cms-init # not really needed except if you want other cmssw stuff
git clone https://github.com/cms-nanoAOD/nanoAOD-tools.git PhysicsTools/NanoAODTools
scram b
voms-proxy-init -voms cms 

cd PhysicsTools/NanoAODTools/python/postprocessing/examples/
python exampleAnalysis.py

Feel free to look at the code and how you could make selections (e.g. choose events with at least two photons) as you loop over the events. You could try finding another interesting dataset in DAS and look at what's stored in the root files.

Miscellaneous Links

Workshops and Tutorials:


Review status

Reviewer/Editor and Date (copy from screen) Comments
CiliciaUzzielPerez - 28-April-2018 Update for Run 2, CMSSW 94x
MarcoPieri - 24-Jan-2010  
NancyMarinelli- 26 Nov 2009 Update for CMSSW 33X
MatteoSani - 02 Mar 2009  
ChrisSeez - 25 Feb 2008  
Nancy Marinelli - 22 Feb 2008 Update
Responsible: Nancy Marinelli

Edit | Attach | Watch | Print version | History: r32 < r31 < r30 < r29 < r28 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r32 - 2018-12-04 - CiliciaUzzielPerez



    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback