Chapter 7: Reconstruction Tutorials



7.1 Introduction to Higher-Level Object Analysis Tutorials

Complete: 3
Detailed Review status

This chapter describes the details of the objects associated with physics analyses such as electrons, photons, muons, missing transverse energy (MET) and jets; heavy-quarks, taus. These objects are available in the standard AOD data samples. They are produced using a wide variety of sophisticated reconstruction algorithms which are described in detail in the Offline Guide.

  • Track Reconstrcuction and Analysis - shows the available methods for accessing and analyzing the reconstructed tracks, and discusses more advanced operations involving tracks (refitting, accessing track trajectory at different points, etc.).

  • Vertex Reconstruction - decribes
    • how to access and analyze the primary vertices,
    • how to find and fit vertices by a user in his analysis code.

  • Electron Analysis - shows some simple code for basic analysis of electrons, and contains links for further electron analysis and reco instructions.

  • Photon Analysis - shows some simple code for basic analysis of photons, and contains links for further photon analysis and reco instructions.

  • Jet Reconstruction is a more complex process and documentation is provided for both Jet object reconstruction.

  • MET Analysis - describes MET objects in RECO/AOD/PAT, MET Corrections Type-0/1/2, MET Fileters, MET Uncertainties, and other information relevant for MET analyses.

  • Global Muon Reconstruction is a process by which muon tracks are reconstructed using information coming from both the muon spectrometer and the tracker detector.

Review status

Reviewer/Editor and Date (copy from screen) Comments
CMSUserSupport - 08 Jan 2008 restructuring the chapter
JennyWilliams - 16 Oct 2007 added tau pf tagging page info
JennyWilliams - 08 Oct 2007 provided most of page content

Responsible: SudhirMalik
Last reviewed by: KatiLassilaPerini - 19 Feb 2008



7.2 Track Analysis

Complete: 4
Detailed Review status

Goals of this page

This tutorial is intended to show the possible methods to access and analyze the reconstructed Tracks.

NOTE BENE: We will access track information in miniAOD files. Please, check the tracks information stored in the miniAOD.

NOTE: this twiki is under review in this moment.

Contents

Finding data samples

Typically the user wishes to analyze an already generated and reconstructed sample: for example you may want to analyze tracks in CMS.ReleaseValidation or in officially produced physics samples. To find out these samples, please refer to the WorkBook Locating Data Samples section.

Access to Tracks

CMSSW provides three modes for the user to access event data in a CMSSW ROOT event file. Also for accessing the tracks, you can use one of the following modes (described in Analysis Methods).

  • bare ROOT mode
  • FWLite mode (loading the CMS.DataFormat libraries)
  • Framework mode (using an EDAnalyzer module in a configuration file)

We will present an introduction to using tracks for analyses using miniAOD samples. Our exercises will all use real data. First of all you have to create the development area for the analyzer:

source /cvmfs/cms.cern.ch/cmsset_default.sh 
export SCRAM_ARCH=slc6_amd64_gcc530
cmsrel CMSSW_9_2_10
cd CMSSW_9_2_10/src/
cmsenv
voms-proxy-init -voms cms -valid 192:00

Bare ROOT

After opening a CMSSW ROOT file produced by the track reconstruction with Bare ROOT (no setup is needed, ignore all given warnings about missing dictionaries)

root recoTracks.root

and opening a TBrowser

TBrowser b;

you can see the different collections of objects in the ROOT file. A double click on the ROOT Files folder in the right window of the TBrowser and a following double click on the filename recoTracks.root shows the top branch level of the ROOT file. The collections are contained in the Events branch and are visible after double clicking it in the right window. The final track collections start with recoTracks.

TBrowser with Collections

With double clicks on a collection (for example the track collection: recoTracks_generalTracks__RECO and a subsequent double click on the Object branch in the collection recoTracks_generalTracks__RECO.obj, you can select single data members of the object in the collection and produce simple plots using the information of all events in the file by double clicking on a data member (for example recoTracks_generalTracks__RECO.obj.chi2_ to plot the track chi2 distribution).

Bare ROOT browsing

Alternatively, type the Draw command for the main Events branch with the full name of the data member in the root command line:

Events->Draw("recoTracks_generalTracks__RECO.obj.chi2_")

This will produce the same plot of the track chi2 distribution as produced before with the TBrowser.

Transverse momentum distribution

FWLite mode

More sophisticated analysis possibilities are offered by the FWLite mode which has access to all class dictionaries contained in a CMSSW ROOT file. The reprocessing of data, however, is not possible in the FWLite mode, this can be done through an EDAnalyzer module block and running cmsRun executable.

As example for the FWLite mode you can use next macro:

from DataFormats.FWLite import Handle, Events
import ROOT
#events = Events('root://cms-xrd-global.cern.ch//store/data/Run2017B/Charmonium/MINIAOD/PromptReco-v1/000/297/046/00000/88DF6C6A-4556-E711-A5C0-02163E01A630.root')
events = Events('myfile.root') tracks = Handle("std::vector") histogram = ROOT.TH1F("histogram", "histogram", 100, 0, 100) i = 0 for event in events: print "Event", i event.getByLabel("packedPFCandidates", tracks) j = 0 for track in tracks.product(): print " Track", j, track.charge() / track.pt(), track.phi(), track.eta(), track.dxy(), track.dz() histogram.Fill(track.pt()) j += 1 i += 1 if i >= 5: break c = ROOT.TCanvas ( "c" , "c" , 800, 800 ) c.cd() histogram.Draw() c.SaveAs("PtFW.png")

The name of the macro is " FWlite_analize_MiniAOD.py", this simply tells the FWlite to use as input file myfile.root (there is a example, you can use any file containing reconstructed events). You can run it as:

python FWlite_analize_MiniAOD.py

It print some of the properties oft he tracks as: charge, pt(), phi, eta, dxy and dz, besides in this case displays the pT distribution found per track:

PtFW.png

EDAnalyzer mode

The most powerful way to process and analyze CMSSW data is to create an EDAnalyzer. This let the user pull in analysis code modules by way of parameter sets (in the form of configuration files), and running the cmsRun executable on the given parameter set thereby creating or modifying event data. This process exploits the full framework.

To create a very simple example af EDAnalyzer follow the instructions given this section (for more details go to the dedicated part of the workbook: WorkBookWriteFrameworkModule).

First of all you have to create the development area for the analyzer:

mkdir Demo
cd Demo
mkedanlzr DemoTrackAnalyzer
cd DemoTrackAnalyzer/src

In this directory you find DemoTrackAnalyzer.cc, an empty track analyzer which needs to be edited. Open it with your favorite editor.
Here it is shown how to produce the plot of transverse momentum distribution.

  1. Under the comment //user include files add the following lines to include ROOT classes and the TrackCollection:
    • #include "DataFormats/PatCandidates/interface/PackedCandidate.h"
    • #include "FWCore/ServiceRegistry/interface/Service.h"
    • #include "CommonTools/UtilAlgos/interface/TFileService.h"
  2. Add as private members a root file object and a pointer to a histogram:
    • edm::EDGetTokenT<edm::View<pat::PackedCandidate>> trackTags_;
    • TH1D * histo;
  3. In the constructors, add output root file (TFileService)and book the histogram (Note ":" colon at the end of .....iConfig)):
    • trackTags_(consumes<edm::View<pat::PackedCandidate>>(iConfig.getParameter<edm::InputTag>("tracks")))
    • edm::Service<TFileService> fs;
    • histo = fs->make<TH1D>("pt" , "Pt" , 50 , 0 , 50 );
  4. In the analyze method, inside the loop over the tracks in the TrackCollection, fill the histogram with the pt value of the track:
    • histo->Fill(itTrack->pt());

Your file should now look like this: DemoTrackAnalyzer_MiniAOD.cc
Save DemoTrackAnalyzer.cc. Before compiling it you need to edit the CMS.BuildFile. Add the following line on the top of the CMS.BuildFile:

<use name="root"/>
<use name="DataFormats/TrackReco"/>
<use name="CommonTools/UtilAlgos"/>
<use name="PhysicsTools/UtilAlgos"/>

Now compile:

cd ..
scramv1 b

In order to run the analyzer you need a configuration file. Using you editor, create a file called DemoTrackAnalyzer_cfg.py in the test directory and edit it as follows (for more details about configuration files: SWGuideAboutPythonConfigFile):

import ParameterSet.Config as cms
process = cms.Process("Demo")
process.maxEvents = cms.untracked.PSet( input = cms.untracked.int32(-1) )
process.load('FWCore.MessageService.MessageLogger_cfi')
process.MessageLogger.cerr.FwkReport.reportEvery = 10
process.options = cms.untracked.PSet( wantSummary = cms.untracked.bool(True) )
process.maxEvents = cms.untracked.PSet(input = cms.untracked.int32(-1))
process.source = cms.Source("PoolSource",
# replace 'myfile.root' with the source file you want to use
fileNames = cms.untracked.vstring(
'file:myfile.root'
)
)

process.demo = cms.EDAnalyzer('DemoTrackAnalyzer',
tracks = cms.InputTag("packedPFCandidates"),
)

process.TFileService = cms.Service("TFileService",
fileName = cms.string('outfile.root')
)

process.p = cms.Path(process.demo)

This configuration file simply tells the Framework to use as input file myfile.root (you can use any file containing reconstructed events) and to run the DemoTrackAnalyzer, using the CMS default TrackCollection (i.e. the collection labelled as PackedCandidate). Now we are ready to run the analyzer:

cd test
cmsRun DemoTrackAnalyzer_.cfg

This command will produce in the test directory a file called outfile.root. You can open it with ROOT:

root outfile.root

In ROOT open a TBrowser

TBrowser b;

A double click on the ROOT Files folder in the right window of the TBrowser and a following double click on the filename outfile.root shows the list of plots contained in the file. You should find only a plot called pt. Double click on it to display the pt distribution.This distribution should look like the one shown for the Bare ROOT Analysis.

Physics Example

The example uses this macro:

 Demo/DemoAnalyzer/plugins/miniAODmuons.cc

A physics process Jpsi->2mu can be analyzed in EdAnalyzer mode after starting in a completely setup local CMSSW project directory.

source /cvmfs/cms.cern.ch/cmsset_default.sh 
export SCRAM_ARCH=slc6_amd64_gcc530
cmsrel CMSSW_9_2_10
cd CMSSW_9_2_10/src/
cp -r /afs/cern.ch/user/j/jmejiagu/public/miniAODmuonexample/CMSSW_9_2_10/src/Demo/ .
cmsenv
voms-proxy-init -voms cms -valid 192:00
scram b -j8
cd DemoAnalyzer/test
cmsRun miniAODmuonsRootupler.py

We assume the reconstructed data sample to be stored in a file called Rootuple_JpsiToMuMu_2017-MiniAOD.root; otherwise, you should change the name of the output file in miniAODmuonsRootupler.py. Open ROOT:

root -l Rootuple_JpsiToMuMu_2017-MiniAOD.root

Then, to draw the Jpsi mass peaks, execute opening a TBrowser:

TBrowser b;

you can see the collections of object in the ROOT file. A double click on the ROOT Files folder in the right window of the TBrowser and a following double click on the filename Rootuple_JpsiToMuMu_2017-MiniAOD.root shows the branch ntuple. After double clicking it in the ntuple and double clicking B_J_mass:

You can see the Jpsi peak in the di-mu invariant mass distribution


Jpsimass.png

Selecting Good Quality Tracks

Note: we will let this section as it is for didactic purposes, because there are still AOD samples. However, the only flag stored in MiniAOD (PackedCandidate) is "highPurity". Also PackedCandidate have others options, please chek here.

The "generalTrack" collection is produced using rather loose track finding cuts. As a result, it has a very high efficiency to find tracks, but it contains quite a large fraction of fake tracks. Therefore, most people doing physics analysis should use only a subset of the tracks that satisfy quality criteria. The simplest way of doing this, is to use the "quality" flag stored in the Track class. For example:

// Accept only good quality tracks. (This is suitable for most physics analyses)
if (track->quality(Track::highPurity)) { ... track ok ...}

// Accept either medium or good quality tracks. (if want very high efficiency and willing to accept some fakes)
// Note that "highPurity" tracks are a subset of "tight" tracks
if (track->quality(Track::tight)) { ... track ok ...}

// Accept poor, medium or good quality tracks. (if want extremely high efficiency and willing to accept many fakes)
// Note that "highPurity" and "tight" tracks are subsets of "loose" tracks
if (track->quality(Track::loose) { ... track ok ...}

The quality flag is based upon things like the number of hits on the track, its chi2 etc. as described here. You will probably also want to cut yourself on the track impact parameter and Pt. A convenient way of selecting tracks is to use the recoTrack selection tools.

(N.B. Historically, flags "goodIterative" and "confirmed" were also defined. However, these are now redundant and should no longer be used).

Advanced Analysis

Accessing track impact parameters

Track impact parameters with respect to the primary vertex (or any other vertex) can be obtained with reasonable precision via Track::dxy(vertex) ( = transverse impact parameter) and Track::dz(vertex) (=longtitudinal impact parameter). The corresponding errors can be obtained from Track::d0Error() and Track::dzError().

If the "reference point" about which the track parameters was evaluated is a long way from this vertex, then a more precise method is needed. Information on how to use this is given here here. This is not usually necessary for Tracker tracks, except for precision vertexing or b tagging.

Nota bene: if you want to use dxy or dz you need to be sure the pt of the tracks is bigger than 0.5 GeV, otherwise you will get an error related to covariance matrix. next lines are very recommended

if(itTrack->pt()<=0.5)continue;
if(itTrack->charge()==0) continue;// NO neutral objects
if(fabs(itTrack->pdgId())!=211) continue;//Due to the lack of the particle ID all the tracks for cms are pions(ID==211)
if(!(itTrack->trackHighPurity())) continue;

Accessing the hit pattern of a track

Note: please be carefully. From MiniAOD link you will se the next comment:
"approximate hitPattern() and trackerExpectedHitsInner() that yield the correct number of hits, pixel hits and the information returned by lostInnerHits()
(but nothing more, so e.g. you can't rely on methods counting layers, or expected outer hits, or lost hits within the track) "

Given a track (itTrack in our example), here is an example usage of hit pattern accessing the number of the hits and number of hits in the pixel:

// count the number of valid tracker *** hits ***
std::cout << "number of valid tracker hits is "
          << itTrack->numberOfHits() << std::endl;

// count the number of valid pixel *** hits ***
std::cout << "number of valid pixel hits is "
          << itTrack->numberOfPixelHits() << std::endl;

For more methods, please see DataFormats/TrackReco/interface/HitPattern.h.

How to access the track trajectory at each Tracker layer

This can be done exactly on RECO and approximately on AOD. The recipes differ, as explained below.

If you are using RECO, then you can access the track trajectory at its innermost and outermost hit very easily, using for example, Track::innerPosition() or Track::innerHit(). (These return the position and 3-momentum of the track at its point of closest approach to the innermost hit).

You may want to access track's parameters (i.e. its complete TrajectoryMeasurement) on the surface of every layer crossed by the track. The trajectory is the object devoted to collect all the TrajectoryMeasurements (TM) of the tracks. Nevertheless, due to its size on disk, the trajectory cannot be saved on the RECO data file and, after the final fit of the tracks is done, the informations about the TMs were discarded. Final Track contain only the TSOS on the innermost and outermost hits. However, you can refit the tracks in RECO, and this will recover their trajectory information, so solving the problem.

Information on how to refit tracks can be found here.

For MiniAOD the information is very reduce.
Inner hit information: lostInnerHits() will return -1 (validHitInFirstPixelBarrelLayer) if the track has a valid hit in the first pixel barrel layer, 0 (noLostInnerHits) if it doesn't have such hit because of geometrical or detector inefficiencies (i.e. the hit wasn't expected to be there), 1 (oneLostHit) if the track extrapolation towards the beamline crosses an active detector but no hit is found there, 2 (moreLostHits) if there are at least two missing expected inner hits.

Accessing the track parameters at any point in space using TransientTracks.

TransientTracks can be created from normal reco::Tracks. They allow one to extrapolate the track to any point in space. This is useful for vertex reconstruction and for b/tau tagging. For more details about this topic please visit SWGuideTransientTracks.

Add more histograms with impact parameters and number of hits

I will not go into detail about how you should add more histograms. This is the file " DemoTrackAnalyzer_MiniAODnew.cc" that you have to change to plot more histograms. Compare it to what you already have and make changes by seeing what is missing. In this new file there are some comments wichs you could found very helpful.

You could found all the examples in the next github link.

Review status

Reviewer/Editor and Date (copy from screen) Comments
GiuseppeCerati - 29 Apr 2009 Documentation Review
GiuseppeCerati - 23 Jan 2008 Moved from old WorkBookTrackReco page
jhovanny.andres.mejia.guisao 05-November-2017 updated to miniAOD files
Responsible: GiuseppeCerati
Last reviewed by: VincenzoChiochia - 22 Feb 2008

7.3 Vertex Reconstruction

Complete: 5
Detailed Review status

Newsbox

Goals of this page:

This tutorial describes:

  • the pixel standalone reconstruction of tracks and primary vertices (for HLT),
  • the fitting of tracks to a common vertex,
  • the offline primary vertex reconstruction that uses tracks reconstructed from both pixel and silicon strip hits.

Contents

Contents:

Introduction

Vertex reconstruction usually involves 2 steps, vertex finding and vertex fitting. Vertex finding involves grouping tracks into vertex candidates. The vertex-finding algorithms can be very different depending on the physics case (primary or secondary vertex finding, reconstruction of exclusive decays, etc.). Vertex fitting involves determining the best estimate of the vertex parameters (position, covariance matrix, and track parameters constrained by the vertex position and their covariances) for a given set of tracks, as well as indicators of the fit quality (total chi2, number of degrees of freedom, or track weights).

Pixel standalone track and vertex reconstruction is fast enough to be used at HLT. Fitting a single vertex from reconstructed tracks is a common step in the reconstruction of high-level physics objects such as tau's and B's. Offline primary vertex reconstruction yields ultimate vertex position resolution as well as the best possible estimation of the primary vertex position error matrix.

Set up your Environment

Set your runtime environment (shown for release %WBRELEASENEW%):
cd CMSSW_%WBRELEASENEW%/src
cmsenv

Data samples

For this tutorial, any data sample reconstructed with the standard sequence can be used, as long as it countains tracks and primary vertices. These are available in both the RECo and AOD samples. Typically the user wishes to analyze an already generated and reconstructed sample, such as PhysicsValidation or in CMS.ReleaseValidation samples. To find out these samples, please refer to the WorkBook Locating Data Samples section.

In some cases the user may want to analyze reconstructed Tracks in a private production sample: some examples of configuration files to generate data samples can be found in SWGuideTrackRecoSequences (particle gun muons and H->ZZ->4mu).

Individual Tutorial Sections

Offline primary vertex finding

Vertex Fitting

Several other topics are described in the Vertex Reconstruction section of the SWGuide:

Beam-spot reconstruction

Pixel track finding and fitting (for HLT)

Pixel primary vertex finding and tagging (for HLT)

Vertex Analysis Tools

Information Sources

CMS Note 2006/026, "Track reconstruction, primary vertex finding and seed generation with the Pixel Detector", S.Succiarelli et al.
CMS Note 2006/032, "Vertex Fitting in the CMS Tracker", T.Speer et al.
CMS Note 2006/029, "Impact of CMS Silicon Tracker Misalignment on Track and Vertex Reconstruction", P. Vanlaer et al.

Review status

Reviewer/Editor and Date (copy from screen) Comments
AnneHeavey - 28 Jul 2006 Jenny moved Pascal's and Thomas Speer's tutorial initial page to WB; I moved indiv tutorial pages in, fixed TOC, made many edits, added lots of red comments for authors to address! Needs review by experts.
ThomasSpeer - 19 February 2008 Update samples
ThomasSpeer - 27 Feb 2009 review and update

Responsible: Thomas Speer



7.3.1 Offline Primary Vertex Finding

Complete: 5
Detailed Review status

Contents

Primary Vertex reconstruction

The reconstruction of Primary vertices using the full tracks is part of the default reconstruction sequence. It is done using the generalTracks track collection. The vertices are stored in the event as reco::Vertex collections. Two collections are stored in the event:

  • offlinePrimaryVertices : The standard vertex collection, where the offlineBeamSpot is used to filter the tracks. This is the collection which we advise to use.
  • offlinePrimaryVerticesWithBS : For this collection, the offlineBeamSpot is used not only to filter the tracks, but also as an additional constraint in the fit. It improves the resolution of the vertices, but we do not yet advise to use it until further robustness studies are done, especially for the initial conditions.

Opening a data file in root, the data of the primary vertices (such as the coordinates, chi-squared, degrees of freedom) can be viewed direclty. More information on how to redo the primary vertex reconstruction is given in the offline guide.

From CMSSW 1.8.0 onwards, if no reconstructed vertex is found in an event using the tracks, a vertex based on the [[SWGuideFindingBeamSpot][beam-spot] is put into the event. In this case, the vertex contains no tracks (as non have been used) , the chi^2 and the ndof are 0, and the flag isFake() is set to true.

Vertex reconstruction analysis

In cmsRun mode, the primary vertices can be analyzed with the CMS.PrimaryVertexAnalyzer module by running demoAnalyzePrimaryVertex_cfg.py:

cmsRun demoAnalyzePrimaryVertex_cfg.py

in the same directory. A root histogram file simpleVertexAnalyzer.root is produced, that, in addition to histograms of the fitted position, covariance, chi2 and degrees of freedom, also contains results of sanity checks of the number of valid track links and invalid fit parameters.

root simpleVertexAnalyzer.root
pullx->Draw()
eff->Draw()
nans->Draw()

A simple test of the input track parameters is done by

cmsRun demoAnalyzeTracks_cfg.py

which produces histograms of track parameter pulls. For instance, to look at curvature and z0:

root simpleTrackAnalyzer.root
pull0->Draw()
pull4->Draw()

Review status

Reviewer/Editor and Date (copy from screen) Comments
Main.werdmann - 04 Dec 2006 update
ThomasSpeer - 11 Feb 2008 update
ThomasSpeer - 16 Sep 2008 update for 2.1.X

Responsible: ThomasSpeer
Last reviewed by: ThomasSpeer - 16 Sep 2008



7.3.2 Vertex Fitting

Complete: 5

Contents

Introduction

The task of Vertex Fitting, given a set of tracks, is to compute the best estimate of the vertex parameters (position, covariance matrix, constrained track parameters and their covariances) as well as indicators of the success of the fit (total chi^2, number of degrees of freedom, track weights).

All the fitting algorithms are implemented in VertexFitter classes, which control all the steps of the vertex fit from the input of the initial information to the output of the estimated quantities. These algorithms are used in other algorithms (vertex finders, b-taggers, etc.), and can be used in an analysis by themselves. They have to be invoked by the user, and take as a minumum a selected set of tracks as input. As such, a VertexFitter does not produce data which it puts back into the event, since they are not stand-alone EDAnalyzer or EDProducer All fitters work in the same way, with the exception of the constructor, since each concrete algorithm can take different parameters as input.

As an example, we will use the KalmanVertexFitter (KVF). It is a fitter that uses the Kalman filter formalism.

Setup for tutorial

In the src directory of your area do:

addpkg RecoVertex/KalmanVertexFit
cd RecoVertex/KalmanVertexFit
scramv1 b

An example EDAnalyzer is given in the test directory: KVFTest.cc (and KVFTest.h ), with the configuration file analyzeKVF_cfg.py. To see how to use a fitter, look into KVFTest.cc. A few explanations are given below.

Usage of a VertexFitter

To use a fitter, simply instantiate one and request the vertex with a set of TransientTracks:

KalmanVertexFitter fitter;
TransientVertex myVertex = fitter.vertex(vectorOfTransientTracks); 
If you want to change the parameters of the fitter, set them with the appropriate set method (different fitters will obviously have different parameters):
  • Convergence criterion (maximum transverse distance between vertex computed in the previous and the current iterations): setMaximumDistance (float maxShift)
  • Maximum number of iterations: setMaximumNumberOfIterations(int maxIterations)

The vertex returned may not be valid in some cases. The condition might be different for each fitter. For the KalmanVertexFitter, as for most other fitters, an invalid vertex is returned when the maximum number of iterations is exceeded or the fitted position is out of the tracker bounds. An error message is printed in the log, which for the two above-mentionned cases is:

       The maximum number of steps has been exceeded. Returned vertex is invalid.
       Fitted position is out of tracker bounds. Returned vertex is invalid.
and the user has to check the validity of the vertex with the method isValid().

Usage of a TransientTracks object

(More detailed information about TransientTracks be found here).

The tracks which are used for vertex reconstruction and for b/tau tagging are the TransientTracks. Being constructed from a reco::Track, they have access to all the data from it and provide methods which can not be provided by the reco::Track.

While the reco::Track provides only the perigee parameters at the point of closest approach to the nominal vertex (0.,0.,0.), the TransientTracks can provide states at any point along its trajectory. Having access to the magnetic field, it allows you to propagate the track through the various propagators provided in TrackingTools/GeomPropagators. Several other tools, such IPextrapolators are meant to be used with the TransientTracks.

If you have a collection of reco::Track, you must convert them first to a collection of TransientTracks:

#include "TrackingTools/TransientTrack/interface/TransientTrackBuilder.h"
#include "TrackingTools/Records/interface/TransientTrackRecord.h"

    // get RECO tracks from the event
    edm::Handle<reco::TrackCollection> tks;
    iEvent.getByLabel(trackLabel(), tks);

    //get the builder:
    edm::ESHandle<TransientTrackBuilder> theB;
    iSetup.get<TransientTrackRecord>().get("TransientTrackBuilder",theB);
    //do the conversion:
    vector<TransientTrack> t_tks = (*theB).build(tks);

The SimpleVertexTree

The classes SimpleVertexTree is provided to produce a simple TTree to analyse the results of a fitter. It compares the reconstructed vertex with a simulated vertex.

Example

Running now analyzeKVF.cfg, you will see for each event the number of tracks found and the position of the fitted vertex. At the end of the job, some simple statistics are printed. The tree produced by SimpleVertexTree is stored in the root file simpleVertexAnalyzer.root In a root session, look at the output with the following:

.L SimpleVertexAnalysis.C++
t=new SimpleVertexAnalysis("simpleVertexAnalyzer.root")
t->vertexLoop()
t->plotVertexResult()
You will see two canvases with the following histograms:

  • Page with chi^2, track weight and timing distributions:
    VertexFitterStat.jpg

  • Page with residual (left) and pull (right) distributions:
    VertexFitterRes.jpg

Review status

Reviewer/Editor and Date (copy from screen) Comments
JennyWilliams - 05 Dec 2006 tidied, added completeness bar, responsible/review fields.
JennyWilliams - 27 Mar 2007 Edited to move vertexing links into swguide

Responsible: ThomasSpeer
Last reviewed by: ThomasSpeer - 19 Feb 2008



7.4 Electron Analysis

Complete: 5
Detailed Review status

Contents

Content

Introduction

The present page deals with the analysis of electron data produced with CMSSW 94X. It shows examples of some simple analysis code reading the standard collection of reconstructed electron objects whose reconstruction is performed by default by CMSSW. For further details about the reconstruction process, have a look at SWGuideElectronRecoSoftware.

Getting Started

Access to computing environment

First make sure that you have access to either cmslpc, or lxplus. You will also need a github account to checkout the code. To check if you’re using bash, tcsh, sh, csh, ksh, or zsh. You can request for a change in shell (cmslpc,lxplus account self-service tool).

You might as well get a proxy now as you will need it later.

voms-proxy-init -voms cms --valid 192:00
If you do not have grid access see LcgAccess

Analysis in CMSSW 94x

Set-up CMSSW Release

Below is a recipe for setting up the analysis for the CMSSW94 release:

cmsrel CMSSW_9_4_4
cd CMSSW_9_4_4/src
cmsenv

Create Ntuples using the ggNtuplizer

Bare ROOT access

At this point you should have set up your computing environment and perhaps have a centrally produced miniAOD ROOT file. the simplest thing to do with an edm ROOT file is to open it with ROOT interpreter (do not care about the numerous warnings about the lacking dictionaries). You can look within the file, for instance look at the tree called "Events". But what we really want to do is to Ntuplize the datasets. An Ntuple is simply a collection of events, where different variables (momentum, Energy, etc..) corresponding to each events are kept. With these Ntuples, one can make plot meaningful distributions and perform an analysis.

To make Ntuples, one can write an EDAnalyzer. However, making a complete one from scratch takes time so there is a standard Ntuplizer for doing photon and electron analysis called the ggNtuplizer.

Create Ntuples using the ggNtuplizer

First we build the ggNtuplizer in the latest CMSSW version it’s available in: https://github.com/cmkuo/ggAnalysis/tree/94X

This step may take some time.

git cms-init 
git cms-merge-topic lsoffi:CMSSW_9_4_0_pre3_TnP 
git cms-merge-topic guitargeek:ElectronID_MVA2017_940pre3 
scram b -j8 
cd $CMSSW_BASE/external/slc6_amd64_gcc630 
git clone https://github.com/lsoffi/RecoEgamma-PhotonIdentification.git data/RecoEgamma/PhotonIdentification/data 
cd data/RecoEgamma/PhotonIdentification/data 
git checkout CMSSW_9_4_0_pre3_TnP 
cd $CMSSW_BASE/external/slc6_amd64_gcc630/ 
git clone https://github.com/lsoffi/RecoEgamma-ElectronIdentification.git data/RecoEgamma/ElectronIdentification/data 
cd data/RecoEgamma/ElectronIdentification/data 
git checkout CMSSW_9_4_0_pre3_TnP 
cd $CMSSW_BASE/src 
git cms-merge-topic cms-egamma:EGM_94X_v1 
cd EgammaAnalysis/ElectronTools/data 
git clone https://github.com/ECALELFS/ScalesSmearings.git 
cd ScalesSmearings/ 
git checkout Run2017_17Nov2017_v1
cd $CMSSW_BASE/src 
git clone https://github.com/cmkuo/HiggsAnalysis.git 
git clone -b 94X https://github.com/cmkuo/ggAnalysis.git 
scram b -j8 

You can now make Ntuples using a configuration file. You can open it with a text editor like vim,

vi ggAnalysis/ggNtuplizer/test/run_data_94X.py #or 
vi ggAnalysis/ggNtuplizer/test/run_mc_94X.py

If you want to learn what the Ntuplize does you can check out,

vi ggAnalysis/ggNtuplizer/plugins/ggNtuplizer_photons.cc.

One dataset we could run on is DYJetsToLL_M-50_TuneCP5_13TeV-amcatnloFXFX-pythia8/MINIAODSIM/94X_mc2017_realistic_v10-v1/00000/005DC030-D3F4-E711-889A-02163E01A62D.root (see CMS Data Aggregation system).

In the run_mc_94X.py configuration file, comment out the original input file and add the file we want to process.

process.source = cms.Source("PoolSource",
                            fileNames = cms.untracked.vstring(
        #'file:/data4/cmkuo/testfiles/DYJetsToLL_M-50_RunIIFall17.root'        
        '/store/mc/RunIIFall17MiniAOD/DYJetsToLL_M-50_TuneCP5_13TeV-amcatnloFXFX-pythia8/MINIAODSIM/94X_mc2017_realistic_v10-v1/00000/005DC030-D3F4-E711-889A-02163E01A62D.root'
        ))

Let's set the process.maxEvents to just run over 10000 events. To execute the process go to the source directory and do cmsRun,

 cd $CMSSW_BASE/src/
cmsRun ggAnalysis/ggNtuplizer/test/run_mc_94X.py

The output should look like this,

	--- egmGsfElectronIDs:cutBasedElectronID-Fall17-94X-V1-loose added to patElectrons
	--- egmGsfElectronIDs:cutBasedElectronID-Fall17-94X-V1-medium added to patElectrons
	--- egmGsfElectronIDs:cutBasedElectronID-Fall17-94X-V1-tight added to patElectrons
	--- egmGsfElectronIDs:cutBasedElectronID-Fall17-94X-V1-veto added to patElectrons
	--- egmGsfElectronIDs:heepElectronID-HEEPV70 added to patElectrons
	--- egmGsfElectronIDs:mvaEleID-Fall17-noIso-V1-wp80 added to patElectrons
	--- egmGsfElectronIDs:mvaEleID-Fall17-noIso-V1-wp90 added to patElectrons
	--- egmGsfElectronIDs:mvaEleID-Fall17-noIso-V1-wpLoose added to patElectrons
	--- egmGsfElectronIDs:mvaEleID-Fall17-iso-V1-wp80 added to patElectrons
	--- egmGsfElectronIDs:mvaEleID-Fall17-iso-V1-wp90 added to patElectrons
	--- egmGsfElectronIDs:mvaEleID-Fall17-iso-V1-wpLoose added to patElectrons
	--- egmPhotonIDs:cutBasedPhotonID-Fall17-94X-V1-loose added to patPhotons
	--- egmPhotonIDs:cutBasedPhotonID-Fall17-94X-V1-medium added to patPhotons
	--- egmPhotonIDs:cutBasedPhotonID-Fall17-94X-V1-tight added to patPhotons
	--- egmPhotonIDs:mvaPhoID-RunIIFall17-v1-wp80 added to patPhotons
	--- egmPhotonIDs:mvaPhoID-RunIIFall17-v1-wp90 added to patPhotons
10-Oct-2018 00:59:58 CDT  Initiating request to open file root://cmsxrootd-site.fnal.gov//store/mc/RunIIFall17MiniAOD/DYJetsToLL_M-50_TuneCP5_13TeV-amcatnloFXFX-pythia8/MINIAODSIM/94X_mc2017_realistic_v10-v1/00000/005DC030-D3F4-E711-889A-02163E01A62D.root
%MSG-w XrdAdaptor:  file_open 10-Oct-2018 00:59:59 CDT pre-events
Data is served from fnal.gov instead of original site T1_US_FNAL
%MSG
10-Oct-2018 01:00:00 CDT  Successfully opened file root://cmsxrootd-site.fnal.gov//store/mc/RunIIFall17MiniAOD/DYJetsToLL_M-50_TuneCP5_13TeV-amcatnloFXFX-pythia8/MINIAODSIM/94X_mc2017_realistic_v10-v1/00000/005DC030-D3F4-E711-889A-02163E01A62D.root
Begin processing the 1st record. Run 1, Event 74714917, LumiSection 42879 at 10-Oct-2018 01:00:23.365 CDT
10-Oct-2018 01:01:20 CDT  Closed file root://cmsxrootd-site.fnal.gov//store/mc/RunIIFall17MiniAOD/DYJetsToLL_M-50_TuneCP5_13TeV-amcatnloFXFX-pythia8/MINIAODSIM/94X_mc2017_realistic_v10-v1/00000/005DC030-D3F4-E711-889A-02163E01A62D.root

=============================================

MessageLogger Summary

 type     category        sev    module        subroutine        count    total
 ---- -------------------- -- ---------------- ----------------  -----    -----
    1 XrdAdaptor           -w file_open                              1        1
    2 fileAction           -s file_close                             1        1
    3 fileAction           -s file_open                              2        2

 type    category    Examples: run/evt        run/evt          run/evt
 ---- -------------------- ---------------- ---------------- ----------------
    1 XrdAdaptor           pre-events                        
    2 fileAction           PostGlobalEndRun                  
    3 fileAction           pre-events       pre-events       

Severity    # Occurrences   Total Occurrences
--------    -------------   -----------------
Warning                 1                   1
System                  3                   3

dropped waiting message count 0

The output, ggtree_mc.root is the Ntuple that you can analyze.

Analyzing the Ntuples

One way to access the information contained in the ntuple directly is through the command line using ROOT interactively. Try these commands and draw the electron pt and see the effect of adding some cuts.

$ root -l ggtree_mc.root
root [1] .ls
root [2] ggNtuplizer->cd()
root [3] EventTree->Print()
root [4] EventTree->Draw("elePt")
root [5] EventTree->Draw("elePt","elePt>20") # add a cut at 20 GeV

You can apply a mask cut and select events with a tighter electron ID by doing this:

root [6] EventTree->Draw("elePt","(eleIDbit & 8)== 8")

You can learn a little more about eleIDbit and do more exercises in this link FNALHATS2017EGM). To understand more about the effects of the different levels of cuts, (Loose, Tight, Medium), let's superimpose the plots of the variable =sigmaIetaIetaFull5x5".

EventTree->SetLineColor(kBlack)
EventTree->Draw("eleSigmaIEtaIEtaFull5x5","(eleIDbit & 2)== 2")
EventTree->SetLineColor(kRed)
EventTree->Draw("eleSigmaIEtaIEtaFull5x5","(eleIDbit & 4)== 4","same")
EventTree->SetLineColor(kBlue)
EventTree->Draw("eleSigmaIEtaIEtaFull5x5","(eleIDbit & 8)== 8","same")

We can also use a simple plotting script to make the plots we did above.

https://raw.githubusercontent.com/uzzielperez/CMSSW-workbook-practice/master/ElectronAnalysis/simple_plotter.py
Using the simple plotter, we can reproduce and see the effect of putting a cut at 20 GeV. Making a few modifications with the script, we can make comparisons on the effect of different levels of cuts on different variables. In the script below, you can edit the variable name, "var".
wget https://raw.githubusercontent.com/uzzielperez/CMSSW-workbook-practice/master/ElectronAnalysis/EleIDSame.py

From the plots, what can you say about "fake" electrons in the high pT region? How does the "sigmaIetaIeta" distribution compare for electrons and jets?

Z->ee Exercise

The Z->ee channel is an important final state that is often used for calibration and systematic uncertainty measurements. In this exercise, we will be reconstructing the Z boson. We expect a resonance peak at around 91.1 GeV.

To do this, make sure you are in the CMSSW environment created for this page and copy a simple plotting script for this purpose.

cmsenv
voms-proxy-init -voms cms --valid 192:00
cd $CMSSW_BASE/src/ggAnalysis/ggNtuplizer/test
xrdcp root://cmseos.fnal.gov//store/user/drberry/HATS/makePlots.py . 
# or download a copy from Github 
wget 

The input file would be the Ntuple ggtree_mc.root which we made in the previous section. (Note: If you want to change the input file you can do python makePlots.py -I <filename.root>. To check command line options do, python makePlots.py -h). Running this script will output plots.root which has 3 the histograms for the pT, sigmaIetaIeta and the mass. Have a look at the mass distribution in particular and confirm that it peaks at around the expected mass of the Z.

Now, let's try to loosen and tighten the selection and correspondingly change the output names to plots_loose.root and plots_tight.root. This time, we'll be running on background MC.

python makePlots.py -o plots_loose.root -i root://cmsxrootd.fnal.gov//store/user/drberry/HATS/ggntuples/DYJetsToLL_M-50_TuneCUETP8M1_13TeV-amcatnloFXFX-pythia8.root root://cmsxrootd.fnal.gov//store/user/drberry/HATS/ggntuples/QCD_Pt-40toInf_DoubleEMEnriched_MGG-80toInf_TuneCUETP8M1_13TeV_Pythia8.root root://cmsxrootd.fnal.gov//store/user/drberry/HATS/ggntuples/QCD_Pt-40toInf_DoubleEMEnriched_MGG-80toInf_TuneCUETP8M1_13TeV_Pythia8.root

Change to the tight selection:

(event.eleIDbit[i]&8)!=8:

python makePlots.py -o plots_tight.root -i root://cmsxrootd.fnal.gov//store/user/drberry/HATS/ggntuples/DYJetsToLL_M-50_TuneCUETP8M1_13TeV-amcatnloFXFX-pythia8.root root://cmsxrootd.fnal.gov//store/user/drberry/HATS/ggntuples/QCD_Pt-40toInf_DoubleEMEnriched_MGG-80toInf_TuneCUETP8M1_13TeV_Pythia8.root root://cmsxrootd.fnal.gov//store/user/drberry/HATS/ggntuples/QCD_Pt-40toInf_DoubleEMEnriched_MGG-80toInf_TuneCUETP8M1_13TeV_Pythia8.root

To make a quick comparison of the effects of the loosening and the tightening of the selection, you can use a simple plotter that overlays the plots for several root files.

wget https://raw.githubusercontent.com/uzzielperez/CMSSW-workbook-practice/master/ElectronAnalysis/PltHistsSame.py

If we look at the output plot, much of the output background has been cut off. Let's start experimenting with different cuts and see how it affects the Z peak.

Note: If you have trouble accessing the files you can contact the most recent editor or try getting some practice on retrieving similar datasets from DAS and Ntuplize them to make a similar analysis above.

To make a quick comparison of the effects of the loosening and the tightening of the selection, you can use a simple plotter that overlays the plots for several root files.

wget https://raw.githubusercontent.com/uzzielperez/CMSSW-workbook-practice/master/ElectronAnalysis/PltHistsSame.py

If we look at the output plot, much of the output background has been cut off. Let's start experimenting with different cuts and see how it affects the Z peak. Here are some questions you can ask yourself:

  • How does the mass peak look like for "barrel-only" and "endcap-only" electrons? For the barrel(endcap), the electron should fall in the abs(eleEta) < 1.4 (1.566 < eleEta && eleEta < 2.5) region.
  • What is the effect of raising the R9 requirement? (R9 is the energy sum of 3x3 crystals in a supercluster divided by the raw energy within that cluster. The 3x3 crystals are centered on the most energetic crystal.)?
  • Try other cuts on other variables like eleSigmaIEtaIEtaFull5x5, eleHoverE, elePFChIso, eleIDMVA.

You can also try comparing data with Montecarlo. You can try to list all the samples for this category

eosls /store/user/lpcsusystealth/DATA/ggSKIMS/ 

Let's run over the DoubleEG Run2016H sample. It has around 200M entries so we can choose to run over only 100000 of them.

python makePlots.py -o plots_data.root -m 100000 -i root://cmseos.fnal.gov//store/user/lpcsusystealth/DATA/ggSKIMS/DoubleEG_Run2016H_SepRereco_HLTDiPho3018M90_SKIM.root

We can vary fit parameters in the fitter code below in order to fit the data. This code is configured to run on MC so make sure you change the argument on the fitter function first. Try to see the difference between the regular and regressed energy.

xrdcp root://cmseos.fnal.gov//store/user/drberry/HATS/fitter.C .
root fitter.C

Tag and Probe Efficiency Measurement

The tag and probe method is a data-driven approach to measuring selection ID efficiencies. It exploits resonances that decay into two identical final states, like the JPsi or Z (to dilepton), to select a certain (tag) particles and probe the efficiency of a particular selection criterion being applied to them. The "tag" are often golden objects that need to pass a tight selection, has a very low fake rate, while the probe has a selection criteria that could be very loose. Resonances are used by reconstructing them in pairs with one final state being a "tag" (passing tight ID) while the other passes a loose ID is a probe. Lineshapes of (tag+"passing probes") and (tag+"passing probes") are then created and are separately fit with a signal + background model. The efficiency is then computed from the ratio of the number of "passing probes" and the total (passing + failing probes). These can be done in bins of different probe variables (e.g. pT, η ...).

Setup and PileUp

It is recommended that this part of the workbook be done at lxplus since many files that are needed here are stored at the CERN EOS. However, there's no reason that these cannot be run at FNAL. You can choose to use your cmslpc account instead if you do not have enough space in your lxplus account.

cmsrel CMSSW_9_4_0
cd CMSSW_9_4_0/src
cmsenv

Next we need to clone some GitHub code and update our TnPTreeproducer code. The parts below takes a while but we can save some time with the git cos-init with the following commands:

export CMSSW_GIT_REFERENCE=/cvmfs/cms.cern.ch/cmssw.git.daily #setenv CMSSW_GIT_REFERENCE /cvmfs/cms.cern.ch/cmssw.git.daily
git cms-init
git ls-remote --tags my-cmssw | cut -f2 | sed 's~refs/tags/~~' | xargs -n 1000 -P 1 git push my-cmssw --delete
git cms-merge-topic lsoffi:CMSSW_9_4_0_pre3_TnP
git cms-merge-topic guitargeek:ElectronID_MVA2017_940pre3
scram b -j8

# Add the area containing the MVA weights (from cms-data, to appear in “external”).
# Note: the “external” area appears after “scram build” is run at least once, as above
#
cd $CMSSW_BASE/external
# below, you may have a different architecture, this is just one example from lxplus
cd slc6_amd64_gcc630/
git clone https://github.com/lsoffi/RecoEgamma-PhotonIdentification.git data/RecoEgamma/PhotonIdentification/data
cd data/RecoEgamma/PhotonIdentification/data
git checkout CMSSW_9_4_0_pre3_TnP
cd $CMSSW_BASE/external
cd slc6_amd64_gcc630/
git clone https://github.com/lsoffi/RecoEgamma-ElectronIdentification.git data/RecoEgamma/ElectronIdentification/data
cd data/RecoEgamma/ElectronIdentification/data
git checkout CMSSW_9_4_0_pre3_TnP
# Go back to the src/
cd $CMSSW_BASE/src

cd $CMSSW_BASE/src
git clone -b CMSSW_9_4_X https://github.com/cms-analysis/EgammaAnalysis-TnPTreeProducer.git EgammaAnalysis/TnPTreeProducer

scram b -j4

We then need to retrieve the 2017 golden and luminosity JSON files from the CMS DQM area.

wget https://cms-service-dqm.web.cern.ch/cms-service-dqm/CAF/certification/Collisions17/13TeV/Final/Cert_294927-306462_13TeV_PromptReco_Collisions17_JSON.txt
wget https://cms-service-dqm.web.cern.ch/cms-service-dqm/CAF/certification/Collisions17/13TeV/PileUp/pileup_latest.txt

Using those two files we can now generate the 2017 data pileup distribution using a pileup calculator:

pileupCalc.py -i Cert_294927-306462_13TeV_PromptReco_Collisions17_JSON.txt --inputLumiJSON pileup_latest.txt --calcMode true --minBiasXsec 69200 --maxPileupBin 100 --numPileupBins 100 pileup_2017_41fb.root

Let's dump the values of the data pileup distribution to the std output.

cp ~charaf/nobackup/EgammaHATS2018/dumpPileup.py PhysicsTools/TagAndProbe/test/utilities/
python PhysicsTools/TagAndProbe/test/utilities/dumpPileup.py  pileup_2017_41fb.root

Replace the values for the key "2017_DATA_xSec69.2mb_94X_17Jan" in the "data_pu_distribs" dictionary.

data_pu_distribs = {.....
"2017_DATA_xSec69.2mb_94X_17Jan":[2.6e+05,1.08e+06,2.09e+06,3.69e+06,....

data_pu_distribs = {.....
"2017_DATA_xSec69.2mb_94X_17Jan":[2.6e+05,1.08e+06,2.09e+06,3.69e+06,....

   "2017_DATA_xSec69.2mb_94X_17Jan":[2.59e+05,1.08e+06,2.01e+06,3.78e+06,...,1.34,0.765,0.431]

Now we can produce the T&P trees!

Preparing the T&P dataset

To produce a T&P trees, you can do the following.

cd EgammaAnalysis/TnPTreeProducer/
cmsenv
cmsRun python/TnPTreeProducer_cfg.py doEleID=True isMC=False maxEvents=5000

Open the root file it produces which is called "TnPTree_data.root" and inspect a few distributions like = el_5x5_sieie, el_abseta, el_sc_eta, el_hoe=. The number of events in this file might not be enough for a "good" set of distributions so you can rerun this setting maxEvents to a higher number (~10000). Or you can retrieve a prepared T&P Tree by doing:

cd ~/cuperez/nobackup/CMSSWWorkbook/TnPTree_data_workbook.root .

Now we can run the fitter code to get the information we need from the tree.

Measuring Efficiencies by fitting to the dataset

We still need to download the tag and probe fitter code from GitHub.

cd $CMSSW_BASE/src
git clone -b egm_tnp_Moriond18_v3.0 git@github.com:michelif/egm_tnp_analysis.git
cd egm_tnp_analysis/
make

We'll be modifying some of the python files so it's good to keep a backup copy:

cp etc/config/settings_ele.py etc/config/settings_ele_workbook.py 
cp etc/config/tnpSampleDef.py etc/config/tnpSampleDef_workbook.py 
cp etc/config/tnpEGM_fitter.py etc/config/tnpEGM_fitter_workbook.py

If you are at FNAL LPC, you need to replace eos/cms by root://eoscms.cern.ch/. If at CERN, put / before eos/cms. The need to use xrootd at LPC is a lot slower so it is recommended that you work at lxplus.

To run the fitter code, try the following commands and open egammaEffi.txt to view the results. Note that --mcSig --altSig" "--altSig" may take at least an hour.

python tnpEGM_fitter_workbook.py etc/config/settings_ele_workbook.py --flag passingTight94X --checkBins
python tnpEGM_fitter_workbook.py etc/config/settings_ele_workbook.py --flag passingTight94X --createBins
python tnpEGM_fitter_workbook.py etc/config/settings_ele_workbook.py --flag passingTight94X --createHists
python tnpEGM_fitter_workbook.py etc/config/settings_ele_workbook.py --flag passingTight94X --doFit
python tnpEGM_fitter_workbook.py etc/config/settings_ele_workbook.py --flag passingTight94X --doFit --mcSig --altSig
python tnpEGM_fitter_workbook.py etc/config/settings_ele_workbook.py --flag passingTight94X --doFit --altSig
python tnpEGM_fitter_workbook.py etc/config/settings_ele_workbook.py --flag passingTight94X --doFit --altBkg
python tnpEGM_fitter_workbook.py etc/config/settings_ele_workbook.py --flag passingTight94X  --sumUp

NanoAOD

It's good to familiarize oneself with the recently developed NanoAOD data tier. You can learn more about it in the WorkBookNanoAOD. The goal for this new data tier is to keep the event size below 2kb/event so that ~50% of analyses can have their ntuples centrally produced. We can run through a quick example and make a quick analysis with them.

cd $CMSSW_BASE/src 
cmsenv 
git cms-init # not really needed except if you want other cmssw stuff
git clone https://github.com/cms-nanoAOD/nanoAOD-tools.git PhysicsTools/NanoAODTools
scram b
voms-proxy-init -voms cms 

cd PhysicsTools/NanoAODTools/python/postprocessing/examples/
python exampleAnalysis.py

Feel free to look at the code and how you could make selections (e.g. choose events with at least two electrons) as you loop over the events. You could try finding another interesting dataset in DAS and look at what's stored in the root files.

...

Miscellaneous Links

Workshops and Tutorials:

Miscellaneous:

Review status

Reviewer/Editor and Date (copy from screen) Comments
UzzielPerez - 15-Aug-2018 Update for Run 2, CMSSW 94x
MarcoPieri - 24-Jan-2010  
DavidChamont - End November 2009 Upgrade for 3XX.
MatteoSani - 02 Mar 2009  
ChrisSeez - 22 Feb 2008  
Main.Aresh - 27 Feb 2008 Changes in verbatim elements (in the current page and also in https://twiki.cern.ch/twiki/bin/view/CMS/WorkBookElectronAnalysis) because of some lines too long for printable version
JennyWilliams - 08 Oct 2007 recreate contents table
Main.palmale - 05 Dec 2006 Review and edit

Responsible:Main.DavidChamont



7.5 Photon Analysis

Complete: 4
Detailed Review status

Contents

Content

Introduction

This page shows some basic instruction in order to put you in the conditions of starting an analysis on Photons.

Getting Started

Access to computing environment

First make sure that you have access to either cmslpc, or lxplus. You will also need a github account to checkout the code. To check if you’re using bash, tcsh, sh, csh, ksh, or zsh. You can request for a change in shell (cmslpc,lxplus account self-service tool).

You might as well get a proxy now as you will need it later.

voms-proxy-init -voms cms --valid 192:00
If you do not have grid access see LcgAccess

Analysis in CMSSW 94x

Set-up CMSSW Release

Below is a recipe for setting up the analysis for the CMSSW94 release:

cmsrel CMSSW_9_4_4
cd CMSSW_9_4_4/src
cmsenv

Create Ntuples using the ggNtuplizer

First we build the ggNtuplizer in the latest CMSSW version it’s available in: https://github.com/cmkuo/ggAnalysis/tree/94X

This step may take some time.

git cms-init 
git cms-merge-topic lsoffi:CMSSW_9_4_0_pre3_TnP 
git cms-merge-topic guitargeek:ElectronID_MVA2017_940pre3 
scram b -j8 
cd $CMSSW_BASE/external/slc6_amd64_gcc630 
git clone https://github.com/lsoffi/RecoEgamma-PhotonIdentification.git data/RecoEgamma/PhotonIdentification/data 
cd data/RecoEgamma/PhotonIdentification/data 
git checkout CMSSW_9_4_0_pre3_TnP 
cd $CMSSW_BASE/external/slc6_amd64_gcc630/ 
git clone https://github.com/lsoffi/RecoEgamma-ElectronIdentification.git data/RecoEgamma/ElectronIdentification/data 
cd data/RecoEgamma/ElectronIdentification/data 
git checkout CMSSW_9_4_0_pre3_TnP 
cd $CMSSW_BASE/src 
git cms-merge-topic cms-egamma:EGM_94X_v1 
cd EgammaAnalysis/ElectronTools/data 
git clone https://github.com/ECALELFS/ScalesSmearings.git 
cd ScalesSmearings/ 
git checkout Run2017_17Nov2017_v1
cd $CMSSW_BASE/src 
git clone https://github.com/cmkuo/HiggsAnalysis.git 
git clone -b 94X https://github.com/cmkuo/ggAnalysis.git 
scram b -j8 

After you've succesfully built the ggNtuplizer, you can see how the analyzer accesses the photon collection in ggNtuplizer_photons.cc. Use your favourite text editor to open the file.

vi ggAnalysis/ggNtuplizer/plugins/ggNtuplizer_photons.cc

or

emacs -nw ggAnalysis/ggNtuplizer/plugins/ggNtuplizer_photons.cc

  edm::Handle > photonHandle; //Define the photon handle and access information using getByToken
  e.getByToken(photonCollection_, photonHandle);
  ...
for (edm::View::const_iterator iPho = photonHandle->begin(); iPho != photonHandle->end(); ++iPho) {
    ...
    phoEt_            .push_back(iPho->et());
    phoEta_           .push_back(iPho->eta());
    phoPhi_           .push_back(iPho->phi());
    ...
  }

Next, open the configuration file

vi ggAnalysis/ggNtuplizer/test/run_data_94X.py.

This file configures your cmsRun job and creates ntuples. In line 70, you will find the lines that will import the VID framework.You should see:

#####VID framework####################
from PhysicsTools.SelectorUtils.tools.vid_id_tools import *
dataFormat = DataFormat.MiniAOD
switchOnVIDElectronIdProducer(process, dataFormat)
switchOnVIDPhotonIdProducer(process, dataFormat)

These lines import VID and enable the electron and photon IDs. VID (Versioned Identification) is a standardized way to apply particle identification selections. Both electron and photon IDs are generally updated twice a year which usually happens for summer conferences and for Moriond.

 my_phoid_modules = ['RecoEgamma.PhotonIdentification.Identification.cutBasedPhotonID_Fall17_94X_V1_TrueVtx_cff',
                  'RecoEgamma.PhotonIdentification.Identification.mvaPhotonID_Fall17_94X_V1_cff']
….
#add them to the VID producer
for idmod in my_phoid_modules:
    setupAllVIDIdsInModule(process,idmod,setupVIDPhotonSelection)

Ntuples from miniAOD

To create Ntuples from miniAOD files, either MC or data, we can run on a sample dataset specified in this the process.source section of the run_mc_94x.py or run_mc_94x.py file. In this example we will run on an MC dataset DYJetsToLL_M-50_TuneCP5_13TeV-amcatnloFXFX-pythia8/MINIAODSIM/94X_mc2017_realistic_v10-v1/00000/005DC030-D3F4-E711-889A-02163E01A62D.root.

process.source = cms.Source("PoolSource",
                            fileNames = cms.untracked.vstring(
        #'file:/data4/cmkuo/testfiles/DYJetsToLL_M-50_RunIIFall17.root'        
        '/store/mc/RunIIFall17MiniAOD/DYJetsToLL_M-50_TuneCP5_13TeV-amcatnloFXFX-pythia8/MINIAODSIM/94X_mc2017_realistic_v10-v1/00000/005DC030-D3F4-E711-889A-02163E01A62D.root'
        ))

Edit the file and set process.maxEvents to run over 10000 events. Execute the process with

cd $CMSSW_BASE/src/ggAnalysis/ggNtuplizer/test
cmsRun run_mc_94X.py

The output should look like this:

	--- egmGsfElectronIDs:cutBasedElectronID-Fall17-94X-V1-loose added to patElectrons
	--- egmGsfElectronIDs:cutBasedElectronID-Fall17-94X-V1-medium added to patElectrons
	--- egmGsfElectronIDs:cutBasedElectronID-Fall17-94X-V1-tight added to patElectrons
	--- egmGsfElectronIDs:cutBasedElectronID-Fall17-94X-V1-veto added to patElectrons
	--- egmGsfElectronIDs:heepElectronID-HEEPV70 added to patElectrons
	--- egmGsfElectronIDs:mvaEleID-Fall17-noIso-V1-wp80 added to patElectrons
	--- egmGsfElectronIDs:mvaEleID-Fall17-noIso-V1-wp90 added to patElectrons
	--- egmGsfElectronIDs:mvaEleID-Fall17-noIso-V1-wpLoose added to patElectrons
	--- egmGsfElectronIDs:mvaEleID-Fall17-iso-V1-wp80 added to patElectrons
	--- egmGsfElectronIDs:mvaEleID-Fall17-iso-V1-wp90 added to patElectrons
	--- egmGsfElectronIDs:mvaEleID-Fall17-iso-V1-wpLoose added to patElectrons
	--- egmPhotonIDs:cutBasedPhotonID-Fall17-94X-V1-loose added to patPhotons
	--- egmPhotonIDs:cutBasedPhotonID-Fall17-94X-V1-medium added to patPhotons
	--- egmPhotonIDs:cutBasedPhotonID-Fall17-94X-V1-tight added to patPhotons
	--- egmPhotonIDs:mvaPhoID-RunIIFall17-v1-wp80 added to patPhotons
	--- egmPhotonIDs:mvaPhoID-RunIIFall17-v1-wp90 added to patPhotons
27-Apr-2018 23:59:25 CDT  Initiating request to open file root://cmsxrootd-site.fnal.gov//store/mc/RunIIFall17MiniAOD/DYJetsToLL_M-50_TuneCP5_13TeV-amcatnloFXFX-pythia8/MINIAODSIM/94X_mc2017_realistic_v10-v1/00000/005DC030-D3F4-E711-889A-02163E01A62D.root
%MSG-w XrdAdaptor:  file_open 27-Apr-2018 23:59:29 CDT pre-events
Data is served from T2_US_Purdue instead of original site T1_US_FNAL
%MSG
27-Apr-2018 23:59:30 CDT  Successfully opened file root://cmsxrootd-site.fnal.gov//store/mc/RunIIFall17MiniAOD/DYJetsToLL_M-50_TuneCP5_13TeV-amcatnloFXFX-pythia8/MINIAODSIM/94X_mc2017_realistic_v10-v1/00000/005DC030-D3F4-E711-889A-02163E01A62D.root
Begin processing the 1st record. Run 1, Event 74714917, LumiSection 42879 at 28-Apr-2018 00:01:19.401 CDT
28-Apr-2018 00:02:17 CDT  Closed file root://cmsxrootd-site.fnal.gov//store/mc/RunIIFall17MiniAOD/DYJetsToLL_M-50_TuneCP5_13TeV-amcatnloFXFX-pythia8/MINIAODSIM/94X_mc2017_realistic_v10-v1/00000/005DC030-D3F4-E711-889A-02163E01A62D.root

=============================================

MessageLogger Summary

 type     category        sev    module        subroutine        count    total
 ---- -------------------- -- ---------------- ----------------  -----    -----
    1 XrdAdaptor           -w file_open                              1        1
    2 fileAction           -s file_close                             1        1
    3 fileAction           -s file_open                              2        2

 type    category    Examples: run/evt        run/evt          run/evt
 ---- -------------------- ---------------- ---------------- ----------------
    1 XrdAdaptor           pre-events                        
    2 fileAction           PostGlobalEndRun                  
    3 fileAction           pre-events       pre-events       

Severity    # Occurrences   Total Occurrences
--------    -------------   -----------------
Warning                 1                   1
System                  3                   3

This outputs a file name ggtree_mc.root which you can now analyze. If you want to change the output file name, edit this line in run_mc_94x.py

process.TFileService = cms.Service("TFileService", fileName = cms.string('ggtree_mc.root')).

If you want another dataset, you can query a dataset in the CMS Data Aggregation system. For example, using wildcards enter: dataset dataset=/GGJets_*_Pt-50_13TeV-sherpa/*Fall17*/MINIAODSIM, or use the dasgoclient:

dasgoclient -query="dataset=/GGJets*/*Fall17*/MINI*"

For testing, click on one of the datasets, e.g. /GGJets_M-1000To2000_Pt-50_13TeV-sherpa/RunIIFall17MiniAOD-PU2017_94X_mc2017_realistic_v11-v1/MINIAODSIM → click on Files and you will find the logical file names (LFN) of (note: The LFN uniquely identifies any file that is somewhere with in the /store directory tree within all of CMS storage.) of samples you can run on. Example: /store/mc/RunIIFall17MiniAOD/GGJets_M-1000To2000_Pt-50_13TeV-sherpa/MINIAODSIM/PU2017_94X_mc2017_realistic_v11-v1/30000/047D1FF7-A532-E811-B817-02163E01A10B.root.

Open run_mc_94X.py and edit the section, comment out or delete the DYJetsToLL*.root and replace it with the files you want to examine.

If you are analyzing a local file, you need to add the file: prefix like

#'file:/data4/cmkuo/testfiles/DYJetsToLL_M-50_RunIIFall17.root'.      

When you've configured the run_mc_94X.py just execute with cmsRun run_mc_94X.py to get the desired ntuple.

Analyzing the Ntuples

One way to access the information contained in the ntuple directly is through the command line.

$ root -l ggtree_mc.root
root [1] .ls
root [2] ggNtuplizer->cd()
root [3] EventTree->Print()
root [4] EventTree->Draw("phoEt")

The first command opens ROOT and reads in the ntuple contained in the file ggtree_mc.root. The ".ls" command shows that the file contains a directory named ggNtuplizer which contains the TTree EventTree. The Print() command displays the contents of tree, and the Draw() command plots a particular variable. As an exercise

look at different kinematical variables (Et, eta, phi) for photons.

You can also plot a variable with some kinematic cuts. Try:

root [5] EventTree->Draw("phoEt”, "phoEt > 20.0”)
root [6] EventTree->Draw("phoEt", "phoEt > 20.0 && fabs(phoEta) < 1.4")  // Plot electron Et with minimum pT of 20 GeV in the barrel (-1.4 < eta < 1.4).

Another way to access information stored in the ntuple is to open a TBrowser.

root[7] TBrowser b

In the above example, phoEt is the transverse energy of the photon, phoEta indicates the pseudorapidity (a spatial coordinate describing the angle of the photon relative to the beam axis where eta =0 is perpendicular to the beam. Higher-eta objects are described to be more “forward” and are likely to be detected at the endcaps compared to the “central”, low-eta objects), and phoPhi is measured along the azimuthal direction. As a simple exercise, one can plot the phoPhi variable and confirm azimuthal symmetry.

HoverE (H/E), is another important ID which indicates the ratio of energy deposited in the HCAL to the energy in the ECAL.

Shower-shape variables like SigmaIEtaEta, indicate the “shape” or spread of the energy deposited in the calorimeter by the particle. Smaller values of this variable correspond to real photons. The Isolation variables phoPFChIso (Charged Hadron Isolation), phoPFPhoIso (Photon Isolation), phoPFNeuIso(Neutral Hadron Isolation) represent the relative contribution of energy from nearby charged hadrons, photons and neutral hadrons, respectively. A large contribution by these other objects may indicate a fake photon object.

These last two sets of variables, shower-shape and Isolation variables, together with HoverE, are given major consideration when checking for jet-faking photons.

Double click on the file, and the ggNtuplizer directory and look for the variable you want to examine. For example, you can scroll down and double click on phoEt and draw the Et of all the photons.

You could learn more technical details about the definitions of the different photon variables stored in the ntuples To learn more about the definitions of the differet photon variables stored in the Ntuples here. This page also gives the cut-based and MVA-based IDs for both photons and electrons for Run 2.

To practice with more Ntuples, you can look at the examples from the 2018 CMSDAS egamma hands-on exercises. Note that the pre-ntuplized examples here has been processed with an earlier version of CMSSW with different detector conditions that might be outdated for our purposes.

Writing macros

For a quick look at the physics from the Ntuples, the ROOT command line provides an interactive way to plot different variables that you are interested in. However, for more complex analyses, such as distinguishing “true” from “fake” photons from the sample it is necessary to write a longer piece of code to consider more complicated variables that may be functions of variables stored in the Ntuples generated. You can write macros in C++ (see below) or python. Python scripts are usually simpler to create but if you prefer to use C++, there’s a helpful method of TTree called MakeClass() which generates a skeleton macro for you. It also imports the variables in the tree.

Let’s begin by looking at a simple plotting script.

wget https://raw.githubusercontent.com/uzzielperez/CMSSW-workbook-practice/master/PhotonAnalysis/basicPlotter_photons.py
vi basicPlotter_photons.py

The first part of the script collects the input file/s containing the same default TTree ggNtuplizer/EventTree (you can change this if you might be using different Ntuplizer/analyzer) into a chain called tchain. We create histograms for variables that we are interested in. In this case we are interested in the Energy, momentum, Eta, and Phi variables. We select events with a minimum photon energy of 20.0 GeV and apply some other ID cuts. In the main block of the code, we loop over all the events in the input ntuple. Within this loop is a second loop over all the photons in the event. Here we can apply our selection criteria, (e.g. minimum photon energy is 20.0 GeV). The histograms are filled with events that pass the cuts and are stored in an output root file. (In a later exercise, we can apply additional ID cuts on the photons. See H->gg exercise.)

# Output file
file_out = ROOT.TFile(args.outputfile, 'recreate')
# Histograms to fill 
h_pho_pt = ROOT.TH1D('h_pho_pt', 'Photon p_{T}', 98, 20.0, 1000.0)
h_pho_Eta = ROOT.TH1D('h_pho_Eta', 'Photon Eta', 80, -3, 3)
h_pho_Phi = ROOT.TH1D('h_pho_Phi', 'Photon Phi', 80, -3.5, 3.5)
 
#h_pho_sigmaIEtaIEta = ROOT.TH1D('h_pho_sigmaIEtaIEta', 'Photon #sigma_{i#eta i#eta}', 100, 0.0, 0.1)
 
#Loop over all the events in the input ntuple
for ievent,event in enumerate(tchain):
    if ievent > args.maxevents and args.maxevents != -1: break
    if ievent % 10000 == 0: print 'Processing entry ' + str(ievent)
 
    # Loop over all the photons in an event
    for i in range(event.nPho):
        if (event.phoEt[i] > 20.0 and (event.phoIDbit[i]&2==2)):
            pho_vec = ROOT.TLorentzVector()
            pho_vec.SetPtEtaPhiE(event.phoEt[i], event.phoEta[i], event.phoPhi[i], event.phoE[i])
            h_pho_pt.Fill(event.phoEt[i])
            h_pho_Eta.Fill(event.phoEta[i])
            h_pho_Phi.Fill(event.phoPhi[i])

As an optional first step, check the usage of this python script.

python basicPlotter_photons.py -h 

You should see the command line options available to you. You may also want to edit the script for your purposes, e.g. change the default in the argument parser section at the top of the script.

usage: basicPlotter_photons.py [-h] [-i [INPUTFILES [INPUTFILES ...]]]
                               [-o OUTPUTFILE] [-m MAXEVENTS] [-t TTREE]
 
A simple ttree plotter
 
optional arguments:
  -h, --help            show this help message and exit
  -i [INPUTFILES [INPUTFILES ...]], --inputfiles [INPUTFILES [INPUTFILES ...]]
                        List of input ggNtuplizer files
  -o OUTPUTFILE, --outputfile OUTPUTFILE
                        Input ggNtuplizer file
  -m MAXEVENTS, --maxevents MAXEVENTS
                        Maximum number events to loop over. Default set to -1
                        to loop over all events in the input file/s.
  -t TTREE, --ttree TTREE
                        TTree Name

We’ll first take a look at the “ggtree.root” Ntuples we created in the previous section. To specify the input file and output file name other than the default,

python basicPlotter_photons.py -i ggtree_mc.root -o ggtree_mc_plots.root

Examine the plots you created in ggtree_mc_plots. One way to do this to open a TBrowser. Note that the events that populated these histograms might not be “real” photons. We may need to apply more IDs to get a quality collection of photons.

* Try to look at the transverse momentum plot in the log scale. 
* Modify the script by adding other variables. Try making histograms for good discriminating variables that separates real photons from fakes, e.g. |dEtaInSeed|, mini isolation, and the number of expected missing inner hits.

Real and Fake Photons

In this part, we will add another layer of complexity to our basic plotting code to separate "real" from "fake" photons. We edit our basic plotting script and import a function from selection.py. First download the code,

wget https://raw.githubusercontent.com/uzzielperez/CMSSW-workbook-practice/master/PhotonAnalysis/selection.py

At the top of the basic plotting script, basicPlotter_photons.py add #from selection import.

 #! /usr/bin/env python
 
import argparse
from selection import *

...

The selection.py file has a function called has_mcPho_match which takes a TLorentzVector for the reconstructed photon as input and determines whether the closest MC object is a photon or not.

 def has_mcPho_match(event, pho_vec):
    min_delta_r = float('Inf')
    pid = 0
    for mc in range(event.nMC):
        if event.mcPt[mc] > 1.0:
            mc_vec = ROOT.TLorentzVector()
            mc_vec.SetPtEtaPhiE(event.mcPt[mc], event.mcEta[mc], event.mcPhi[mc], event.mcE[mc])
            delta_r = pho_vec.DeltaR(mc_vec)
            if delta_r < min_delta_r:
                min_delta_r = delta_r
                if delta_r < 0.3:
                    pid = abs(event.mcPID[mc])
    if pid == 22: return True
    return False

In the histograms section add,

h_pho_sigmaIEtaIEta = ROOT.TH1D('h_pho_sigmaIEtaIEta', 'Photon #sigma_{i#eta i#eta}', 100, 0.0, 0.1)

Edit the 2nd loop (over the photons in an event) in the so that you have,

    # Loop over all the photons in an event
    for i in range(event.nPho):
        if (event.phoEt[i] > 20.0 and (event.phoIDbit[i]&2==2)):
            pho_vec = ROOT.TLorentzVector()
            pho_vec.SetPtEtaPhiE(event.phoEt[i], event.phoEta[i], event.phoPhi[i], event.phoE[i])
            if not event.isData and has_mcPho_match(event, pho_vec):
                h_pho_pt.Fill(event.phoEt[i])
                h_pho_Eta.Fill(event.phoEta[i])
                h_pho_Phi.Fill(event.phoPhi[i])
                h_pho_sigmaIEtaIEta.Fill(event.phoSigmaIEtaIEtaFull5x5[i])
            else:
                h_pho_pt.Fill(event.phoEt[i])
                h_pho_Eta.Fill(event.phoEta[i])
                h_pho_Phi.Fill(event.phoPhi[i])
                h_pho_sigmaIEtaIEta.Fill(event.phoSigmaIEtaIEtaFull5x5[i])

You can check out the full code here.

In the selection above, the variable phoIDbit, event.phoIDbit[i]&2==2 selects with a loose Photon ID. Now, run the plotting script with the ggtree_mc.root ntuples we produced above.

python basicPlotter_photons.py -i ggtree_mc.root -o ggtree_mc_plots.root. 

* Create corresponding histograms for fake objects (i.e. those that did not pass the selection criteria, did not match generator photons). This part is needed for the next section.
* Plot photon and fake histograms together for a given variable and compare. How do the fake histograms vary with pT? 
* What cuts should you impose when performing an analysis with low/large backgrounds?

Note: The ggtree_mc.root Ntuples we plotted earlier were obtained from running on a DYJetsToLL_M-50_TuneCP5_13TeV-amcatnloFXFX-pythia8 dataset. This sample includes only events with at least two very loose electrons. To prepare for our later exercise on the Higgs decay into two photons, let us look at a sample with at least two very loose photons instead. We will make use of a pre-Ntuplized dataset from the 2018 CMS Data Analysis school. In CMSLPC you can get the file by

xrdcp root://cmseos.fnal.gov//store/user/cmsdas/2018/short_exercises/ElectronsPhotons/GluGluHToGG_M-125_13TeV_powheg_pythia8.root .

* For practice, try querying DAS for a dataset you can Ntuplize yourself with the ggNtuplizer by following the steps in the previous sections (e.g. /GluGluHToGG_M-120_13TeV_powheg_pythia8/RunIIFall17MiniAOD-94X_mc2017_realistic_v10-v2/MINIAODSIM) . 

To make plots, we again indicate the input and output filename as follows:

python basicPlotter_photons.py -i GluGluHToGG_M-125_13TeV_powheg_pythia8.root -o H2gg.root.

ROC curves: Background Rejection and Signal Efficiency

When making cuts, we would like to reject as much background as possible while minimizing signal loss. One possible way to measure the how good a set of cuts is through ROC (Receiver Operating Characteristic) curves. ROC curves can show how the signal efficiency vs bacground rejection changes as the cut on a given variable is changed.

We can edit the basic plotting script above to make histograms for fake objects and run on the ggtree_mc.root/GluGluHToGG-M-125_13Tev ntuples. To get the same results, you can do the following for ggtree_mc.root:

wget https://raw.githubusercontent.com/uzzielperez/CMSSW-workbook-practice/Backup/PhotonAnalysis/basicPlotter_photons3.py
python basicPlotter_photons3.py -i ggtree_mc.root -o ggtree_comp_mc_plots.root
wget https://raw.githubusercontent.com/uzzielperez/CMSSW-workbook-practice/Backup/PhotonAnalysis/makeRoc_pho.py
python makeRoc_pho.py -i ggtree_comp_mc.root -v sigmaIEtaIEta

The main part of the code (makeRoc_pho.py) is in the for loop:

for i in range(h_sig.GetNbinsX()):
    eff_sig = h_sig.Integral(1, i + 1) / h_sig.Integral()
    eff_bkg = h_bkg.Integral(1, i + 1) / h_bkg.Integral()
    g_roc.SetPoint(i, 1.0 - eff_bkg, eff_sig)

The “best” cut ususally depends on the analysis one needs to perform. One usual goal is to achieve a certain signal efficiency or background rejection. Another is to minimize the distance from a hypothetical “perfect” cut with 100% signal efficiency and 100% background rejection, sqrt((1 - eff_sig)**2 + eff_bkg**2)).

* Try to code and minimise the distance from a hypothetical "perfect" cut. 

H->gg Bonus Exercise

Although the Higgs primarily decays into bbbar pairs, there is a huge background in this channel that gives enough motivation to search in the diphoton channel which will be done here. In fact, the Higgs was discovered in this channel and the ZZ channel. The hunt for the Higgs was a difficult task since it has a small cross-section and the easily reconstructed final states also have a small branching ratio. In this exercise, one can expand the basic plotting script in the earlier examples to include only events with at least two photons that pass a certain selection criteria. You will see that the reconstructed Higgs peaks at 125 GeV.

To run the script:

wget https://raw.githubusercontent.com/uzzielperez/CMSSW-workbook-practice/master/PhotonAnalysis/selection.py
wget https://raw.githubusercontent.com/uzzielperez/CMSSW-workbook-practice/master/PhotonAnalysis/makePlots_H2gg.py
python makePlots_H2gg.py -m 10000 -i root://cmsxrootd.fnal.gov///store/user/drberry/HATS/ggntuples/GluGluHToGG_M-125_13TeV_powheg_pythia8.root -o H2gg_plots.root

The output: * HiggsMassPeak.pdf: HiggsMassPeak

* Try applying R9 and regression corrects to the photon and see the improvement to the peak width.
* Search for the Higgs in the data skims from above.

Generating events and Writing your own Analyzer to access photon information

See related:

For your own study, you might want to simulate and generate events with photon final states. You can check out a number of run cards from GitHub (link above) and follow the recipes in the Generator section of the CMSSW Workbook. For our purposes, will take a quick run-through of how to use the Pythia8 interface with CMSSW and make some basic plots from it.

First get the fragment,

https://raw.githubusercontent.com/uzzielperez/CMSSW-workbook-practice/master/Generators/SMGG_TuneCUEP8M1_13TeV_pythia8_cfi.py

Once we have that ready, from the src directory of $CMSSW_BASE we will use cmsDriver.py to generate 1000 GEN level events only (no showering or detector simulation).

scram b -j 16
cmsDriver.py Configuration/GenProduction/python/ThirTeenTeV/SMGG_TuneCUEP8M1_13TeV_pythia8_cfi.py -s GEN --mc --no_exec --conditions auto:mc -n 1000

A configuration file would then be generated. To run, do

cmsRun SMGG_TuneCUEP8M1_13TeV_pythia8_cfi_py_GEN.py

You can also do

cmsRun SMGG_TuneCUEP8M1_13TeV_pythia8_cfi_py_GEN.py > logSMGG_TuneCUEP8M1_13TeV_pythia8_cfi_py_GEN.txt
to save the output to text file. This takes a while to run. After this, a root file will be generated. You have to do
edmDumpEventContent SMGG_TuneCUEP8M1_13TeV_pythia8_cfi_py_GEN.root
If done correctly, the output would look like this,
Type                                  Module               Label      Process   
--------------------------------------------------------------------------------
GenEventInfoProduct                   "generator"          ""         "GEN"     
ROOT::Math::PositionVector3D,ROOT::Math::DefaultCoordinateSystemTag>    "genParticles"       "xyz0"     "GEN"     
double                                "ak4GenJets"         "rho"      "GEN"     
double                                "ak4GenJetsNoNu"     "rho"      "GEN"     
double                                "ak8GenJets"         "rho"      "GEN"     
double                                "ak8GenJetsNoNu"     "rho"      "GEN"     
double                                "ak4GenJets"         "sigma"    "GEN"     
double                                "ak4GenJetsNoNu"     "sigma"    "GEN"     
double                                "ak8GenJets"         "sigma"    "GEN"     
double                                "ak8GenJetsNoNu"     "sigma"    "GEN"     
edm::HepMCProduct                     "generatorSmeared"   ""         "GEN"     
edm::TriggerResults                   "TriggerResults"     ""         "GEN"     
float                                 "genParticles"       "t0"       "GEN"     
vector                        "ak4GenJets"         "rhos"     "GEN"     
vector                        "ak4GenJetsNoNu"     "rhos"     "GEN"     
vector                        "ak8GenJets"         "rhos"     "GEN"     
vector                        "ak8GenJetsNoNu"     "rhos"     "GEN"     
vector                        "ak4GenJets"         "sigmas"   "GEN"     
vector                        "ak4GenJetsNoNu"     "sigmas"   "GEN"     
vector                        "ak8GenJets"         "sigmas"   "GEN"     
vector                        "ak8GenJetsNoNu"     "sigmas"   "GEN"     
vector                           "genParticles"       ""         "GEN"     
vector                  "ak4GenJets"         ""         "GEN"     
vector                  "ak4GenJetsNoNu"     ""         "GEN"     
vector                  "ak8GenJets"         ""         "GEN"     
vector                  "ak8GenJetsNoNu"     ""         "GEN"     
vector                  "genMetCalo"         ""         "GEN"     
vector                  "genMetTrue"         ""         "GEN"     
vector             "genParticles"       ""         "GEN"   

To access the information in the root file we generated, we need to write an analyzer. You can follow the link on how to write an analyzer, but for our purposes, do the following:

mkdir PhotonDemoAnalyzer
Cd PhotonDemoAnalyzer

mkedanlzr PhotonAnalyzer
cd PhotonAnalyzer
scram b -j 8

These commands will set up the environment and the framework module for the analyzer. The mkedanlzr script generates a configuration file located in the python directory. Open ConfFile_cfg.py and replace the “myfile.root” with the desired source. In our case, it would be SMGG_TuneCUEP8M1_13TeV_pythia8_cfi_py_GEN.root. With the path to the root file, the configuration file will look like this:

import FWCore.ParameterSet.Config as cms

process = cms.Process("Demo")

process.load("FWCore.MessageService.MessageLogger_cfi")

process.maxEvents = cms.untracked.PSet( input = cms.untracked.int32(-1) )

process.source = cms.Source("PoolSource",
    # replace 'myfile.root' with the source file you want to use
    fileNames = cms.untracked.vstring(
        'file:/uscms/home/cuperez/nobackup/PhotonAnalysis/CMSSW_9_4_4/src/SMGG_TuneCUEP8M1_13TeV_pythia8_cfi_py_GEN.root'

    )
)

process.demo = cms.EDAnalyzer('PhotonAnalyzer'
)


process.p = cms.Path(process.demo)

To do a quick check if it works, do

cmsRun python/ConfFile_cfg.py

The output should look as follows:

06-Aug-2018 02:10:08 CDT  Initiating request to open file file:/uscms/home/cuperez/nobackup/PhotonAnalysis/CMSSW_9_4_4/src/SMGG_TuneCUEP8M1_13TeV_pythia8_cfi_py_GEN.root
06-Aug-2018 02:10:10 CDT  Successfully opened file file:/uscms/home/cuperez/nobackup/PhotonAnalysis/CMSSW_9_4_4/src/SMGG_TuneCUEP8M1_13TeV_pythia8_cfi_py_GEN.root
Begin processing the 1st record. Run 1, Event 1, LumiSection 1 at 06-Aug-2018 02:10:12.439 CDT
Begin processing the 2nd record. Run 1, Event 2, LumiSection 1 at 06-Aug-2018 02:10:12.440 CDT
Begin processing the 3rd record. Run 1, Event 3, LumiSection 1 at 06-Aug-2018 02:10:12.440 CDT
……….
Begin processing the 996th record. Run 1, Event 996, LumiSection 1 at 06-Aug-2018 02:10:12.597 CDT
Begin processing the 997th record. Run 1, Event 997, LumiSection 1 at 06-Aug-2018 02:10:12.597 CDT
Begin processing the 998th record. Run 1, Event 998, LumiSection 1 at 06-Aug-2018 02:10:12.597 CDT
Begin processing the 999th record. Run 1, Event 999, LumiSection 1 at 06-Aug-2018 02:10:12.598 CDT
Begin processing the 1000th record. Run 1, Event 1000, LumiSection 1 at 06-Aug-2018 02:10:12.598 CDT
06-Aug-2018 02:10:12 CDT  Closed file file:/uscms/home/cuperez/nobackup/PhotonAnalysis/CMSSW_9_4_4/src/SMGG_TuneCUEP8M1_13TeV_pythia8_cfi_py_GEN.root

=============================================

MessageLogger Summary

 type     category        sev    module        subroutine        count    total
 ---- -------------------- -- ---------------- ----------------  -----    -----
    1 fileAction           -s file_close                             1        1
    2 fileAction           -s file_open                              2        2

 type    category    Examples: run/evt        run/evt          run/evt
 ---- -------------------- ---------------- ---------------- ----------------
    1 fileAction           PostGlobalEndRun                  
    2 fileAction           pre-events       pre-events       

Severity    # Occurrences   Total Occurrences
--------    -------------   -----------------
System                  3                   3

dropped waiting message count 0

Now we want to access event information like each of the photon's pt, eta, and phi for instance. We have prepared a basic analyzer that you can study and modify to your liking. Go back to CMSSW_9_4_4/src and start fresh. You can compare the newly checked out analyzer with the prepared one.

cd CMSSW_9_4_4/src/src 
svn checkout https://github.com/uzzielperez/CMSSW-workbook-practice/trunk/EDAnalyzer/DiPhotonAnalyzer/
cd EDAnalyzer/DiPhotonAnalyzer
rm -rf .svn/             #this is not needed
rm -rf plugins/.svn
rm -rf plugins/.svn 

There are a number of added lines, includes, in the plugins/BuildFile.xml, DiPhotonAnalyzer.cc and the configuration file. We will focus on the important additions. Below, we have defined structs to store information on severable variables of interests for particular object like the leading and subleading photons of an event.

 //DiPhotonAnalyzer.cc - 
 edm::Service fs;
      edm::EDGetTokenT > genParticlesToken_;
      edm::InputTag genParticles_;
      edm::InputTag particles_;

      TTree *fgenTree;
      //Structs
      struct eventInfo_t {
      Long64_t run;
      Long64_t LS;
      Long64_t evnum;
      };
      eventInfo_t fEventInfo;

      struct genPhotonInfo_t{
        double pt;
        double eta;
        double phi;
        double E;
       };

      genPhotonInfo_t fPhoton1;
      genPhotonInfo_t fPhoton2;

     struct genDiPhotonInfo_t{
        //kinematics 
        double Minv;
        double qt;
      };

      genDiPhotonInfo_t fDiPhoton;
      int numPhotons = 0;

In the next section, we can define the tree and the branches where we want to write our selected information into. We also define a token that consumes the type of object from the input root file.

   //DiPhotonAnalyzer.cc - Tree
   fgenTree = fs->make("fgenTree","GENDiphotonTree");
   fgenTree->Branch("genPhoton1", &fPhoton1, "pt/D:eta:phi:E");
   fgenTree->Branch("genPhoton2", &fPhoton2, "pt/D:eta:phi:E");
   fgenTree->Branch("genDiPhoton", &fDiPhoton, "Minv/D:qt");

   genParticlesToken_ = consumes>(iConfig.getParameter("particles"));

After initializing the variables, we then loop over the genParticle collection and choose particles with pdgID==22 (photons) with status code 1 (end state). We also sort the photons by pt and then update the structs.

   //DiPhotonAnalyzer.cc - Loop over genParticle collection
   //There are more succinct ways to do this section but we want to begin with a more transparent code.
   
  for(vector::const_iterator ip = genParticles->begin(); ip != genParticles->end(); ++ip){
      if(ip->status()==1 && ip->pdgId()==22){
         //cout << "Photon end state found" << endl;
        photoncount = photoncount + 1;
        double pt = ip->pt();
        double eta = ip->eta();
        double phi = ip->phi();
        double E = ip->energy();

        //Ordering photons
        if (pt > fPhoton1.pt){
            fPhoton2.pt = fPhoton1.pt;
            fPhoton2.eta = fPhoton1.eta;
            fPhoton2.phi = fPhoton1.phi;
            fPhoton2.E = fPhoton1.E;

            fPhoton1.pt = pt;
            fPhoton1.eta = eta;
            fPhoton1.phi = phi;
            fPhoton1.E = E;
        }
        if ((pt < fPhoton1.pt) && (pt > fPhoton2.pt)){
            fPhoton2.pt = pt;
            fPhoton2.eta = eta;
            fPhoton2.phi = phi;
            fPhoton2.E = E;
        }
      }//end photon end state condition
  }//end loop over gen particles 

Once you have the basic kinematic variables for each photon, you can computer for more complex variables such as invariant mass and add them to your Ntuples.

   //DiPhotonAnalyzer.cc - DiPhoton object 
 //Fill DiPhoton Information
  TLorentzVector leadingPhoton, subleadingPhoton;
  leadingPhoton.SetPtEtaPhiE(fPhoton1.pt, fPhoton1.eta, fPhoton1.phi, fPhoton1.E);
  subleadingPhoton.SetPtEtaPhiE(fPhoton2.pt, fPhoton2.eta, fPhoton2.phi, fPhoton2.E);

   TLorentzVector total = leadingPhoton + subleadingPhoton;
   double ggmass = total.M();

   fDiPhoton.Minv = ggmass;
   fDiPhoton.qt = total.Pt();

Finally the information can be written to the tree.

 //Fill the tree Branches 
   fgenTree->Fill();

Take note that you need to add these lines to your configuration file python/ConfFile_cfg.py in order for it to compile properly.

process.TFileService = cms.Service("TFileService",
                fileName = cms.string("DemoDiPhotonInfo.root") #output filename
                            )

process.demo = cms.EDAnalyzer('DiPhotonAnalyzer',
    particles = cms.InputTag("genParticles")
  )

To run and make Ntuples.

cd CMSSW_9_4_4/src/EDAnalyzer/ 
scram b -j 4
cmsRun DiPhotonAnalyzer/python/ConfFile_cfg.py

This will create DemoDiPhotonInfo.root from which can make your plots.

NanoAOD

It's good to familiarize oneself with the recently developed NanoAOD data tier. You can learn more about it in the WorkBookNanoAOD. The goal for this new data tier is to keep the event size below 2kb/event so that ~50% of analyses can have their ntuples centrally produced. We can run through a quick example and make a quick analysis with them.

cd $CMSSW_BASE/src 
cmsenv 
git cms-init # not really needed except if you want other cmssw stuff
git clone https://github.com/cms-nanoAOD/nanoAOD-tools.git PhysicsTools/NanoAODTools
scram b
voms-proxy-init -voms cms 

cd PhysicsTools/NanoAODTools/python/postprocessing/examples/
python exampleAnalysis.py

Feel free to look at the code and how you could make selections (e.g. choose events with at least two photons) as you loop over the events. You could try finding another interesting dataset in DAS and look at what's stored in the root files.

Miscellaneous Links

Workshops and Tutorials:

Miscellaneous:

Review status

Reviewer/Editor and Date (copy from screen) Comments
CiliciaUzzielPerez - 28-April-2018 Update for Run 2, CMSSW 94x
MarcoPieri - 24-Jan-2010  
NancyMarinelli- 26 Nov 2009 Update for CMSSW 33X
MatteoSani - 02 Mar 2009  
ChrisSeez - 25 Feb 2008  
Nancy Marinelli - 22 Feb 2008 Update
Responsible: Nancy Marinelli



7.6   Jet Analysis

Introduction to Jets in CMS

Page status as of Oct 28, 2018. Complete: 3


Contents

7.6.1   Introduction

The JetMET POG is responsible for jets.

7.6.2   Types of jets

One must consider multiple variables when classifying a jet such as:

  • Type of input particle
    • PFJets - clustered from particle flow candidates
      • These can be further classified based on different pileup reduction techniques:
        • PFCHS jets - Charge Hadron Subtracted (CHS) - charged particles from non-primary vertices (pileup) are removed before clustering.
        • PUPPI jets - jets using inputs from the PUPPI algorithm
    • JPTJets - Jets plus track
    • CaloJets - clustered from calorimeter towers
  • Jet algorithm used to cluster
    • Anti-kt (AK)
    • Cambridge Aachen (CA)
    • kt
  • Jet size
    • Example: AK4 implies a jet clustered with the anti-kT algorithm with distance parameter R=0.4

7.6.3   Jets in CMSSW

The latest instructions for accessing Jets in CMSSW for AOD, MiniAOD and the latest NanoAOD formats are listed below.

7.6.4   Accessing jets from AOD

The jet collections available in an AOD file can be obtained using edmDumpEventContent. For example in 94X this returns the following PFJet collections:

vector<reco::PFJet>                   "ak4PFJets"              ""                "RECO"   
vector<reco::PFJet>                   "ak4PFJetsCHS"              ""                "RECO"   
vector<reco::PFJet>                   "ak8PFJetsCHS"              ""                "RECO"   
vector<reco::PFJet>                   "pfJetsEI"                  ""                "RECO"   
vector<reco::PFJet>                   "ak8PFJetsCHSSoftDrop"        "SubJets"         "RECO"   
vector<reco::PFJet>                   "cmsTopTagPFJetsCHS"        "caTopSubJets"    "RECO"   

You can loop over these collections as follows:

#include "DataFormats/JetReco/interface/PFJet.h"
#include "DataFormats/JetReco/interface/PFJetCollection.h"

edm::Handle<reco::PFJetCollection> pfjetH;
iEvent.getByLabel("ak4PFJetsCHS", pfjetH);
for ( reco::PFJetCollection::const_iterator jet = pfjetH->begin(); jet != pfjetH->end(); ++jet ) {
    double pt = jet->pt();
}

You'll need to include the following in your BuildFile

<use name="DataFormats/JetReco"/>

See the PFJet Class Reference for more details on the accessible PFJet members.

7.6.5   Accessing jets from MiniAOD

In Spring 2014, a new high-level data tier called MiniAOD was introduced to serve the needs of the mainstream physics analyses while keeping a small event size (30-50 kb/event).

The MiniAOD workbook contains information on accessing the jets and all other high-level physics objects stored in miniAOD:

  • Latest Information (2017) on the jet collections stored: link
  • Example for looping over the jet collections: link
  • Examples of reclustering new jet collections from the PF constituents stored in miniAOD: link

7.6.6   Accessing jets from NanoAOD

The NanoAOD workbook contains information on accessing the jets and all other high-level physics objects stored in NanoAOD.

7.6.7   Jet Corrections

The Jet Energy Resolution and Corrections (JERC) Subgroup is responsible for jet corrections at CMS. The jet corrections workbook can be found under subsection. Three particle-flow algorithms are currently fully supported and recommended by the JERC group, with two cone sizes:

In addition, two algorithms based on Calorimetric inputs (plain Calorimetric jets, and Calorimetric jets with corrections derived based on tracking information) are available:

  • Calo4
  • Calo8
  • JPT4
  • JPT8

7.6.8   Jet ID

The latest instructions on Jet-Identification can be found at JetID twiki.

7.6.9   Jet Substructure

A tutorial on Jet Substructure techniques can be found at this twiki.

7.6.10   Pileup Jet ID

The latest instructions on Pileup Jet ID can be found at this twiki.

7.6.11   Quark/Gluon Discrimination

The latest instructions on Quark/Gluon Jet Discrimination can be found at this twiki.

7.6.12   Jet Toolbox

The jet toolbox is a consolidated set of jet tools, with uniform output develop by the JetMET POG.

7.6.13   Jet Tutorials

The CMS Data Analysis School (CMSDAS) jet long exercise:

https://twiki.cern.ch/twiki/bin/view/CMS/SWGuideCMSDataAnalysisSchoolJetAnalysis

7.6.14   Previous versions of the JetAnalysis workbook

CMSSW is in continuous development so the syntax for particular commands may be changing. To resolve this issue separate tutorial pages for different CMSSW releases are provided.

Jet/MET pages for following releases are available

Review status

Reviewer/Editor and Date (copy from screen) Comments
NitishDhingra - 2018-09-21 Updated with latest instructions. Added missing twiki links of the necessary tools for doing jet analysis.
Main.Aresh - 27 Feb 2008 Changes in verbatim elements (in https://twiki.cern.ch/twiki/bin/edit/CMS/WorkBook160JetAnalysis) because of some lines too long for printable version
FedorRatnikov - 01 Dec 2006 separate pages for different releases, clean up main page

Responsible: HartmutStadie
Last reviewed by: Chris Tully -20 Dec 06



7.8 Muon Analysis

Complete: 4

Detailed Review status

Contents:

Goals of this page

This document explains how to get access to muon tracks and associated information available in the event record, which can be used for muon identification purposes as well as analyses. Brief summary of muon reconstruction algorithms and of muon selectors is provided. (See WorkBookSetComputerNode for instructions on how to set up a proper CMSSW computing environment.)

Nota bene: We will access muon information in miniAOD files. Please, check the muon information stored in the miniAOD. This link here could be very useful to get familiar with muons in the miniAOD too.

Introduction to muon reconstruction

Muon reconstruction overview:

muonreco.png

The muon reconstruction chain starts with the "local reconstruction". First, hits in DTs, CSCs and RPCs are reconstructed from digitized electronics signals. Hits within each DT and CSC chamber are then matched to form "segments" (track stubs). The production of hits and segments in muon sub-systems is discussed in detail in the tutorial on local muon reconstruction. The reconstruction of the tracks inside the silicon tracker is described in the tutorial on track reconstruction.

Standalone muons

In the offline reconstruction, the segments reconstructed in the muon chambers are used to generate "seeds" consisting of position and direction vectors and an estimate of the muon transverse momentum. These initial estimates are used as seeds for the track fits in the muon system, which are performed using segments and hits from DTs, CSCs and RPCs and are based on the Kalman filter technique. The result is a collection of reco::Track objects reconstructed in the muon spectrometer, which are referred to as "standalone muons". To improve the momentum resolution, a beam-spot constraint can be applied in the fit. Two collections of "standalone muons", with and without the beam-spot constraint, are available in the event record (in both RECO and AOD). A more detailed description of the reconstruction of tracks in the muon system alone can be found in Section 4 of CMS AN 2008/097. A tutorial explaining how to run standalone muon reconstruction and to analyze its results is available at this link.

Global muons

For each standalone muon track, a search for tracks matching it among those reconstructed in the inner tracking system (referred to as "tracker tracks", "inner tracks" or "silicon tracks") is performed, and the best-matching tracker track is selected. For each "tracker track" - "standalone muon" pair, the track fit using all hits in both tracks is performed, again based on the Kalman filter technique. The result is a collection of reco::Track objects referred to as "global muons". More details on the reconstruction of global muons can be found in Section 5 of CMS AN 2008/097.

References to the matching tracker tracks, standalone muon tracks, and global muon tracks are assembled into one single collection of reco::Muon objects described below.

Tracker muons

An approach complementary to the global-muon reconstruction consists in considering all tracker tracks to be potential muon candidates and in checking this hypothesis by looking for compatible signatures in the calorimeters and in the muon system. Tracker tracks identified as muons by this method are referred to as "tracker muons". A detailed description of the reconstruction of tracker muons can be found in Section 6 of CMS AN 2008/097.

The tracker-muon algorithm is particularly useful for the identification of low-pT muons (with pT of the order of several GeV), which may not leave enough hits in the muon stations for a standalone muon to be reconstructed. The default criteria for tagging a tracker track as "tracker muon" are very loose (in CMSSW_3_X_Y series, every track with p > 2.5 GeV and pT > 0.5 GeV matched with at least one segment in the muon stations is labeled as "tracker muon"), so tracker muons should in general not be used without further requirements. Several sets of such requirements recommended by the muon POG are described in Sections 6-8 of CMS AN 2008/098; the corresponding flags can be retrieved from the reco::Muon object (see WorkBook section on muon ID).

#RPCMu

RPC muons

An approach similar to Tracker Muons is followed to define the RPC Muons: in this case a match is sought between the extrapolated inner track and hits on the RPC muon detectors. A description of the algorithm and the performance measurements is contained in the CMS AN-2012/482 . The effects of including the RPC hits in the global muon reconstruction have also been studied and are described in M.S.Kim, JINST 8 (2013) T03001 . The main twiki page documenting the RPC Muon algorithm is linked HERE .

Calorimeter-based muons

Calorimeter-based muons, or "calo muons" for short, represent a subset of all tracker tracks reconstructed in the event, which includes tracks with energy depositions in the calorimeters compatible with those of a minimum-ionizing particle. The fake rate of these muon candidates is high and they should not be used when muon purity is essential. A typical use case for "calo muons" is the reconstruction of the J/ψ decaying to low-momentum muons that have little or no information in the muon system, thus improving signal to background ratio compared with the inner tracks. In the event record, "calo muons" are stored in a collection of dedicated objects, reco::CaloMuon's.

Available information

The information on various types of muons discussed above is assembled in one single collection of reco::Muon objects, "muons". This collection serves as the main entry point for accessing muon-related information in CMSSW: various quantities are either directly included in the reco::Muon objects, or can be accessed from there via provided references. One exception to this rule is "calo muons", which are stored in a separate collection of reco::CaloMuon objects. The full list of muon collections available in the event record is available in SWGuideDataFormatRecoMuon.

reco::Muon

Full description of the reco::Muon class can be found here: DataFormats/MuonReco/interface/Muon.h.

Brief summary of available information:

  • Methods to check to which category of muon candidates a given reco::Muon object belongs: isStandAloneMuon(), isGlobalMuon(), isTrackerMuon() (isCaloMuon() is currently not used). Note that a single object can belong to more than one category.
  • References to the corresponding reco::Track objects for tracks reconstructed in the inner tracker (tracker track), muon system alone (standalone muon track with a beam-spot constraint), and to a combined tracker-muon fit (global muon track). High-pT muons can be accessed through isHighPtMuon.
  • For tracker muons:
    • Energy depositions in ECAL, HCAL and HO in crossed elements as well as 3x3 shapes around them.
    • Compatibility with muon hypothesis based on energy depositions in calorimeters.
    • Vector of muon segments matched with the extrapolated tracker track and some other track-segments matching information.
  • Summary of muon isolation information: sums of pT/ET and numbers of tracks and jets for two cones of dR = √(dφ^2 + dη^2) = 0.3 and 0.5 around the muon trajectory (see below).
  • Identification flags (see below).
  • Timing information (see below).

reco::Muon timing information

Timing information is calculated from information in the DT, CSC and ECAL (RPC timing information is not available for offline at all; efforts to implement HCAL timing are currently under way). Information is stored in reco::MuonTime format. It contains a summary of the combined fit of all subdetector data with two estimates of time at the interaction point and its error, assuming an outside-in or inside-out β=1 particle, plus an estimate of the muon's direction.

More detailed information about muon timing (time, velocity with and without the assumption of the muon being produced in-time, etc.) separated into subsystems is available in a set external products of type reco::MuonTimeExtra linked to the muons via association maps. If necessary, these objects can be regenerated independent of the muon reconstruction chain using the MuonTimingProducer.

For a simple example of accessing all the information, please take a look at MuonTimingValidator.cc in here. For more details on the interpretation of the different quantities and performance in early data, see CMS IN-2010/003.

Muon identification

Detailed information about muon identification and the various selection algorithms can be found in the CMS AN-2008/098 analysis note. There is a detailed accounting of changes made to the tracker-muon-based muon identification algorithms since AN-2008/098 (over 1.5 years ago as of this writing) at SWGuideTrackerMuons. Below is a brief summary.

There are several categories of muon identification algorithms that have been developed:

  • Cut-based ID for global muons, which consists of a set of track-quality requirements
  • Likelihood-based ID for tracker muons, which uses compatibility of the calorimeter response with the muon hypothesis and the presence of matched segments in the muon system
  • Cut-based ID for tracker muons, which selects muons on the basis of the track-penetration depth in the detector

The muon id selection function and selection types exist outside of the reco::Muon object. This means that one should use an external function

bool muon::isGoodMuon( const reco::Muon& muon, muon::SelectionType type )

which returns true if the muon passes the given selection type. The following selection types are available:

namespace muon {
enum SelectionType {
All = 0,                      // dummy options - always true
AllGlobalMuons = 1,           // checks isGlobalMuon flag
AllStandAloneMuons = 2,       // checks isStandAloneMuon flag
AllTrackerMuons = 3,          // checks isTrackerMuon flag
TrackerMuonArbitrated = 4,    // resolve ambiguity of sharing segments
AllArbitrated = 5,            // all muons with the tracker muon arbitrated
GlobalMuonPromptTight = 6,    // global muons with tighter fit requirements
TMLastStationLoose = 7,       // penetration depth loose selector
TMLastStationTight = 8,       // penetration depth tight selector
TM2DCompatibilityLoose = 9,   // likelihood based loose selector
TM2DCompatibilityTight = 10,  // likelihood based tight selector
TMOneStationLoose = 11,       // require one well matched segment
TMOneStationTight = 12,       // require one well matched segment
TMLastStationOptimizedLowPtLoose = 13, // combination of TMLastStation and TMOneStation
TMLastStationOptimizedLowPtTight = 14, // combination of TMLastStation and TMOneStation
GMTkChiCompatibility = 15,    // require tk stub have good chi2 relative to glb track
GMStaChiCompatibility = 16,   // require sta stub have good chi2 compatibility relative to glb track
GMTkKinkTight = 17,           // require a small kink value in the tracker stub
TMLastStationAngLoose = 18,   // TMLastStationLoose with additional angular cuts
TMLastStationAngTight = 19,   // TMLastStationTight with additional angular cuts
TMOneStationAngLoose = 20,    // TMOneStationLoose with additional angular cuts
TMOneStationAngTight = 21,    // TMOneStationTight with additional angular cuts
TMLastStationOptimizedBarrelLowPtLoose = 22, // combination of TMLastStation and TMOneStation with low pT optimization in barrel only
TMLastStationOptimizedBarrelLowPtTight = 23  // combination of TMLastStation and TMOneStation with low pT optimization in barrel only
};
}

The isGoodMuon method and the available selection types are defined in DataFormats/MuonReco/interface/MuonSelectors.h, so one must include this header file to utilize the selectors.

The muon-identification algorithms recommended by the muon POG can be found in SWGuideMuonId. Tracker-muon selectors (TMLastStationLoose, etc.) are described in SWGuideTrackerMuons. GlobalMuonPromptTight consists of the following requirements, designed to suppress hadronic punch-throughs and muons from decays in flight:

muon.isGlobalMuon() && muon.globalTrack()->normalizedChi2() < 10. && muon.globalTrack()->hitPattern().numberOfValidMuonHits() > 0

that is:

  • the track is identified as a global muon
  • χ2/ndof of the global muon fit < 10
  • number of valid muon-detector hits used in the global fit > 0

The following additional track-quality cuts using the tracker-track information are recommended to further suppress non-prompt muons, although they are not explicitly included in the selection types above:

  • Transverse impact parameter of the tracker track (or global muon) relative to the beam spot position, |d0|, < 2 mm. This loose cut preserves efficiency for muons from decays of b and c hadrons.
  • Number of hits in the tracker track, Nhits, > 10.

One can also impose cuts on the last point in the global fit to reject punch-throughs that terminate in the first station of the muon detector:

// barrel region
if ( abs(z) < 660 && r > 400 && r < 480)  keepMuon = false;

// endcap region
if ( abs(z) > 600 && abs(z) < 650 && r < 300)  keepMuon = false; 
if ( abs(z) > 680 && abs(z) < 730 && r < 480)  keepMuon = false; 

This can reduce the fake rate by ~20% with minimal loss in efficiency.

More information can be found in CMS AN 2008/098, in SWGuideTrackerMuons, and in several dedicated talks: "Muons in CMSSW_2_1_X", "Global Muon Selections" and "Summary of the Muon ID Note".

The muon-identification criteria used by the Vector-Boson Task Force of the Electroweak PAG follow closely muon-POG recommendations and can be found here.

Muon isolation

For each muon candidate, a number of quantities is computed that can help to distinguish isolated muons from the muons embedded in jets. The following data can be obtained directly from the reco::Muon class:

  • Scalar sum of pT of all tracks reconstructed in cones of R = 0.3 and 0.5 around the direction of a muon, and the number of such tracks.
  • Scalar sum of transverse energy depositions ET in ECAL, HCAL, and HO in cones of R = 0.3 and 0.5, and the total number of jets in these cones.
Additional information allowing to calculate other isolation-related quantities is stored in the event record as associations (IsoDepositMap); a detailed descripition can be found in SWGuideMuonIsolation.

Muons in missing ET calculations

Muons in Physics Analysis Tools

Examples

The example provided is for "CMSSW_9_2_3_patch2", it allows for quick demonstration of some basic concepts on how to get access to information.
We will created an EDAnalyzer within our CMSSW work area. The analyzer should iterate over the reconstructed muons stored in the miniAOD and fill histograms with the momenta components of the reconstructed muons. Use the TFileService object to store the histograms. We also make a vertex to reconstruct the "Jpsi" particle (see figure).

source /cvmfs/cms.cern.ch/cmsset_default.sh 
export SCRAM_ARCH=slc6_amd64_gcc530
cmsrel CMSSW_9_2_3_patch2
cd CMSSW_9_2_3_patch2/src/
cp -r /afs/cern.ch/user/j/jmejiagu/public/miniAODmuonexample/CMSSW_9_2_3_patch2/src/myAnalyzers .
cmsenv
voms-proxy-init -voms cms -valid 192:00
scram b -j8
cd myAnalyzers/JPsiKsPAT/test/
cmsRun miniAODmuonsRootupler.py

Review status

Reviewer/Editor and Date (copy from screen) Comments
RiccardoBellan - 30 Nov 2006 Added initial info
JennyWilliams - 12 Jan 2006 Added info from very similar page by ChangLiu and remove that page
GianlucaCerminara - 09 Feb 2007 Update to CMSSW_1_2_2
JennyWilliams - 23 Mar 2007 reinstated manual contents for pdf generation
DmytroKovalskyi - 17 Feb 2008 Modified the workbook to include full spectrum of muon information available
MartijnMulders - Feb 2008 Updated in course of documentation review
Main.Aresh - 27 Feb 2008 Changes in verbatim elements because of some lines too long for printable version
PiotrTraczyk - 06 Oct 2008 Updated TeV muon part
MartijnMulders - 07 Oct 2008 Updated muon identification part
SlavaValuev - Feb 2009 Updated in course of documentation review
PiotrTraczyk - 31 Aug 2010 Muon timing part made up-to-date
JhovannyMejia 30 Aug 2017 Some information made up-to-date. Update all the links. Create a example on how to get access to Muon information in the miniAOD.
Responsible: DmytroKovalskyi
Last reviewed by: SlavaValuev - 04 Mar 2009

7.9 B-Tagging

Complete: 5
Detailed Review status

Goals of this page

This page introduces the principal algorithms used to tag b-jets and will help you decide which one will be most suitable for your physics analysis. It then explains how to access the most commonly used b tag information.

Please also note that from version 1.7.0 onwards the bTagging data formats have been frozen. Whatever you read on 1.7.X you can safely applied to any 2.X.Y and 3.X.Y versions (at least for what concerns data formats).

Expert users may also wish to consult the b tag chapter of the CMSSW Offline Guide for more detailed information, such as how to modify and rerun the b tagging algorithms.

Contents

b Tag Algorithms - which one to use ?

Several b tag algorithms have been implemented in CMSSW. Some exploit the long B hadron lifetime, others its semi-leptonic decay mode and others use kinematic variables related to the high B hadron mass and hard b fragmentation function.

All begin by associating good quality tracks with jets. By default, b tag algorithms are only run on iterativeCone5CMS.CaloJets (i.e. uncalibrated, iterative-cone jets produced with Delta(R) = 0.5) up to CMSSW_2_X_Y, and on AntiKt05 starting from CMSSW_3_X_Y. Instructions on how to tag alternative jet collections are given below.

In CMSSW, all the algorithms produce a b tag "discriminator" for each jet, on which one can cut more or less tightly, in order to distinguish b-jets from non-b jets. Although the definition of the discriminator varies between algorithms, it is always true that b-jets have more positive values of the discriminator. This page will help you choose a sensible cut value for the discriminator.

  • "Track Counting" algorithm: More... Close This is a very simple tag, exploiting the long lifetime of B hadrons. It calculates the signed impact parameter significance of all good tracks, and orders them by decreasing significance. Its b tag discriminator is defined as the significance of the N'th track. It comes in two variations for N = 2 (high efficiency) or N = 3 (high purity).
  • "Jet Probability" algorithm: More... Close This is a more sophisticated algorithm, also exploiting the long lifetime of B hadrons. Its b tag discriminator is equal to the negative logarithm of the confidence level that all the tracks in the jet are consistent with originating from the primary vertex. This confidence level is calculated from the signed impact parameter significances of all good tracks. It reads the resolution function on these from a database (DB). Indeed, we have two versions of this tagger: JetProbabilityBJetTags and JetBProbabilityBJetTags - the latter uses only the four most displaced tracks, matching the typical reconstructed multiplicity of a B decay vertex.
  • "Soft Muon" and "Soft Electron" algorithms: More... Close These two algorithms tag b jets by searching for the lepton from a semi-leptonic B decay, which typically has a large Pt_rel with respect to the jet axis. Their b tag discriminators are the output of neural nets based on the leptons Pt_rel, impact parameter significance and a few other variables. For each of these taggers, we want to have a simple one (cutting basically only on the presence and pT of the lepton), and a complex one using also jet quantities to compute a MVA analysis. This is already the case for Muons; Electron taggers need refinement and are not really usable up to 2.1.X.
  • "Simple Secondary Vertex" algorithms: More... Close These class of algorithms reconstructs the B decay vertex using an adaptive vertex finder, and then uses variables related to it, such as decay length significance to calculate its b tag discriminator. It has been found to be more robust to Tracker misalignment than the other lifetime-based tags. CMSSW releases <= 35X contain one version of the algorithm (simpleSecondaryVertexBJetTags). Starting from 36X two versions will be provided: simpleSecondaryVertexHighEffBJetTags (equivalent to the previous version) and simpleSecondaryVertexHighPurBJetTags (with increased purity due to a cut on the track multiplicity at the secondary vertex)
  • "Combined Secondary Vertex" algorithm: More... Close This sophisticated and complex tag exploits all known variables, which can distinguish b from non-b jets. Its goal is to provide optimal b tag performance, by combining information about impact parameter significance, the secondary vertex and jet kinematics. (Currently lepton information is not included). . The variables are combined using a likelihood ratio technique to compute the b tag discriminator. A variant of this tagger combines the variables using the Multivariant Analysis (MVA) tool.

In CMSSW 1.5 - 1.6, the "Jet Probability" algorithm gives best performance and is consequently recommended for most physics analyses. In older versions of CMSSW, this is not available, and you should instead use the simpler, but fairly effective "Track Counting" algorithm. The "Soft Lepton" algorithms have several times lower efficiency due to the low B semi-leptonic branching ratio, but can sometimes be useful, particularly for early data taking or for measuring b tag performance with data. The "Combined Secondary Vertex" algorithm should ultimately give better performance than the Jet Probability algorithm, but only becomes competitive with it in CMSSW 1.7, and becomes the most powerful algorithm since CMSSW_2_X_Y (but beware it will be the most difficult to calibrate on data).

So the advice is

  • for initial analysis, stick to the TrackCounting;
  • when experience is gained on data, you can move to algorithms more complicated and performing.

b Tag AOD/RECO Data

The full data format is described here in the CMSSW Offline Guide. However, this section gives you an overview.

Each b tag algorithm produces a collection of JetTag and various kinds of TagInfo (Each collection has one entry for each jet, ordered as in the original jet collection). For most physics analyses, you will only need to access the JetTag object:

170_JetTag.png

This gives the b tag discriminator value and a references (link) to the jet. The JetTag collections are labelled by the b tag algorithm that produced them. In CMSSW 1.7, the most important of these labels are:

  • Track counting tag with N = 3: trackCountingHighPurBJetTags
  • Track counting tag with N = 2: trackCountingHighEffBJetTags
  • Jet probability tag: jetProbabilityBJetTags
  • Soft electron tag: softElectronBJetTags
  • Soft muon tag: softMuonBJetTags
  • Soft muon tag, not using muon impact parameter significance: softMuonNoIPBJetTags
  • Simple secondary vertex b tag: simpleSecondaryVertexBJetTags
  • Combined SV b tag using likelihood ratios: combinedSVBJetTags
  • Combined SV b tag using MVA: combinedSVMVABJetTags

For a full list of these labels in any CMSSW version, see the file RecoBTag/Configuration/python/RecoBTag_EventContent.cff .

The TagInfo contains the variables, such as track impact parameter significance, from which the b tag discriminator was calculated. It also gives you the jet and the tracks associated with it. There are several different types of TagInfo object, for the various b tag algorithms. The TrackIPTagInfo is shared by the Track Counting, Jet Probability and CombinedSV tags. Sophisticated b tags, such as the CombinedSV tag can use more than one TagInfo. (It also uses SecondaryVertexTagInfo).

170_TagInfo.png

The b tag algorithms also rely on the association of tracks with jets. This information is stored in JetTracksAssociation. However, it can be accessed by the JetTag or TagInfo classes, so you should not need to worry about this.

The above CMSSW 1.7 documentation is kept here and is valid for every later release (including CMSSW_2_X_Y and CMSSW_3_X_Y).

b Tag Analysis with bare ROOT, FWLite or EDAnalyzers

BEWARE - THIS DOES DOT WORK ANYMORE WITH CMSSW_3_X_Y; USE FWLITE INSTEAD

This section describes how to analyze b tag information with bare ROOT, FWLite and EDAnalyzers. The first two can be useful for a quick look at the data, whereas the latter is recommended for serious analysis. To follow the examples here, please find some t-tbar events with DBS. (For CMSSW 1.6, /store/mc/2007/10/20/RelVal-RelValTTbar-1192895175 /0000/00C41641-2A81-DC11-B6EA-0019DB29C620.root is available at CERN).

b Tag Analysis with bare ROOT

To quickly look at the b tag information inside a CMSSW dataset, it can be convenient to simply open the CMSSW dataset with bare ROOT. However, you will only be able to look at simple data members (e.g. float, int) inside classes, as illustrated below:

root rfio:///castor/cern.ch/cms//store/relval/CMSSW_2_1_4/RelValQCD_Pt_80_120/
GEN-SIM-DIGI-RAW-HLTDEBUG-RECO/IDEAL_V6_v1/0004/080739DB-816C-DD11-ADFC-001617E30F50.root

(warning: ' ...' - one space and three dots - should not be considered as a part of the code. It means only that the line is continuing below. Therefore ' ...' should be replaced with what is following)

(note the need to add "rfio:///castor/cern.ch/cms/" to dataset name provided by DBS, in order to open it interatively).

TBrowser b;

and then navigate in the TBrowser to the Event directory, where you can find the b tags: recoJetdmRefProdTofloatAssociationVector_jetProbabilityBJetTags__RECO, for example. The name is of the form `C++ namespace' `class name'_label__RECO', as shown in the following screen capture:

Picture_9.png

Double-clicking on recoJetdmRefProdTofloatAssociationVector_jetProbabilityBJetTags__RECO and then on recoJetdmRefProdTofloatAssociationVector_jetProbabilityBJetTags__RECO.obj lets one see the C++ data members of the JetTag object, as shown in this screen capture:

Picture_12.png

Double-clicking on recoJetdmRefProdTofloatAssociationVector_jetProbabilityBJetTags__RECO.obj.data shows the histogram of the b `probability' tag discriminator value:

Picture_11.png

The above CMSSW 1.7 documentation is kept here.

b Tag Analysis with CMSSW Framework Light (fwLite)

This is convenient for quickly looking at the data, with the bonus of being able to use the public member functions of any class stored in the event. You should first set up your CMSSW environment from inside your project area and start ROOT (we work in this example with CMSSW_3_3_5 and a RelVal TTbar file):

eval `scramv1 runtime -csh`
root

then make the CMSSW FWlite libraries available and open the dataset:

gSystem.Load("libFWCoreFWLite.so");
AutoLibraryLoader::enable();   

TFile * f= TFile::Open("/castor/cern.ch/cms/store/relval/CMSSW_3_3_5/RelValTTbar/GEN-SIM-RECO/
MC_31X_V9-v1/0008/ACFD3497-68DC-DE11-A974-00261894388A.root")

(warning: ' ...' - one space and three dots - should not be considered as a part of the code. It means only that the line is continuing below. Therefore ' ...' should be replaced with what is following)

We can now set some aliases for B tag objects. (If you don't know the exact name of the B tag objects, check using TBrowser). While defining aliases is not necessary, it does make the subsequent analysis much simpler:

Events->SetAlias("bmu", 
"recoJetedmRefToBaseProdTofloatsAssociationVector_softMuonBJetTags__RECO.obj");

Functionality is somehow limited by the fact that an association map is involved, you can for example get the number of btagged jets as

Events->Draw("bmu.size()")

Screen_shot_2009-12-11_at_11.15.30.png

The above CMSSW 1.7 and beyond documentation is kept here.

b Tag Analysis with a CMSSW EDAnalyzer

Most physics analyses will use this method. You should start by creating your own generic EDAnalyzer and .cfg files, as described here earlier in the Workbook.

To look at b tag information in each event, you can add to your EDAnalyzer::analyze() code like this. :

#include "DataFormats/BTauReco/interface/JetTag.h"

# Get b tag information
edm::Handle<reco::JetTagCollection> bTagHandle;
iEvent.getByLabel("trackCountingHighEffBJetTags", bTagHandle);
const reco::JetTagCollection & bTags = *(bTagHandle.product());

# Loop over jets and study b tag info.
for (int i = 0; i != bTags.size(); ++i) {
   cout<<" Jet "<< i 
         <<" has b tag discriminator = "<<bTags[i].second
         << " and jet Pt = "<<bTags[i].first->pt()<<endl;
}

If you have a Jet, rather than a JetTag, and wish to know if it is b-tagged, there are several ways of doing so. One which always works is to perform angular matching between the Jet and the JetTag::jet(). (The match should be perfect if your JetCollection was used to produce the JetTagCollection).

Also add

 <use name=DataFormats/BTauReco> 
to the CMS.BuildFile.

The above CMSSW 1.7 and beyond documentation is kept here.

How to determine the True Flavour of a Jet.

Since CMSSW_2_X_Y the old way (see 1.6 chapter) of finding the flavour of a jet is deprecated, and we use GenParticles, which are available in any data tier up to (including) AODSIM.

To access the information, you have before to add in the cmsRun Path the sequence process.myPartons * process.AK5Flavour which can be defined via

process.load("CMS.PhysicsTools.JetMCAlgos.CaloJetsMCFlavour_cfi")  

Here is how to operate:

  • take a CMSSW version (let's say CMSSW_3_3_5)
  • prepare a cmsRun sequence which includes
process.load("CMS.PhysicsTools.JetMCAlgos.CaloJetsMCFlavour_cfi")  
and runs before your analyzer
process.myPartons * process.AK5Flavour

In the .cc code of the analyzer includes lines like

  edm::Handle<reco::JetFlavourMatchingCollection> jetMC;
  iEvent.getByLabel(jetMCSrc, jetMC);
to load the product. jetMCsrc is the collection you want to analyze, depending on the jet type. Default should be 'AK5byValAlgo'.

This product is a mapping between Reco::Jets and an int which is the flavour (please note it can be positive and negative, so to select b quark ask for std::abs(flavour)==5. Then, inside the loop in which you analyze jets, you can ask for the flavour. The easiest way is to produce event by event a searchable map, with lines like

typedef std::map<edm::RefToBase<reco::Jet>, unsigned int, JetRefCompare> FlavourMap;
FlavourMap flavours;
  for (reco::JetFlavourMatchingCollection::const_iterator iter = jetMC->begin();
       iter != jetMC->end(); iter++) {
    unsigned int fl = std::abs(iter->second.getFlavour());
    flavours.insert(FlavourMap::value_type(iter->first, fl));
  }

Then, when you have a RefToBase<reco::Jet> (for example from a view), you can ask for the flavour:

unsigned int myFlavour=0;
RefToBase<reco::Jet> aJet; // fill it from the collection you want to probe!
if (flavours.find (aJet) == flavours.end()) {
    std::cout <<" Cannot access flavour for this jet - not in the Map"<<std::endl;
} else {
   myFlavour = flavours[aJet];
}

Please note the value of myFlavour can be zero if there was not a positive identification of the jet flavour (for example, no partons inside the matching cone).

In summary the following things need to be changed in order to get the jet flavour.

  • Make changes in your MyEDAnalyzer.cc and it should look like this MyEDAnalyzer.cc

  • Make changes in the BuildFile and it should look like this CMS.BuildFile

Then do scram b and run it by doing cmsRun myedanalyzer_cfg.py

You should see something like this (click on Show result below):

Begin processing the 1st record. Run 1, Event 8753440, LumiSection 5674 at 18-May-2010 10:31:54 CEST
 Jet 0 has b tag discriminator = -100 and jet Pt = 1.51255
 Jet 1 has b tag discriminator = -100 and jet Pt = 1.12788
 Jet 2 has b tag discriminator = -100 and jet Pt = 1.00847
 Jet 0 has flavour = 21
 Jet 1 has flavour = 0
 Jet 2 has flavour = 0
Begin processing the 2nd record. Run 1, Event 8753441, LumiSection 5674 at 18-May-2010 10:31:54 CEST
 Jet 0 has b tag discriminator = -100 and jet Pt = 3.76868
 Jet 1 has b tag discriminator = -100 and jet Pt = 3.47249
 Jet 2 has b tag discriminator = -100 and jet Pt = 3.33955
 Jet 3 has b tag discriminator = -100 and jet Pt = 2.79301
 Jet 4 has b tag discriminator = -100 and jet Pt = 1.48905
 Jet 5 has b tag discriminator = -100 and jet Pt = 1.48498
 Jet 6 has b tag discriminator = -100 and jet Pt = 1.21182
 Jet 7 has b tag discriminator = -100 and jet Pt = 1.19971
 Jet 8 has b tag discriminator = -100 and jet Pt = 1.16671
 Jet 9 has b tag discriminator = -100 and jet Pt = 1.12592
 Jet 10 has b tag discriminator = -100 and jet Pt = 1.03013
 Jet 0 has flavour = 2
 Jet 1 has flavour = 21
 Jet 2 has flavour = 21
 Jet 3 has flavour = 0
 Jet 4 has flavour = 0
 Jet 5 has flavour = 0
 Jet 6 has flavour = 0
 Jet 7 has flavour = 0
 Jet 8 has flavour = 0
 Jet 9 has flavour = 0
 Jet 10 has flavour = 0
Begin processing the 3rd record. Run 1, Event 8753442, LumiSection 5674 at 18-May-2010 10:31:54 CEST
 Jet 0 has b tag discriminator = -2.35632 and jet Pt = 2.80738
 Jet 1 has b tag discriminator = -100 and jet Pt = 2.4376
 Jet 2 has b tag discriminator = -100 and jet Pt = 2.3545
 Jet 3 has b tag discriminator = -100 and jet Pt = 2.02223
 Jet 4 has b tag discriminator = -100 and jet Pt = 1.65683
 Jet 5 has b tag discriminator = -100 and jet Pt = 1.6328
 Jet 6 has b tag discriminator = -100 and jet Pt = 1.61783
 Jet 7 has b tag discriminator = -100 and jet Pt = 1.57096
 Jet 8 has b tag discriminator = -100 and jet Pt = 1.56623
 Jet 9 has b tag discriminator = -100 and jet Pt = 1.38087
 Jet 10 has b tag discriminator = -100 and jet Pt = 1.33699
 Jet 11 has b tag discriminator = -100 and jet Pt = 1.27284
 Jet 12 has b tag discriminator = -100 and jet Pt = 1.09305
 Jet 0 has flavour = 0
 Jet 1 has flavour = 0
 Jet 2 has flavour = 0
 Jet 3 has flavour = 21
 Jet 4 has flavour = 21
 Jet 5 has flavour = 0
 Jet 6 has flavour = 21
 Jet 7 has flavour = 0
 Jet 8 has flavour = 21
 Jet 9 has flavour = 21
 Jet 10 has flavour = 0
 Jet 11 has flavour = 0
 Jet 12 has flavour = 0

The above CMSSW 2_X_Y and beyond documentation is kept here

How to make Plots of b Tag Performance

Warning: Can't find topic CMSPublic.WorkBookBTagPerformance2XY

The above CMSSW 2.X and beyond documentation is kept here.

#Advancedtopics

Advanced topics

This section will cover additional topics related to b-tagging in CMSSW, in particular how to run a b-tag algorithm that was not included in the original data sample, or how to re-run some or all taggers with a different set of jets and/or tracks.

How to run a new tagger

Obtaining additional b-tag discriminators from the data stored in AOD is rather simple, as it only requires running the corresponding module. As an example, it will be shown how to run the softMuonByIP3d soft lepton tagger.

Add to your python configuration file the following snippet to take care of the general configuration (change the GlobalTag to match the dataset you are using):

# default configuration with frontier conditions
process.load("Configuration.StandardSequences.FrontierConditions_CMS.GlobalTag_cff")
process.load("Configuration.StandardSequences.MagneticField_cff")
process.load("Configuration.StandardSequences.Geometry_cff")
process.GlobalTag.globaltag = 'IDEAL_V9::All'

# b-tagging general configuration
process.load("RecoBTag.Configuration.RecoBTag_cff")

Then add the configuration for the extra tagger you want to run - n this example, softMuonByIP3d. Usually both an ESProducer and EDProducer are needed, the first taking care of the algorithm configuration, the second of the b-tag raw data to use:

# configure the softMuonByIP3d ESProducer and EDProducer
process.load("RecoBTag.SoftLepton.softLeptonByIP3dES_cfi")
process.load("RecoBTag.SoftLepton.softMuonByIP3dBJetTags_cfi")

Finally, add the producer to a path, and run it:

# run the b-tag sequence
process.btag  = cms.Path( process.softMuonByIP3dBJetTags )

Of course, remember to add the new module(s) before any analysys code that will use them.

How to run on a different jet collection

Running the b-tag algorithms on a different jet collection requires access to the full RECO data.

Note: an alternative is to use the b-tag information precomputed with the default jet collection, matching the jets from the new collection to those from the original one.

In order to rerun all the algorithms on a different jet (or track) collection, one needs to duplicate the original b-tag sequence. Only the EDProducers needs to be cloned.

import FWCore.ParameterSet.Config as cms

# default configuration with frontier conditions
process.load("Configuration.StandardSequences.MagneticField_cff")
process.load("Configuration.StandardSequences.Geometry_cff")
process.load("Configuration.StandardSequences.Reconstruction_cff")
process.load("Configuration.StandardSequences.FrontierConditions_CMS.GlobalTag_cff")
process.GlobalTag.globaltag = 'IDEAL_V9::All'

# b-tagging general configuration
process.load("RecoJets.JetAssociationProducers.ic5JetTracksAssociatorAtVertex_cfi")
process.load("RecoBTag.Configuration.RecoBTag_cff")

The first step is to clone the definition of the association between jets and tracks. This is where most of the customization will take place. As an example, here generalTracks and sisCone5CMS.CaloJets are used - to try different ones, change them to what you need throughout the configuration (check later the soft lepton part, too):

# create a new jets and tracks association
process.newJetTracksAssociatorAtVertex = process.ic5JetTracksAssociatorAtVertex.clone()
process.newJetTracksAssociatorAtVertex.jets = "sisCone5CMS.CaloJets"
process.newJetTracksAssociatorAtVertex.tracks = "generalTracks"

The one needs to clone the b-tag producers and instruct them to use this new collection. First the impact parameter-based b-tag:

# impact parameter b-tag
process.newImpactParameterTagInfos = process.impactParameterTagInfos.clone()
process.newImpactParameterTagInfos.jetTracks = "newJetTracksAssociatorAtVertex"
process.newTrackCountingHighEffBJetTags = process.trackCountingHighEffBJetTags.clone()
process.newTrackCountingHighEffBJetTags.tagInfos = cms.VInputTag( cms.InputTag("newImpactParameterTagInfos") )
process.newTrackCountingHighPurBJetTags = process.trackCountingHighPurBJetTags.clone()
process.newTrackCountingHighPurBJetTags.tagInfos = cms.VInputTag( cms.InputTag("newImpactParameterTagInfos") )
process.newJetProbabilityBJetTags = process.jetProbabilityBJetTags.clone()
process.newJetProbabilityBJetTags.tagInfos = cms.VInputTag( cms.InputTag("newImpactParameterTagInfos") )
process.newJetBProbabilityBJetTags = process.jetBProbabilityBJetTags.clone()
process.newJetBProbabilityBJetTags.tagInfos = cms.VInputTag( cms.InputTag("newImpactParameterTagInfos") )

Then the secondary vertex-based b-tag. Note that these producers inherit the jets and tracks they use from the impact parameter modules: # secondary vertex b-tag
process.newSecondaryVertexTagInfos = process.secondaryVertexTagInfos.clone()
process.newSecondaryVertexTagInfos.trackIPTagInfos = "newImpactParameterTagInfos"
process.newSimpleSecondaryVertexBJetTags = process.simpleSecondaryVertexBJetTags.clone()
process.newSimpleSecondaryVertexBJetTags.tagInfos = cms.VInputTag( cms.InputTag("newSecondaryVertexTagInfos") )
process.newCombinedSecondaryVertexBJetTags = process.combinedSecondaryVertexBJetTags.clone()
process.newCombinedSecondaryVertexBJetTags.tagInfos = cms.VInputTag( cms.InputTag("newImpactParameterTagInfos"), cms.InputTag("newSecondaryVertexTagInfos") )
process.newCombinedSecondaryVertexMVABJetTags = process.combinedSecondaryVertexMVABJetTags.clone()
process.newCombinedSecondaryVertexMVABJetTags.tagInfos = cms.VInputTag( cms.InputTag("newImpactParameterTagInfos"), cms.InputTag("newSecondaryVertexTagInfos") )

And the soft lepton b-tag. These producers will accept as input either the raw jets, or the association collection:

# soft electron b-tag
process.newSoftElectronTagInfos = process.softElectronTagInfos.clone()
process.newSoftElectronTagInfos.jets = "sisCone5CMS.CaloJets"
process.newSoftElectronBJetTags = process.softElectronBJetTags.clone()
process.newSoftElectronBJetTags.tagInfos = cms.VInputTag( cms.InputTag("newSoftElectronTagInfos") )

# soft muon b-tag
process.newSoftMuonTagInfos = process.softMuonTagInfos.clone()
process.newSoftMuonTagInfos.jets = "sisCone5CMS.CaloJets"
process.newSoftMuonBJetTags = process.softMuonBJetTags.clone()
process.newSoftMuonBJetTags.tagInfos = cms.VInputTag( cms.InputTag("newSoftMuonTagInfos") )
process.newSoftMuonByIP3dBJetTags = process.softMuonByIP3dBJetTags.clone()
process.newSoftMuonByIP3dBJetTags.tagInfos = cms.VInputTag( cms.InputTag("newSoftMuonTagInfos") )
process.newSoftMuonByPtBJetTags = process.softMuonByPtBJetTags.clone()
process.newSoftMuonByPtBJetTags.tagInfos = cms.VInputTag( cms.InputTag("newSoftMuonTagInfos") )

Finally, there needs to be a new path running all these modules

# prepare a path running the new modules
process.newJetTracksAssociator = cms.Sequence(
    process.newJetTracksAssociatorAtVertex
)

process.newJetBtaggingIP = cms.Sequence(
    process.newImpactParameterTagInfos * (
        process.newTrackCountingHighEffBJetTags +
        process.newTrackCountingHighPurBJetTags +
        process.newJetProbabilityBJetTags +
        process.newJetBProbabilityBJetTags
    )
)

process.newJetBtaggingSV = cms.Sequence(
    process.newImpactParameterTagInfos *
    process.newSecondaryVertexTagInfos * (
        process.newSimpleSecondaryVertexBJetTags +
        process.newCombinedSecondaryVertexBJetTags +
        process.newCombinedSecondaryVertexMVABJetTags
    )
)

process.newJetBtaggingEle = cms.Sequence(
    process.btagSoftElectrons *
    process.newSoftElectronTagInfos *
    process.newSoftElectronBJetTags
)

process.newJetBtaggingMu = cms.Sequence(
    process.newSoftMuonTagInfos * (
        process.newSoftMuonBJetTags +
        process.newSoftMuonByIP3dBJetTags +
        process.newSoftMuonByPtBJetTags
    )
)

process.newJetBtagging = cms.Sequence(
    process.newJetBtaggingIP +
    process.newJetBtaggingSV +
    process.newJetBtaggingEle +
    process.newJetBtaggingMu
)

process.newBtaggingPath = cms.Path(
    process.newJetTracksAssociator *
    process.newJetBtagging
)

Further Information

Review status

Reviewer/Editor and Date (copy from screen) Comments
JyothsnaK - 18-May-2010 Fixed the truth matching section to get the jet flavour
JyothsnaK - 17-Feb-2010 Fixed a typo in the EDAnalyzer section and also added in this section changes to be made to the BuildFile.
AndreaBocci - 29 Jan 2009 Added advanced topics
Main.Aresh - 27 Feb 2008 Changes in verbatim elements (in https://twiki.cern.ch/twiki/bin/edit/CMS/WorkBookBTagEdAnalyzer160, https://twiki.cern.ch/twiki/bin/edit/CMS/WorkBookBTagFWlite160) because of some lines too long for printable version
IanTomalin - 18 Oct 2007 Created from original WorkBookTauTagging page, with updates to reflect CMSSW 1.6, and simplifications to correspond more to what a new user would wish to know. More sophisticated documentation will be moved to Offline Guide.

Responsible: AlexanderSchmidt & LucaScodellaro

Last reviewed by: JyothsnaK - 18-May-2010



Tau Tagging on a PFJet

Complete: 5
This page is intended to document the information on the hadronic tau-jet reconstruction and identification starting from a PFJet.

Contents:

Introduction

Tau lepton having a mass of 1.777 GeV, is the only lepton heavy enough to decay into hadrons. As depicted in the pi-chart, in about one third of the cases τ’s decay leptonically to a muon (τμ) or an electron (τe) with two neutrinos, and are reconstructed and identified with the usual techniques for muons and electrons. In the remaining cases, τ leptons decay hadronically, to a combination of charged and neutral mesons with a τν. Hadronically decaying τ’s, denoted by τh, are reconstructed and identified with the hadrons-plus-strips (HPS) algorithm, which was developed for use in the LHC Run-1. The key challenge that this algorithm has to face is the distinction between genuine τh, and quark and gluon jets, which are copiously produced in QCD multijet process and can be misidentified as τh. The main handle for reducing these jet→τh misidentification backgrounds is to utilize the fact that the particles produced in τh decays are of lower multiplicity, deposit energy in a narrow region compared to a quark or gluon jet, and are typically isolated with respect to other particles in the event. In some physics analyses, the misidentification of electrons or muons as τh candidates may constitute a sizeable background as well. Therefore, HPS algorithm has got various discriminators like isolation, against electrons and muons etc. to identify genuine hadronically decaying taus.

TauDecayPiChart.png

Tau Reconstruction and Identification Software

The hadronic tau-jet candidates are created from ParticleFlow jets using hadron-plus-strip (HPS) algorithm. The tau reconstruction code can be found in:

The details on HPS algorithm can be found at:

Creation

Rerun the tau sequences, load the tau steering sequence in your cfg file: process.load("RecoTauTag.Configuration.RecoPFTauTag_cff")

On RECO level reco::PFTaus are used to store all possible taus jets. The result of the various discriminators is stored in reco::PFTauDiscriminator.

The initial PFTaus collection will first contain one tau candidate for each reconstructed PFJet (using the anti-kt algorithm with a cone size of 0.5). The jet direction is set to the direction of the leading charged hadron in the jet (highest pT), which should be within ∆R = 0.1 with respect to the jet axis. In this stage the signal- and isolation cones as well as the decay mode are defined. Subsequently the HPS algorithm create PFTauDiscriminators which can be used to select a collection of PFTaus.

hadrTauJetIsol.png

Usage

in your event loop (e.g. the analyze(const edm::Event& iEvent) method of your analyzer) do

// read PFTau collection (set pfTauToken_ in to an edm::InputTag before)

     edm::EDGetTokenT<reco::PFTauCollection> pfTauToken_;
     edm::Handle<reco::PFTauCollection> pfTaus;
     iEvent.getByToken(pfTauToken_,pfTaus);

//read one PFTauDiscriminator (set discriminatorToken_ in to an edm::InputTag before)

    edm::EDGetTokenT<reco::PFTauDiscriminator> discriminatorToken_;
    edm::Handle<reco::PFTauDiscriminator> discriminator;
    iEvent.getByToken(discriminatorToken_,discriminator);

// loop over taus
    for ( unsigned iTau = 0; iTau < pfTaus->size(); ++iTau ) {
        reco::PFTauRef tauCandidate(pfTaus, iTau);
// check if tau candidate has passed discriminator
        if( (*discriminator)[tauCandidate] > 0.5 ){
        // do something with your candidate
        }
    }

Discriminators

The PFTauDiscriminators are used store the result of the various tau tagging algorithms and select jets that likely originate from a hadronic tau decay. By definition they are real numbers between 0 and 1 where higher numbers indicate a more positive outcome of the discriminator. Note that most discriminators are in fact binary thus taking just values of 0 (=did not pass) and 1 (=passed).

The names PFTauDiscriminator collections follow the <algorithm-prefix> + <discriminator name> convention. (e.g. the AgainstElectron collection can be accessed via hpsPFTauDiscriminationAgainstElectron)

Legacy Tau ID (Run I)

The <algorithm-prefix> is hpsPFTauDiscrimination

Name binary? Description
ByLooseElectronRejection yes electron pion MVA discriminator < 0.6
ByMediumElectronRejection yes electron pion MVA discriminator < -0.1 and not 1.4442 < η < 1.566 (both positive & negative eta values)
ByTightElectronRejection yes electron pion MVA discriminator < -0.1 and not 1.4442 < η < 1.566 (both positive & negative eta values) and Brem pattern cuts (see AN-10-387)
ByMVA3Loose/Medium/Tight/VTightElectronRejection yes anti-electron MVA discriminator with improved training (see talk)
ByMVA3VTightElectronRejection yes anti-electron MVA discriminator with improved training (same efficiency as "HCP 2012 working point" (see talk)
ByLooseMuonRejection yes Tau Lead Track not matched to chamber hits
ByMediumMuonRejection yes Tau Lead Track not matched to global/tracker muon
ByTightMuonRejection yes Tau Lead Track not matched to global/tracker muon and large enough energy deposit in ECAL + HCAL
ByLooseMuonRejection2 yes Same as AgainstMuonLoose
ByMediumMuonRejection2 yes Loose2 && no DT, CSC or RPC Hits in last 2 Stations
ByTightMuonRejection2 yes Medium2 && large enough energy deposit in ECAL + HCAL in 1 prong + 0 strip decay mode (Σ(ECAL+HCAL) > 0.2 * pT)
ByDecayModeFinding yes You will always want to use this (see AN-10-82 )
ByVLooseIsolation yes isolation cone of 0.3 , no PF Charged Candidates with pT > 1.5 GeV/c and no PF Gamma candidates with ET > 2.0 GeV
ByVLooseCombinedIsolationDBSumPtCorr yes isolation cone of 0.3 , Delta Beta corrected sum pT of PF charged and PF gamma isolation candidates (pT > 0.5 GeV) less than 3 GeV
ByLooseCombinedIsolationDBSumPtCorr yes isolation cone of 0.5 , Delta Beta corrected sum pT of PF charged and PF gamma isolation candidates (pT > 0.5 GeV) less than 2 GeV
ByMediumCombinedIsolationDBSumPtCorr yes isolation cone of 0.5 , Delta Beta corrected sum pT of PF charged and PF gamma isolation candidates (pT > 0.5 GeV) less than 1 GeV
ByTightCombinedIsolationDBSumPtCorr yes isolation cone of 0.5 , Delta Beta corrected sum pT of PF charged and PF gamma isolation candidates (pT > 0.5 GeV) less than 0.8 GeV
ByLooseCombinedIsolationDBSumPtCorr3Hits yes same as ByLooseCombinedIsolationDBSumPtCorr but requiring 3 hits (instead of 8) on track of isolation candidates
ByMediumCombinedIsolationDBSumPtCorr3Hits yes same as ByMediumCombinedIsolationDBSumPtCorr but requiring 3 hits (instead of 8) on track of isolation candidates
ByTightCombinedIsolationDBSumPtCorr3Hits yes same as ByTightCombinedIsolationDBSumPtCorr but requiring 3 hits (instead of 8) on track of isolation candidates
ByLooseIsolationMVA yes BDT based selection using isolation in rings around tau direction and shower shape variables
ByMediumIsolationMVA yes BDT based selection using isolation in rings around tau direction and shower shape variables
ByTightIsolationMVA yes BDT based selection using isolation in rings around tau direction and shower shape variables
ByIsolationMVAraw no output of BDT based selection using isolation in rings around tau direction and shower shape variables
ByLooseIsolationMVA2 yes same as ByLooseIsolationMVA with new training and improved performance
ByMediumIsolationMVA2 yes same as ByMediumIsolationMVA with new training and improved performance
ByTightIsolationMVA2 yes same as ByTightIsolationMVA with new training and improved performance
ByIsolationMVA2raw no output of "MVA2" BDT discriminator

Isolation discriminators usage

Loosening of quality cuts on tracks of isolation candidates means that more of them can enter isolation pt. This results in slightly decreased efficiency of tau identification but also considerably decreased jet fake rate. More details in this talk.

The combined isolation is recommended for all taus, while MVA based isolation can provide better performance for low pt (< 100 GeV) taus. Since RecoTauTag/RecoTau V01-04-23 (V01-04-23-4XX-00 for 4XX analysis) a new training for MVA isolation is available. The discriminators have "IsolationMVA2" in their name.

"MVA3" anti-electron discriminators

The new MVA training provides working points with decreased electron fake rate when keeping the same efficiency as for previous training. All MVA3 discriminators veto tau candidates in a crack region between barrel and endcap, so they are NOT RECOMMENDED if you are using tau id for TAU VETO in your analysis.

"AgainstMuon2" discriminators

A drop of efficiency of anti-muon discriminator have been observed in high pT. The fix is available in "Muon2" discriminators. More details in ( talk 1 and talk 2)

int A = tau.signalPFChargedHadrCands().size()
int B = tau.signalPFGammaCands().size()

Mode A B
one prong 1 0
one prong + pi0 1 >0
three prong 3 0

Tau ID 2014 (preparation for Run II)

The <algorithm-prefix> is hpsPFTauDiscrimination

Name binary? Description
ByLooseElectronRejection yes electron pion MVA discriminator < 0.6
ByMediumElectronRejection yes electron pion MVA discriminator < -0.1 and not 1.4442 < η < 1.566 (both positive & negative eta values)
ByTightElectronRejection yes electron pion MVA discriminator < -0.1 and not 1.4442 < η < 1.566 (both positive & negative eta values) and Brem pattern cuts (see AN-10-387)
ByMVA5(Loose/Medium/Tight/VTight)ElectronRejection yes anti-electron MVA discriminator with new training
ByLooseMuonRejection yes Tau Lead Track not matched to chamber hits
ByMediumMuonRejection yes Tau Lead Track not matched to global/tracker muon
ByTightMuonRejection yes Tau Lead Track not matched to global/tracker muon and energy deposit in ECAL + HCAL exceeding 0.2 times Lead Track momentum
ByLooseMuonRejection3 yes Tau Lead Track not matched to more than one segment in muon system, energy deposit in ECAL + HCAL at least 0.2 times Lead Track momentum
ByTightMuonRejection3 yes Tau Lead Track not matched to more than one segment or hits in the outermost two stations of the muon system, energy deposit in ECAL + HCAL at least 0.2 times Lead Track momentum
ByMVA(Loose/Medium/Tight)MuonRejection yes BDT based anti-muon discriminator
ByMVArawMuonRejection no raw MVA output of BDT based anti-muon discriminator
ByDecayModeFinding yes You will always want to use this (see AN-10-82 )
ByVLooseIsolation yes isolation cone of 0.3 , no PF Charged Candidates with pT > 1.5 GeV/c and no PF Gamma candidates with ET > 2.0 GeV
ByVLooseCombinedIsolationDBSumPtCorr yes isolation cone of 0.3 , Delta Beta corrected sum pT of PF charged and PF gamma isolation candidates (pT > 0.5 GeV) less than 3 GeV
ByLooseCombinedIsolationDBSumPtCorr yes isolation cone of 0.5 , Delta Beta corrected sum pT of PF charged and PF gamma isolation candidates (pT > 0.5 GeV) less than 2 GeV
ByMediumCombinedIsolationDBSumPtCorr yes isolation cone of 0.5 , Delta Beta corrected sum pT of PF charged and PF gamma isolation candidates (pT > 0.5 GeV) less than 1 GeV
ByTightCombinedIsolationDBSumPtCorr yes isolation cone of 0.5 , Delta Beta corrected sum pT of PF charged and PF gamma isolation candidates (pT > 0.5 GeV) less than 0.8 GeV
ByLooseCombinedIsolationDBSumPtCorr3Hits yes same as ByLooseCombinedIsolationDBSumPtCorr but requiring 3 hits (instead of 8) on track of isolation candidates
ByMediumCombinedIsolationDBSumPtCorr3Hits yes same as ByMediumCombinedIsolationDBSumPtCorr but requiring 3 hits (instead of 8) on track of isolation candidates
ByTightCombinedIsolationDBSumPtCorr3Hits yes same as ByTightCombinedIsolationDBSumPtCorr but requiring 3 hits (instead of 8) on track of isolation candidates
By(VLoose/Loose/Medium/Tight/VTight/VVTight)IsolationMVA3oldDMwoLT yes BDT based tau ID discriminator based on isolation Pt sums, trained on 1-prong and 3-prong tau candidates
ByIsolationMVA3oldDMwoLTraw no raw MVA output of BDT based tau ID discriminator based on isolation Pt sums, trained on 1-prong and 3-prong tau candidates
By(VLoose/Loose/Medium/Tight/VTight/VVTight)IsolationMVA3oldDMwLT yes BDT based tau ID discriminator based on isolation Pt sums plus tau lifetime information, trained on 1-prong and 3-prong tau candidates
ByIsolationMVA3oldDMwLTraw no raw MVA output of BDT based tau ID discriminator based on isolation Pt sums plus tau lifetime information, trained on 1-prong and 3-prong tau candidates
By(VLoose/Loose/Medium/Tight/VTight/VVTight)IsolationMVA3newDMwoLT yes BDT based tau ID discriminator based on isolation Pt sums, trained on 1-prong, "2-prong" and 3-prong tau candidates
ByIsolationMVA3newDMwoLTraw no raw MVA output of BDT based tau ID discriminator based on isolation Pt sums, trained on 1-prong, "2-prong" and 3-prong tau candidates
By(VLoose/Loose/Medium/Tight/VTight/VVTight)IsolationMVA3newDMwLT yes BDT based tau ID discriminator based on isolation Pt sums plus tau lifetime information, trained on 1-prong, "2-prong" and 3-prong tau candidates
ByIsolationMVA3newDMwLTraw no raw MVA output of BDT based tau ID discriminator based on isolation Pt sums plus tau lifetime information, trained on 1-prong, "2-prong" and 3-prong tau candidates

How to get latest TauID on MiniAOD event content

While it is not possible to fully rebuild taus from jets given MiniAOD event content, it is possible to recompute the BDT output of both isolation and anti-electron discriminators for new trainings made available starting from CMSSW 8_0_X onwards. While for releases from 8_1_X onwards the necessary infrastructure is included in official CMSSW releases, for 8_0_X one has to merge developments from the following cms-tau-pog branch: cms-tau-pog/CMSSW_8_0_X_tau-pog_miniAOD-backport-tauID . In your CMSSW_8_0_X/src/ area do:

git cms-merge-topic -u cms-tau-pog:CMSSW_8_0_X_tau-pog_miniAOD-backport-tauID

The -u is necessary to avoid that git checks out all packages that depend on any of the ones touched in this branch since this would lead to a very long compilation. Then compile everything.

In order to be able to access the latest and greatest BDT output for the isolation discriminators and save it in your ntuples, you need to add a sequence to your python config file and some code to your analyzer itself. You can find an example analyzer and the corrsponding config file in RecoTauTag/RecoTau/test . Below, a code example for including the new training with old decay modes is shown. The procedure is the same for the training with new decay modes. Please refer to the TauIDRecommendation13TeV TWiki for the lines that need to be changed.

from RecoTauTag.RecoTau.TauDiscriminatorTools import noPrediscriminants
process.load('RecoTauTag.Configuration.loadRecoTauTagMVAsFromPrepDB_cfi')
from RecoTauTag.RecoTau.PATTauDiscriminationByMVAIsolationRun2_cff import *

process.rerunDiscriminationByIsolationMVArun2v1raw = patDiscriminationByIsolationMVArun2v1raw.clone(
   PATTauProducer = cms.InputTag('slimmedTaus'),
   Prediscriminants = noPrediscriminants,
   loadMVAfromDB = cms.bool(True),
   mvaName = cms.string("RecoTauTag_tauIdMVAIsoDBoldDMwLT2016v1"), # name of the training you want to use
   mvaOpt = cms.string("DBoldDMwLT"), # option you want to use for your training (i.e., which variables are used to compute the BDT score)
   requireDecayMode = cms.bool(True),
   verbosity = cms.int32(0)
)

process.rerunDiscriminationByIsolationMVArun2v1VLoose = patDiscriminationByIsolationMVArun2v1VLoose.clone(
   PATTauProducer = cms.InputTag('slimmedTaus'),    
   Prediscriminants = noPrediscriminants,
   toMultiplex = cms.InputTag('rerunDiscriminationByIsolationMVArun2v1raw'),
   key = cms.InputTag('rerunDiscriminationByIsolationMVArun2v1raw:category'),
   loadMVAfromDB = cms.bool(True),
   mvaOutput_normalization = cms.string("RecoTauTag_tauIdMVAIsoDBoldDMwLT2016v1_mvaOutput_normalization"), # normalization fo the training you want to use
   mapping = cms.VPSet(
      cms.PSet(
         category = cms.uint32(0),
         cut = cms.string("RecoTauTag_tauIdMVAIsoDBoldDMwLT2016v1_WPEff90"), # this is the name of the working point you want to use
         variable = cms.string("pt"),
      )
   )
)

# here we produce all the other working points for the training
process.rerunDiscriminationByIsolationMVArun2v1Loose = process.rerunDiscriminationByIsolationMVArun2v1VLoose.clone()
process.rerunDiscriminationByIsolationMVArun2v1Loose.mapping[0].cut = cms.string("RecoTauTag_tauIdMVAIsoDBoldDMwLT2016v1_WPEff80")
process.rerunDiscriminationByIsolationMVArun2v1Medium = process.rerunDiscriminationByIsolationMVArun2v1VLoose.clone()
process.rerunDiscriminationByIsolationMVArun2v1Medium.mapping[0].cut = cms.string("RecoTauTag_tauIdMVAIsoDBoldDMwLT2016v1_WPEff70")
process.rerunDiscriminationByIsolationMVArun2v1Tight = process.rerunDiscriminationByIsolationMVArun2v1VLoose.clone()
process.rerunDiscriminationByIsolationMVArun2v1Tight.mapping[0].cut = cms.string("RecoTauTag_tauIdMVAIsoDBoldDMwLT2016v1_WPEff60")
process.rerunDiscriminationByIsolationMVArun2v1VTight = process.rerunDiscriminationByIsolationMVArun2v1VLoose.clone()
process.rerunDiscriminationByIsolationMVArun2v1VTight.mapping[0].cut = cms.string("RecoTauTag_tauIdMVAIsoDBoldDMwLT2016v1_WPEff50")
process.rerunDiscriminationByIsolationMVArun2v1VVTight = process.rerunDiscriminationByIsolationMVArun2v1VLoose.clone()
process.rerunDiscriminationByIsolationMVArun2v1VVTight.mapping[0].cut = cms.string("RecoTauTag_tauIdMVAIsoDBoldDMwLT2016v1_WPEff40")

# this sequence has to be included in your cms.Path() before your analyzer which accesses the new variables is called.
process.rerunMvaIsolation2SeqRun2 = cms.Sequence(
   process.rerunDiscriminationByIsolationMVArun2v1raw
   *process.rerunDiscriminationByIsolationMVArun2v1VLoose
   *process.rerunDiscriminationByIsolationMVArun2v1Loose
   *process.rerunDiscriminationByIsolationMVArun2v1Medium
   *process.rerunDiscriminationByIsolationMVArun2v1Tight
   *process.rerunDiscriminationByIsolationMVArun2v1VTight
   *process.rerunDiscriminationByIsolationMVArun2v1VVTight
)

# embed new id's into new tau collection
embedID = cms.EDProducer("PATTauIDEmbedder",
   src = cms.InputTag('slimmedTaus'),
   tauIDSources = cms.PSet(
      byIsolationMVArun2v1DBoldDMwLTrawNew = cms.InputTag('rerunDiscriminationByIsolationMVArun2v1raw'),
      byVLooseIsolationMVArun2v1DBoldDMwLTNew = cms.InputTag('rerunDiscriminationByIsolationMVArun2v1VLoose'),
      byLooseIsolationMVArun2v1DBoldDMwLTNew = cms.InputTag('rerunDiscriminationByIsolationMVArun2v1Loose'),
      byMediumIsolationMVArun2v1DBoldDMwLTNew = cms.InputTag('rerunDiscriminationByIsolationMVArun2v1Medium'),
      byTightIsolationMVArun2v1DBoldDMwLTNew = cms.InputTag('rerunDiscriminationByIsolationMVArun2v1Tight'),
      byVTightIsolationMVArun2v1DBoldDMwLTNew = cms.InputTag('rerunDiscriminationByIsolationMVArun2v1VTight'),
      byVVTightIsolationMVArun2v1DBoldDMwLTNew = cms.InputTag('rerunDiscriminationByIsolationMVArun2v1VVTight'),
      . . . (other discriminators like anti-electron),
      ),
   )
setattr(process, "NewTauIDsEmbedded", embedID)

process.p = cms.Path(
   . . . (other processes)
   *process.rerunMvaIsolation2SeqRun2
   *getattr(process, "NewTauIDsEmbedded")
   . . . (for example you ntuple creation process)
)

The python configuration to be added to include new trainings of the anti-electron discriminator is shown below.

process.load('RecoTauTag.Configuration.loadRecoTauTagMVAsFromPrepDB_cfi')
from RecoTauTag.RecoTau.PATTauDiscriminationAgainstElectronMVA6_cfi import *

process.rerunDiscriminationAgainstElectronMVA6 = patTauDiscriminationAgainstElectronMVA6.clone(
   PATTauProducer = cms.InputTag('slimmedTaus'),
   Prediscriminants = noPrediscriminants,
   #Prediscriminants = requireLeadTrack,
   loadMVAfromDB = cms.bool(True),
   returnMVA = cms.bool(True),
   method = cms.string("BDTG"),
   mvaName_NoEleMatch_woGwoGSF_BL = cms.string("RecoTauTag_antiElectronMVA6v1_gbr_NoEleMatch_woGwoGSF_BL"),
   mvaName_NoEleMatch_wGwoGSF_BL = cms.string("RecoTauTag_antiElectronMVA6v1_gbr_NoEleMatch_wGwoGSF_BL"),
   mvaName_woGwGSF_BL = cms.string("RecoTauTag_antiElectronMVA6v1_gbr_woGwGSF_BL"),
   mvaName_wGwGSF_BL = cms.string("RecoTauTag_antiElectronMVA6v1_gbr_wGwGSF_BL"),
   mvaName_NoEleMatch_woGwoGSF_EC = cms.string("RecoTauTag_antiElectronMVA6v1_gbr_NoEleMatch_woGwoGSF_EC"),
   mvaName_NoEleMatch_wGwoGSF_EC = cms.string("RecoTauTag_antiElectronMVA6v1_gbr_NoEleMatch_wGwoGSF_EC"),
   mvaName_woGwGSF_EC = cms.string("RecoTauTag_antiElectronMVA6v1_gbr_woGwGSF_EC"),
   mvaName_wGwGSF_EC = cms.string("RecoTauTag_antiElectronMVA6v1_gbr_wGwGSF_EC"),
   minMVANoEleMatchWOgWOgsfBL = cms.double(0.0),
   minMVANoEleMatchWgWOgsfBL  = cms.double(0.0),
   minMVAWOgWgsfBL            = cms.double(0.0),
   minMVAWgWgsfBL             = cms.double(0.0),
   minMVANoEleMatchWOgWOgsfEC = cms.double(0.0),
   minMVANoEleMatchWgWOgsfEC  = cms.double(0.0),
   minMVAWOgWgsfEC            = cms.double(0.0),
   minMVAWgWgsfEC             = cms.double(0.0),
   srcElectrons = cms.InputTag('slimmedElectrons')
)
# embed new id's into new tau collection
embedID = cms.EDProducer("PATTauIDEmbedder",
   src = cms.InputTag('slimmedTaus'),
   tauIDSources = cms.PSet(
      . . . (other new discriminators like isolation),
      againstElectronMVA6RawNew = cms.InputTag('rerunDiscriminationAgainstElectronMVA6')
      ),
   )
setattr(process, "NewTauIDsEmbedded", embedID)

process.p = cms.Path(
   . . . (other processes)
   *process.rerunDiscriminationAgainstElectronMVA6
   *getattr(process, "NewTauIDsEmbedded")
   . . . (for example you ntuple creation process)
)

Please be mindful that if you want to include new isolation and anti-electron discriminators at the same time, things like the PATTauIDEmbedder need only be run once. Just reorder/change the example python configuration snippets accordingly.

Once the new discriminators are embedded into the new tau collection (in this example called "NewTauIDsEmbedded") one can simply retrieve them via the pat::Tau::tauID function in a loop over the collection like so:

float newIsolationMVArawValue = tau->tauID("byIsolationMVArun2v1DBoldDMwLTrawNew");
float newAntiElectronMVArawValue = tau->tauID("againstElectronMVA6RawNew");

Rerunning of the TauID on AOD event content

The tau collections contained in the simulated samples are often outdated. The most recent tau ID is in general contained in the releases, so to benefit from all new features and bugfixes, you should re-run the PFTau sequence on RECO/AOD. Because the the most up-to-date software is almost always contained in the production releases, there is no need to merge in any code from other repositories.

To re-run the tau sequence in, you need to add following few lines to your config file

process.load("RecoTauTag.Configuration.RecoPFTauTag_cff") #loading the configuration

from PhysicsTools.PatAlgos.tools.tauTools import * # to get the most up-to-date discriminators when using patTaus
switchToPFTauHPS(process) # this line and the one above can be ignored when running on reco::PFTauCollection directly
...
process.p = cms.Path(   #adding to the path
....
        process.PFTau*
....
)

Warning, important When running in un-scheduled mode it is enough to add process.load("RecoTauTag.Configuration.RecoPFTauTag_cff") to the config file.

Important links for detailed information

More detailed information can be found at: twiki 1 and twiki 2.

Review status

Reviewer/Editor and Date Comments
Simone Gennai - 12 Sep 2007 Create the page with minimal informations
Evan Friis - 21 Sep 2008 Update the page with guide to using in CMSSW 2_1_X
Evan Friis - 20 Nov 2008 Update tags
Evan Friis - 27 Jan 2009 Remove outdated information
Nitish Dhingra - 7 Sep 2017 completely rewritten for Run2

Responsible: NitishDhingra
-- NitishDhingra - 2017-09-07



7.11 Particle Flow Tutorial

Complete: 5

TIP Please have a look at the left bar for the other pages of the particle flow documentation.


Detailed Review status

Contents:

Overview

This page will teach you how to:

  • Access the particle flow products in the event
  • Run the particle flow together with the fast sim

For the impatient user

Particle flow in the fast simulation

Follow sections:

Particle flow in the full simulation

Follow sections:

Set up particle flow in CMSSW

The default version of the particle flow is used in this workbook. Get it in the following way, after having set up your local area in the usual way:

addpkg RecoParticleFlow/Configuration
addpkg RecoParticleFlow/PFRootEvent
scram b -j 4

How to run particle flow

With the FULL sim

Reprocessing particle flow, would involve a reprocessing of the tracking, which is a time-consuming operation. Please use the products available in your data sample, or use the fast sim if you need a more recent version of the particle flow.

Warning, important Do not use Summer08 event samples or older!

With the FAST sim

The fast simulation is the ideal tool to test particle flow. Since it is much faster than the full simulation, it is possible to simulate and reconstruct events on the fly. In case the reconstruction needs to be redone, it is not a big deal to simulate again the same sample of events. Moreover, this saves a lot of disk space (or staging time...) because there is no need to store large FEVT files.

Running particle flow together with the fast simulation is straightforward. Follow the instructions below to:

  • (fast) simulate a few thousands of single taus,
  • perform the full reconstruction, including particle flow.
  • run additional producers, necessary for the CMS.PFRootEvent display.

cd RecoParticleFlow/Configuration/test
cmsRun fastSimWithParticleFlow_cfg.py

Output files:

  • display.root : Contains all particle flow objects. This file can be used as an input to PFRootEvent, for display or further processing using the ROOT interface.
  • aod.root : Contains AOD events

Generating and simulating your own events

Just do:

cp fastSimWithParticleFlow_cfg.py myFastSim_cfg.py
Then, edit the file, and replace the current source by the file RecoParticleFlow/Configuration/python/source_particleGun_cfi.py, and modify the pdgId of the particles shot by the gun:
#process.load("RecoParticleFlow.Configuration.source_singleTau_cfi")
process.load("RecoParticleFlow.Configuration.source_particleGun_cfi")
# to get photons:
process.generator.PGunParameters.ParticleID = cms.vint32(22)
Standard generator configuration files can be found in Configuration/Generator/python

How to Rerun the particle flow

  • If the files was saved in the format DISPLAY just use Root Interface.

  • If the FULLSIM or DATA files was saved in RECO format you need to transform it to the DISPLAY format using RecoToDisplay_cfg.py.
    • The process in RecoToDisplay_cfg.py is called REPROD:
      process = cms.Process("REPROD")
      Thus, if you want to have a look on them via Root Interface you need to replace in particleFlow.opt option file all occurrences of PROD by REPROD.
    • Don't forget to change in RecoToDisplay_cfg.py the global tag
      process.GlobalTag.globaltag = 'MC_31X_V3::All'
      according to the CMSSW version used to reconstruct the DATA or MC. All information is available in PFlowDevelopers or more generally SWGuideFrontierConditions.
    • If you want to use the standard benchmark tools to compare 2 versions of reconstruction, PROD and REPROD, you can add Aod files to the output of RecoToDisplay_cfg.py:
       
              process.load("FastSimulation.Configuration.EventContent_cff")
              process.aod = cms.OutputModule("PoolOutputModule",
              process.AODSIMEventContent,
              fileName = cms.untracked.string('aod.root')
              )
      
              process.outpath = cms.EndPath(process.aod )
              
      The Benchmark tools are also available directly in Root Interface from DISPLAY files.

  • If the FASTSIM data was saved in RECO format the RecoToDisplay_cfg.py tool doesn't work. The reason is that it uses a full reprocess of the tracking and clustering and the associated information is not available for the FASTSIM. The simplest solution is to regenerate again the FASTSIM files.

Access to the particle flow output

There are 4 alternatives to access the reconstructed particles (PFCandidate)

  • PAT (Physics Analysis Toolkit): run PF2PAT+PAT to get pattuples.
    • This is the recommended way to use particle flow in the analysis
  • access the PFCandidates directly from an EDAnalyzer (or another CMSSW module) in the full framework
    • You should not need to do this, but it can be educational.
  • access the PFCandidates from ROOT+FWLite
    • Use the CMSSW output TTree to plot the PFCandidate variables
  • access the PFCandidates from ROOT+FWLite using PFRootEvent
    • Display and study particle flow.

PAT

Full framework

Basic PFCandidate analysis

cd RecoParticleFlow/Configuration/test
cmsRun analyzePFCandidates_cfg.py

The analyzer code is in PFCandidateAnalyzer.h and PFCandidateAnalyzer.cc.

Creating a collection of reco::Muons from the muon PFCandidates

FWLite

{
  gSystem->Load("libFWCoreFWLite.so");
  AutoLibraryLoader::enable();
  gSystem->Load("libCintex.so");
  ROOT::Cintex::Cintex::Enable();
}

  • Open the CMSSW root file containing the collection of PFCandidates, and have a look at them:

TFile f("aod.root")
Events.Draw("recoPFCandidates_particleFlow__PROD.obj.particleId_")
Events.Draw("recoPFCandidates_particleFlow__PROD.obj.pt()")

Fireworks

Fireworks can be configured to display particle-flow reconstructed events:

  • WorkBookFireworks: Install fireworks
  • run:
    ./cmsShow -c /afs/cern.ch/user/c/cbern/public/pflow2_mc.fwc mc.root
    

This configuration allows to display:

  • PFCandidates
  • PFJets and PFMET
  • calorimeter jets and MET
  • gen jets and MET

FireWorks for PF studies

The ROOT interface

Review status

Responsible: Colin Bernet

Reviewer/Editor and Date (copy from screen) Comments
Last reviewed by: Colin - 27 Jul 2009 Doc simplified
Last reviewed by: Colin - 7 Mar 2008 Splitted and reorganized the documentation
Last reviewed by: Colin - 9 Nov 2007 Added a legend for PFRootEvent display
Last reviewed by: Colin - 3 Nov 2007 Modified after particle flow workshop
Colin - 20 Apr 2007 New tutorial for particle flow version 2


7.4.1 Electron Analysis: PAT to MiniAOD

Complete: 5
Detailed Review status

Contents

Content

Introduction

Note: This twiki is under construction.

PAT (Physics Analysis Toolkit) is a common tool and format that facilitates access to event information. Nowadays, centrally produced MiniAODs are produced using PAT on AOD files. In this example twiki, we'll be focusing on working with miniAOD samples.

This twiki page was formerly on the PAT version of the electron analysis. You can find the archived version at r7 of this twiki.

Review status

Reviewer/Editor and Date (copy from screen) Comments
UzzielPerez - 2-Dec-2018 Update for Run 2, CMSSW 94x
RogerWolf - 11 june 2009 Moved from SWGuidePATExamples

Responsible: RogerWolf
Last reviewed by: most recent reviewer and date

-- RogerWolf - 11 Jun 2009



7.8.1 PAT Examples: Muon Example

Contents

Introduction

The muon object is rather sophisticate, as information from all CMS detectors are included. The data format was designed to be complete and user-friendly. Reason for that the reco::Muon does not differ to much to the pat::Muon.

datafmu.jpg

It is worthwhile to look at the two data formats themselves:

Note that in 22x the MuonSelectors are directly in the reco::Muon object and hence in the pat::Muon as well. Starting from 31X they are part of the pat::Muon only.

In this tutorial  a couple of simple analyses, which show how to access the muon properties, are discussed. One analysis is based on pat::Muon and shows how to handle simulated proton-proton sample, the other illustrates how to access the reco::Muon properties of real cosmic muon data.

How to get the code

this tag works for 2_2_X cycle.

Prepare your release area, then go therein:

cd $CMSSW_BASE/src
cmsenv
cvs co -r V00-00-03 MuonAnalysis/Examples
scramv1 b

The package contains two analyses (which sit in the plugins/ directory):

Here you will not find the detailed description of the code. This to make you usual with C++, and make you able to understand what the code does directly looking at it. The code is well commented (unfortunately, in your life, you will not find so many other examples of such a commented code).

Hand-On

First example: PAT access

Get a PAT sample or produce it

For what concern the first example (MuonAnalysis module), before run the code you have to search for a sample which contains pat::Muons. Please be sure that one of the following conditions have been satisfied when the sample was produced:

  • The pat::Muons have Generated info and Reco Tracks embedded in (i.e., that information has been copied inside the PAT object). This means that the configuration used to produce the sample had:
       process.allLayer1Muons.embedTrack = cms.bool(True)
       process.allLayer1Muons.embedGenMatch = cms.bool(True)
       

  • The Event content has the explicit collections in (i.e., the information are not copied in the PAT object and the access to it occurs through links). This means that the sample has been produced with the following request:
   RecoMuonPAT = cms.PSet(
         outputCommands = cms.untracked.vstring(
                   'keep recoTracks_standAloneMuons_*_*',   # optional   
                   'keep recoTracks_globalMuons_*_*',           # optional  
                   'keep recoTracks_tevMuons_*_*',                # optional 
                   'keep recoTracks_generalTracks_*_*',   
                   'keep recoMuons_muons_*_*',                     # optional  
                   'keep recoCaloMuons_calomuons_*_*',       # optional   
                   # GEN   
                   'keep *_genParticles_*_*'   
                                             )   
   )   
   process.out.outputCommands.extend(RecoMuonPAT.outputCommands)
   

So, if you would use CMS.PhysicsTools/PatAlgos/test/patLayer1_fromAOD_full.cfg.py to produce your home-made sample, it is fine, but you should add to the configuration one of the two blocks above. In MuonAnalysis/Example/test/ there is a copy of patLayer1_fromAOD_full.cfg.py but with the instruction to embed Gen&Track info in. The name of this file is producePATMuons_cfg.py. You can run it to produce a sample. 

It runs (on purpose, see exercise below) on sample produced with 22X but not with 21X.

The result of the corresponding job can be find here:

/afs/cern.ch/cms/Physics/muon/MPOG/Tutorial/PATLayer1_Output.fromAOD_full_HIGGS200GeV.root
and it is accessible from NAF, of course.

Run the code

Now that we have a sample, run the code is very simple:

cd $CMSSW_BASE/src/MuonAnalysis/Examples/test
cmsRun MuonAnalyzer_cfg.py

The result is a root file, whit the following content:

  • NMuons: Number of muons per event
  • pT: transverse momentum of muons
  • pT_Reso: pT(rec)-pT(sim)/pT(sim)
  • EHCal: Energy deposit in HCAL
  • MuonType: Type of Muons (STA/GLB/TM)
  • DiffPt_STA_TK: pT(TK)-pT(STA)
  • CaloCompatibility: Muon hypothesis probability using calorimeters only
  • SegmentCompatibility: Muon hypothesis using segments only
  • NumMatchedChamber: Number of matched chambers in the muon system
  • MuonIDSelectors: muon id selectors incidence
  • pT_TPFMS: pT(rec) of Tracker Plus First Muon Station Track
  • MuIso03SumPt: isolation using SumPt of the tracker tracks within a cone of Delta(R)=0.3
  • MuIso03CaloComb: isolation using 1.2*ECAL+0.8HCAL within a cone of Delta(R)=0.3
  • InvMass4MuSystem: Invariant mass of the 4 muons system

The (very same) output of the job can be found here:

/afs/cern.ch/cms/Physics/muon/MPOG/Tutorial/MyMuonPlots_HIGGS200GeV.root

We can have a look at the file using FWLite. You can set up a .C file with the following lines in:

gSystem->Load("libFWCoreFWLite.so");
AutoLibraryLoader::enable();
gSystem->Load("libDataFormatsFWLite.so");
gSystem->Load("libDataFormatsPatCandidates.so");
Then open a .root file:
TFile fMuons("/afs/cern.ch/cms/Physics/muon/MPOG/Tutorial/
PATLayer1_Output.fromAOD_full_HIGGS200GeV.root")
TTree *events = (TTree*)f.Get("Events")
Then browse some of the muon quantities:
 
events->Draw("patMuons_cleanLayer1Muons__PATMuon.obj.pt()")

Second example: cosmic rays analysis

To run the Muon Analysis on cosmic data we do no use the same input, but a sample from the cosmic data taking is used. Please, have a look at the code and the comments therein (especially if you are willing to upgrade your code to 31X).

Here the simple command to run the job:

cmsRun CosmicMuonAnalyzer_cfg.py

The result is a root file with the same content as the first example. You can find an example of it here:

 /afs/cern.ch/cms/Physics/muon/MPOG/Tutorial/MyCosmicMuonPlots.root
(this file has been produced with a slightly different data set with respect to what chosen for tutorial at NAF. The sample used is anyway commented in the cfg file and it is accessible in DBS). Okay, okay ... here you can find also the exact output of the NAF -job:
 /afs/cern.ch/cms/Physics/muon/MPOG/Tutorial/MyCosmicMuonPlots_NAF.root

Third example: run the visualization

CMS has many type of visualization program. Here an example about how run iguana is given (be sure you logged in with -Y option).

We run a visualization job simply typing iguana file.root or iguana file_cfg.py. In the former approach a default cfg_py is also create and you can use it later as input for iguana itself. Let's see an example and have a direct look at the cosmic rays we analysed earlier:

iguana /store/data/Commissioning08/Cosmics/RECO/CRAFT_V3P_TrackerPointing_v1/
0010/E8D35F0C-03A5-DD11-85BD-003048D15E52.root
the cfg file created by iguana can be found also in the package you downloaded: MuonAnalysis/Examples/test/visMuon_cfg.py and you can use it instead the command above, if you wish.

Once iguana is ready, click on Event->Next Event ...wait... then click on the eye. Use ctrl+n to move to another event and... enjoy yourself!

How to get more information

A good description of muon software in CMS is available here.

Exercise

To practice the different steps described in this tutorial, an exercise (with solution) is proposed.
  caveat: this is based on sample stored at NAF and the solution may not work out-of-the-box in other sites.

Here what is proposed:

  • produce a PAT sample, out of a Z into mumu sample and using embedded gen info and Tracker track in the pat::Muons
    • Tips:
      • search for it in DBS or using dcls if your site used dCache (e.g. at NAF)
      • modify Muon/Analyser/tes/producePATMuons_cfg.py accordingly

There are known issue to run PAT in 2_2_X on sample produced with 2_1_X. So, if you want to run smoothly on a 2_1_X sample (most likely you would) you need to add a couple of line to the cfg:

         from CMS.PhysicsTools.PatAlgos.tools.cmsswVersionTools import run22XonSummer08AODSIM
         run22XonSummer08AODSIM(process)
         from CMS.PhysicsTools.PatAlgos.tools.trigTools import switchOffTriggerMatchingOld
         switchOffTriggerMatchingOld( process )
         process.patDefaultSequence.remove(process.patPFCandidateIsoDepositSelection)
         process.patDefaultSequence.remove(process.patPFTauIsolation)
         
The proposed solution uses the above lines (as it runs over a 2_1_X sample... it is not because we are nasty, but in this way you have a broad set of examples). If you are in trouble you can look at the solution.
  • Modify the MuonAnalyser in order to handle 2 muons instead of 4
    • calculate the invariant mass of Z.  Pay attention at the charge of the muons!
    • what is the percentage of events in which a good invariant mass of the Z (let say, when it is between 60 and 110 GeV) is given by two muons reconstructed with the same charge sign?
    • Do not forget to change the input of your cfg file, now you have your own produced pat-tuple to look at!
  • Use iguana to look at some Z events.

Searching for the solution?

cvs co -r SV00-00-06 MuonAnalysis/Examples

Muon efficiencies

An example of filling and retrieving "tag-&-probe" efficiencies with the PAT::muon object is available here.

Review status

Reviewer/Editor and Date (copy from screen) Comments
RogerWolf - 13 May 2009 Created the template page
RiccardoBellan - 23 Jun 2009 Modified for Tutorial at DESY

Responsible: RiccardoBellan
Last reviewed by: -- RiccardoBellan - 23 Jun 2009



7.2.1 PAT Examples: Tracking, Vertexing and b-Tagging

Contents

Introduction

This tutorial is meant to deliver basic insight into CMSSW and the PAT framework. It is focused on the three following topics:

  • Tracking
  • Vertexing
  • b-Tagging
This tutorial was presented the first time on a workshop at DESY, Hamburg, Germany. Therefore, all DESY/NAF site specific settings required during this tutorial can be found here. They are marked by info in the sections below.

Preparation

This tutorial was set up using CMSSW_2_2_13. Basically it should work as well with other software versions. So in order to begin with the different exercises presented below, set up your CMSSW environment and enter the following directory: CMSSW_2_2_13/src.

info For instructions on how to set up the CMSSW environment at DESY/NAF follow this link.

Update: The tutorial was tested and verified to work with CMSSW_3_3_X releases (e.g. CMSSW_3_3_6) with the appropriate RelVal samples.

How to get the code

The tutorial code will be available in future CMSSW releases in the package CMS.PhysicsTools/PatExamples. For currently available CMSSW versions, the CVS HEAD version of this package has to be used (additionally, the CVS tag: TUT_DESY_ws_tracking_btagging has been set up to only get the files related to this one tutorial).

To get a copy of the PAT tutorials/examples, use the following command (replace <tag> with HEAD or info TUT_DESY_ws_tracking_btagging):

addpkg PhysicsTools/PatExamples <tag>

Alternatively you can use:

cvs co -r <tag> PhysicsTools/PatExamples

info If your DESY/NAF username differs from your CERN lxplus username, then you should follow these instructions.

All for this tutorial important files can be found in the directories CMS.PhysicsTools/PatExamples/plugins/ and CMS.PhysicsTools/PatExamples/test/. More details about how to run the different tutorial examples are given in the appropriate subsections below.

Additional Information

Each time something in the CMS.PhysicsTools/PatExamples/plugins/ directory is changed, one needs to compile the code to make the changes available to CMSSW. In case the tutorial code was just checked out, please invoke the compilation now by executing:

scram b

info For the tutorial at DESY/NAF, a skimmed dataset has been prepared. Find more information about it here. Keep in mind that you should replace the input dataset specified in the different tutorial configuration files, but this will be described again in the subsections below.

Exercise I - Tracking

Due to their sheer complexity, topics like running or understanding track reconstruction in details are not covered here. Instead, we are restricting ourselves to understanding the results of the track reconstruction by looking at observables of the reconstructed tracks.

tracker.jpg

Let's have a look at the example:

In the plugins directory (inside the src/CMS.PhysicsTools/PatExamples directory, as for the rest of the tutorial) you can find the CMSSW EDAnalyzer called PatTrackAnalyzer in the PatTrackAnalyzer.cc source code file. We will discuss the contents of this file further below.

The corresponding CMSSW configuration file to execute this analyzer can be found in the test directory and is called analyzePatTracks_cfg.py. Note that the naming scheme of the analyzers and configuration files is consistent for all the examples in this tutorial.

The file test/analyzePatTracks_cfg.py should look like the following:

import FWCore.ParameterSet.Config as cms

process = cms.Process("Test")

Here we are defining a regular CMSSW config file with a process named Test (arbitrary, really, as long as it does not clash with any other process name).

    4# initialize MessageLogger and output report
    5process.load("FWCore.MessageLogger.MessageLogger_cfi")
    6process.MessageLogger.cerr.threshold = 'INFO'
    7process.MessageLogger.categories.append('PATSummaryTables')
    8process.MessageLogger.cerr.INFO = cms.untracked.PSet(
    9        default          = cms.untracked.PSet( limit = cms.untracked.int32(0)  ),
   10        PATSummaryTables = cms.untracked.PSet( limit = cms.untracked.int32(-1) )
   11)
   12process.options   = cms.untracked.PSet( wantSummary =
   13cms.untracked.bool(True) )
   14
   15process.load("Configuration.StandardSequences.Geometry_cff")
   16process.load("Configuration.StandardSequences.FrontierConditions_CMS.GlobalTag_cff")
   17process.load("Configuration.StandardSequences.MagneticField_cff")
   18
   19process.GlobalTag.globaltag = cms.string('IDEAL_V12::All')

Here, a few generic things are set up and standard sequences included (like geometry, calibrations and magnetic field map). They are mostly needed in order to be able to run PAT on RECO or AOD files. If you are already using PAT-tuples as input, they can principally also be removed (but don't hurt if they are defined without being needed). The tag IDEAL_V12::All has to be compatible with the simulated misalignment and the software version used (in our case CMSSW_2_2_13 and without misalignment).

There are two ways one can get the information about the globaltag

First option is to use DBS. This method is convenient as one need not have the RECO/AOD file on the disk.

* Visit the DBS site (https://cmsweb.cern.ch/dbs_discovery/)

* Do a search like (for example): find dataset where file like /store/relval/CMSSW_3_3_6/RelValTTbar/GEN-SIM-RECO/STARTUP3X_V8H-v1/ and dataset.status like VALID*

This gives result similar to the one shown here.

* There you can click on "Conf.files" below the found results. This gives you the config which has been used to produce the dataset.

* Here you find: process.GlobalTag.globaltag = 'STARTUP3X_V8H::All'

Secondly, one can get the above information by using edmProvDump on the RECO/AOD file assuming that the file is available on a local disk.

In addition, the message logger is configured. For CMSSW_3_3_X, you have to comment out the following four lines (leave it untouched in case of CMSSW_2_2_X):

    8#process.MessageLogger.cerr.INFO = cms.untracked.PSet(
    9#        default          = cms.untracked.PSet( limit = cms.untracked.int32(0)  ),
   10#        PATSummaryTables = cms.untracked.PSet( limit = cms.untracked.int32(-1) )
   11#)
   12

   20# produce PAT Layer 1
   21process.load("CMS.PhysicsTools.PatAlgos.patSequences_cff")
   22# switch old trigger matching off
   23from CMS.PhysicsTools.PatAlgos.tools.trigTools import
   24# switchOffTriggerMatchingOld
   25switchOffTriggerMatchingOld( process )

This is needed to create PAT collections on the fly (again redundant when using PAT-tuples). In case of CMSSW_3_3_X, comment out the two lines about old trigger matching. This is now by default the case in current releases.

   26process.maxEvents = cms.untracked.PSet( input = cms.untracked.int32(-1) )

We'd like to run over all the events in the input files (hence -1, otherwise we specify the number of events).

   27process.source = cms.Source("PoolSource",
   28        fileNames = cms.untracked.vstring(
   29                '/store/mc/Fall08/TTJets-madgraph/GEN-SIM-RECO/...',
   30        )
   31)

We define the EDM data files to run on. This examples chooses a RECO file from Fall08 TTbar production. This is not a PAT-tuple, so we need to run PAT on the fly. You can find other samples in DBS, make sure they are compatible with the software release (i.e. CMSSW_2_2_X).

info Due to time constraints at the DESY/NAF tutorial, please modify the config file as described here in order to use skimmed data samples locally available.

   32process.TFileService = cms.Service("TFileService",
   33        fileName = cms.string("analyzePatTracks.root")
   34)

The analyzer is using the TFileService to put histograms into a ROOT file. Here we define the service and give it a filename to write to.

   35process.analyzeTracks = cms.EDAnalyzer("PatTrackAnalyzer",
   36        src = cms.InputTag("generalTracks"),
   37        beamSpot = cms.InputTag("offlineBeamSpot"),
   38        qualities = cms.vstring("loose", "tight", "highPurity")
   39)
   40
   41process.analyzeMuons = cms.EDAnalyzer("PatTrackAnalyzer",
   42        src = cms.InputTag("selectedLayer1Muons"),
   43        beamSpot = cms.InputTag("offlineBeamSpot"),
   44        qualities = cms.vstring("undefQuality")
   45)

Here we define two instances of our PatTrackAnalyzer. The first instance runs on the so-called generalTracks collection which is the most basic track collection in CMSSW. It contains all kinds of tracks, reconstructed with the default "Combinatorial Kalman Track Finder". It includes any kind of charged tracks (pions, kaons, muons and so on) and tries to be as inclusive as possible. In CMSSW_2_2_12 it uses the three-step tracking at this point, which means that it tries to include a wide pT spectrum and also attempts to reconstruct detached tracks. Simple quality cuts have already been applied (the "loose" selection), but it is possible to go to tigher cuts by specifically selecting only tracks with the "tight" or even "highPurity" flags.

The second instance runs on reconstructed muons. Inside the code we will restrict us to muons that have a part reconstructed in the strip tracker. In addition to that they shall also have a part reconstructed in the muon system and a global track fit on both parts simulatenously.

Explanation of the different parameters src, beamSpot and qualities:

  • The src parameter is the parameter that is read by the analyzer and chooses the EDM collection that contains the tracks you want to look at. The analyzer is somewhat flexible and can deal with either plain RECO tracks or PAT muons. The collections for the two analyzers are named "generalTracks" and "selectedLayer1Muons". You can in principle also look at any other track collection that is derived from reco tracks. If you click through the content of an EDM file (e.g. using the TBrowser within ROOT) you will find a few such collections.
  • The beamSpot is the collection which contains the beam spot information (i.e. where the interaction is supposed to take place). It is used to determine the reference point for the track parameters for determination of impact parameters (e.g. "d0" variable in the code). The reason it is needed is because the beam spot in our simulation does not exactly sit at (0, 0), but is slightly shifted (in the real world the beam will likely not sit at (0, 0) as well, so our simulation moves the spot to find potential bugs in the code that are erroneously assuming this. So let's go with it as well, we don't want to fall in that trap either, don't we?
  • Finally, the qualities parameter represents the list of track qualities that we want to look at. For the generalTracks we are looking at the three quality flags mentioned earlier, for the muons we don't care and use the default "undefQuality" which means we select all tracks (muons). The analyzer will write out all histograms for all quality selections.

   46process.p = cms.Path(
   47        process.patDefaultSequence *
   48        process.analyzeTracks *
   49        process.analyzeMuons
   50)

Here we define the sequence to run. If you are running on PAT-tuples, you should comment out the first line (using a # in front), otherwise PAT will be run on the fly. The other two entries here represent the two analyzers we just configured above.

info At the DESY/NAF workshop, we use a prepared PAT-tuple, therefore you should comment the first line: # process.patDefaultSequence *.

Now, given that your input file is valid, the CMSSW environment is set up correctly and that the analyzers compiled successfully (scram b), you can run the example:

cmsRun analyzePatTracks_cfg.py

When the job is done, you will find a file named analyzePatTracks.root in the current directory, containing all the plots from the analyzers. You can open the file using the ROOT command line:

root -l analyzePatTracks.root

and browse a bit by invoking new TBrowser within the ROOT command line interpreter. You will notice that the plots for each analyzer can be found in a directory named after the analyzer.

There is also a ROOT macro prepared for automatic plotting of the file contents including basic histogram formatting. You can call it by executing:

root -l patTracks_showPlots.C

Have a look at the file and open in the editor. You can change the lines with the track qualities and directory to also look at the plots for the muons. Note how the number of hits in the different components are shown in a stacked plot. It is using THStack for that purpose. We will discuss the plots further below after discussing the source code for the analyzer.

Here the source code for the analyzer PatTrackAnalyzer.cc (It will be discussed in full detail here only this first time):

    1#include <iostream>
    2#include <cmath>
    3#include <vector>
    4#include <string>
    5
    6#include <TH1.h>
    7#include <TProfile.h>
    8
    9#include "FWCore/Framework/interface/Frameworkfwd.h"
   10#include "FWCore/Framework/interface/Event.h"
   11#include "FWCore/Framework/interface/EventSetup.h"
   12#include "FWCore/Framework/interface/EDAnalyzer.h"
   13#include "FWCore/Utilities/interface/InputTag.h"
   14#include "FWCore/ParameterSet/interface/ParameterSet.h"
   15#include "FWCore/ServiceRegistry/interface/Service.h"
   16
   17#include "CMS.PhysicsTools/UtilAlgos/interface/TFileService.h"
   18
   19#include "DataFormats/Common/interface/View.h"
   20#include "DataFormats/BeamSpot/interface/BeamSpot.h"
   21#include "DataFormats/TrackReco/interface/Track.h"
   22#include "DataFormats/TrackReco/interface/HitPattern.h"
   23#include "DataFormats/PatCandidates/interface/Muon.h"

Header files we will need.

   24class PatTrackAnalyzer : public edm::EDAnalyzer  {
   25    public: 
   26        // constructor and destructor
   27        PatTrackAnalyzer(const edm::ParameterSet &params);
   28        ~PatTrackAnalyzer();
   29
   30        // virtual methods called from base class EDAnalyzer
   31        virtual void beginJob(const edm::EventSetup&);
   32        virtual void analyze(const edm::Event &event, const edm::EventSetup &es);

Here we define our analyzer by deriving from the CMSSW framework base class EDAnalyzer. We need to define a constructor that is passed a ParameterSet that will contain the configuration from the python config file. We also define the methods beginJob and analyze which are called by the base class, hence they are defined virtual.

   33private:
   34        // configuration parameters
   35        edm::InputTag src_;
   36        edm::InputTag beamSpot_;
   37
   38        // the list of track quality cuts to demand from the tracking
   39        std::vector<std::string> qualities_;

Here we store the parameters from the CMSSW config file in class members. The InputTag class contains references to EDM products (i.e. ROOT file branches) and are preferred over simple string labels.

   40struct Plots {
   41                TH1 *eta, *phi,;
   42                TH1 *pt, *ptErr;
   43                TH1 *invPt, *invPtErr;
   44                TH1 *d0, *d0Err;
   45                TH1 *nHits;
   46
   47                TProfile *pxbHitsEta, *pxeHitsEta;
   48                TProfile *tibHitsEta, *tobHitsEta;
   49                TProfile *tidHitsEta, *tecHitsEta;
   50        };
   51
   52        std::vector<Plots> plots_;
   53};

Here we store all our histograms for the variables we want to plot. The profile histograms store the number of hits the track has in the different subdetector components. The profile is a neat way to simple get the average number of hits per pseudorapidity slice (eta) on the x-axis.

We use a vector of plots, so that we can store plots for each track quality selection in parallel.

   54PatTrackAnalyzer::PatTrackAnalyzer(const edm::ParameterSet &params) :
   55        src_(params.getParameter<edm::InputTag>("src")),
   56        beamSpot_(params.getParameter<edm::InputTag>("beamSpot")),
   57        qualities_(params.getParameter< std::vector<std::string> >("qualities"))
   58{
   59}
   60
   61PatTrackAnalyzer::~PatTrackAnalyzer()
   62{
   63}

Constructor and destructor. Not much to say here. Notice how elegantly we can read the parameters from the CMSSW config file using the constructor syntax. The part enclosed in quotes is the name of the variable to be read and has to match the name specified in the CMSSW config file.

   64void PatTrackAnalyzer::beginJob(const edm::EventSetup &es)
   65{
   66        // retrieve handle to auxiliary service
   67        // used for storing histograms into ROOT file
   68        edm::Service<TFileService> fs;
   69
   70        // now book the histograms, for each category
   71        unsigned int nQualities = qualities_.size();
   72
   73        plots_.resize(nQualities);
   74
   75        for(unsigned int i = 0; i < nQualities; ++i) {
   76                // the name of the quality flag
   77                const char *quality = qualities_[i].c_str();
   78
   79                // the set of plots
   80                Plots &plots = plots_[i];
   81
   82                plots.eta = fs->make<TH1F>(Form("eta_%s", quality),
   83                                           Form("track \\eta (%s)", quality),
   84                                           100, -3, 3);
   85
   86[...]
   87}

The beginJob method is called once at the beginning of the CMSSW job. The main difference with respect to the constructor from our point of view is that here we have access to framework services and event setup, whereas in the constructor these are not set up yet.

Here we first initialize the TFileService, i.e. get a reference to it using the framework serivce concept with edm::Service. We resize the vector of plots to the number of track qualities and then start booking the histograms in a loop.

fs->make is the TFileService way to call new TH1F. It makes sure that the histograms are booked such that they will automatically end up in the correct file.

We are building the histogram name and title using ROOT's Form() call. It is essentially following the C langauge sprintf syntax, where %s dennotes a place holder for a string, %d for an integer, %f for a float, and so on. The parameters to fill into the placeholders are then passed as argument. We need this here to append the track quality to the name of each plot.

   88void PatTrackAnalyzer::analyze(const edm::Event &event, const edm::EventSetup &es)
   89{  
   90        // handles to kinds of data we might want to read
   91        edm::Handle<reco::BeamSpot> beamSpot;
   92        edm::Handle< edm::View<reco::Track> > tracksHandle;
   93        edm::Handle< pat::MuonCollection > muonsHandle;
   94
   95        // read the beam spot
   96        event.getByLabel(beamSpot_, beamSpot);
   97[...]

Here we define the analyze method implementation, which is called for each event. First we define three handles, one handle for each EDM collection that we want to (potentially) read.

We do the actual reading by calling the getByLabel method on the event argument. This method is passed the input tag (the label) and it will fill the handle just defined with the contents of the collection.

The next part is a bit tricky, it will fill the tracks or muons. To do this it uses some "magic" to detect whether it is running on tracks or muons (i.e. what kind of collection you actually passed with the src label). You can learn a few framework tricks by looking at this, but it is not essential for this tutorial.

The important point is that we will fill this collection:

   98std::vector<const reco::Track*> tracks;

which will contain the tracks for both the generalTracks and muon case.

   99// we are done filling the tracks into our "tracks" vector.
  100// now analyze them, once for each track quality category
  101
  102unsigned int nQualities = qualities_.size();
  103for(unsigned int i = 0; i < nQualities; ++i) {

Here we loop over the qualities we want to look at.

  104// we convert the quality flag from its name as a string
  105// to the enumeration value used by the tracking code
  106// (which is essentially an integer number)
  107reco::Track::TrackQuality quality = reco::Track::qualityByName(qualities_[i]);

qualities_[i] is a std::string that was passed from the configuration file. We need to convert this into a different representation (an enumeration value like predefined colors kBlack, kRed, which are esentially integers).

Have a look at the header file TrackBase.h. We will use these methods a lot in the next few lines. Note that the reco::Track class simply derives from the reco::TrackBase class, so you can also have a look at the Track.h class itself. The qualityByName method converts the string into the TrackQuality enumeration for us.

  108// now loop over the tracks
  109for(std::vector<const reco::Track*>::const_iterator iter = tracks.begin();
  110        iter != tracks.end(); ++iter) {
  111             // this is our track
  112             const reco::Track &track = **iter;

How this works should be pretty obvious, if you are familiar with C++ style iterators. (If not: You should really get familiar with them! wink )

We are dereferencing twice here (the two asterisks). First the iterator, and second the pointer inside the collection, to store a reference to the track in our track variable. It can sometimes be a pain to figure out how to do that correctly, but most of the time trial and error or a little bit of thinking and looking at header files works. Please ask an expert before resorting to any kind of ugly hacks, there is usually a nice and clean way to do so.

  113// ignore tracks that fail the quality cut
  114if (!track.quality(quality))
  115        continue;

Here we use the quality method of a track to check if the track fulfills our requirement.

  116// and fill all the plots
  117plots.eta->Fill(track.eta());
  118plots.phi->Fill(track.phi());
  119[...]

This should be pretty obvious.

  120// the hit pattern contains information about
  121// which modules of the detector have been hit
  122const reco::HitPattern &hits = track.hitPattern();

Here we get information about the hit pattern of the track. This class can tell us exactly which modules have been hit by the track (or more precisely: which hits have been used to reconstruct the track). It also tells us in which modules hits were expected, but not found. These are called "lost" hits. Sometimes lost hits are expected, because the module is known to be faulty, sometimes they're not. Have a look at the header files, everything is explained in the comments: HitPattern.h

  123double absEta = std::abs(track.eta());
  124// now fill the number of hits in a layer depending
  125// on eta
  126plots.pxbHitsEta->Fill(absEta, hits.numberOfValidPixelBarrelHits());
  127plots.pxeHitsEta->Fill(absEta, hits.numberOfValidPixelEndcapHits());
  128[...]

We fill the profile histograms with the number of hits in tracking subdetectors with respect to the pseudo-rapidity.

Here is a nice excercise: Add in the three muon subsystems (DT, CSC, RPC), rerun the analyzer on the muons and add them to the plotting macro and have a look at the stacked plot.

Alternatively, you can have a look at the position of each individual hit, but in order to do this you need to have a look at the methods in Track.h. Furthermore, you need to run on RECO for this (neither AOD nor PAT-tuples will work, as the information is not available here). This is outside of the scope of this tutorial.

  129#include "FWCore/Framework/interface/MakerMacros.h"
  130
  131DEFINE_FWK_MODULE(PatTrackAnalyzer);

And finally these lines will turn our C++ class into a framework module.

Now have a look at the the plots from the ROOT macro. Can you explain the observed quantities?

Does the subdetector hit distribution make sense? You can compare it to: http://www.ba.infn.it/~zito/cms/tvis.html

Exercise II - Vertexing

In this sub-tutorial we will look at properties of the reconstructed primary vertex. What is done here software-wise is that to all tracks from the global track collection (the generalTracks that we just looked at), some quality cuts are applied (like required hits in the pixel detector, not too far from the beam spot and so on). Then the tracks are clustered into groups along the z-axis. Remember: With pile-up we can get multipled interactions, so we can get multiple primary vertices along the beam spot. Each group of tracks is then fitted using the AdaptiveVertexFitter and if the resulting vertex passes some basic quality cuts (like minimum number of compatible tracks, compatibility with the beam spot), it is added to the offlinePrimaryVertices collection. The vertices here are ordered by descending sum of pT^2 of the tracks. The reason is that the signal vertex usually has the highest-energetic tracks coming out of it and this selection is close to 100% efficient.

Let's start with the analyzer code PatVertexAnalyzer.cc. After inclusion of a bunch of header files, we will again define a C++ class named PatvertexAnalyzer which contains a constructor, beginJob and an analyze method. Two input tags are taken from the config file: src (the primary vertex collection) and mc. The mc collection is used to get the Monte Carlo truth, as we will be comparing the position of the reconstructed vertex to the actual position where it was simulated.

[...]
    private:
        // configuration parameters
        edm::InputTag src_;
        edm::InputTag genParticles_;

        TH1 *nVertices_, *nTracks_;
        TH1 *x_, *y_, *z_;
        TH1 *xErr_, *yErr_, *zErr_;
        TH1 *xDelta_, *yDelta_, *zDelta_;
        TH1 *xPull_, *yPull_, *zPull_;
};

We will plot the number of reconstructed primary vertices per event. The remaining histograms will only contain nubmers for the first primary vertex (supposedly the signal vertex). Here we plot the number of tracks at the vertex, it's position in three dimensions and the errors on these three coordinates. The Delta variables are the difference between the reconstructed position of the vertex and the simulated, one for each coordinate respectively. Finally, the Pull histograms is what is used by the validation group to test whether everything is fine with the reconstruction. It shows the distribution of (rec - sim) / error for the three coordinates. If everything is fine with the reconstruction and the errors are estimated correctly, these pull distributions should give a gaussian distribution around zero with a width of 1.

Question:

  • Why should it show such a behaviour?

Now let's skip the constructor and beginJob (the stuff in here is pretty obvious) and go to the analyze method:

// handle to the primary vertex collection
edm::Handle<reco::VertexCollection> pvHandle;
event.getByLabel(src_, pvHandle);

// handle to the generator particles (i.e. the MC truth)
edm::Handle<reco::GenParticleCollection> genParticlesHandle;
event.getByLabel(genParticles_, genParticlesHandle);

Here we are defining the handle to hold the collection. Note that typically type names like reco::VertexCollection are just typedef's to std::vector<reco::Vertex>.

The reco::Vertex class is defined in Vertex.h, have a look at the header file.

You can see methods for accessing position information, error information (which is in fact a full matrix, we are just using the diagonal information to get the errors for x, y and z separately) and methods for accessing the tracks and a few more things. The tracks_begin and tracks_end methods can be used as in the previous example to loop over the tracks. We will do this later to obtain secondary vertices relevant for e.g. b-tagging.

// extract the position of the simulated vertex
math::XYZPoint simPV = (*genParticlesHandle)[2].vertex();

We are extracting the simulated primary vertex from the list of generator particles. If you are wondering why we are using the third particle (index 2, starting form 0), this is because the first two particles are the incoming protons, and those do not have an incoming vertex inside the detector, obviously. The third particle is a daughter of one of the protons and is produced at the production vertex.

// the number of reconstructed primary vertices
nVertices_->Fill(pvHandle->size());

// if we have at least one, use the first (highest pt^2 sum)
if (!pvHandle->empty()) {
      const reco::Vertex &pv = (*pvHandle)[0];

If there is at least one reconstructed primary vertex, we will access the first element. Notice how edm::Handle<...> behaves like a C++ pointer. We use -> or * to derefence it to get to the std::vector<...> that is embedded inside it.

The rest below just fills the histograms as discussed above.

Now run the example by executing cmsRun analyzePatVertex_cfg.py. Make sure that the input files are ok and you run the PAT sequence if it is not a PAT-tuple file. In case of CMSSW_3_3_X, make sure to adapt the config file in a similar way as in the first exercise.

info In case of the DESY/NAF tutorial, please change the input dataset as described here. Additionally comment # process.patDefaultSequence * at the end of the configuration file: analyzePatVertex_cfg.py

The two collections passed for the input tags of the PatVertexAnalyzer are offlinePrimaryVertices and genParticles (the latter containing the MC truth).

Now have a look at the results by using the TBrowser on analyzePatVertex.root directly.

Questions:

  • Can you explain the number of primary vertetices per events distribution? These events are simulated without pile-up, why could there be more than one vertex in some events?
  • Can you derive the position of the beam spot and its width from the plots?
  • What is the precision of the resolution? Why is it different for the z coordinate?
  • Do you agree that the pull distribution looks how it should? Why is that so?
  • What is the efficiency of finding the primary vertex in this type of event?

Exercise III - b-Tagging

In this excercise we will make a straight jump to b-tagging and later on combine the things learned with the two previous exercises. b-tagging is essentially a flag that can be given to a jet of any kind to indicate whether it is likely containing a b-hadron decay (and hence originating from a b quark before hadronisation) or not. How such an algorithm works is discussed in the following exercises. Here we will simply have a look at what the official CMS algorithms are giving us.

btag.jpg

For this exercise we will be looking at PAT jets. PAT jets have the advantage that they have all information available on AOD level directly embedded into the C++ class and information does not need to be picked up from a gazillion of individual collections and matched to the jet by hand. Also, the flavour of a jet (or more precisely the flavour of the parton, quark or gluon that this jet originates from) is embedded as MC truth information in the PAT jet object.

So, the analyzer here will be rather trivial. We will be looping over all PAT jets, retrieve the MC flavour truth and the result of a few b-tagging algorithms (see official CMS b-tagging algorithms for the list of official algorithms) and just plot them.

Lets have a look at the with analyzer, PatBJetTagAnalyzer.cc. After the usual header files we get to the class definition, where we look at the private members:

    1// configuration parameters
    2edm::InputTag jets_;

The label of our PAT jet collection.

    1double jetPtCut_;               // minimum (uncorrected) jet energy
    2double jetEtaCut_;              // maximum |eta| for jet

Since we are doing b-tagging, it makes no sense to tag jets outside of the tracker acceptance, so we cut on eta. (we will set |eta| < 2.4). Also, jets go down to infinitely small energies in principle until we get individual particles and noise. In order to stay in perturbative QCD regime and account for the bending of tracks in the magnetic field, we should demand some minimum jet energy as well, so we allow to also cut on a minimum jet transverse momentum.

    1enum Flavour {
    2                ALL_JETS = 0,
    3                UDSG_JETS,   
    4                C_JETS,   
    5                B_JETS,
    6                NONID_JETS,
    7                N_JET_TYPES
    8};

We define our own enumeration for the kinds of jets that we want to distinguish. We will use this enumeration as array index later on, so start with zero. The first histogram in each entry ALL_JETS will contain the cumulative entries of all kinds of jets regardless of their flavour.

The second entry, UDSG_JETS will contain only light flavour jets, i.e. jets from u, d, s quarks or gluons. These will not contain any long-lived c- or b-hadron decay.

The third and fourth jets now are c- and b-jets, the fifth category for jets that have no MC truth assigned (basically energy clusters in the calorimeter that managed to survive the pT cut, but cannot be assigned a hard parton). The last N_JET_TYPES is not an index, but will simply be the required size of the arrays holding the histograms.

    1TH1 * flavours_;
    2
    3// one group of plots per jet flavour;
    4struct Plots {
    5        TH1 *discrTC, *discrSSV, *discrCSV;
    6} plots_[N_JET_TYPES];

flavours_ will contain a simple histogram with the flavour distribution, i.e. bin 0 will contain the number of all jets, bin 1 the number of light flavour jets, bin 3 the number of b-jets, ...

plots_ containts histograms with the discriminators for the three algorithms we are selecting: "Track Counting", "Simple Secondary Vertex" and "Combined Secondary Vertex".

We will then skip the constructor and histogram booking and go to the analyze method:

    1void PatBJetTagAnalyzer::analyze(const edm::Event &event, const edm::EventSetup &es)
    2{  
    3        // handle to the jets collection
    4        edm::Handle<pat::JetCollection> jetsHandle;
    5        event.getByLabel(jets_, jetsHandle);
    6
    7        // now go through all jets
    8        for(pat::JetCollection::const_iterator jet = jetsHandle->begin();
    9            jet != jetsHandle->end(); ++jet) {

This is the classic retrieval of a collection and looking over the entries, which are PAT jets. The header file is here.

    1// only look at jets that pass the pt and eta cut
    2if (jet->pt() < jetPtCut_ ||
    3       std::abs(jet->eta()) > jetEtaCut_)
    4               continue;

Jets are "candidates" in terms of CMSSW software, and candidates are particles are fourvectors. So these have a momentum and angles. We use the pt() and eta() members to implement our cuts.

    1Flavour flavour;
    2// find out the jet flavour (differs between quark and
    3// anti-quark)
    4switch(std::abs(jet->partonFlavour())) {
    5                    case 1:
    6                    case 2:
    7                    case 3:
    8                    case 21:
    9                        flavour = UDSG_JETS;
   10                        break;
   11                    case 4:   
   12                        flavour = C_JETS;
   13                        break;
   14                    case 5:   
   15                        flavour = B_JETS;
   16                        break;
   17                    default:  
   18                        flavour = NONID_JETS;
   19}

We convert the value returned by the partonFlavour() method and assign it one of our Flavour indices. The value returned is the PDG id, which is between 1 to 6 for d, u, s, c, b and t quarks, negative for respective anti-quarks and 21 for gluons.

    1// simply count the number of accepted jets
    2flavours_->Fill(ALL_JETS);
    3flavours_->Fill(flavour); 

Obvious.

    1double discrTC =jet->bDiscriminator("trackCountingHighEffBJetTags");
    2double discrSSV = jet->bDiscriminator("simpleSecondaryVertexBJetTags");
    3double discrCSV = jet->bDiscriminator("combinedSecondaryVertexBJetTags");

Here we return the discriminator values for the three algorithms. As mentioned earlier these are stored directly inside the PAT jets. You will notice that the discriminator are floating point values and not boolean values. This means that as a user you have to cut on the value and this allows you to choose a working point. The higher the value, the more likely it is that the jet is actually a real b-jet. The range and shape of the discriminator is highly dependent on the algorithm (see already the different ranges in the histogram booking).

The lines below just fill the histograms with the discriminator distribution.

Run the example by executing cmsRun analyzePatBJetTags_cfg.py. In case of CMSSW_3_3_X, make sure to adapt the config file in a similar way as in the first exercise.

info In case of the DESY/NAF tutorial, please change the input dataset as described here. Additionally comment # process.patDefaultSequence * at the end of the configuration file: analyzePatBJetTags_cfg.py

You will see that we are cutting on a minimum pT of 30 GeV for the jets here (these are uncorrected values and correspond roughly to 50 GeV corrected jets in the central region). Otherwise nothing special in the config file.

You can either look at the output by using a plain TBrowser to get the discriminator distribution, or use the provided macro patBJetTags_efficiencies.C which is much more sophisticated.

Executing it (root -l patBJetTags_efficiencies.C) will give you one canvas per algorithm. The leftmost plot shows the discriminator distribution for the three kinds of jets: Light flavour jets, which should clearly be distinguishable from the b-jets, and c-jets which are harder, since they contain real lifetime and are often therefore usually handled separately.

You can see that all light flavour jet discriminators peak to the left of the histogram and b-jets have larger tails or peak to the right. The "Track Counting" algorithm is based on the impact parameter significance and therefore has an exponential tail, which represents the exponentially decaying real lifetime distribution of the b-hadron. The "Simple Secondary Vertex" distribution is esentially the same, only that it's an actual transverse distance significance of the reconstructed secondary vertex and the primary vertex, which is also an exponentially decaying function, except that it's taken as log(1 + sig). The "Combined Secondary Vertex" finally is the output of a Multivariate Analysis, and is in the used case a Likelihood Ratio of the type S / (S + B) in particular and therefore limited to the range between 0 and 1.

The plot in the middle is derived from these discriminator distributions. It shows the relative probability to tag a jet at a given working point, i.e. discriminator cut, which is also called an efficiency. The jet is tagged if the discriminator is above the cut value, so the efficiency is defined as the number of jets which have a discriminator that is above that cut, divided by all jets (of the same flavour). In other words, the integral of the histogram from a certain discriminator cut up to infinity, divided by the total number of jets.

Open the patBJetTags_efficiencies.C source code and see how this is done in the computeEffVsCut method. It is taking a discriminator distribution histogram as input, creates a new histogram with the same x-axis and then scans through all the bins and stores the normalized integral as bin content, as just discussed. The errors are binomial errors. The total number of jets in a certain category is taken from the "flavour" histogram. What you see in the histogram is that the efficiency to tag a b-jet is always higher than that for a non-b-jet, obviously. The higher the separating powers, the better. This curve can be used to choose the optimal working point for your analysis.

The third plot on the right is essentially just a different representation of efficiencies. Instead of showing the non-b-jet efficiencies in terms of discriminator cut, they are show with respect to the b-tagging efficiency at the same working point. This is done in the computeEffVsBEff method. The histograms are scanned bin-wise and for each pair of b-efficiency and non-b-efficiency in a given bin, an entry in the histogram is filled with that pair. The non-b-jet efficiency is also called mistag, since tagging a non-b-jet is what you want to avoid when doing b-tagging. So, the lower the mistag rate and the higher your b-tagging efficiency is, the better your algorithm performs.

Questions:

  • The "Simple Secondary Vertex" can only tag jets, that have a reconstructed secondary vertex. With this information, can you explain why the mistag versus b-eff. curve stops before reaching the upper right corner?
  • What significance does the endpoint have and what information does this contain?

Exercise IV - b-Tagging with tracks

Now we will try to construct a simplified version of the "Track Counting" algorithm ourselves. Have a look at PatBJetTrackAnalyzer.cc. After the usual header inclusions, class and method definition, we will have the following members:

    1private:
    2        // configuration parameters
    3        edm::InputTag jets_;
    4        edm::InputTag tracks_;
    5        edm::InputTag beamSpot_;
    6        edm::InputTag primaryVertices_;

We will need jets (which we want to tag), tracks (we will need impact parameters) and primary vertices. The latter are used to compute the impact parameters. We will only use an approximation for that, as the exact computation involves using a few more complicated steps with creation of transient tracks, which we will not go into. The beam spot is needed as well to get this approximation to work (as we will use the track's parameterisation reference point, which is closest to the beam spot).

    1double jetPtCut_;               // minimum (uncorrected) jet energy
    2double jetEtaCut_;              // maximum |eta| for jet
    3double maxDeltaR_;              // angle between jet and tracks

The first two are the jet kinematics cuts, as already seen in the previous excercise. The maxDeltaR_ parameter is new, it is used to test whether a track is considered to be "inside" a jet or not. We will use a Delta R = sqrt(Delta phi^2 + Delta eta^2) criterium with respect to the center of the jet to associate tracks in a cone around the jet center.

    1double minPt_;                  // track pt quality cut
    2unsigned int minPixelHits_;     // minimum number of pixel hits
    3unsigned int minTotalHits_;     // minimum number of total hits

These are very important. b-tagging is extremely sensitive to badly reconstructed tracks, so we are demanding a minimum transverse momentum (so that there is not too much curvature, not too much energy loss, extrapolation uncertainty and fake rate). In addition, a minimum number of total hits and especially enough pixel hits for the extrapolation to the impact point is required as well.

    1unsigned int nThTrack_;         // n-th hightest track to choose

This is the key to the "Track Counting" algorithm. This algorithm uses exactly one impact parameter significance of a certain track as discriminator. This is how we choose this track: We simply compute the IP significance for all tracks in the jet, then order them by IP significance and take the n-th highest track. For nThTrack_ a trade-off has to be found. If the number is too small we risk of selecting too many badly reconstructed tracks or tracks from unavoidable long-lived decays from K_short or lambda for instance. On the other hand, the average charged decay multiplicity of a b-hadron is around five, and with out quality cuts it is effectively smaller than that. So we can't choose this number too large, or we will miss the b-decay tracks. Good numbers are 2 or 3. We will go with 2, as it is the number chosen by the official "Track Counting High Efficiency" algorithm. You are welcome to change this number to see what happens.

The next part in the source file is used for our flavour definition and is identical to the previous excercise, as well as some histogram booking, which we will skip and directly go to the analyze method:

It starts off with loading of collections into edm::Handle variables, nothing new here.

    1// rare case of no reconstructed primary vertex
    2if (pvHandle->empty())
    3        return;
    4
    5// extract the position of the (most probable) reconstructed vertex
    6math::XYZPoint pv = (*pvHandle)[0].position();

We are lazy and will just skip events that don't have a primary vertex. If it has one, we will save the position of the first (i.e. signal primary vertex) for later.

Then we will loop over all jets, check for the pT and |eta| cut, and check the jet flavour, as before.

    1[...]
    2// this vector will contain IP value / error pairs
    3std::vector<Measurement1D> ipValErr;

This variable will be used to store the impact parameters for all tracks. Measurement1D is a useful little helper class for computing significances. The header can be found here.

The impact parameter significance is just defined as the impact parameter itself, divided by the error on its measurement. This gives us a significance that the track is in fact displaced from the primary vertex. Tracks originating from the primary vertex will have a gaussian IP distribution, smeared around zero. If divided by the error, this gaussian will be scaled to have a width of 1 (as discussed in the pull distribution for the primary vertex fit in a previous exercise). Hence, the average expected significance of a track that genuinely comes from the PV is expected to be of the order of 1, whereas tracks from the b-decay should exhibit large tails in the IP significance distribution.

The Measurement1D class stores a pair of value and error and has three methods to access value, error or significance (the latter is just a simple division).

    1// now loop through all tracks
    2for(reco::TrackCollection::const_iterator track = tracksHandle->begin();
    3          track != tracksHandle->end(); ++track) {

For each jet we will now loop over the whole generalTracks track collection.

    1// check the quality criteria
    2if (track->pt() < minPt_ ||
    3           track->hitPattern().numberOfValidHits() < (int)minTotalHits_ ||
    4           track->hitPattern().numberOfValidPixelHits() < (int)minPixelHits_)
    5                   continue;

And apply our quality cuts. You should be familiar with these calls from the first exercise.

    1// check the Delta R between jet axis and track
    2// (Delta_R^2 = Delta_Eta^2 + Delta_Phi^2)
    3double deltaR = ROOT::Math::VectorUtil::DeltaR(
    4          jet->momentum(), track->momentum());

Here we use a helper function to compute the Delta R between jet and track. Doing this by hand is also easily done, but you have to take into account the rotation symmetry of phi. The momentum() calls return the three-vector of the track and jet momentum. We then plot the DeltaR distribution in a histogram as a consistency check.

    1// only look at tracks in jet cone
    2if (deltaR > maxDeltaR_)
    3            continue;

And then ignore any tracks outside of the cone.

The next part is a bit tricky. As mentioned earlier we will use a simple approximation to compute the impact parameter. What we will use are the dxy method we have been using before. This method can be passed a vertex. So we will pass our primary vertex. If you have a look on this method in the TrackBase.h header file, it will tell you that a linear extrapolation of the track from the reference point is used. Thankfully, the primary vertex is close to the beam spot that is used for the reference point, so this will work (it will not work for secondary vertices and you'll have to use a complex and more precise machinery instead, which will not be discussed here).

    1double ipError = track->dxyError();
    2double ipValue = std::abs(track->dxy(pv));

The next trick we are going to use is to compute a sign for the impact parameter. Here we will use the fact that real b-decays will always be in jet direction and never behind it, whereas measurement errors will cause the impact parameters for genuine primary vertex tracks to appear half in front and half behing the primary vertex (due to measurement errors, remember the gaussian distribution around zero).

So we will define the sign as follows: If the closest point of approach on the track to the primary vertex is in jet directon, we will give it a positive sign. Otherwise a negative one. We can use the dot product of the jet axis vector and vector pointing to the point of closest approach for that. Again, we will use an approximation and just compare the track reference point with the beam spot (in our approximation this problem is simply symmetric with respect to the translation between beam spot and primary vertex). Also, we are only computing the transverse impact parameter and not the fully three-dimensional one, so we will have to set the z component of the vector to zero.

    1math::XYZVector closestPoint = track->referencePoint() - beamSpot->position();
    2// only interested in transverse component, z -> 0
    3closestPoint.SetZ(0.);
    4double sign = closestPoint.Dot(jet->momentum());
    5
    6if (sign < 0)
    7        ipValue = -ipValue;
    8
    9ipValErr.push_back(Measurement1D(ipValue, ipError));

So, this is what it looks like.

    1// now order all tracks by significance (highest first)
    2std::sort(ipValErr.begin(), ipValErr.end(), significanceHigher);

This call will now sort the vector of impact parameters by their significance. This is a neat C++ tool and signifianceHelper is a simple helper function that is defined further up in the source code and compares the significance of two Measurement1D objects. After that call the vector will be sorted by descending IP significances.

Now, what we have to do is to take the n-th track and plot it. Actually, for cross-checks we will also plot the impact parameter distributions (the values, the errors and the significances) for all tracks first, and then also for the n-th track.

    1// check if we have enough tracks to fulfill the
    2// n-th track requirement
    3if (ipValErr.size() < nThTrack_)
    4       continue;
    5
    6// pick the n-th highest track
    7const Measurement1D *trk = &ipValErr[nThTrack_ - 1];
    8[...]

In addition, we will introduce a negative tagger here, where we are taking the n-th track from the end of the vector instead of from the beginning, effectively mirroring the whole problem. What this can be used for will be discussed later.

    1trk = &ipValErr[ipValErr.size() - nThTrack_];
    2[...]

Now execute the example by calling cmsRun analyzePatBJetTracks_cfg.py. In case of CMSSW_3_3_X, make sure to adapt the config file in a similar way as in the first exercise.

You can look at the plots using the macro patBJetTracks_showPlots.C or a TBrowser.

info In case of the DESY/NAF tutorial, please change the input dataset as described here. Additionally comment # process.patDefaultSequence * at the end of the configuration file: analyzePatBJetTracks_cfg.py

The macro will produce a canvas for each set of variables. Each set of impact parameter plots will show the value on the left, the error in the middle and the significance on the right. The top row shows the "flavour blind" distribution, i.e. for all kinds of jets, whereas the bottom row will shows the curves separately for light flavour, c and b jets.

The first canvas (with the labels "signed IP for all tracks") shows the distributions for all tracks in a jet. The green curve, the light flavour tracks, will show an approximately gaussian peak at zero, as it is expected. For b-jets you well see the large tail towards positive values, which means that b-jets contain tracks with lifetime in the direction of the jet. On the negative side you see that these tails are largely suppresed, due to the selection of our impact parameter sign. A few tracks apparently get the wrong sign, but this is unavoidable due to measurement errors.

The errors in the middle column are identical for all kinds of tracks. They should be, since reconstruction works the same way for all kinds of tracks.

On the right plot you see the significance distribution. The gaussian core should have a width of 1 (Exercise: fit a gaussian to prove that). The tails are even more visible than in the left plots, which is because we are correcting the value distribution for the errors. We will now use the tails to do the b-jet tagging:

We will now look at the second canvas, showing the distribution one for one track per jet, namely the n-th highest (second-highest) track, according to the definition of the "Track Counting" algorithm. The plots are labelled "signed IP for selected positive track". We are essentially getting the same distribution as before, we are just biasing the distributions to the right due to our selection. Especially, for b-jets we tend to select a track from the b-decay itself most of the time. We can clearly see that the gaussian core around zero is largely suppressed on favour of the tail.

This corresponds now exactly to the discriminator distribution for the "Track Counting" algorithm as we've seen in the last exercise.

The third canvas, with the label ("signed IP for selected negative tracks") are the distributions for the negative tagger (n-th last track). You may ask what this is for. Obviously not for b-tagging, as the tails in b-jets are largely suppresed (due to our sign definition). This is exactly the point, as this can be used to get a sort of "clean" sample for impact parameter distributions as expected from light flavour jets. If you look at the distribution for all jets (top row), these distributions look very much like the respective distributions for the light flavour jets for the positive-side n-th track (you have to mirror the distribution around zero in your head). We can use this to measure the b-tagging mistag rate on data without knowing the jet flavour!

Finally, the last canvas shows a few control plots, like the Delta R distribution between jet and tracks. You will see a peak at zero, obviously, then a dip around 0.5. This is also obvious, since the jet clustering algorithm (Iterative Cone 0.5) will produce jets that are exactly that size. The increase in tracks above this threshold means that these are already tracks from neighbouring jets. You will also see that b-jets contain on average more tracks than c or light flavour jets, which is a fact that is used in more complex b-tagging algorithms like the "Combined Secondary Vertex".

Now, execute the macro patBJetTracks_efficiencies.C. This will produce the efficiency plots as in the previous exercise, but now for our own algorithm.

In the first canvas you will see that it behaves similarly like the official "Track Counting" algorithm. The performance is just a bit worse. This is mostly due to the fact that the official algorithm has a fancier track selection with many more quality cuts and so picks up less fake tracks or tracks from other long-lived decays.

The second canvas shows a comparison between the light flavour discriminator distribution and the inverted negative tagger distribution (on all jets). As expected, these distributions do in fact look very similar and can be used to measure the mistag rate on data (i.e. efficiency versus discriminator cut for non-b jets). They do not coincide perfectly. It can be made a bit better using a few more tricks, but a small uncertainty always remains. The world is after all not perfect.

Exercise V - b-Tagging with vertices

This last exercise allows us to learn a bit more about secondary vertices. As in the previous example we will create two simple b-tagging algorithms. One of them corresponds to the "Simple Secondary Vertex" algorithm that we have already seen.

Let's go again straight to the analyzer code and open PatBJetVertexAnalyzer.cc. The whole part before the analyze method will be skipped here as there is nothing new to see. We will loop over the PAT jet collection only, as everything we need is embedded here. Also we will see the flavour definition used in the two previous exercices and the jet kinematic cuts.

In the analyze methods we will loop over the jets again, cut on pT and |eta| and assign the flavour. Here we go:

    1// retrieve the "secondary vertex tag infos"
    2// this is the output of the b-tagging reconstruction code
    3// and contains secondary vertices in the jets
    4const reco::SecondaryVertexTagInfo &svTagInfo = *jet->tagInfoSecondaryVertex();

We retrieve the "secondary vertex tag infos" from the jet. The class is defined here. It contains the result of the inclusive secondary vertex reconstruction inside the jets. This means that it can contain zero, one or more secondary vertices. If there is more than one vertex, they are sorted by quality, i.e. the one with the smallest error is listed first.

    1// count the number of secondary vertices
    2plots_[ALL_JETS].nVertices->Fill(svTagInfo.nVertices());
    3plots_[flavour].nVertices->Fill(svTagInfo.nVertices());
    4
    5// ignore jets without SV from now on
    6if (svTagInfo.nVertices() < 1)
    7       continue;
    8
    9// pick the first secondary vertex (the "best" one)
   10const reco::Vertex &sv = svTagInfo.secondaryVertex(0);

Now we plot the number of reconstructed secondary vertices per jet. If the jet does not have a secondary vertex, we skip the rest of the method, as it will then concentrate on the first (i.e. best) secondary vertex.

The class definition for a vertex is the same as for primary vertices earlier, it is useful to open up the class header in a browser now (Vertex.h).

    1// and plot number of tracks and chi^2
    2plots_[ALL_JETS].nTracks->Fill(sv.tracksSize());
    3plots_[flavour].nTracks->Fill(sv.tracksSize());
    4
    5plots_[ALL_JETS].chi2->Fill(sv.chi2());
    6plots_[flavour].chi2->Fill(sv.chi2());

Here we fill the number of tracks and chi^2 of the vertex fit into histograms.

    1// the precomputed transverse distance to the primary vertex
    2Measurement1D distance = svTagInfo.flightDistance(0, true);

We can retrieve the distance between the secondary vertex and the primary vertex from the tag infos (it is precomputed there). The "0" in the first arguments means we want to have the first (best) secondary vertex, the second argument "true" tells it that we want the result in two dimensions (transverse in x-y plane). The result is a Measurement1D object with value, error and significance, as for the impact parameters. We then fill these three values into histograms.

We will try to produce a few additional observables. The next part computes, for instance, the angle between the secondary vertex "flight direction" and the jet axis using the DeltaR helper method we've seen in the last exercise. This is not particularly exciting, so the explanation will be skipped here.

However, the last part is exciting. We are computing the invariant mass of the secondary vertex! In order to do this, we need the fourvector sum of all tracks at the vertex. In order to build a fourvector, we assume that each track is a pion and assign it the charged pion mass. There are a few ROOT fourvector classes that can be used for this purpose. Vector classes are quite involved, so just the classes are used here (please refer to the documentation for more details).

    1// compute the invariant mass from a four-vector sum
    2math::XYZTLorentzVector trackFourVectorSum;
    3
    4// loop over all tracks in the vertex
    5for(reco::Vertex::trackRef_iterator track = sv.tracks_begin();
    6          track != sv.tracks_end(); ++track) {
    7                        ROOT::Math::LorentzVector<ROOT::Math::PxPyPzM4D<double> > vec;
    8                        vec.SetPx((*track)->px());
    9                        vec.SetPy((*track)->py());
   10                        vec.SetPz((*track)->pz());
   11                        vec.SetM(0.13957);      // pion mass
   12                        trackFourVectorSum += vec;
   13}

You can see that we are using the tracks_begin() and tracks_end() methods to loop over the tracks of the vertex, then fill the fourvector with the components, give the fourvector a mass and then sum them up.

    1// get the invariant mass: sqrt(E² - px² - py² - pz²)
    2double vertexMass = trackFourVectorSum.M();

This part is now trivial. You can also compute the mass by hand if it makes you feel more comfortable.

Ok, we can now run the example by calling cmsRun with analyzePatBJetVertex_cfg.py and look at the results using patBJetVertex_showPlots.C with ROOT. In case of CMSSW_3_3_X, make sure to adapt the config file in a similar way as in the first exercise.

info In case of the DESY/NAF tutorial, please change the input dataset as described here. Additionally comment # process.patDefaultSequence * at the end of the configuration file: analyzePatBJetVertex_cfg.py

The flight distance significance distribution is the same as for the "Simple Secondary Vertex" algorithm, as seen earlier. See how the average b-vertex also has a smaller error, which is because on average it contains more real and genuinely displaced tracks. Compare with the number of vertices found in the different type of jets and the average number of tracks at the vertices.

If you look at the vertex mass, you will see that b-jets are clearly distinguisable. This means that we can also use the vertex mass in order to define a discriminator to do b-tagging. It can also be used for quite a number of things, e.g. it is used in fits to do heavy flavour fraction measurements and the like (or in the "Combined Secondary Vertex" it is also used as input variable for b-tagging).

Now call root -l patBJetVertex_efficiencies.C and you will see the performance curves if you use the two variables just discussed, the flight distance significance and the vertex mass, as discriminators. You see, both work quite well.

How to get more information

Review status

Reviewer/Editor and Date (copy from screen) Comments
ArminScheurer - 17 Jun 2009 Created the page
ArminScheurer - 22 Dec 2009 Reviewed the page
JyothsnaK - 25-Feb-2010 Reviewed the page and added information on globaltag

Responsible: ArminScheurer, ChristopheSaout
Last reviewed by: most recent reviewer and date



7.10.1 PAT Examples: Tau Example

Contents

IMPORTANT This is out of date and replaced by WorkBookPFTauTagging

Introduction

The aim of this tutorial is to show you how to access the values of kinematic quantities, tau id. variables and discriminator values from pat::Tau objects. The pat::Tau object represents a tau-jet, the reconstructed particles produced in a hadronic tau decay.

Having a lifetime of about t=290ps, corresponding to a mean distance of ct=87micrometer between their production and decay, tau leptons are different from electrons and muons in that only the decay products of the tau can ever be identified in the detector. The following discussion will be restricted to decays of tau leptons to hadrons, in the majority of cases either 1 or 3 charged pions plus 0, 1 or 2 neutral pions (the latter decay almost immediately to a pair of photons each).

As an example for how to do this, the tutorial will guide you through a simple EDAnalyzer. You can find the source code of the EDAnalyzer and its configuration file in the PatExamples package.

The EDAnalyzer will fill distributions of Pt, eta and phi of the tau-jets before and after cuts representing tau id. requirements are applied. From the ratio of the after/before distributions, a simple ROOT macro will compute tau id. efficiencies.

In summary, this tutorial aims to provide you with the following information:

  • how to access information from the pat::Tau.
  • how to determine an identification efficiency for tau leptons decaying into hadrons.

Before starting with this tutorial, it is helpful if you have a clear picture of:

  • how to write an EDAnalyzer (in principle).
  • how to read in parameters from the configuration file.
  • how to access a pat::Candidate collection from the event content with an EDAnalyzer.

ALERT! Note: For more fundamental examples have a look at the WorkBookPATAccessExercise. This example is based on CMSSW_3_3_x.

How to get the code

To run the example check out the latest tags for PAT as given on the PAT Recipes for CMSSW_3_3_x Guide and checkout the following tags in addition:

cd CMSSW_3_3_6_patch2/src
cmsenv
cvs co -A PhysicsTools/PatExamples/CMS.BuildFile  
cvs co -A PhysicsTools/PatExamples/plugins/CMS.BuildFile  
cvs co -A PhysicsTools/PatExamples/plugins/PatTauAnalyzer.h
cvs co -A PhysicsTools/PatExamples/plugins/PatTauAnalyzer.cc
cvs co -A PhysicsTools/PatExamples/test/analyzePatTau_fromAOD_cfg.py
cvs co -A PhysicsTools/PatExamples/test/patTau_idEfficiency.C

These files have the following functionality:

How to run the code

Compile and run the example as given below:

scram b 
cmsRun PhysicsTools/PhysicsExamples/test/analyzePatTau_fromAOD_cfg.py

You can inspect the histograms produced by the PatTauAnalyzer module by executing:

root PhysicsTools/PatExamples/test/patTau_Histograms.root
new TBrowser

and then click on the folder icon labelled "ROOT Files", followed by clicks on "CMS.PhysicsTools/PatExamples/test/patTau_Histograms.root", "analyzePatTau" and finally one of the histogram icons.

Screenshot of histograms in TBrowser

In order to produce plots of the tau id. efficiency as function of Pt of the tau-jet, execute:

cd PhysicsTools/PatExamples/test
root patTau_idEfficiency.C
gv tauIdEff_Pt.eps

Find out more about the details

Let's have a look at the main configuration file analyzePatTau_fromAOD_cfg.py first:

analyzePatTau_fromAOD_cfg.py

process = cms.Process('analyzePatTau')

This statement defines the name of the cmsRun instance to be analyzePatTau . The histograms will be saved into a subdirectory of that name.

process.load('CMS.PhysicsTools.PatAlgos.patSequences_cff')

This statement imports definitions needed to produce the PAT objects.

process.analyzePatTau = cms.EDAnalyzer("PatTauAnalyzer",
    src = cms.InputTag('cleanLayer1Taus'),
    requireGenTauMatch = cms.bool(True),
    discrByLeadTrack = cms.string("leadingTrackPtCut"),
    discrByIso = cms.string("byIsolation"),
    discrByTaNC = cms.string("byTaNCfrHalfPercent")
)

The section above instructs cmsRun to add a module of type PatTauAnalyzer to the process and defines a set of configuration parameters needed by the PatTauAnalyzer module:

  • src: collection of pat::Taus to be analyzed by the module
  • requireGenTauMatch: fill histograms with those reconstructed pat::Taus only which match a "true" tau lepton decay on the generator level
  • discrByLeadTrack, discrByIso and discrByTaNC: names of "discriminator" used to identify "good" reconstructed pat::Taus, separating "true" hadronic tau decay from a large background of QCD-jets.

ALERT! Note: The discriminators quantify the likelihood for a reconstructed pat::Tau object to be due to a true hadronic tau decay. They have values between 0 and 1, corresponding to the extreme (hypothetical) cases that a reconstructed pat::Tau is known not to to be due/to be due to a true hadronic tau decay. For this tutorial, we will require that a "good" reconstructed pat::Tau contains a high Pt ("leading") track (discrByLeadTrack) and to be be either isolated from other (charged) particles in the event (discrByIso) or to pass a tau id. discriminator (discrByTaNC) computed by a neural network. See this link for more details about the tau identification procedure.

This statement defines the name of the ROOT file in which the histograms get saved:

process.TFileService = cms.Service("TFileService", 
    fileName = cms.string('patTau_Histograms.root')
)

And this statement, located at the end of the main configuration file analyzePatTau_fromAOD_cfg.py, instructs cmsRun what to do with each event, namely to first rerun the tau reconstruction on the AOD level, then produce the PAT objects and finally fill the histograms containing the the values of kinematic quantities, tau id. variables and discriminator values of the reconstructed pat::Tau objects:

process.p = cms.Path( process.patDefaultSequence + process.analyzePatTau )

Now, let's have a look at the source code of the analyzer module:

PatTauAnalyzer.h

This file contains the declaration of the PatTauAnalyzer class:

class PatTauAnalyzer : public edm::EDAnalyzer 
{
 public: 
  explicit PatTauAnalyzer(const edm::ParameterSet&);
  ~PatTauAnalyzer();
  
//--- methods inherited from EDAnalyzer base-class
  void beginJob(const edm::EventSetup&);
  void analyze(const edm::Event&, const edm::EventSetup&);
  void endJob();

 private:
//--- configuration parameters
  edm::InputTag src_;

  bool requireGenTauMatch_;

  std::string discrByLeadTrack_;
  std::string discrByIso_;
  std::string discrByTaNC_;

//--- generator level histograms
  TH1* hGenTauEnergy_;
  TH1* hGenTauPt_;
  TH1* hGenTauEta_;
  TH1* hGenTauPhi_;

//--- reconstruction level histograms
  TH1* hTauJetEnergy_;
  TH1* hTauJetPt_;
  TH1* hTauJetEta_;
  TH1* hTauJetPhi_;

  TH1* hNumTauJets_;
  
  TH1* hTauLeadTrackPt_;
  
  TH1* hTauNumSigConeTracks_;
  TH1* hTauNumIsoConeTracks_;

  TH1* hTauDiscrByIso_;
  TH1* hTauDiscrByTaNC_;
  TH1* hTauDiscrAgainstElectrons_;
  TH1* hTauDiscrAgainstMuons_;
  
  TH1* hTauJetEnergyIsoPassed_;
  TH1* hTauJetPtIsoPassed_;
  TH1* hTauJetEtaIsoPassed_;
  TH1* hTauJetPhiIsoPassed_;
  
  TH1* hTauJetEnergyTaNCpassed_;
  TH1* hTauJetPtTaNCpassed_;
  TH1* hTauJetEtaTaNCpassed_;
  TH1* hTauJetPhiTaNCpassed_;
};

The declaration is typical for a physics analysis module that books and fills histograms:

  • the class inherits from EDAnalyzer, the base-class for modules which process events without filtering or reconstructing part of the the event content
  • the class contains data-members of type edm::InputTag which allow to easily switch the collections of objects to be analyzed by the module by means of changing the values of corresponding parameters in the python configuration file
  • and, of course, the class contains data-members for the histograms that are to be filled

The histograms filled in this tutorial contain the values of the following kinematic quantities, tau id. variables and discriminator values:

  • hTauJetEnergy_, hTauJetPt_, hTauJetEta_, hTauJetPhi_: the energy, Pt, eta and phi of those reconstructed pat::Tau objects which match a "true" tau lepton on the generator level
  • hNumTauJets_: the number of all tau-jet candidates in the event (without the generator level matching applied)
  • hTauLeadTrackPt_: the transverse momentum of the highest Pt ("leading") track within the tau-jet
  • hTauNumSigConeTracks_, hTauNumIsoConeTracks_: the number of tracks reconstucted within the signal and isolation cones of the tau-jet.
  • hTauDiscrByIso_, hTauDiscrByTaNC_, hTauDiscrAgainstElectrons_, hTauDiscrAgainstMuons_: the values of tau id. discriminators against QCD-jets, based on track isolation (discrByIso) and neural network tau id. (discrByTaNC), and against misidentified electrons (discrAgainstElectrons) and muons (discrAgainstMuons)
  • hTauJetEnergyIsoPassed_, hTauJetPtIsoPassed_, hTauJetEtaIsoPassed_, hTauJetPhiIsoPassed_: the energy, Pt, eta and phi for the subset of reconstructed pat::Taus which match a "true" tau lepton on the generator level and pass the combination of the discrByLeadTrack && discrByIso discriminators
  • hTauJetEnergyTaNCpassed_, hTauJetPtTaNCpassed_, hTauJetEtaTaNCpassed_, hTauJetPhiTaNCpassed_: the energy, Pt, eta and phi for the subset of reconstructed pat::Taus which match a "true" tau lepton on the generator level and pass the combination of tau id. discriminators discrByLeadTrack && discrByTaNC

The latter two sets of histograms are used to compute a tau id. efficiency.

PatTauAnalyzer.cc

This file contains the actual implementation of the PatTauAnalyzer module:

Like the declaration, the implementation of the PatTauAnalyzer module is typical for that of an analysis module:

  • in the constructor of the module, the parameter values are read in from the python configuration file
  • in the beginJob() method of the module, which gets called by cmsRun exactly once, all the histograms are booked
  • the histograms are filled in the analyze() method of the module, which gets called by cmsRun once per event

The only part of the implementation which is slightly non-standard is the matching of the reconstructed pat::Taus with "true" tau leptons leptons on the generator level:

 for ( pat::TauCollection::const_iterator patTau = patTaus->begin(); 
   patTau != patTaus->end(); ++patTau ) {

    const reco::GenParticle* genTau = getGenTau(*patTau);
    if ( requireGenTauMatch_ && !genTau ) continue;

In case the requireGenTauMatch configuration parameters is set to true (default), the if statement causes all reconstructed pat::Taus that don't match a generated tau lepton to be ignored.

The evaluation of whether or not a reconstructed pat::Tau matches a generated tau lepton is implemented in the getGenTau function:

const reco::GenParticle* getGenTau(const pat::Tau& patTau)
{
  std::vector<reco::GenParticleRef> associatedGenParticles = patTau.genParticleRefs();
  for ( std::vector<reco::GenParticleRef>::const_iterator it = associatedGenParticles.begin(); 
   it != associatedGenParticles.end(); ++it ) {
    if ( it->isAvailable() ) {
      const reco::GenParticleRef& genParticle = (*it);
      if ( genParticle->pdgId() == -15 || genParticle->pdgId() == +15 ) return genParticle.get();
    }
  }
  return 0;
}

For the pat::Tau passed as function argument the getGenTau function returns the pointer to the generator level tau lepton if the match succeeds and a NULL pointer otherwise. In the implementation of the getGenTau function the genParticleRefs() method, inherited from the pat::PATObject base-class of the pat::Tau, is utilized. This method returns the list of generator level particles which have been matched to the pat::Tau during the PAT production sequence (per default, each generated tau lepton is matched to the reconstructed pat::Tau object closest in dR = sqrt(dEta^2 - dPhi^2) ).

Please click here to see the complete source code of the module.

Exercises

The following exercises will help you getting more familiar with the way histograms are booked and filled and how the values of kinematic quantities, tau id. variables and discriminator values of pat::Tau are accessed.

Please open the file PatTauAnalyzer.cc in your preferred editor.

In the source code of the PatTauAnalyzer::beginJob() method you will find a string "TO-DO". Add code to book the histograms

  • hTauJetEta_
  • hTauJetPt_
at this position in the source code. (These histograms are already defined in PatTauAnalyzer.h)

ALERT! Note: It is important that you assign the names "TauJetEta" and "TauJetPhi" to these histograms, because those are the names expected by the ROOT macro which computes the tau id. efficiency. The name of the histogram is defined by the first parameter passed when calling the fs->make() function.

In the source code of the PatTauAnalyzer::analyze() method, add code to fill the histograms

  • hTauJetEta_ and hTauJetPt_: with eta, phi of the tau-jet
  • hTauLeadTrackPt_: with the transverse momentum of the highest Pt ("leading") track within the tau-jet
  • hTauDiscrAgainstElectrons_: with the values of the discriminator against misidentified electrons

You can find the names of the methods for accessing the pseudo-rapidity and azimuthal angle of the pat::Tau in the header file of its reco::Particle base-class.

In the header file of the pat::Tau class you can find the methods for accessing the leading track and for the discriminator value.

ALERT! Note: Not all tau-jet candidates actually have an associated leading track. You need to check for the existence of the leading track by checking that pat::Tau::leadTrack().isAvailable() == true before filling the histogram with its Pt value.

ALERT! Note: In order to find the value of the string that needs to be passed to the pat::Tau::tauId() method in order to access the discrAgainstElectrons value, please follow this link.

Once you have finished implementing booking and filling of these histograms, save your changes, recompile and rerun the PatTauAnalyzer module:

scramv1 b
cmsRun PhysicsTools/PatExamples/test/patTau_Histograms.root

Have a look at the hTauJetEta_, hTauJetPt_, hTauLeadTrackPt_ and hTauDiscrAgainstElectrons_ in the TBrowser.

Open the ROOT macro patTau_idEfficiency.C in your editor and remove all occurences of the strings "/*" and "*/" in the macro. When you execute

cd PhysicsTools/PatExamples/test
root patTau_idEfficiency.C
gv tauIdEff_Eta.eps
gv tauIdEff_Phi.eps

now, you will see the plots of the tau id. efficiency as a function of eta and phi of the tau-jet.

At this point, the tutorial on pat::Taus is finished.

For further information on taus in CMS, please have a look at the links below.

How to get more information

You can get handouts of the PAT tau tutorial from here.

The following links may help you to find-out more about taus in CMS:

and about how taus are used in different physics analyses:

Review status

Reviewer/Editor and Date (copy from screen) Comments
RogerWolf - 13 May 2009 Created the template page
ChristianVeelken - 12 June 2009 Added content (based on CMSSW_2_2_x)
ChristianVeelken - 18 December 2009 Updated tutorial to CMSSW_3_3_x

Responsible: ChristianVeelken
Last reviewed by: ChristianVeelken - 18 Dec 2009

Edit | Attach | Watch | Print version | History: r24 < r23 < r22 < r21 < r20 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r24 - 2010-01-19 - KatiLassilaPerini


ESSENTIALS

ADVANCED TOPICS


 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback