PAT Examples: Tracking, Vertexing and b-Tagging
Contents
Introduction
This tutorial is meant to deliver basic insight into CMSSW and the PAT framework. It is focused on the three following topics:
- Tracking
- Vertexing
- b-Tagging
This tutorial was presented the first time on a workshop at DESY, Hamburg, Germany. Therefore, all DESY/NAF site specific settings required during this tutorial can be found
here. They are marked by

in the sections below.
Preparation
This tutorial was set up using CMSSW_2_2_13. Basically it should work as well with other software versions. So in order to begin with the different exercises presented below, set up your CMSSW environment and enter the following directory:
CMSSW_2_2_13/src
.

For instructions on how to set up the CMSSW environment at DESY/NAF follow
this link.
Update: The tutorial was tested and verified to work with CMSSW_3_3_X releases (e.g. CMSSW_3_3_6) with the appropriate
RelVal samples.
How to get the code
The tutorial code will be available in future CMSSW releases in the package
PhysicsTools/PatExamples
. For currently available CMSSW versions, the CVS HEAD version of this package has to be used (additionally, the CVS tag: TUT_DESY_ws_tracking_btagging has been set up to only get the files related to this one tutorial).
To get a copy of the PAT tutorials/examples, use the following command (replace
<tag>
with
HEAD
or
TUT_DESY_ws_tracking_btagging
):
addpkg PhysicsTools/PatExamples <tag>
Alternatively you can use:
cvs co -r <tag> PhysicsTools/PatExamples

If your DESY/NAF username differs from your CERN lxplus username, then you should follow
these instructions.
All for this tutorial important files can be found in the directories
PhysicsTools/PatExamples/plugins/
and
PhysicsTools/PatExamples/test/
. More details about how to run the different tutorial examples are given in the appropriate subsections below.
Additional Information
Each time something in the
PhysicsTools/PatExamples/plugins/
directory is changed, one needs to compile the code to make the changes available to CMSSW. In case the tutorial code was just checked out, please invoke the compilation now by executing:
scram b

For the tutorial at DESY/NAF, a skimmed dataset has been prepared. Find more information about it
here. Keep in mind that you should replace the input dataset specified in the different tutorial configuration files, but this will be described again in the subsections below.
Exercise I - Tracking
Due to their sheer complexity, topics like running or understanding track
reconstruction in details are not covered here. Instead, we are restricting
ourselves to understanding the results of the track reconstruction by
looking at observables of the reconstructed tracks.
Let's have a look at the example:
In the
plugins
directory (inside the
src/PhysicsTools/PatExamples
directory, as for the rest of the tutorial) you can find the CMSSW
EDAnalyzer called
PatTrackAnalyzer
in the
PatTrackAnalyzer.cc
source
code file. We will discuss the contents of this file further below.
The corresponding CMSSW configuration file to execute this analyzer can be
found in the
test
directory and is called
analyzePatTracks_cfg.py
. Note
that the naming scheme of the analyzers and configuration files is
consistent for all the examples in this tutorial.
The file
test/analyzePatTracks_cfg.py
should look like the following:
import FWCore.ParameterSet.Config as cms
process = cms.Process("Test")
Here we are defining a regular CMSSW config file with a process named
Test
(arbitrary, really, as long as it does not clash with any other
process name).
4# initialize MessageLogger and output report
5process.load("FWCore.MessageLogger.MessageLogger_cfi")
6process.MessageLogger.cerr.threshold = 'INFO'
7process.MessageLogger.categories.append('PATSummaryTables')
8process.MessageLogger.cerr.INFO = cms.untracked.PSet(
9 default = cms.untracked.PSet( limit = cms.untracked.int32(0) ),
10 PATSummaryTables = cms.untracked.PSet( limit = cms.untracked.int32(-1) )
11)
12process.options = cms.untracked.PSet( wantSummary =
13cms.untracked.bool(True) )
14
15process.load("Configuration.StandardSequences.Geometry_cff")
16process.load("Configuration.StandardSequences.FrontierConditions_GlobalTag_cff")
17process.load("Configuration.StandardSequences.MagneticField_cff")
18
19process.GlobalTag.globaltag = cms.string('IDEAL_V12::All')
Here, a few generic things are set up and standard sequences included
(like geometry, calibrations and magnetic field map). They are mostly needed
in order to be able to run PAT on RECO or AOD files. If you are already
using PAT-tuples as input, they can principally also be removed (but don't
hurt if they are defined without being needed). The tag
IDEAL_V12::All
has to be compatible with the simulated misalignment and the software version
used (in our case
CMSSW_2_2_13
and without misalignment).
There are two ways one can get the information about the
globaltag
First option is to use DBS. This method is convenient as one need not have the RECO/AOD file on the disk.
* Visit the DBS site (
https://cmsweb.cern.ch/dbs_discovery/
)
* Do a search like (for example):
find dataset where file like /store/relval/CMSSW_3_3_6/RelValTTbar/GEN-SIM-RECO/STARTUP3X_V8H-v1/ and dataset.status like VALID*
This gives result similar to the one shown
here
.
* There you can click on "Conf.files" below the found results. This gives you the
config
which has been used to produce the dataset.
* Here you find:
process.GlobalTag.globaltag = 'STARTUP3X_V8H::All'
Secondly, one can get the above information by using
edmProvDump
on the RECO/AOD file assuming that the file is available on a local disk.
In addition, the message logger is configured. For CMSSW_3_3_X, you have to comment out the following four lines (leave it untouched in case of CMSSW_2_2_X):
8#process.MessageLogger.cerr.INFO = cms.untracked.PSet(
9# default = cms.untracked.PSet( limit = cms.untracked.int32(0) ),
10# PATSummaryTables = cms.untracked.PSet( limit = cms.untracked.int32(-1) )
11#)
12
20# produce PAT Layer 1
21process.load("PhysicsTools.PatAlgos.patSequences_cff")
22# switch old trigger matching off
23from PhysicsTools.PatAlgos.tools.trigTools import
24# switchOffTriggerMatchingOld
25switchOffTriggerMatchingOld( process )
This is needed to create PAT collections on the fly (again redundant when
using PAT-tuples). In case of CMSSW_3_3_X, comment out the two lines about old trigger matching. This is now by default the case in current releases.
26process.maxEvents = cms.untracked.PSet( input = cms.untracked.int32(-1) )
We'd like to run over all the events in the input files (hence -1, otherwise
we specify the number of events).
27process.source = cms.Source("PoolSource",
28 fileNames = cms.untracked.vstring(
29 '/store/mc/Fall08/TTJets-madgraph/GEN-SIM-RECO/...',
30 )
31)
We define the
EDM data files to run on. This examples chooses a RECO file
from Fall08 TTbar production. This is not a PAT-tuple, so we need to run
PAT on the fly. You can find other samples in DBS, make sure they are
compatible with the software release (i.e.
CMSSW_2_2_X
).

Due to time constraints at the DESY/NAF tutorial, please modify the config file as described
here in order to use skimmed data samples locally available.
32process.TFileService = cms.Service("TFileService",
33 fileName = cms.string("analyzePatTracks.root")
34)
The analyzer is using the
TFileService
to put histograms into a ROOT file.
Here we define the service and give it a filename to write to.
35process.analyzeTracks = cms.EDAnalyzer("PatTrackAnalyzer",
36 src = cms.InputTag("generalTracks"),
37 beamSpot = cms.InputTag("offlineBeamSpot"),
38 qualities = cms.vstring("loose", "tight", "highPurity")
39)
40
41process.analyzeMuons = cms.EDAnalyzer("PatTrackAnalyzer",
42 src = cms.InputTag("selectedLayer1Muons"),
43 beamSpot = cms.InputTag("offlineBeamSpot"),
44 qualities = cms.vstring("undefQuality")
45)
Here we define two instances of our
PatTrackAnalyzer
. The first instance
runs on the so-called
generalTracks
collection which is the most basic
track collection in CMSSW. It contains all kinds of tracks, reconstructed
with the default "Combinatorial Kalman Track Finder". It includes any kind
of charged tracks (pions, kaons, muons and so on) and tries to be as
inclusive as possible. In
CMSSW_2_2_12
it uses the three-step tracking at
this point, which means that it tries to include a wide pT spectrum and also
attempts to reconstruct detached tracks. Simple quality cuts have already
been applied (the "loose" selection), but it is possible to go to tigher
cuts by specifically selecting only tracks with the "tight" or even
"highPurity" flags.
The second instance runs on reconstructed muons. Inside the code we will
restrict us to muons that have a part reconstructed in the strip tracker.
In addition to that they shall also have a part reconstructed in the muon
system and a global track fit on both parts simulatenously.
Explanation of the different parameters
src
,
beamSpot
and
qualities
:
-
The
src
parameter is the parameter that is read by the analyzer and
chooses the EDM collection that contains the tracks you want to look at.
The analyzer is somewhat flexible and can deal with either plain RECO
tracks or PAT muons. The collections for the two analyzers are named
"generalTracks" and "selectedLayer1Muons". You can in principle also look
at any other track collection that is derived from reco tracks. If you click
through the content of an EDM file (e.g. using the TBrowser
within ROOT) you will find a
few such collections.
-
The
beamSpot
is the collection which contains the beam spot information (i.e.
where the interaction is supposed to take place). It is used to determine
the reference point for the track parameters for determination of impact
parameters (e.g. "d0" variable in the code). The reason it is needed is
because the beam spot in our simulation does not exactly sit at (0, 0), but
is slightly shifted (in the real world the beam will likely not sit at (0,
0) as well, so our simulation moves the spot to find potential bugs
in the code that are erroneously assuming this. So let's go with it as
well, we don't want to fall in that trap either, don't we?
-
Finally, the
qualities
parameter represents the list of track qualities that we want to look
at. For the generalTracks
we are looking at the three quality flags
mentioned earlier, for the muons we don't care and use the default
"undefQuality" which means we select all tracks (muons). The analyzer will
write out all histograms for all quality selections.
46process.p = cms.Path(
47 process.patDefaultSequence *
48 process.analyzeTracks *
49 process.analyzeMuons
50)
Here we define the sequence to run. If you are running on PAT-tuples, you
should comment out the first line (using a # in front), otherwise PAT will
be run on the fly. The other two entries here represent the two analyzers we just configured above.

At the DESY/NAF workshop, we use a prepared PAT-tuple, therefore you should comment the first line:
# process.patDefaultSequence *
.
Now, given that your input file is valid, the CMSSW environment is set up
correctly and that the analyzers compiled successfully (
scram b
), you
can run the example:
cmsRun analyzePatTracks_cfg.py
When the job is done, you will find a file named
analyzePatTracks.root
in the current directory, containing all the plots from the analyzers.
You can open the file using the ROOT command line:
root -l analyzePatTracks.root
and browse a bit by invoking
new TBrowser
within the ROOT command line interpreter. You will notice
that the plots for each analyzer can be found in a directory named after the analyzer.
There is also a ROOT macro prepared for automatic plotting of the file
contents including basic histogram formatting. You can call it by
executing:
root -l patTracks_showPlots.C
Have a look at the file and open in the editor. You can change the lines
with the track qualities and directory to also look at the plots for the
muons. Note how the number of hits in the different components are shown
in a stacked plot. It is using
THStack
for that purpose. We will discuss
the plots further below after discussing the source code for the analyzer.
Here the source code for the analyzer
PatTrackAnalyzer.cc
(It will be discussed in full detail here only this first time):
1#include <iostream>
2#include <cmath>
3#include <vector>
4#include <string>
5
6#include <TH1.h>
7#include <TProfile.h>
8
9#include "FWCore/Framework/interface/Frameworkfwd.h"
10#include "FWCore/Framework/interface/Event.h"
11#include "FWCore/Framework/interface/EventSetup.h"
12#include "FWCore/Framework/interface/EDAnalyzer.h"
13#include "FWCore/Utilities/interface/InputTag.h"
14#include "FWCore/ParameterSet/interface/ParameterSet.h"
15#include "FWCore/ServiceRegistry/interface/Service.h"
16
17#include "PhysicsTools/UtilAlgos/interface/TFileService.h"
18
19#include "DataFormats/Common/interface/View.h"
20#include "DataFormats/BeamSpot/interface/BeamSpot.h"
21#include "DataFormats/TrackReco/interface/Track.h"
22#include "DataFormats/TrackReco/interface/HitPattern.h"
23#include "DataFormats/PatCandidates/interface/Muon.h"
Header files we will need.
24class PatTrackAnalyzer : public edm::EDAnalyzer {
25 public:
26 // constructor and destructor
27 PatTrackAnalyzer(const edm::ParameterSet ¶ms);
28 ~PatTrackAnalyzer();
29
30 // virtual methods called from base class EDAnalyzer
31 virtual void beginJob(const edm::EventSetup&);
32 virtual void analyze(const edm::Event &event, const edm::EventSetup &es);
Here we define our analyzer by deriving from the CMSSW framework base class
EDAnalyzer
. We need to define a constructor that is passed a
ParameterSet
that will contain the configuration from the python config
file. We also define the methods
beginJob
and
analyze
which are called
by the base class, hence they are defined virtual.
33private:
34 // configuration parameters
35 edm::InputTag src_;
36 edm::InputTag beamSpot_;
37
38 // the list of track quality cuts to demand from the tracking
39 std::vector<std::string> qualities_;
Here we store the parameters from the CMSSW config file in class members.
The
InputTag
class contains references to
EDM products (i.e. ROOT file
branches) and are preferred over simple string labels.
40struct Plots {
41 TH1 *eta, *phi,;
42 TH1 *pt, *ptErr;
43 TH1 *invPt, *invPtErr;
44 TH1 *d0, *d0Err;
45 TH1 *nHits;
46
47 TProfile *pxbHitsEta, *pxeHitsEta;
48 TProfile *tibHitsEta, *tobHitsEta;
49 TProfile *tidHitsEta, *tecHitsEta;
50 };
51
52 std::vector<Plots> plots_;
53};
Here we store all our histograms for the variables we want to plot.
The profile histograms store the number of hits the track has in the
different subdetector components. The profile is a neat way to simple get
the average number of hits per pseudorapidity slice (eta) on the x-axis.
We use a vector of plots, so that we can store plots for each track quality
selection in parallel.
54PatTrackAnalyzer::PatTrackAnalyzer(const edm::ParameterSet ¶ms) :
55 src_(params.getParameter<edm::InputTag>("src")),
56 beamSpot_(params.getParameter<edm::InputTag>("beamSpot")),
57 qualities_(params.getParameter< std::vector<std::string> >("qualities"))
58{
59}
60
61PatTrackAnalyzer::~PatTrackAnalyzer()
62{
63}
Constructor and destructor. Not much to say here. Notice how elegantly
we can read the parameters from the CMSSW config file using the constructor
syntax. The part enclosed in quotes is the name of the variable to be read
and has to match the name specified in the CMSSW config file.
64void PatTrackAnalyzer::beginJob(const edm::EventSetup &es)
65{
66 // retrieve handle to auxiliary service
67 // used for storing histograms into ROOT file
68 edm::Service<TFileService> fs;
69
70 // now book the histograms, for each category
71 unsigned int nQualities = qualities_.size();
72
73 plots_.resize(nQualities);
74
75 for(unsigned int i = 0; i < nQualities; ++i) {
76 // the name of the quality flag
77 const char *quality = qualities_[i].c_str();
78
79 // the set of plots
80 Plots &plots = plots_[i];
81
82 plots.eta = fs->make<TH1F>(Form("eta_%s", quality),
83 Form("track \\eta (%s)", quality),
84 100, -3, 3);
85
86[...]
87}
The
beginJob
method is called once at the beginning of the CMSSW job.
The main difference with respect to the constructor from our point of view
is that here we have access to framework services and event setup, whereas
in the constructor these are not set up yet.
Here we first initialize the
TFileService
, i.e. get a reference to it
using the framework serivce concept with
edm::Service
. We resize the
vector of plots to the number of track qualities and then start booking the
histograms in a loop.
fs->make
is the
TFileService
way to call
new TH1F
. It makes
sure that the histograms are booked such that they will automatically end up
in the correct file.
We are building the histogram name and title using ROOT's
Form()
call.
It is essentially following the C langauge
sprintf
syntax, where %s
dennotes a place holder for a string, %d for an integer, %f for a float, and
so on. The parameters to fill into the placeholders are then passed as
argument. We need this here to append the track quality to the name of each
plot.
88void PatTrackAnalyzer::analyze(const edm::Event &event, const edm::EventSetup &es)
89{
90 // handles to kinds of data we might want to read
91 edm::Handle<reco::BeamSpot> beamSpot;
92 edm::Handle< edm::View<reco::Track> > tracksHandle;
93 edm::Handle< pat::MuonCollection > muonsHandle;
94
95 // read the beam spot
96 event.getByLabel(beamSpot_, beamSpot);
97[...]
Here we define the
analyze
method implementation, which is called for each
event. First we define three handles, one handle for each
EDM collection
that we want to (potentially) read.
We do the actual reading by calling the
getByLabel
method on the
event
argument. This method is passed the input tag (the label) and it will fill
the handle just defined with the contents of the collection.
The next part is a bit tricky, it will fill the tracks or muons. To do this
it uses some "magic" to detect whether it is running on tracks or muons (i.e.
what kind of collection you actually passed with the
src
label). You can
learn a few framework tricks by looking at this, but it is not essential for
this tutorial.
The important point is that we will fill this collection:
98std::vector<const reco::Track*> tracks;
which will contain the tracks for both the generalTracks and muon case.
99// we are done filling the tracks into our "tracks" vector.
100// now analyze them, once for each track quality category
101
102unsigned int nQualities = qualities_.size();
103for(unsigned int i = 0; i < nQualities; ++i) {
Here we loop over the qualities we want to look at.
104// we convert the quality flag from its name as a string
105// to the enumeration value used by the tracking code
106// (which is essentially an integer number)
107reco::Track::TrackQuality quality = reco::Track::qualityByName(qualities_[i]);
qualities_[i]
is a
std::string
that was passed from the configuration
file. We need to convert this into a different representation (an
enumeration value like predefined colors
kBlack
,
kRed
, which are
esentially integers).
Have a look at the header file
TrackBase.h
.
We will use these methods a lot in the next few lines. Note that the
reco::Track
class simply derives from the
reco::TrackBase
class, so you
can also have a look at the
Track.h
class itself. The
qualityByName
method converts the string into the
TrackQuality
enumeration for us.
108// now loop over the tracks
109for(std::vector<const reco::Track*>::const_iterator iter = tracks.begin();
110 iter != tracks.end(); ++iter) {
111 // this is our track
112 const reco::Track &track = **iter;
How this works should be pretty obvious, if you are familiar with C++ style
iterators. (If not: You should really get familiar with them!

)
We are dereferencing twice here (the two asterisks). First the iterator,
and second the pointer inside the collection, to store a reference to the
track in our
track
variable. It can sometimes be a pain to figure out
how to do that correctly, but most of the time trial and error or a little
bit of thinking and looking at header files works. Please ask an expert
before resorting to any kind of ugly hacks, there is usually a nice and
clean way to do so.
113// ignore tracks that fail the quality cut
114if (!track.quality(quality))
115 continue;
Here we use the
quality
method of a track to check if the track fulfills
our requirement.
116// and fill all the plots
117plots.eta->Fill(track.eta());
118plots.phi->Fill(track.phi());
119[...]
This should be pretty obvious.
120// the hit pattern contains information about
121// which modules of the detector have been hit
122const reco::HitPattern &hits = track.hitPattern();
Here we get information about the hit pattern of the track. This class
can tell us exactly which modules have been hit by the track (or more
precisely: which hits have been used to reconstruct the track). It also
tells us in which modules hits were expected, but not found. These are
called "lost" hits. Sometimes lost hits are expected, because the module is
known to be faulty, sometimes they're not. Have a look at the header files,
everything is explained in the comments:
HitPattern.h
123double absEta = std::abs(track.eta());
124// now fill the number of hits in a layer depending
125// on eta
126plots.pxbHitsEta->Fill(absEta, hits.numberOfValidPixelBarrelHits());
127plots.pxeHitsEta->Fill(absEta, hits.numberOfValidPixelEndcapHits());
128[...]
We fill the profile histograms with the number of hits in tracking
subdetectors with respect to the pseudo-rapidity.
Here is a nice excercise: Add in the three muon subsystems (DT, CSC, RPC),
rerun the analyzer on the muons and add them to the plotting macro and have
a look at the stacked plot.
Alternatively, you can have a look at the position of each individual hit,
but in order to do this you need to have a look at the methods in
Track.h
. Furthermore, you need to run on RECO for this (neither AOD nor
PAT-tuples will work, as the information is not available here). This is
outside of the scope of this tutorial.
129#include "FWCore/Framework/interface/MakerMacros.h"
130
131DEFINE_FWK_MODULE(PatTrackAnalyzer);
And finally these lines will turn our C++ class into a framework module.
Now have a look at the the plots from the ROOT macro. Can you explain the
observed quantities?
Does the subdetector hit distribution make sense?
You can compare it to:
http://www.ba.infn.it/~zito/cms/tvis.html
Exercise II - Vertexing
In this sub-tutorial we will look at properties of the reconstructed
primary vertex. What is done here software-wise is that to all tracks from the
global track collection (the
generalTracks
that we just looked at), some
quality cuts are applied (like required hits in the pixel detector, not too far
from the beam spot and so on). Then the tracks are clustered into groups
along the z-axis. Remember: With pile-up we can get multipled interactions,
so we can get multiple primary vertices along the beam spot.
Each group of tracks is then fitted using the
AdaptiveVertexFitter
and if
the resulting vertex passes some basic quality cuts (like minimum number of
compatible tracks, compatibility with the beam spot), it is added to the
offlinePrimaryVertices
collection. The vertices here are ordered by
descending sum of pT^2 of the tracks. The reason is that the signal vertex
usually has the highest-energetic tracks coming out of it and this selection is
close to 100% efficient.
Let's start with the analyzer code
PatVertexAnalyzer.cc
. After inclusion
of a bunch of header files, we will again define a C++ class named
PatvertexAnalyzer
which contains a constructor,
beginJob
and an
analyze
method. Two input tags are taken from the config file:
src
(the primary
vertex collection) and
mc
. The
mc
collection is
used to get the Monte Carlo truth, as we will be comparing the position of
the reconstructed vertex to the actual position where it was simulated.
[...]
private:
// configuration parameters
edm::InputTag src_;
edm::InputTag genParticles_;
TH1 *nVertices_, *nTracks_;
TH1 *x_, *y_, *z_;
TH1 *xErr_, *yErr_, *zErr_;
TH1 *xDelta_, *yDelta_, *zDelta_;
TH1 *xPull_, *yPull_, *zPull_;
};
We will plot the number of reconstructed primary vertices per event.
The remaining histograms will only contain nubmers for the first primary
vertex (supposedly the signal vertex). Here we plot the number of tracks at
the vertex, it's position in three dimensions and the errors on these three
coordinates. The
Delta
variables are the difference between the
reconstructed position of the vertex and the simulated, one for each
coordinate respectively. Finally, the
Pull
histograms is what is used by
the validation group to test whether everything is fine with the
reconstruction. It shows the distribution of (rec - sim) / error for the
three coordinates. If everything is fine with the reconstruction and the
errors are estimated correctly, these pull distributions should give a
gaussian distribution around zero with a width of 1.
Question:
- Why should it show such a behaviour?
Now let's skip the constructor and
beginJob
(the stuff in here is pretty
obvious) and go to the
analyze
method:
// handle to the primary vertex collection
edm::Handle<reco::VertexCollection> pvHandle;
event.getByLabel(src_, pvHandle);
// handle to the generator particles (i.e. the MC truth)
edm::Handle<reco::GenParticleCollection> genParticlesHandle;
event.getByLabel(genParticles_, genParticlesHandle);
Here we are defining the handle to hold the collection. Note that typically
type names like
reco::VertexCollection
are just typedef's to
std::vector<reco::Vertex>
.
The
reco::Vertex
class is defined in
Vertex.h
,
have a look at the header file.
You can see methods for accessing position information, error information
(which is in fact a full matrix, we are just using the diagonal information
to get the errors for x, y and z separately) and methods for accessing the tracks and a
few more things. The
tracks_begin
and
tracks_end
methods can be used as
in the previous example to loop over the tracks. We will do this later to obtain
secondary vertices relevant for e.g. b-tagging.
// extract the position of the simulated vertex
math::XYZPoint simPV = (*genParticlesHandle)[2].vertex();
We are extracting the simulated primary vertex from the list of generator
particles. If you are wondering why we are using the third particle (index
2, starting form 0), this is because the first two particles are the
incoming protons, and those do not have an incoming vertex inside the
detector, obviously. The third particle is a daughter of one of the protons
and is produced at the production vertex.
// the number of reconstructed primary vertices
nVertices_->Fill(pvHandle->size());
// if we have at least one, use the first (highest pt^2 sum)
if (!pvHandle->empty()) {
const reco::Vertex &pv = (*pvHandle)[0];
If there is at least one reconstructed primary vertex, we will access the
first element. Notice how
edm::Handle<...>
behaves like a C++ pointer.
We use -> or * to derefence it to get to the
std::vector<...>
that is
embedded inside it.
The rest below just fills the histograms as discussed above.
Now run the example by executing
cmsRun analyzePatVertex_cfg.py
. Make sure that
the input files are ok and you run the PAT sequence if it is not a
PAT-tuple file. In case of CMSSW_3_3_X, make sure to adapt the config file in a similar way as in the first exercise.

In case of the DESY/NAF tutorial, please change the input dataset as described
here. Additionally comment
# process.patDefaultSequence *
at the end of the configuration file:
analyzePatVertex_cfg.py
The two collections passed for the input tags of the
PatVertexAnalyzer
are
offlinePrimaryVertices
and
genParticles
(the latter containing the MC truth).
Now have a look at the results by using the
TBrowser
on
analyzePatVertex.root
directly.
Questions:
- Can you explain the number of primary vertetices per events distribution? These events are simulated without pile-up, why could there be more than one vertex in some events?
- Can you derive the position of the beam spot and its width from the plots?
- What is the precision of the resolution? Why is it different for the z coordinate?
- Do you agree that the pull distribution looks how it should? Why is that so?
- What is the efficiency of finding the primary vertex in this type of event?
Exercise III - b-Tagging
In this excercise we will make a straight jump to b-tagging and later on
combine the things learned with the two previous exercises.
b-tagging is essentially a flag that can be given to a jet of any kind to
indicate whether it is likely containing a b-hadron decay (and hence
originating from a b quark before hadronisation) or not.
How such an algorithm works is discussed in the following exercises.
Here we will simply have a look at what the official CMS algorithms are
giving us.
For this exercise we will be looking at PAT jets. PAT jets have the
advantage that they have all information available on AOD level directly
embedded into the C++ class and information does not need to be picked up
from a gazillion of individual collections and matched to the jet by hand.
Also, the flavour of a jet (or more precisely the flavour of the parton,
quark or gluon that this jet originates from) is embedded as MC truth
information in the PAT jet object.
So, the analyzer here will be rather trivial. We will be looping over all
PAT jets, retrieve the MC flavour truth and the result of a few b-tagging
algorithms (see
official CMS b-tagging algorithms
for the list of official algorithms) and just plot them.
Lets have a look at the with analyzer,
PatBJetTagAnalyzer.cc
. After the usual header
files we get to the class definition, where we look at the private members:
1// configuration parameters
2edm::InputTag jets_;
The label of our PAT jet collection.
1double jetPtCut_; // minimum (uncorrected) jet energy
2double jetEtaCut_; // maximum |eta| for jet
Since we are doing b-tagging, it makes no sense to tag jets outside of
the tracker acceptance, so we cut on eta. (we will set |eta| < 2.4).
Also, jets go down to infinitely small energies in principle until we get
individual particles and noise. In order to stay in perturbative QCD regime
and account for the bending of tracks in the magnetic field, we should
demand some minimum jet energy as well, so we allow to also cut on a minimum
jet transverse momentum.
1enum Flavour {
2 ALL_JETS = 0,
3 UDSG_JETS,
4 C_JETS,
5 B_JETS,
6 NONID_JETS,
7 N_JET_TYPES
8};
We define our own enumeration for the kinds of jets that we want to
distinguish. We will use this enumeration as array index later on, so start
with zero. The first histogram in each entry
ALL_JETS
will contain
the cumulative entries of all kinds of jets regardless of their flavour.
The second entry,
UDSG_JETS
will contain only light flavour jets, i.e.
jets from u, d, s quarks or gluons. These will not contain any long-lived
c- or b-hadron decay.
The third and fourth jets now are c- and b-jets, the fifth category for jets
that have no MC truth assigned (basically energy clusters in the calorimeter
that managed to survive the pT cut, but cannot be assigned a hard parton).
The last
N_JET_TYPES
is not an index, but will simply be the required size
of the arrays holding the histograms.
1TH1 * flavours_;
2
3// one group of plots per jet flavour;
4struct Plots {
5 TH1 *discrTC, *discrSSV, *discrCSV;
6} plots_[N_JET_TYPES];
flavours_
will contain a simple histogram with the flavour distribution,
i.e. bin 0 will contain the number of all jets, bin 1 the number of light
flavour jets, bin 3 the number of b-jets, ...
plots_
containts histograms with the discriminators for the three
algorithms we are selecting: "Track Counting", "Simple Secondary Vertex" and
"Combined Secondary Vertex".
We will then skip the constructor and histogram booking and go to the
analyze
method:
1void PatBJetTagAnalyzer::analyze(const edm::Event &event, const edm::EventSetup &es)
2{
3 // handle to the jets collection
4 edm::Handle<pat::JetCollection> jetsHandle;
5 event.getByLabel(jets_, jetsHandle);
6
7 // now go through all jets
8 for(pat::JetCollection::const_iterator jet = jetsHandle->begin();
9 jet != jetsHandle->end(); ++jet) {
This is the classic retrieval of a collection and looking over the entries,
which are PAT jets. The header file is
here
.
1// only look at jets that pass the pt and eta cut
2if (jet->pt() < jetPtCut_ ||
3 std::abs(jet->eta()) > jetEtaCut_)
4 continue;
Jets are "candidates" in terms of CMSSW software, and candidates are
particles are fourvectors. So these have a momentum and angles. We use the
pt()
and
eta()
members to implement our cuts.
1Flavour flavour;
2// find out the jet flavour (differs between quark and
3// anti-quark)
4switch(std::abs(jet->partonFlavour())) {
5 case 1:
6 case 2:
7 case 3:
8 case 21:
9 flavour = UDSG_JETS;
10 break;
11 case 4:
12 flavour = C_JETS;
13 break;
14 case 5:
15 flavour = B_JETS;
16 break;
17 default:
18 flavour = NONID_JETS;
19}
We convert the value returned by the
partonFlavour()
method and assign it
one of our
Flavour
indices. The value returned is the PDG id, which is
between 1 to 6 for d, u, s, c, b and t quarks, negative for respective
anti-quarks and 21 for gluons.
1// simply count the number of accepted jets
2flavours_->Fill(ALL_JETS);
3flavours_->Fill(flavour);
Obvious.
1double discrTC =jet->bDiscriminator("trackCountingHighEffBJetTags");
2double discrSSV = jet->bDiscriminator("simpleSecondaryVertexBJetTags");
3double discrCSV = jet->bDiscriminator("combinedSecondaryVertexBJetTags");
Here we return the discriminator values for the three algorithms.
As mentioned earlier these are stored directly inside the PAT jets.
You will notice that the discriminator are floating point values and not
boolean values. This means that as a user you have to cut on the value and
this allows you to choose a working point. The higher the value, the more
likely it is that the jet is actually a real b-jet. The range and shape of
the discriminator is highly dependent on the algorithm (see already the
different ranges in the histogram booking).
The lines below just fill the histograms with the discriminator
distribution.
Run the example by executing
cmsRun analyzePatBJetTags_cfg.py
. In case of CMSSW_3_3_X, make sure to adapt the config file in a similar way as in the first exercise.

In case of the DESY/NAF tutorial, please change the input dataset as described
here. Additionally comment
# process.patDefaultSequence *
at the end of the configuration file:
analyzePatBJetTags_cfg.py
You will see that we are cutting on a minimum pT of 30
GeV for the jets here (these are
uncorrected values and correspond roughly to 50
GeV corrected jets in the
central region). Otherwise nothing special in the config file.
You can either look at the output by using a plain
TBrowser
to get the
discriminator distribution, or use the provided macro
patBJetTags_efficiencies.C
which is much more sophisticated.
Executing it (
root -l patBJetTags_efficiencies.C
) will give you one canvas per algorithm. The leftmost plot
shows the discriminator distribution for the three kinds of jets: Light
flavour jets, which should clearly be distinguishable from the b-jets, and
c-jets which are harder, since they contain real lifetime and are often
therefore usually handled separately.
You can see that all light flavour jet discriminators peak to the left
of the histogram and b-jets have larger tails or peak to the right. The
"Track Counting" algorithm is based on the impact parameter significance and therefore
has an exponential tail, which represents the exponentially decaying real
lifetime distribution of the b-hadron. The "Simple Secondary Vertex"
distribution is esentially the same, only that it's an actual transverse
distance significance of the reconstructed secondary vertex and the primary vertex, which
is also an exponentially decaying function, except that it's taken as log(1 +
sig). The "Combined Secondary Vertex" finally is the output of a
Multivariate Analysis, and is in the used case a Likelihood Ratio of the type S / (S +
B) in particular and therefore limited to the range between 0 and 1.
The plot in the middle is derived from these discriminator
distributions. It shows the relative probability to tag a jet at a given
working point, i.e. discriminator cut, which is also called an efficiency.
The jet is tagged if the discriminator is above the cut value, so the
efficiency is defined as the number of jets which have a discriminator that
is above that cut, divided by all jets (of the same flavour). In other
words, the integral of the histogram from a certain discriminator cut up to
infinity, divided by the total number of jets.
Open the
patBJetTags_efficiencies.C
source code and see how this is done
in the
computeEffVsCut
method. It is taking a discriminator
distribution histogram as input, creates a new histogram with the same
x-axis and then scans through all the bins and stores the normalized
integral as bin content, as just discussed. The errors are binomial errors.
The total number of jets in a certain category is taken from the "flavour"
histogram. What you see in the histogram is that the efficiency to tag a
b-jet is always higher than that for a non-b-jet, obviously. The higher the
separating powers, the better. This curve can be used to choose the optimal
working point for your analysis.
The third plot on the right is essentially just a different representation
of efficiencies. Instead of showing the non-b-jet efficiencies in terms of
discriminator cut, they are show with respect to the b-tagging efficiency at
the same working point. This is done in the
computeEffVsBEff
method. The
histograms are scanned bin-wise and for each pair of b-efficiency and
non-b-efficiency in a given bin, an entry in the histogram is filled with
that pair. The non-b-jet efficiency is also called mistag, since tagging a
non-b-jet is what you want to avoid when doing b-tagging. So, the lower the
mistag rate and the higher your b-tagging efficiency is, the better your
algorithm performs.
Questions:
- The "Simple Secondary Vertex" can only tag jets, that have a reconstructed secondary vertex. With this information, can you explain why the mistag versus b-eff. curve stops before reaching the upper right corner?
- What significance does the endpoint have and what information does this contain?
Exercise IV - b-Tagging with tracks
Now we will try to construct a simplified version of the "Track Counting"
algorithm ourselves. Have a look at
PatBJetTrackAnalyzer.cc
. After the
usual header inclusions, class and method definition, we will have the
following members:
1private:
2 // configuration parameters
3 edm::InputTag jets_;
4 edm::InputTag tracks_;
5 edm::InputTag beamSpot_;
6 edm::InputTag primaryVertices_;
We will need jets (which we want to tag), tracks (we will need impact
parameters) and primary vertices. The latter are used to compute the impact
parameters. We will only use an approximation for that, as the exact
computation involves using a few more complicated steps with creation
of transient tracks, which we will not go into. The beam spot is needed
as well to get this approximation to work (as we will use the track's
parameterisation reference point, which is closest to the beam spot).
1double jetPtCut_; // minimum (uncorrected) jet energy
2double jetEtaCut_; // maximum |eta| for jet
3double maxDeltaR_; // angle between jet and tracks
The first two are the jet kinematics cuts, as already seen in the previous
excercise. The
maxDeltaR_
parameter is new, it is used to test whether a
track is considered to be "inside" a jet or not. We will use a Delta R
= sqrt(Delta phi^2 + Delta eta^2) criterium with respect to the center of
the jet to associate tracks in a cone around the jet center.
1double minPt_; // track pt quality cut
2unsigned int minPixelHits_; // minimum number of pixel hits
3unsigned int minTotalHits_; // minimum number of total hits
These are very important. b-tagging is extremely sensitive to badly
reconstructed tracks, so we are demanding a minimum transverse momentum (so
that there is not too much curvature, not too much energy loss, extrapolation uncertainty
and fake rate). In addition, a minimum number of total hits
and especially enough pixel hits for the extrapolation to the impact point is required as well.
1unsigned int nThTrack_; // n-th hightest track to choose
This is the key to the "Track Counting" algorithm. This algorithm uses
exactly one impact parameter significance of a certain track as
discriminator. This is how we choose this track: We simply compute the IP
significance for all tracks in the jet, then order them by IP significance
and take the n-th highest track. For
nThTrack_
a trade-off has to be
found. If the number is too small we risk of selecting too many badly
reconstructed tracks or tracks from unavoidable long-lived decays from
K_short or lambda for instance. On the other hand, the average charged decay
multiplicity of a b-hadron is around five, and with out quality cuts it is
effectively smaller than that. So we can't choose this number too large, or
we will miss the b-decay tracks. Good numbers are 2 or 3. We will go with
2, as it is the number chosen by the official "Track Counting High
Efficiency" algorithm. You are welcome to change this number to see what
happens.
The next part in the source file is used for our flavour definition and is identical to the previous
excercise, as well as some histogram booking, which we will skip and directly go to the
analyze
method:
It starts off with loading of collections into
edm::Handle
variables,
nothing new here.
1// rare case of no reconstructed primary vertex
2if (pvHandle->empty())
3 return;
4
5// extract the position of the (most probable) reconstructed vertex
6math::XYZPoint pv = (*pvHandle)[0].position();
We are lazy and will just skip events that don't have a primary vertex. If
it has one, we will save the position of the first (i.e. signal primary
vertex) for later.
Then we will loop over all jets, check for the pT and |eta| cut, and check
the jet flavour, as before.
1[...]
2// this vector will contain IP value / error pairs
3std::vector<Measurement1D> ipValErr;
This variable will be used to store the impact parameters for all tracks.
Measurement1D
is a useful little helper class for computing significances.
The header can be found
here
.
The impact parameter significance is just defined as the impact parameter
itself, divided by the error on its measurement. This gives us a
significance that the track is in fact displaced from the primary vertex.
Tracks originating from the primary vertex will have a gaussian IP
distribution, smeared around zero. If divided by the error, this gaussian
will be scaled to have a width of 1 (as discussed in the pull distribution
for the primary vertex fit in a previous exercise). Hence, the average
expected significance of a track that genuinely comes from the PV is
expected to be of the order of 1, whereas tracks from the b-decay should
exhibit large tails in the IP significance distribution.
The
Measurement1D
class stores a pair of value and error and has three
methods to access value, error or significance (the latter is just a simple
division).
1// now loop through all tracks
2for(reco::TrackCollection::const_iterator track = tracksHandle->begin();
3 track != tracksHandle->end(); ++track) {
For each jet we will now loop over the whole
generalTracks
track
collection.
1// check the quality criteria
2if (track->pt() < minPt_ ||
3 track->hitPattern().numberOfValidHits() < (int)minTotalHits_ ||
4 track->hitPattern().numberOfValidPixelHits() < (int)minPixelHits_)
5 continue;
And apply our quality cuts. You should be familiar with these calls from
the first exercise.
1// check the Delta R between jet axis and track
2// (Delta_R^2 = Delta_Eta^2 + Delta_Phi^2)
3double deltaR = ROOT::Math::VectorUtil::DeltaR(
4 jet->momentum(), track->momentum());
Here we use a helper function to compute the Delta R between jet and track.
Doing this by hand is also easily done, but you have to take into account
the rotation symmetry of phi. The
momentum()
calls return the
three-vector of the track and jet momentum. We then plot the
DeltaR
distribution in a histogram as a consistency check.
1// only look at tracks in jet cone
2if (deltaR > maxDeltaR_)
3 continue;
And then ignore any tracks outside of the cone.
The next part is a bit tricky. As mentioned earlier we will use a simple
approximation to compute the impact parameter. What we will use are the
dxy
method we have been using before. This method can be passed a
vertex. So we will pass our primary vertex. If you have a look on this method
in the
TrackBase.h
header file, it will tell you that a linear
extrapolation of the track from the reference point is used. Thankfully,
the primary vertex is close to the beam spot that is used for the reference
point, so this will work (it will not work for secondary vertices and you'll
have to use a complex and more precise machinery instead, which will not be
discussed here).
1double ipError = track->dxyError();
2double ipValue = std::abs(track->dxy(pv));
The next trick we are going to use is to compute a sign for the impact
parameter. Here we will use the fact that real b-decays will always be in
jet direction and never behind it, whereas measurement errors will cause the
impact parameters for genuine primary vertex tracks to appear half in front
and half behing the primary vertex (due to measurement errors, remember the
gaussian distribution around zero).
So we will define the sign as follows: If the closest point of approach on
the track to the primary vertex is in jet directon, we will give it a
positive sign. Otherwise a negative one. We can use the dot product of the
jet axis vector and vector pointing to the point of closest approach for
that. Again, we will use an approximation and just compare the track
reference point with the beam spot (in our approximation this problem is
simply symmetric with respect to the translation between beam spot and primary
vertex). Also, we are only computing the transverse impact parameter and
not the fully three-dimensional one, so we will have to set the z component
of the vector to zero.
1math::XYZVector closestPoint = track->referencePoint() - beamSpot->position();
2// only interested in transverse component, z -> 0
3closestPoint.SetZ(0.);
4double sign = closestPoint.Dot(jet->momentum());
5
6if (sign < 0)
7 ipValue = -ipValue;
8
9ipValErr.push_back(Measurement1D(ipValue, ipError));
So, this is what it looks like.
1// now order all tracks by significance (highest first)
2std::sort(ipValErr.begin(), ipValErr.end(), significanceHigher);
This call will now sort the vector of impact parameters by their
significance. This is a neat C++ tool and
signifianceHelper
is a simple
helper function that is defined further up in the source code and compares
the significance of two
Measurement1D
objects. After that call the vector
will be sorted by descending IP significances.
Now, what we have to do is to take the n-th track and plot it.
Actually, for cross-checks we will also plot the impact parameter
distributions (the values, the errors and the significances) for all tracks
first, and then also for the n-th track.
1// check if we have enough tracks to fulfill the
2// n-th track requirement
3if (ipValErr.size() < nThTrack_)
4 continue;
5
6// pick the n-th highest track
7const Measurement1D *trk = &ipValErr[nThTrack_ - 1];
8[...]
In addition, we will introduce a
negative tagger here, where we are taking the n-th track from the end of the
vector instead of from the beginning, effectively mirroring the whole
problem. What this can be used for will be discussed later.
1trk = &ipValErr[ipValErr.size() - nThTrack_];
2[...]
Now execute the example by calling
cmsRun analyzePatBJetTracks_cfg.py
. In case of CMSSW_3_3_X, make sure to adapt the config file in a similar way as in the first exercise.
You can look at the plots using the macro
patBJetTracks_showPlots.C
or
a
TBrowser
.

In case of the DESY/NAF tutorial, please change the input dataset as described
here. Additionally comment
# process.patDefaultSequence *
at the end of the configuration file:
analyzePatBJetTracks_cfg.py
The macro will produce a canvas for each set of variables. Each set of
impact parameter plots will show the value on the left, the error in the
middle and the significance on the right. The top row shows the "flavour
blind" distribution, i.e. for all kinds of jets, whereas the bottom row will
shows the curves separately for light flavour, c and b jets.
The first canvas (with the labels "signed IP for all tracks") shows the
distributions for all tracks in a jet. The green curve, the light flavour
tracks, will show an approximately gaussian peak at zero, as it is expected.
For b-jets you well see the large tail towards positive values, which means that
b-jets contain tracks with lifetime in the direction of the jet.
On the negative side you see that these tails are largely
suppresed, due to the selection of our impact parameter sign. A few tracks
apparently get the wrong sign, but this is unavoidable due to measurement
errors.
The errors in the middle column are identical for all kinds of tracks. They
should be, since reconstruction works the same way for all kinds of tracks.
On the right plot you see the significance distribution. The gaussian core
should have a width of 1 (
Exercise: fit a gaussian to prove that).
The tails are even more visible than in the left plots, which is because we are
correcting the value distribution for the errors. We will now use the tails
to do the b-jet tagging:
We will now look at the second canvas, showing the distribution one for one
track per jet, namely the n-th highest (second-highest) track, according to
the definition of the "Track Counting" algorithm. The plots are labelled
"signed IP for selected positive track". We are essentially getting the
same distribution as before, we are just biasing the distributions to the
right due to our selection. Especially, for b-jets we tend to select a
track from the b-decay itself most of the time. We can clearly see that the
gaussian core around zero is largely suppressed on favour of the tail.
This corresponds now exactly to the discriminator distribution for the
"Track Counting" algorithm as we've seen in the last exercise.
The third canvas, with the label ("signed IP for selected negative tracks")
are the distributions for the negative tagger (n-th last track). You may
ask what this is for. Obviously not for b-tagging, as the tails in b-jets
are largely suppresed (due to our sign definition). This is exactly the
point, as this can be used to get a sort of "clean" sample for impact
parameter distributions as expected from light flavour jets.
If you look at the distribution for all jets (top row), these distributions
look very much like the respective distributions for the light flavour jets
for the positive-side n-th track (you have to mirror the distribution around
zero in your head). We can use this to measure the b-tagging mistag rate on
data without knowing the jet flavour!
Finally, the last canvas shows a few control plots, like the Delta R
distribution between jet and tracks. You will see a peak at zero,
obviously, then a dip around 0.5. This is also obvious, since the jet
clustering algorithm (Iterative Cone 0.5) will produce jets that are exactly
that size. The increase in tracks above this threshold means that these are
already tracks from neighbouring jets. You will also see that b-jets
contain on average more tracks than c or light flavour jets, which is a fact
that is used in more complex b-tagging algorithms like the "Combined
Secondary Vertex".
Now, execute the macro
patBJetTracks_efficiencies.C
. This will produce
the efficiency plots as in the previous exercise, but now for our own
algorithm.
In the first canvas you will see that it behaves similarly like the official "Track
Counting" algorithm. The performance is just a bit worse. This is mostly
due to the fact that the official algorithm has a fancier track selection
with many more quality cuts and so picks up less fake tracks or tracks from
other long-lived decays.
The second canvas shows a comparison between the light flavour discriminator
distribution and the inverted negative tagger distribution (on all jets). As
expected, these distributions do in fact look very similar and can be used
to measure the mistag rate on data (i.e. efficiency versus discriminator cut
for non-b jets). They do not coincide perfectly. It can be made a bit
better using a few more tricks, but a small uncertainty always remains. The
world is after all not perfect.
Exercise V - b-Tagging with vertices
This last exercise allows us to learn a bit more about secondary vertices.
As in the previous example we will create two simple b-tagging algorithms.
One of them corresponds to the "Simple Secondary Vertex" algorithm that we
have already seen.
Let's go again straight to the analyzer code and open
PatBJetVertexAnalyzer.cc
. The whole part before the
analyze
method will be skipped here as there is nothing new to see. We will loop over the PAT jet
collection only, as everything we need is embedded here. Also we will see
the flavour definition used in the two previous exercices and the jet
kinematic cuts.
In the
analyze
methods we will loop over the jets again, cut on pT and
|eta| and assign the flavour. Here we go:
1// retrieve the "secondary vertex tag infos"
2// this is the output of the b-tagging reconstruction code
3// and contains secondary vertices in the jets
4const reco::SecondaryVertexTagInfo &svTagInfo = *jet->tagInfoSecondaryVertex();
We retrieve the "secondary vertex tag infos" from the jet.
The class is defined
here
.
It contains the result of the inclusive secondary vertex reconstruction
inside the jets. This means that it can contain zero, one or more secondary
vertices. If there is more than one vertex, they are sorted by quality,
i.e. the one with the smallest error is listed first.
1// count the number of secondary vertices
2plots_[ALL_JETS].nVertices->Fill(svTagInfo.nVertices());
3plots_[flavour].nVertices->Fill(svTagInfo.nVertices());
4
5// ignore jets without SV from now on
6if (svTagInfo.nVertices() < 1)
7 continue;
8
9// pick the first secondary vertex (the "best" one)
10const reco::Vertex &sv = svTagInfo.secondaryVertex(0);
Now we plot the number of reconstructed secondary vertices per jet.
If the jet does not have a secondary vertex, we skip the rest of the method,
as it will then concentrate on the first (i.e. best) secondary vertex.
The class definition for a vertex is the same as for primary vertices
earlier, it is useful to open up the class header in a browser now
(
Vertex.h
).
1// and plot number of tracks and chi^2
2plots_[ALL_JETS].nTracks->Fill(sv.tracksSize());
3plots_[flavour].nTracks->Fill(sv.tracksSize());
4
5plots_[ALL_JETS].chi2->Fill(sv.chi2());
6plots_[flavour].chi2->Fill(sv.chi2());
Here we fill the number of tracks and chi^2 of the vertex fit into histograms.
1// the precomputed transverse distance to the primary vertex
2Measurement1D distance = svTagInfo.flightDistance(0, true);
We can retrieve the distance between the secondary vertex and the primary
vertex from the tag infos (it is precomputed there). The "0" in the first
arguments means we want to have the first (best) secondary vertex, the
second argument "true" tells it that we want the result in two dimensions
(transverse in x-y plane). The result is a
Measurement1D
object with
value, error and significance, as for the impact parameters.
We then fill these three values into histograms.
We will try to produce a few additional observables. The next part
computes, for instance, the angle between the secondary vertex "flight
direction" and the jet axis using the
DeltaR
helper method we've seen in
the last exercise. This is not particularly exciting, so the explanation will be skipped here.
However, the last part is exciting. We are computing the invariant mass of
the secondary vertex! In order to do this, we need the fourvector sum of
all tracks at the vertex. In order to build a fourvector, we assume that
each track is a pion and assign it the charged pion mass. There are a few
ROOT fourvector classes that can be used for this purpose. Vector classes
are quite involved, so just the classes are used here (please refer to the
documentation for more details).
1// compute the invariant mass from a four-vector sum
2math::XYZTLorentzVector trackFourVectorSum;
3
4// loop over all tracks in the vertex
5for(reco::Vertex::trackRef_iterator track = sv.tracks_begin();
6 track != sv.tracks_end(); ++track) {
7 ROOT::Math::LorentzVector<ROOT::Math::PxPyPzM4D<double> > vec;
8 vec.SetPx((*track)->px());
9 vec.SetPy((*track)->py());
10 vec.SetPz((*track)->pz());
11 vec.SetM(0.13957); // pion mass
12 trackFourVectorSum += vec;
13}
You can see that we are using the
tracks_begin()
and
tracks_end()
methods to loop over the tracks of the vertex, then fill the fourvector with
the components, give the fourvector a mass and then sum them up.
1// get the invariant mass: sqrt(E² - px² - py² - pz²)
2double vertexMass = trackFourVectorSum.M();
This part is now trivial. You can also compute the mass by hand if it makes
you feel more comfortable.
Ok, we can now run the example by calling
cmsRun
with
analyzePatBJetVertex_cfg.py
and look at the results using
patBJetVertex_showPlots.C
with ROOT. In case of CMSSW_3_3_X, make sure to adapt the config file in a similar way as in the first exercise.

In case of the DESY/NAF tutorial, please change the input dataset as described
here. Additionally comment
# process.patDefaultSequence *
at the end of the configuration file:
analyzePatBJetVertex_cfg.py
The flight distance significance distribution is the same as for the "Simple
Secondary Vertex" algorithm, as seen earlier. See how the average b-vertex
also has a smaller error, which is because on average it contains more real
and genuinely displaced tracks. Compare with the number of vertices found
in the different type of jets and the average number of tracks at the
vertices.
If you look at the vertex mass, you will see that b-jets are clearly
distinguisable. This means that we can also use the vertex mass in order to
define a discriminator to do b-tagging. It can also be used for quite a
number of things, e.g. it is used in fits to do heavy flavour fraction
measurements and the like (or in the "Combined Secondary Vertex" it is also
used as input variable for b-tagging).
Now call
root -l patBJetVertex_efficiencies.C
and you will see the
performance curves if you use the two variables just discussed, the flight
distance significance and the vertex mass, as discriminators. You see, both
work quite well.
How to get more information
Review status
Responsible:
ArminScheurer,
ChristopheSaout
Last reviewed by: most recent reviewer and date