7.2.1 PAT Examples: Tracking, Vertexing and b-Tagging

Contents

Introduction

This tutorial is meant to deliver basic insight into CMSSW and the PAT framework. It is focused on the three following topics:

  • Tracking
  • Vertexing
  • b-Tagging
This tutorial was presented the first time on a workshop at DESY, Hamburg, Germany. Therefore, all DESY/NAF site specific settings required during this tutorial can be found here. They are marked by info in the sections below.

Preparation

This tutorial was set up using CMSSW_2_2_13. Basically it should work as well with other software versions. So in order to begin with the different exercises presented below, set up your CMSSW environment and enter the following directory: CMSSW_2_2_13/src.

info For instructions on how to set up the CMSSW environment at DESY/NAF follow this link.

Update: The tutorial was tested and verified to work with CMSSW_3_3_X releases (e.g. CMSSW_3_3_6) with the appropriate RelVal samples.

How to get the code

The tutorial code will be available in future CMSSW releases in the package CMS.PhysicsTools/PatExamples. For currently available CMSSW versions, the CVS HEAD version of this package has to be used (additionally, the CVS tag: TUT_DESY_ws_tracking_btagging has been set up to only get the files related to this one tutorial).

To get a copy of the PAT tutorials/examples, use the following command (replace <tag> with HEAD or info TUT_DESY_ws_tracking_btagging):

addpkg PhysicsTools/PatExamples <tag>

Alternatively you can use:

cvs co -r <tag> PhysicsTools/PatExamples

info If your DESY/NAF username differs from your CERN lxplus username, then you should follow these instructions.

All for this tutorial important files can be found in the directories CMS.PhysicsTools/PatExamples/plugins/ and CMS.PhysicsTools/PatExamples/test/. More details about how to run the different tutorial examples are given in the appropriate subsections below.

Additional Information

Each time something in the CMS.PhysicsTools/PatExamples/plugins/ directory is changed, one needs to compile the code to make the changes available to CMSSW. In case the tutorial code was just checked out, please invoke the compilation now by executing:

scram b

info For the tutorial at DESY/NAF, a skimmed dataset has been prepared. Find more information about it here. Keep in mind that you should replace the input dataset specified in the different tutorial configuration files, but this will be described again in the subsections below.

Exercise I - Tracking

Due to their sheer complexity, topics like running or understanding track reconstruction in details are not covered here. Instead, we are restricting ourselves to understanding the results of the track reconstruction by looking at observables of the reconstructed tracks.

tracker.jpg

Let's have a look at the example:

In the plugins directory (inside the src/CMS.PhysicsTools/PatExamples directory, as for the rest of the tutorial) you can find the CMSSW EDAnalyzer called PatTrackAnalyzer in the PatTrackAnalyzer.cc source code file. We will discuss the contents of this file further below.

The corresponding CMSSW configuration file to execute this analyzer can be found in the test directory and is called analyzePatTracks_cfg.py. Note that the naming scheme of the analyzers and configuration files is consistent for all the examples in this tutorial.

The file test/analyzePatTracks_cfg.py should look like the following:

import FWCore.ParameterSet.Config as cms

process = cms.Process("Test")

Here we are defining a regular CMSSW config file with a process named Test (arbitrary, really, as long as it does not clash with any other process name).

    4# initialize MessageLogger and output report
    5process.load("FWCore.MessageLogger.MessageLogger_cfi")
    6process.MessageLogger.cerr.threshold = 'INFO'
    7process.MessageLogger.categories.append('PATSummaryTables')
    8process.MessageLogger.cerr.INFO = cms.untracked.PSet(
    9        default          = cms.untracked.PSet( limit = cms.untracked.int32(0)  ),
   10        PATSummaryTables = cms.untracked.PSet( limit = cms.untracked.int32(-1) )
   11)
   12process.options   = cms.untracked.PSet( wantSummary =
   13cms.untracked.bool(True) )
   14
   15process.load("Configuration.StandardSequences.Geometry_cff")
   16process.load("Configuration.StandardSequences.FrontierConditions_CMS.GlobalTag_cff")
   17process.load("Configuration.StandardSequences.MagneticField_cff")
   18
   19process.GlobalTag.globaltag = cms.string('IDEAL_V12::All')

Here, a few generic things are set up and standard sequences included (like geometry, calibrations and magnetic field map). They are mostly needed in order to be able to run PAT on RECO or AOD files. If you are already using PAT-tuples as input, they can principally also be removed (but don't hurt if they are defined without being needed). The tag IDEAL_V12::All has to be compatible with the simulated misalignment and the software version used (in our case CMSSW_2_2_13 and without misalignment).

There are two ways one can get the information about the globaltag

First option is to use DBS. This method is convenient as one need not have the RECO/AOD file on the disk.

* Visit the DBS site (https://cmsweb.cern.ch/dbs_discovery/)

* Do a search like (for example): find dataset where file like /store/relval/CMSSW_3_3_6/RelValTTbar/GEN-SIM-RECO/STARTUP3X_V8H-v1/ and dataset.status like VALID*

This gives result similar to the one shown here.

* There you can click on "Conf.files" below the found results. This gives you the config which has been used to produce the dataset.

* Here you find: process.GlobalTag.globaltag = 'STARTUP3X_V8H::All'

Secondly, one can get the above information by using edmProvDump on the RECO/AOD file assuming that the file is available on a local disk.

In addition, the message logger is configured. For CMSSW_3_3_X, you have to comment out the following four lines (leave it untouched in case of CMSSW_2_2_X):

    8#process.MessageLogger.cerr.INFO = cms.untracked.PSet(
    9#        default          = cms.untracked.PSet( limit = cms.untracked.int32(0)  ),
   10#        PATSummaryTables = cms.untracked.PSet( limit = cms.untracked.int32(-1) )
   11#)
   12

   20# produce PAT Layer 1
   21process.load("CMS.PhysicsTools.PatAlgos.patSequences_cff")
   22# switch old trigger matching off
   23from CMS.PhysicsTools.PatAlgos.tools.trigTools import
   24# switchOffTriggerMatchingOld
   25switchOffTriggerMatchingOld( process )

This is needed to create PAT collections on the fly (again redundant when using PAT-tuples). In case of CMSSW_3_3_X, comment out the two lines about old trigger matching. This is now by default the case in current releases.

   26process.maxEvents = cms.untracked.PSet( input = cms.untracked.int32(-1) )

We'd like to run over all the events in the input files (hence -1, otherwise we specify the number of events).

   27process.source = cms.Source("PoolSource",
   28        fileNames = cms.untracked.vstring(
   29                '/store/mc/Fall08/TTJets-madgraph/GEN-SIM-RECO/...',
   30        )
   31)

We define the EDM data files to run on. This examples chooses a RECO file from Fall08 TTbar production. This is not a PAT-tuple, so we need to run PAT on the fly. You can find other samples in DBS, make sure they are compatible with the software release (i.e. CMSSW_2_2_X).

info Due to time constraints at the DESY/NAF tutorial, please modify the config file as described here in order to use skimmed data samples locally available.

   32process.TFileService = cms.Service("TFileService",
   33        fileName = cms.string("analyzePatTracks.root")
   34)

The analyzer is using the TFileService to put histograms into a ROOT file. Here we define the service and give it a filename to write to.

   35process.analyzeTracks = cms.EDAnalyzer("PatTrackAnalyzer",
   36        src = cms.InputTag("generalTracks"),
   37        beamSpot = cms.InputTag("offlineBeamSpot"),
   38        qualities = cms.vstring("loose", "tight", "highPurity")
   39)
   40
   41process.analyzeMuons = cms.EDAnalyzer("PatTrackAnalyzer",
   42        src = cms.InputTag("selectedLayer1Muons"),
   43        beamSpot = cms.InputTag("offlineBeamSpot"),
   44        qualities = cms.vstring("undefQuality")
   45)

Here we define two instances of our PatTrackAnalyzer. The first instance runs on the so-called generalTracks collection which is the most basic track collection in CMSSW. It contains all kinds of tracks, reconstructed with the default "Combinatorial Kalman Track Finder". It includes any kind of charged tracks (pions, kaons, muons and so on) and tries to be as inclusive as possible. In CMSSW_2_2_12 it uses the three-step tracking at this point, which means that it tries to include a wide pT spectrum and also attempts to reconstruct detached tracks. Simple quality cuts have already been applied (the "loose" selection), but it is possible to go to tigher cuts by specifically selecting only tracks with the "tight" or even "highPurity" flags.

The second instance runs on reconstructed muons. Inside the code we will restrict us to muons that have a part reconstructed in the strip tracker. In addition to that they shall also have a part reconstructed in the muon system and a global track fit on both parts simulatenously.

Explanation of the different parameters src, beamSpot and qualities:

  • The src parameter is the parameter that is read by the analyzer and chooses the EDM collection that contains the tracks you want to look at. The analyzer is somewhat flexible and can deal with either plain RECO tracks or PAT muons. The collections for the two analyzers are named "generalTracks" and "selectedLayer1Muons". You can in principle also look at any other track collection that is derived from reco tracks. If you click through the content of an EDM file (e.g. using the TBrowser within ROOT) you will find a few such collections.
  • The beamSpot is the collection which contains the beam spot information (i.e. where the interaction is supposed to take place). It is used to determine the reference point for the track parameters for determination of impact parameters (e.g. "d0" variable in the code). The reason it is needed is because the beam spot in our simulation does not exactly sit at (0, 0), but is slightly shifted (in the real world the beam will likely not sit at (0, 0) as well, so our simulation moves the spot to find potential bugs in the code that are erroneously assuming this. So let's go with it as well, we don't want to fall in that trap either, don't we?
  • Finally, the qualities parameter represents the list of track qualities that we want to look at. For the generalTracks we are looking at the three quality flags mentioned earlier, for the muons we don't care and use the default "undefQuality" which means we select all tracks (muons). The analyzer will write out all histograms for all quality selections.

   46process.p = cms.Path(
   47        process.patDefaultSequence *
   48        process.analyzeTracks *
   49        process.analyzeMuons
   50)

Here we define the sequence to run. If you are running on PAT-tuples, you should comment out the first line (using a # in front), otherwise PAT will be run on the fly. The other two entries here represent the two analyzers we just configured above.

info At the DESY/NAF workshop, we use a prepared PAT-tuple, therefore you should comment the first line: # process.patDefaultSequence *.

Now, given that your input file is valid, the CMSSW environment is set up correctly and that the analyzers compiled successfully (scram b), you can run the example:

cmsRun analyzePatTracks_cfg.py

When the job is done, you will find a file named analyzePatTracks.root in the current directory, containing all the plots from the analyzers. You can open the file using the ROOT command line:

root -l analyzePatTracks.root

and browse a bit by invoking new TBrowser within the ROOT command line interpreter. You will notice that the plots for each analyzer can be found in a directory named after the analyzer.

There is also a ROOT macro prepared for automatic plotting of the file contents including basic histogram formatting. You can call it by executing:

root -l patTracks_showPlots.C

Have a look at the file and open in the editor. You can change the lines with the track qualities and directory to also look at the plots for the muons. Note how the number of hits in the different components are shown in a stacked plot. It is using THStack for that purpose. We will discuss the plots further below after discussing the source code for the analyzer.

Here the source code for the analyzer PatTrackAnalyzer.cc (It will be discussed in full detail here only this first time):

    1#include <iostream>
    2#include <cmath>
    3#include <vector>
    4#include <string>
    5
    6#include <TH1.h>
    7#include <TProfile.h>
    8
    9#include "FWCore/Framework/interface/Frameworkfwd.h"
   10#include "FWCore/Framework/interface/Event.h"
   11#include "FWCore/Framework/interface/EventSetup.h"
   12#include "FWCore/Framework/interface/EDAnalyzer.h"
   13#include "FWCore/Utilities/interface/InputTag.h"
   14#include "FWCore/ParameterSet/interface/ParameterSet.h"
   15#include "FWCore/ServiceRegistry/interface/Service.h"
   16
   17#include "CMS.PhysicsTools/UtilAlgos/interface/TFileService.h"
   18
   19#include "DataFormats/Common/interface/View.h"
   20#include "DataFormats/BeamSpot/interface/BeamSpot.h"
   21#include "DataFormats/TrackReco/interface/Track.h"
   22#include "DataFormats/TrackReco/interface/HitPattern.h"
   23#include "DataFormats/PatCandidates/interface/Muon.h"

Header files we will need.

   24class PatTrackAnalyzer : public edm::EDAnalyzer  {
   25    public: 
   26        // constructor and destructor
   27        PatTrackAnalyzer(const edm::ParameterSet &params);
   28        ~PatTrackAnalyzer();
   29
   30        // virtual methods called from base class EDAnalyzer
   31        virtual void beginJob(const edm::EventSetup&);
   32        virtual void analyze(const edm::Event &event, const edm::EventSetup &es);

Here we define our analyzer by deriving from the CMSSW framework base class EDAnalyzer. We need to define a constructor that is passed a ParameterSet that will contain the configuration from the python config file. We also define the methods beginJob and analyze which are called by the base class, hence they are defined virtual.

   33private:
   34        // configuration parameters
   35        edm::InputTag src_;
   36        edm::InputTag beamSpot_;
   37
   38        // the list of track quality cuts to demand from the tracking
   39        std::vector<std::string> qualities_;

Here we store the parameters from the CMSSW config file in class members. The InputTag class contains references to EDM products (i.e. ROOT file branches) and are preferred over simple string labels.

   40struct Plots {
   41                TH1 *eta, *phi,;
   42                TH1 *pt, *ptErr;
   43                TH1 *invPt, *invPtErr;
   44                TH1 *d0, *d0Err;
   45                TH1 *nHits;
   46
   47                TProfile *pxbHitsEta, *pxeHitsEta;
   48                TProfile *tibHitsEta, *tobHitsEta;
   49                TProfile *tidHitsEta, *tecHitsEta;
   50        };
   51
   52        std::vector<Plots> plots_;
   53};

Here we store all our histograms for the variables we want to plot. The profile histograms store the number of hits the track has in the different subdetector components. The profile is a neat way to simple get the average number of hits per pseudorapidity slice (eta) on the x-axis.

We use a vector of plots, so that we can store plots for each track quality selection in parallel.

   54PatTrackAnalyzer::PatTrackAnalyzer(const edm::ParameterSet &params) :
   55        src_(params.getParameter<edm::InputTag>("src")),
   56        beamSpot_(params.getParameter<edm::InputTag>("beamSpot")),
   57        qualities_(params.getParameter< std::vector<std::string> >("qualities"))
   58{
   59}
   60
   61PatTrackAnalyzer::~PatTrackAnalyzer()
   62{
   63}

Constructor and destructor. Not much to say here. Notice how elegantly we can read the parameters from the CMSSW config file using the constructor syntax. The part enclosed in quotes is the name of the variable to be read and has to match the name specified in the CMSSW config file.

   64void PatTrackAnalyzer::beginJob(const edm::EventSetup &es)
   65{
   66        // retrieve handle to auxiliary service
   67        // used for storing histograms into ROOT file
   68        edm::Service<TFileService> fs;
   69
   70        // now book the histograms, for each category
   71        unsigned int nQualities = qualities_.size();
   72
   73        plots_.resize(nQualities);
   74
   75        for(unsigned int i = 0; i < nQualities; ++i) {
   76                // the name of the quality flag
   77                const char *quality = qualities_[i].c_str();
   78
   79                // the set of plots
   80                Plots &plots = plots_[i];
   81
   82                plots.eta = fs->make<TH1F>(Form("eta_%s", quality),
   83                                           Form("track \\eta (%s)", quality),
   84                                           100, -3, 3);
   85
   86[...]
   87}

The beginJob method is called once at the beginning of the CMSSW job. The main difference with respect to the constructor from our point of view is that here we have access to framework services and event setup, whereas in the constructor these are not set up yet.

Here we first initialize the TFileService, i.e. get a reference to it using the framework serivce concept with edm::Service. We resize the vector of plots to the number of track qualities and then start booking the histograms in a loop.

fs->make is the TFileService way to call new TH1F. It makes sure that the histograms are booked such that they will automatically end up in the correct file.

We are building the histogram name and title using ROOT's Form() call. It is essentially following the C langauge sprintf syntax, where %s dennotes a place holder for a string, %d for an integer, %f for a float, and so on. The parameters to fill into the placeholders are then passed as argument. We need this here to append the track quality to the name of each plot.

   88void PatTrackAnalyzer::analyze(const edm::Event &event, const edm::EventSetup &es)
   89{  
   90        // handles to kinds of data we might want to read
   91        edm::Handle<reco::BeamSpot> beamSpot;
   92        edm::Handle< edm::View<reco::Track> > tracksHandle;
   93        edm::Handle< pat::MuonCollection > muonsHandle;
   94
   95        // read the beam spot
   96        event.getByLabel(beamSpot_, beamSpot);
   97[...]

Here we define the analyze method implementation, which is called for each event. First we define three handles, one handle for each EDM collection that we want to (potentially) read.

We do the actual reading by calling the getByLabel method on the event argument. This method is passed the input tag (the label) and it will fill the handle just defined with the contents of the collection.

The next part is a bit tricky, it will fill the tracks or muons. To do this it uses some "magic" to detect whether it is running on tracks or muons (i.e. what kind of collection you actually passed with the src label). You can learn a few framework tricks by looking at this, but it is not essential for this tutorial.

The important point is that we will fill this collection:

   98std::vector<const reco::Track*> tracks;

which will contain the tracks for both the generalTracks and muon case.

   99// we are done filling the tracks into our "tracks" vector.
  100// now analyze them, once for each track quality category
  101
  102unsigned int nQualities = qualities_.size();
  103for(unsigned int i = 0; i < nQualities; ++i) {

Here we loop over the qualities we want to look at.

  104// we convert the quality flag from its name as a string
  105// to the enumeration value used by the tracking code
  106// (which is essentially an integer number)
  107reco::Track::TrackQuality quality = reco::Track::qualityByName(qualities_[i]);

qualities_[i] is a std::string that was passed from the configuration file. We need to convert this into a different representation (an enumeration value like predefined colors kBlack, kRed, which are esentially integers).

Have a look at the header file TrackBase.h. We will use these methods a lot in the next few lines. Note that the reco::Track class simply derives from the reco::TrackBase class, so you can also have a look at the Track.h class itself. The qualityByName method converts the string into the TrackQuality enumeration for us.

  108// now loop over the tracks
  109for(std::vector<const reco::Track*>::const_iterator iter = tracks.begin();
  110        iter != tracks.end(); ++iter) {
  111             // this is our track
  112             const reco::Track &track = **iter;

How this works should be pretty obvious, if you are familiar with C++ style iterators. (If not: You should really get familiar with them! wink )

We are dereferencing twice here (the two asterisks). First the iterator, and second the pointer inside the collection, to store a reference to the track in our track variable. It can sometimes be a pain to figure out how to do that correctly, but most of the time trial and error or a little bit of thinking and looking at header files works. Please ask an expert before resorting to any kind of ugly hacks, there is usually a nice and clean way to do so.

  113// ignore tracks that fail the quality cut
  114if (!track.quality(quality))
  115        continue;

Here we use the quality method of a track to check if the track fulfills our requirement.

  116// and fill all the plots
  117plots.eta->Fill(track.eta());
  118plots.phi->Fill(track.phi());
  119[...]

This should be pretty obvious.

  120// the hit pattern contains information about
  121// which modules of the detector have been hit
  122const reco::HitPattern &hits = track.hitPattern();

Here we get information about the hit pattern of the track. This class can tell us exactly which modules have been hit by the track (or more precisely: which hits have been used to reconstruct the track). It also tells us in which modules hits were expected, but not found. These are called "lost" hits. Sometimes lost hits are expected, because the module is known to be faulty, sometimes they're not. Have a look at the header files, everything is explained in the comments: HitPattern.h

  123double absEta = std::abs(track.eta());
  124// now fill the number of hits in a layer depending
  125// on eta
  126plots.pxbHitsEta->Fill(absEta, hits.numberOfValidPixelBarrelHits());
  127plots.pxeHitsEta->Fill(absEta, hits.numberOfValidPixelEndcapHits());
  128[...]

We fill the profile histograms with the number of hits in tracking subdetectors with respect to the pseudo-rapidity.

Here is a nice excercise: Add in the three muon subsystems (DT, CSC, RPC), rerun the analyzer on the muons and add them to the plotting macro and have a look at the stacked plot.

Alternatively, you can have a look at the position of each individual hit, but in order to do this you need to have a look at the methods in Track.h. Furthermore, you need to run on RECO for this (neither AOD nor PAT-tuples will work, as the information is not available here). This is outside of the scope of this tutorial.

  129#include "FWCore/Framework/interface/MakerMacros.h"
  130
  131DEFINE_FWK_MODULE(PatTrackAnalyzer);

And finally these lines will turn our C++ class into a framework module.

Now have a look at the the plots from the ROOT macro. Can you explain the observed quantities?

Does the subdetector hit distribution make sense? You can compare it to: http://www.ba.infn.it/~zito/cms/tvis.html

Exercise II - Vertexing

In this sub-tutorial we will look at properties of the reconstructed primary vertex. What is done here software-wise is that to all tracks from the global track collection (the generalTracks that we just looked at), some quality cuts are applied (like required hits in the pixel detector, not too far from the beam spot and so on). Then the tracks are clustered into groups along the z-axis. Remember: With pile-up we can get multipled interactions, so we can get multiple primary vertices along the beam spot. Each group of tracks is then fitted using the AdaptiveVertexFitter and if the resulting vertex passes some basic quality cuts (like minimum number of compatible tracks, compatibility with the beam spot), it is added to the offlinePrimaryVertices collection. The vertices here are ordered by descending sum of pT^2 of the tracks. The reason is that the signal vertex usually has the highest-energetic tracks coming out of it and this selection is close to 100% efficient.

Let's start with the analyzer code PatVertexAnalyzer.cc. After inclusion of a bunch of header files, we will again define a C++ class named PatvertexAnalyzer which contains a constructor, beginJob and an analyze method. Two input tags are taken from the config file: src (the primary vertex collection) and mc. The mc collection is used to get the Monte Carlo truth, as we will be comparing the position of the reconstructed vertex to the actual position where it was simulated.

[...]
    private:
        // configuration parameters
        edm::InputTag src_;
        edm::InputTag genParticles_;

        TH1 *nVertices_, *nTracks_;
        TH1 *x_, *y_, *z_;
        TH1 *xErr_, *yErr_, *zErr_;
        TH1 *xDelta_, *yDelta_, *zDelta_;
        TH1 *xPull_, *yPull_, *zPull_;
};

We will plot the number of reconstructed primary vertices per event. The remaining histograms will only contain nubmers for the first primary vertex (supposedly the signal vertex). Here we plot the number of tracks at the vertex, it's position in three dimensions and the errors on these three coordinates. The Delta variables are the difference between the reconstructed position of the vertex and the simulated, one for each coordinate respectively. Finally, the Pull histograms is what is used by the validation group to test whether everything is fine with the reconstruction. It shows the distribution of (rec - sim) / error for the three coordinates. If everything is fine with the reconstruction and the errors are estimated correctly, these pull distributions should give a gaussian distribution around zero with a width of 1.

Question:

  • Why should it show such a behaviour?

Now let's skip the constructor and beginJob (the stuff in here is pretty obvious) and go to the analyze method:

// handle to the primary vertex collection
edm::Handle<reco::VertexCollection> pvHandle;
event.getByLabel(src_, pvHandle);

// handle to the generator particles (i.e. the MC truth)
edm::Handle<reco::GenParticleCollection> genParticlesHandle;
event.getByLabel(genParticles_, genParticlesHandle);

Here we are defining the handle to hold the collection. Note that typically type names like reco::VertexCollection are just typedef's to std::vector<reco::Vertex>.

The reco::Vertex class is defined in Vertex.h, have a look at the header file.

You can see methods for accessing position information, error information (which is in fact a full matrix, we are just using the diagonal information to get the errors for x, y and z separately) and methods for accessing the tracks and a few more things. The tracks_begin and tracks_end methods can be used as in the previous example to loop over the tracks. We will do this later to obtain secondary vertices relevant for e.g. b-tagging.

// extract the position of the simulated vertex
math::XYZPoint simPV = (*genParticlesHandle)[2].vertex();

We are extracting the simulated primary vertex from the list of generator particles. If you are wondering why we are using the third particle (index 2, starting form 0), this is because the first two particles are the incoming protons, and those do not have an incoming vertex inside the detector, obviously. The third particle is a daughter of one of the protons and is produced at the production vertex.

// the number of reconstructed primary vertices
nVertices_->Fill(pvHandle->size());

// if we have at least one, use the first (highest pt^2 sum)
if (!pvHandle->empty()) {
      const reco::Vertex &pv = (*pvHandle)[0];

If there is at least one reconstructed primary vertex, we will access the first element. Notice how edm::Handle<...> behaves like a C++ pointer. We use -> or * to derefence it to get to the std::vector<...> that is embedded inside it.

The rest below just fills the histograms as discussed above.

Now run the example by executing cmsRun analyzePatVertex_cfg.py. Make sure that the input files are ok and you run the PAT sequence if it is not a PAT-tuple file. In case of CMSSW_3_3_X, make sure to adapt the config file in a similar way as in the first exercise.

info In case of the DESY/NAF tutorial, please change the input dataset as described here. Additionally comment # process.patDefaultSequence * at the end of the configuration file: analyzePatVertex_cfg.py

The two collections passed for the input tags of the PatVertexAnalyzer are offlinePrimaryVertices and genParticles (the latter containing the MC truth).

Now have a look at the results by using the TBrowser on analyzePatVertex.root directly.

Questions:

  • Can you explain the number of primary vertetices per events distribution? These events are simulated without pile-up, why could there be more than one vertex in some events?
  • Can you derive the position of the beam spot and its width from the plots?
  • What is the precision of the resolution? Why is it different for the z coordinate?
  • Do you agree that the pull distribution looks how it should? Why is that so?
  • What is the efficiency of finding the primary vertex in this type of event?

Exercise III - b-Tagging

In this excercise we will make a straight jump to b-tagging and later on combine the things learned with the two previous exercises. b-tagging is essentially a flag that can be given to a jet of any kind to indicate whether it is likely containing a b-hadron decay (and hence originating from a b quark before hadronisation) or not. How such an algorithm works is discussed in the following exercises. Here we will simply have a look at what the official CMS algorithms are giving us.

btag.jpg

For this exercise we will be looking at PAT jets. PAT jets have the advantage that they have all information available on AOD level directly embedded into the C++ class and information does not need to be picked up from a gazillion of individual collections and matched to the jet by hand. Also, the flavour of a jet (or more precisely the flavour of the parton, quark or gluon that this jet originates from) is embedded as MC truth information in the PAT jet object.

So, the analyzer here will be rather trivial. We will be looping over all PAT jets, retrieve the MC flavour truth and the result of a few b-tagging algorithms (see official CMS b-tagging algorithms for the list of official algorithms) and just plot them.

Lets have a look at the with analyzer, PatBJetTagAnalyzer.cc. After the usual header files we get to the class definition, where we look at the private members:

    1// configuration parameters
    2edm::InputTag jets_;

The label of our PAT jet collection.

    1double jetPtCut_;               // minimum (uncorrected) jet energy
    2double jetEtaCut_;              // maximum |eta| for jet

Since we are doing b-tagging, it makes no sense to tag jets outside of the tracker acceptance, so we cut on eta. (we will set |eta| < 2.4). Also, jets go down to infinitely small energies in principle until we get individual particles and noise. In order to stay in perturbative QCD regime and account for the bending of tracks in the magnetic field, we should demand some minimum jet energy as well, so we allow to also cut on a minimum jet transverse momentum.

    1enum Flavour {
    2                ALL_JETS = 0,
    3                UDSG_JETS,   
    4                C_JETS,   
    5                B_JETS,
    6                NONID_JETS,
    7                N_JET_TYPES
    8};

We define our own enumeration for the kinds of jets that we want to distinguish. We will use this enumeration as array index later on, so start with zero. The first histogram in each entry ALL_JETS will contain the cumulative entries of all kinds of jets regardless of their flavour.

The second entry, UDSG_JETS will contain only light flavour jets, i.e. jets from u, d, s quarks or gluons. These will not contain any long-lived c- or b-hadron decay.

The third and fourth jets now are c- and b-jets, the fifth category for jets that have no MC truth assigned (basically energy clusters in the calorimeter that managed to survive the pT cut, but cannot be assigned a hard parton). The last N_JET_TYPES is not an index, but will simply be the required size of the arrays holding the histograms.

    1TH1 * flavours_;
    2
    3// one group of plots per jet flavour;
    4struct Plots {
    5        TH1 *discrTC, *discrSSV, *discrCSV;
    6} plots_[N_JET_TYPES];

flavours_ will contain a simple histogram with the flavour distribution, i.e. bin 0 will contain the number of all jets, bin 1 the number of light flavour jets, bin 3 the number of b-jets, ...

plots_ containts histograms with the discriminators for the three algorithms we are selecting: "Track Counting", "Simple Secondary Vertex" and "Combined Secondary Vertex".

We will then skip the constructor and histogram booking and go to the analyze method:

    1void PatBJetTagAnalyzer::analyze(const edm::Event &event, const edm::EventSetup &es)
    2{  
    3        // handle to the jets collection
    4        edm::Handle<pat::JetCollection> jetsHandle;
    5        event.getByLabel(jets_, jetsHandle);
    6
    7        // now go through all jets
    8        for(pat::JetCollection::const_iterator jet = jetsHandle->begin();
    9            jet != jetsHandle->end(); ++jet) {

This is the classic retrieval of a collection and looking over the entries, which are PAT jets. The header file is here.

    1// only look at jets that pass the pt and eta cut
    2if (jet->pt() < jetPtCut_ ||
    3       std::abs(jet->eta()) > jetEtaCut_)
    4               continue;

Jets are "candidates" in terms of CMSSW software, and candidates are particles are fourvectors. So these have a momentum and angles. We use the pt() and eta() members to implement our cuts.

    1Flavour flavour;
    2// find out the jet flavour (differs between quark and
    3// anti-quark)
    4switch(std::abs(jet->partonFlavour())) {
    5                    case 1:
    6                    case 2:
    7                    case 3:
    8                    case 21:
    9                        flavour = UDSG_JETS;
   10                        break;
   11                    case 4:   
   12                        flavour = C_JETS;
   13                        break;
   14                    case 5:   
   15                        flavour = B_JETS;
   16                        break;
   17                    default:  
   18                        flavour = NONID_JETS;
   19}

We convert the value returned by the partonFlavour() method and assign it one of our Flavour indices. The value returned is the PDG id, which is between 1 to 6 for d, u, s, c, b and t quarks, negative for respective anti-quarks and 21 for gluons.

    1// simply count the number of accepted jets
    2flavours_->Fill(ALL_JETS);
    3flavours_->Fill(flavour); 

Obvious.

    1double discrTC =jet->bDiscriminator("trackCountingHighEffBJetTags");
    2double discrSSV = jet->bDiscriminator("simpleSecondaryVertexBJetTags");
    3double discrCSV = jet->bDiscriminator("combinedSecondaryVertexBJetTags");

Here we return the discriminator values for the three algorithms. As mentioned earlier these are stored directly inside the PAT jets. You will notice that the discriminator are floating point values and not boolean values. This means that as a user you have to cut on the value and this allows you to choose a working point. The higher the value, the more likely it is that the jet is actually a real b-jet. The range and shape of the discriminator is highly dependent on the algorithm (see already the different ranges in the histogram booking).

The lines below just fill the histograms with the discriminator distribution.

Run the example by executing cmsRun analyzePatBJetTags_cfg.py. In case of CMSSW_3_3_X, make sure to adapt the config file in a similar way as in the first exercise.

info In case of the DESY/NAF tutorial, please change the input dataset as described here. Additionally comment # process.patDefaultSequence * at the end of the configuration file: analyzePatBJetTags_cfg.py

You will see that we are cutting on a minimum pT of 30 GeV for the jets here (these are uncorrected values and correspond roughly to 50 GeV corrected jets in the central region). Otherwise nothing special in the config file.

You can either look at the output by using a plain TBrowser to get the discriminator distribution, or use the provided macro patBJetTags_efficiencies.C which is much more sophisticated.

Executing it (root -l patBJetTags_efficiencies.C) will give you one canvas per algorithm. The leftmost plot shows the discriminator distribution for the three kinds of jets: Light flavour jets, which should clearly be distinguishable from the b-jets, and c-jets which are harder, since they contain real lifetime and are often therefore usually handled separately.

You can see that all light flavour jet discriminators peak to the left of the histogram and b-jets have larger tails or peak to the right. The "Track Counting" algorithm is based on the impact parameter significance and therefore has an exponential tail, which represents the exponentially decaying real lifetime distribution of the b-hadron. The "Simple Secondary Vertex" distribution is esentially the same, only that it's an actual transverse distance significance of the reconstructed secondary vertex and the primary vertex, which is also an exponentially decaying function, except that it's taken as log(1 + sig). The "Combined Secondary Vertex" finally is the output of a Multivariate Analysis, and is in the used case a Likelihood Ratio of the type S / (S + B) in particular and therefore limited to the range between 0 and 1.

The plot in the middle is derived from these discriminator distributions. It shows the relative probability to tag a jet at a given working point, i.e. discriminator cut, which is also called an efficiency. The jet is tagged if the discriminator is above the cut value, so the efficiency is defined as the number of jets which have a discriminator that is above that cut, divided by all jets (of the same flavour). In other words, the integral of the histogram from a certain discriminator cut up to infinity, divided by the total number of jets.

Open the patBJetTags_efficiencies.C source code and see how this is done in the computeEffVsCut method. It is taking a discriminator distribution histogram as input, creates a new histogram with the same x-axis and then scans through all the bins and stores the normalized integral as bin content, as just discussed. The errors are binomial errors. The total number of jets in a certain category is taken from the "flavour" histogram. What you see in the histogram is that the efficiency to tag a b-jet is always higher than that for a non-b-jet, obviously. The higher the separating powers, the better. This curve can be used to choose the optimal working point for your analysis.

The third plot on the right is essentially just a different representation of efficiencies. Instead of showing the non-b-jet efficiencies in terms of discriminator cut, they are show with respect to the b-tagging efficiency at the same working point. This is done in the computeEffVsBEff method. The histograms are scanned bin-wise and for each pair of b-efficiency and non-b-efficiency in a given bin, an entry in the histogram is filled with that pair. The non-b-jet efficiency is also called mistag, since tagging a non-b-jet is what you want to avoid when doing b-tagging. So, the lower the mistag rate and the higher your b-tagging efficiency is, the better your algorithm performs.

Questions:

  • The "Simple Secondary Vertex" can only tag jets, that have a reconstructed secondary vertex. With this information, can you explain why the mistag versus b-eff. curve stops before reaching the upper right corner?
  • What significance does the endpoint have and what information does this contain?

Exercise IV - b-Tagging with tracks

Now we will try to construct a simplified version of the "Track Counting" algorithm ourselves. Have a look at PatBJetTrackAnalyzer.cc. After the usual header inclusions, class and method definition, we will have the following members:

    1private:
    2        // configuration parameters
    3        edm::InputTag jets_;
    4        edm::InputTag tracks_;
    5        edm::InputTag beamSpot_;
    6        edm::InputTag primaryVertices_;

We will need jets (which we want to tag), tracks (we will need impact parameters) and primary vertices. The latter are used to compute the impact parameters. We will only use an approximation for that, as the exact computation involves using a few more complicated steps with creation of transient tracks, which we will not go into. The beam spot is needed as well to get this approximation to work (as we will use the track's parameterisation reference point, which is closest to the beam spot).

    1double jetPtCut_;               // minimum (uncorrected) jet energy
    2double jetEtaCut_;              // maximum |eta| for jet
    3double maxDeltaR_;              // angle between jet and tracks

The first two are the jet kinematics cuts, as already seen in the previous excercise. The maxDeltaR_ parameter is new, it is used to test whether a track is considered to be "inside" a jet or not. We will use a Delta R = sqrt(Delta phi^2 + Delta eta^2) criterium with respect to the center of the jet to associate tracks in a cone around the jet center.

    1double minPt_;                  // track pt quality cut
    2unsigned int minPixelHits_;     // minimum number of pixel hits
    3unsigned int minTotalHits_;     // minimum number of total hits

These are very important. b-tagging is extremely sensitive to badly reconstructed tracks, so we are demanding a minimum transverse momentum (so that there is not too much curvature, not too much energy loss, extrapolation uncertainty and fake rate). In addition, a minimum number of total hits and especially enough pixel hits for the extrapolation to the impact point is required as well.

    1unsigned int nThTrack_;         // n-th hightest track to choose

This is the key to the "Track Counting" algorithm. This algorithm uses exactly one impact parameter significance of a certain track as discriminator. This is how we choose this track: We simply compute the IP significance for all tracks in the jet, then order them by IP significance and take the n-th highest track. For nThTrack_ a trade-off has to be found. If the number is too small we risk of selecting too many badly reconstructed tracks or tracks from unavoidable long-lived decays from K_short or lambda for instance. On the other hand, the average charged decay multiplicity of a b-hadron is around five, and with out quality cuts it is effectively smaller than that. So we can't choose this number too large, or we will miss the b-decay tracks. Good numbers are 2 or 3. We will go with 2, as it is the number chosen by the official "Track Counting High Efficiency" algorithm. You are welcome to change this number to see what happens.

The next part in the source file is used for our flavour definition and is identical to the previous excercise, as well as some histogram booking, which we will skip and directly go to the analyze method:

It starts off with loading of collections into edm::Handle variables, nothing new here.

    1// rare case of no reconstructed primary vertex
    2if (pvHandle->empty())
    3        return;
    4
    5// extract the position of the (most probable) reconstructed vertex
    6math::XYZPoint pv = (*pvHandle)[0].position();

We are lazy and will just skip events that don't have a primary vertex. If it has one, we will save the position of the first (i.e. signal primary vertex) for later.

Then we will loop over all jets, check for the pT and |eta| cut, and check the jet flavour, as before.

    1[...]
    2// this vector will contain IP value / error pairs
    3std::vector<Measurement1D> ipValErr;

This variable will be used to store the impact parameters for all tracks. Measurement1D is a useful little helper class for computing significances. The header can be found here.

The impact parameter significance is just defined as the impact parameter itself, divided by the error on its measurement. This gives us a significance that the track is in fact displaced from the primary vertex. Tracks originating from the primary vertex will have a gaussian IP distribution, smeared around zero. If divided by the error, this gaussian will be scaled to have a width of 1 (as discussed in the pull distribution for the primary vertex fit in a previous exercise). Hence, the average expected significance of a track that genuinely comes from the PV is expected to be of the order of 1, whereas tracks from the b-decay should exhibit large tails in the IP significance distribution.

The Measurement1D class stores a pair of value and error and has three methods to access value, error or significance (the latter is just a simple division).

    1// now loop through all tracks
    2for(reco::TrackCollection::const_iterator track = tracksHandle->begin();
    3          track != tracksHandle->end(); ++track) {

For each jet we will now loop over the whole generalTracks track collection.

    1// check the quality criteria
    2if (track->pt() < minPt_ ||
    3           track->hitPattern().numberOfValidHits() < (int)minTotalHits_ ||
    4           track->hitPattern().numberOfValidPixelHits() < (int)minPixelHits_)
    5                   continue;

And apply our quality cuts. You should be familiar with these calls from the first exercise.

    1// check the Delta R between jet axis and track
    2// (Delta_R^2 = Delta_Eta^2 + Delta_Phi^2)
    3double deltaR = ROOT::Math::VectorUtil::DeltaR(
    4          jet->momentum(), track->momentum());

Here we use a helper function to compute the Delta R between jet and track. Doing this by hand is also easily done, but you have to take into account the rotation symmetry of phi. The momentum() calls return the three-vector of the track and jet momentum. We then plot the DeltaR distribution in a histogram as a consistency check.

    1// only look at tracks in jet cone
    2if (deltaR > maxDeltaR_)
    3            continue;

And then ignore any tracks outside of the cone.

The next part is a bit tricky. As mentioned earlier we will use a simple approximation to compute the impact parameter. What we will use are the dxy method we have been using before. This method can be passed a vertex. So we will pass our primary vertex. If you have a look on this method in the TrackBase.h header file, it will tell you that a linear extrapolation of the track from the reference point is used. Thankfully, the primary vertex is close to the beam spot that is used for the reference point, so this will work (it will not work for secondary vertices and you'll have to use a complex and more precise machinery instead, which will not be discussed here).

    1double ipError = track->dxyError();
    2double ipValue = std::abs(track->dxy(pv));

The next trick we are going to use is to compute a sign for the impact parameter. Here we will use the fact that real b-decays will always be in jet direction and never behind it, whereas measurement errors will cause the impact parameters for genuine primary vertex tracks to appear half in front and half behing the primary vertex (due to measurement errors, remember the gaussian distribution around zero).

So we will define the sign as follows: If the closest point of approach on the track to the primary vertex is in jet directon, we will give it a positive sign. Otherwise a negative one. We can use the dot product of the jet axis vector and vector pointing to the point of closest approach for that. Again, we will use an approximation and just compare the track reference point with the beam spot (in our approximation this problem is simply symmetric with respect to the translation between beam spot and primary vertex). Also, we are only computing the transverse impact parameter and not the fully three-dimensional one, so we will have to set the z component of the vector to zero.

    1math::XYZVector closestPoint = track->referencePoint() - beamSpot->position();
    2// only interested in transverse component, z -> 0
    3closestPoint.SetZ(0.);
    4double sign = closestPoint.Dot(jet->momentum());
    5
    6if (sign < 0)
    7        ipValue = -ipValue;
    8
    9ipValErr.push_back(Measurement1D(ipValue, ipError));

So, this is what it looks like.

    1// now order all tracks by significance (highest first)
    2std::sort(ipValErr.begin(), ipValErr.end(), significanceHigher);

This call will now sort the vector of impact parameters by their significance. This is a neat C++ tool and signifianceHelper is a simple helper function that is defined further up in the source code and compares the significance of two Measurement1D objects. After that call the vector will be sorted by descending IP significances.

Now, what we have to do is to take the n-th track and plot it. Actually, for cross-checks we will also plot the impact parameter distributions (the values, the errors and the significances) for all tracks first, and then also for the n-th track.

    1// check if we have enough tracks to fulfill the
    2// n-th track requirement
    3if (ipValErr.size() < nThTrack_)
    4       continue;
    5
    6// pick the n-th highest track
    7const Measurement1D *trk = &ipValErr[nThTrack_ - 1];
    8[...]

In addition, we will introduce a negative tagger here, where we are taking the n-th track from the end of the vector instead of from the beginning, effectively mirroring the whole problem. What this can be used for will be discussed later.

    1trk = &ipValErr[ipValErr.size() - nThTrack_];
    2[...]

Now execute the example by calling cmsRun analyzePatBJetTracks_cfg.py. In case of CMSSW_3_3_X, make sure to adapt the config file in a similar way as in the first exercise.

You can look at the plots using the macro patBJetTracks_showPlots.C or a TBrowser.

info In case of the DESY/NAF tutorial, please change the input dataset as described here. Additionally comment # process.patDefaultSequence * at the end of the configuration file: analyzePatBJetTracks_cfg.py

The macro will produce a canvas for each set of variables. Each set of impact parameter plots will show the value on the left, the error in the middle and the significance on the right. The top row shows the "flavour blind" distribution, i.e. for all kinds of jets, whereas the bottom row will shows the curves separately for light flavour, c and b jets.

The first canvas (with the labels "signed IP for all tracks") shows the distributions for all tracks in a jet. The green curve, the light flavour tracks, will show an approximately gaussian peak at zero, as it is expected. For b-jets you well see the large tail towards positive values, which means that b-jets contain tracks with lifetime in the direction of the jet. On the negative side you see that these tails are largely suppresed, due to the selection of our impact parameter sign. A few tracks apparently get the wrong sign, but this is unavoidable due to measurement errors.

The errors in the middle column are identical for all kinds of tracks. They should be, since reconstruction works the same way for all kinds of tracks.

On the right plot you see the significance distribution. The gaussian core should have a width of 1 (Exercise: fit a gaussian to prove that). The tails are even more visible than in the left plots, which is because we are correcting the value distribution for the errors. We will now use the tails to do the b-jet tagging:

We will now look at the second canvas, showing the distribution one for one track per jet, namely the n-th highest (second-highest) track, according to the definition of the "Track Counting" algorithm. The plots are labelled "signed IP for selected positive track". We are essentially getting the same distribution as before, we are just biasing the distributions to the right due to our selection. Especially, for b-jets we tend to select a track from the b-decay itself most of the time. We can clearly see that the gaussian core around zero is largely suppressed on favour of the tail.

This corresponds now exactly to the discriminator distribution for the "Track Counting" algorithm as we've seen in the last exercise.

The third canvas, with the label ("signed IP for selected negative tracks") are the distributions for the negative tagger (n-th last track). You may ask what this is for. Obviously not for b-tagging, as the tails in b-jets are largely suppresed (due to our sign definition). This is exactly the point, as this can be used to get a sort of "clean" sample for impact parameter distributions as expected from light flavour jets. If you look at the distribution for all jets (top row), these distributions look very much like the respective distributions for the light flavour jets for the positive-side n-th track (you have to mirror the distribution around zero in your head). We can use this to measure the b-tagging mistag rate on data without knowing the jet flavour!

Finally, the last canvas shows a few control plots, like the Delta R distribution between jet and tracks. You will see a peak at zero, obviously, then a dip around 0.5. This is also obvious, since the jet clustering algorithm (Iterative Cone 0.5) will produce jets that are exactly that size. The increase in tracks above this threshold means that these are already tracks from neighbouring jets. You will also see that b-jets contain on average more tracks than c or light flavour jets, which is a fact that is used in more complex b-tagging algorithms like the "Combined Secondary Vertex".

Now, execute the macro patBJetTracks_efficiencies.C. This will produce the efficiency plots as in the previous exercise, but now for our own algorithm.

In the first canvas you will see that it behaves similarly like the official "Track Counting" algorithm. The performance is just a bit worse. This is mostly due to the fact that the official algorithm has a fancier track selection with many more quality cuts and so picks up less fake tracks or tracks from other long-lived decays.

The second canvas shows a comparison between the light flavour discriminator distribution and the inverted negative tagger distribution (on all jets). As expected, these distributions do in fact look very similar and can be used to measure the mistag rate on data (i.e. efficiency versus discriminator cut for non-b jets). They do not coincide perfectly. It can be made a bit better using a few more tricks, but a small uncertainty always remains. The world is after all not perfect.

Exercise V - b-Tagging with vertices

This last exercise allows us to learn a bit more about secondary vertices. As in the previous example we will create two simple b-tagging algorithms. One of them corresponds to the "Simple Secondary Vertex" algorithm that we have already seen.

Let's go again straight to the analyzer code and open PatBJetVertexAnalyzer.cc. The whole part before the analyze method will be skipped here as there is nothing new to see. We will loop over the PAT jet collection only, as everything we need is embedded here. Also we will see the flavour definition used in the two previous exercices and the jet kinematic cuts.

In the analyze methods we will loop over the jets again, cut on pT and |eta| and assign the flavour. Here we go:

    1// retrieve the "secondary vertex tag infos"
    2// this is the output of the b-tagging reconstruction code
    3// and contains secondary vertices in the jets
    4const reco::SecondaryVertexTagInfo &svTagInfo = *jet->tagInfoSecondaryVertex();

We retrieve the "secondary vertex tag infos" from the jet. The class is defined here. It contains the result of the inclusive secondary vertex reconstruction inside the jets. This means that it can contain zero, one or more secondary vertices. If there is more than one vertex, they are sorted by quality, i.e. the one with the smallest error is listed first.

    1// count the number of secondary vertices
    2plots_[ALL_JETS].nVertices->Fill(svTagInfo.nVertices());
    3plots_[flavour].nVertices->Fill(svTagInfo.nVertices());
    4
    5// ignore jets without SV from now on
    6if (svTagInfo.nVertices() < 1)
    7       continue;
    8
    9// pick the first secondary vertex (the "best" one)
   10const reco::Vertex &sv = svTagInfo.secondaryVertex(0);

Now we plot the number of reconstructed secondary vertices per jet. If the jet does not have a secondary vertex, we skip the rest of the method, as it will then concentrate on the first (i.e. best) secondary vertex.

The class definition for a vertex is the same as for primary vertices earlier, it is useful to open up the class header in a browser now (Vertex.h).

    1// and plot number of tracks and chi^2
    2plots_[ALL_JETS].nTracks->Fill(sv.tracksSize());
    3plots_[flavour].nTracks->Fill(sv.tracksSize());
    4
    5plots_[ALL_JETS].chi2->Fill(sv.chi2());
    6plots_[flavour].chi2->Fill(sv.chi2());

Here we fill the number of tracks and chi^2 of the vertex fit into histograms.

    1// the precomputed transverse distance to the primary vertex
    2Measurement1D distance = svTagInfo.flightDistance(0, true);

We can retrieve the distance between the secondary vertex and the primary vertex from the tag infos (it is precomputed there). The "0" in the first arguments means we want to have the first (best) secondary vertex, the second argument "true" tells it that we want the result in two dimensions (transverse in x-y plane). The result is a Measurement1D object with value, error and significance, as for the impact parameters. We then fill these three values into histograms.

We will try to produce a few additional observables. The next part computes, for instance, the angle between the secondary vertex "flight direction" and the jet axis using the DeltaR helper method we've seen in the last exercise. This is not particularly exciting, so the explanation will be skipped here.

However, the last part is exciting. We are computing the invariant mass of the secondary vertex! In order to do this, we need the fourvector sum of all tracks at the vertex. In order to build a fourvector, we assume that each track is a pion and assign it the charged pion mass. There are a few ROOT fourvector classes that can be used for this purpose. Vector classes are quite involved, so just the classes are used here (please refer to the documentation for more details).

    1// compute the invariant mass from a four-vector sum
    2math::XYZTLorentzVector trackFourVectorSum;
    3
    4// loop over all tracks in the vertex
    5for(reco::Vertex::trackRef_iterator track = sv.tracks_begin();
    6          track != sv.tracks_end(); ++track) {
    7                        ROOT::Math::LorentzVector<ROOT::Math::PxPyPzM4D<double> > vec;
    8                        vec.SetPx((*track)->px());
    9                        vec.SetPy((*track)->py());
   10                        vec.SetPz((*track)->pz());
   11                        vec.SetM(0.13957);      // pion mass
   12                        trackFourVectorSum += vec;
   13}

You can see that we are using the tracks_begin() and tracks_end() methods to loop over the tracks of the vertex, then fill the fourvector with the components, give the fourvector a mass and then sum them up.

    1// get the invariant mass: sqrt(E² - px² - py² - pz²)
    2double vertexMass = trackFourVectorSum.M();

This part is now trivial. You can also compute the mass by hand if it makes you feel more comfortable.

Ok, we can now run the example by calling cmsRun with analyzePatBJetVertex_cfg.py and look at the results using patBJetVertex_showPlots.C with ROOT. In case of CMSSW_3_3_X, make sure to adapt the config file in a similar way as in the first exercise.

info In case of the DESY/NAF tutorial, please change the input dataset as described here. Additionally comment # process.patDefaultSequence * at the end of the configuration file: analyzePatBJetVertex_cfg.py

The flight distance significance distribution is the same as for the "Simple Secondary Vertex" algorithm, as seen earlier. See how the average b-vertex also has a smaller error, which is because on average it contains more real and genuinely displaced tracks. Compare with the number of vertices found in the different type of jets and the average number of tracks at the vertices.

If you look at the vertex mass, you will see that b-jets are clearly distinguisable. This means that we can also use the vertex mass in order to define a discriminator to do b-tagging. It can also be used for quite a number of things, e.g. it is used in fits to do heavy flavour fraction measurements and the like (or in the "Combined Secondary Vertex" it is also used as input variable for b-tagging).

Now call root -l patBJetVertex_efficiencies.C and you will see the performance curves if you use the two variables just discussed, the flight distance significance and the vertex mass, as discriminators. You see, both work quite well.

How to get more information

Review status

Reviewer/Editor and Date (copy from screen) Comments
ArminScheurer - 17 Jun 2009 Created the page
ArminScheurer - 22 Dec 2009 Reviewed the page
JyothsnaK - 25-Feb-2010 Reviewed the page and added information on globaltag

Responsible: ArminScheurer, ChristopheSaout
Last reviewed by: most recent reviewer and date

Topic attachments
I Attachment History Action Size Date Who CommentSorted ascending
JPEGjpg btag.jpg r1 manage 102.1 K 2009-06-30 - 16:48 ArminScheurer  
JPEGjpg tracker.jpg r1 manage 204.8 K 2009-06-30 - 16:42 ArminScheurer  
Edit | Attach | Watch | Print version | History: r22 < r21 < r20 < r19 < r18 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r22 - 2010-08-06 - SudhirMalik
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback