Contents

Introduction

This page was created for the PAT Tutorial June 2009 and the last time updated for the PAT Tutorial December 2009. You can find the corresponding talks on the Indico agenda (June 2009, September 2009, December 2009).

This tutorial will show you examples of usage for the Top Quark Analysis Framework (TQAF). For this purpose, the semi-leptonic ttbar channel with muons was chosen.

In the following you will learn:

  • how to produce an object of the TtSemiLeptonicEvent class that contains six different event hypotheses plus generator information
  • how to access the candidates from the hypotheses, the generator particles and meta information from the TtSemiLeptonicEvent
  • how to use b-tag information and flavour dependent jet energy corrections for certain event hypotheses
  • how to change the algorithm that is used for the generator particle based jet-parton matching

Previous tutorials

Here you can find the Twiki pages from previous tutorials: WorkBookPATExampleTopQuarksSeptember2009

How to get the code

First check out the latest tags of PAT and TQAF for CMSSW_3_3_4 as described in the TQAF recipes. Then check out a specific version of the TQAF Examples package for this tutorial:

cmsrel CMSSW_3_3_4
cd CMSSW_3_3_4/src
cmsenv
cvs co -r V06-03-07 AnalysisDataFormats/TopObjects
cvs co -r V06-03-07 TopQuarkAnalysis
cvs co -r B33X_PatTutorial_December2009 TopQuarkAnalysis/Examples

How to run the code

Info This section introduces the most important files and commands for this tutorial. They will be described in detail in the subsequent section, in which you will also be asked again to run the code.

Don't forget to compile the code (maybe on more than one core, e.g. with four processes simultaneously, to make it faster):

scram b -j4

You can now run a first test:

cmsRun TopQuarkAnalysis/Examples/test/analyzeTopHypotheses_cfg.py

You will get a file named analyzeTopHypothesis.root which you can inspect using a root macro that is provided in the Examples package:

root -l
.x TopQuarkAnalysis/Examples/test/analyzeTopHypotheses.C+

Tip, idea The macro does not only display all histograms on your screen, it also writes them into a postscript file. So if you feel that the drawing of the canvases with the histograms takes too much time, you can significantly speed things up by running root in batch mode and look at the results with a tool like ghostview afterwards:

root -l -b -q TopQuarkAnalysis/Examples/test/analyzeTopHypotheses.C+
gv analyzeTopHypotheses.ps

In order to compare different results, re-configure the HypothesisAnalyzer, rename the output root file (e.g. analyzeTopHypothesisReconfig.root) and run the HypothesisAnalyzer again. Then inspect the differences via a second root macro provided in the Examples package:

root -l -b -q 'TopQuarkAnalysis/Examples/test/
analyzeTopHypothesesComparison.C+("analyzeTopHypothesisReconfig.root")'

gv analyzeTopHypothesesComparison.ps

Note that the macro always compares the root file given as a parameter (here: ("analyzeTopHypothesisReconfig.root")) with the file named analyzeTopHypothesis.root. So both have to exist.

Find out more about the details

The HypothesisAnalzer runs on input files that are part of a PAT tuple created from a Pythia ttbar sample (already with a standard event selection for the semileptonic ttbar channel with a muon).

TopQuarkAnalysis/Examples/python/TtSemiLepEvtSelection_cff.py

If you have a look into the cff file, you can see how the selectedLayer1 objects and clones of the standard PAT filters are used for a very quick but realistic, PAG specific event selection.


import FWCore.ParameterSet.Config as cms

#
# require exactly one isolated muon above 20 GeV in the central region
#

from CMS.PhysicsTools.PatAlgos.selectionLayer1.muonSelector_cfi import *
leadingMuons = selectedLayer1Muons.clone()
leadingMuons.src = 'selectedLayer1Muons'
leadingMuons.cut = 'pt > 20. & abs(eta) < 2.1 & (trackIso+caloIso)/pt < 0.1'

from CMS.PhysicsTools.PatAlgos.selectionLayer1.muonCountFilter_cfi import *
countLeadingMuons = countLayer1Muons.clone()
countLeadingMuons.src = 'leadingMuons'
countLeadingMuons.minNumber = 1
countLeadingMuons.maxNumber = 1

#
# require at least four jets above 30 GeV in the central region
#

from CMS.PhysicsTools.PatAlgos.selectionLayer1.jetSelector_cfi  import *
leadingJets = selectedLayer1Jets.clone()
leadingJets.src = 'selectedLayer1Jets'
leadingJets.cut = 'pt > 30. & abs(eta) < 2.4'

from CMS.PhysicsTools.PatAlgos.selectionLayer1.jetCountFilter_cfi import *
countLeadingJets = countLayer1Jets.clone()
countLeadingJets.src = 'leadingJets'
countLeadingJets.minNumber = 4

ttSemiLepEvtSelection = cms.Sequence(leadingMuons *
                                     countLeadingMuons *
                                     leadingJets *
                                     countLeadingJets)

The applied cuts are:

  • exactly one isolated muon above 20 GeV in the central detector region and
  • at least four jets above 30 GeV in the central detector region

If you look at the file

TopQuarkAnalysis/Examples/test/analyzeTopHypotheses_cfg.py

you see three sequences in the path:


process.path = cms.Path(process.makeGenEvt *
                        process.makeTtSemiLepEvent *
                        process.analyzeHypotheses)

The sequences makeGenEvt, makeTtSemiLepEvent and analyzeHypotheses will be described in the following.

The TtGenEvent

The sequence makeGenEvt is used to produce the TtGenEvent, a container class that keeps the genParticles of the ttbar decay chain and additionally provides some meta information obtained from the Monte Carlo generator information for each event. This sequence is defined in:

TopQuarkAnalysis/TopEventProducers/python/sequences/ttGenEvent_cff.py

The TtSemiLeptonicEvent

The sequence makeTtSemiLepEvt produces the TtSemiLeptonicEvent, a container class that provides a set of different event hypotheses for semileptonic ttbar events plus some meta information. The main differences between the hypotheses consist in the algorithms used to derive the jet-parton association within the events. The makeTtSemiLepEvt sequence is defined in:

TopQuarkAnalysis/TopEventProducers/pyhton/sequences/ttSemiLepEvtBuilder_cff.py

If you uncomment the line

process.ttSemiLepEvent.verbosity = 1

and run the analyzeTopHypotheses_cfg.py, as described above, you will get a printout like the following for each event:

++++++++++++++++++++++++++++++++++++++++++++++++++
 TtGenEvent says: Semi-leptonic TtBar, Muon Channel
 Number of available event hypothesis classes: 6
 - JetLepComb: LightP LightQ  HadB   LepB  Lepton
--------------------------------------------------
 Geom-Hypothesis:
 * JetLepComb:   1      2      3      0      0
--------------------------------------------------
 WMassMaxSumPt-Hypothesis:
 * JetLepComb:   1      2      3      0      0
--------------------------------------------------
 MaxSumPtWMass-Hypothesis:
 * JetLepComb:   1      2      3      0      0
--------------------------------------------------
 GenMatch-Hypothesis:
 * JetLepComb:   1      2      0      3      0
 * Sum(DeltaR) : 1.92811
 * Sum(DeltaPt): 281.127
--------------------------------------------------
 MVADisc-Hypothesis:
 * JetLepComb:   0      2      1      3      0
 * Method  : ProcLikelihood
 * Discrim.: 0.770072
--------------------------------------------------
 KinFit-Hypothesis:
 * JetLepComb:   1      2      0      3      0
 * Chi^2      : 0.0588122
 * Prob(Chi^2): 0.971022
++++++++++++++++++++++++++++++++++++++++++++++++++

You can see from this that all 6 existing algorithms for event hypotheses in the TtSemiLeptonicEvent were run and you can read from the tabular printout which jet was assigned to which parton, where the indices refer to the selectedLayer1Jets from PAT which were used as input for the algorithms. And you can see which of the selectedLayer1Muons was chosen as the lepton for the event hypothesis. In addition, you already find some meta information in this standard printout like the summed deltaR of the GenMatch, the MVA discriminator value and the final chisquare of the kinematic fit.

The HypothesisAnalyzer

Finally, the sequence analyzeHypotheses invokes the HypothesisAnalyzer. If you have a look at the file

TopQuarkAnalysis/Examples/python/HypothesisAnalyzer_cff.py

you see that actually three different clones of the corresponding module are run with this sequence.


import FWCore.ParameterSet.Config as cms

#
# make simple analysis plots for a comparison
# between a gen match and two simple algorithmic
# event hypotheses
#

# initialize analyzers
from TopQuarkAnalysis.Examples.HypothesisAnalyzer_cfi import *
analyzeGenMatch      = analyzeHypothesis.clone()
analyzeMaxSumPtWMass = analyzeHypothesis.clone()
analyzeGeom          = analyzeHypothesis.clone()

# configure analyzers
analyzeGenMatch.hypoClassKey      = "kGenMatch"
analyzeMaxSumPtWMass.hypoClassKey = "kMaxSumPtWMass"
analyzeGeom.hypoClassKey          = "kGeom"

# define sequence
analyzeHypotheses = cms.Sequence(analyzeGenMatch *
                                 analyzeMaxSumPtWMass *
                                 analyzeGeom)

Let's go through the HypothesisAnalyzer now. The relevant files are:

TopQuarkAnalysis/Examples/plugins/HypothesisAnalyzer.h
TopQuarkAnalysis/Examples/plugins/HypothesisAnalyzer.cc
TopQuarkAnalysis/Examples/python/HypothesisAnalyzer_cfi.py

You can see in the source file how the TtSemiLeptonicEvent is read as a product from the edm::Event and made available to the analyzer.


semiLepEvt_ (cfg.getParameter<edm::InputTag>("semiLepEvent"))

edm::Handle<TtSemiLeptonicEvent> semiLepEvt;
event.getByLabel(semiLepEvt_, semiLepEvt);

The member semiLepEvent_ just holds the corresponding InputTag which is "ttSemiLepEvent" as defined in the cfi file.

A second product is read from the event: A simple string which is internally converted into the type of TtSemiLeptonicEvent::HypoClassKey.


hypoClassKey(cfg.getParameter<std::string>("hypoClassKey"))

The HypoClassKey enumerator is defined in the TtSemiLeptonicEvent and used for accessing the hypothesis classes. The only difference between the three clones of the HypothesisAnalyzer consists in the strings that are used for the hypoClassKey. As can be seen from the HypothesisAnalyzer_cff.py, the strings are "kGenMatch", "kMaxSumPtWMass" and "kGeom". This means, the same analyzer is run three times to study first the GenMatch hypothesis, then the hypothesis named MaxSumPtWMass and finally the pure geometric hypothesis.

After having read in the hypoClassKey, it is checked whether the corresponding hypothesis is available and valid in this event.


if( !semiLepEvt->isHypoValid(hypoClassKey) ){
  edm::LogInfo("HypothesisAnalyzer") << "Hypothesis not valid for this event";
  return;
}

A hypothesis is not available if the module that produces the hypothesis was not run.

A hypothesis is not valid if it is not available or if, for example, there is no appropriate lepton candidate in the event or if there are not enough jets. And the GenMatch hypothesis is not valid if the event was not generated in the semileptonic channel. So, depending on the event selection, you might get quite some events with hypotheses that are not valid.

The candidates for the hadronic W boson and the hadronic top quark are accessed from the hypotheses in the analyzer using the following two lines:


const reco::Candidate* hadTop = semiLepEvt->hadronicDecayTop(hypoClassKey);
const reco::Candidate* hadW   = semiLepEvt->hadronicDecayW  (hypoClassKey);

Similar getter functions could be used to access the other particles of the ttbar decay chain:

  • hadronicDecayTop(const HypoKey& key)
  • hadronicDecayB(const HypoKey& key)
  • hadronicDecayW(const HypoKey& key)
  • hadronicDecayQuark(const HypoKey& key)
  • hadronicDecayQuarkBar(const HypoKey& key)
  • leptonicDecayTop(const HypoKey& key)
  • leptonicDecayB(const HypoKey& key)
  • leptonicDecayW(const HypoKey& key)
  • singleNeutrino(const HypoKey& key)
  • singleLepton(const HypoKey& key)

In the HypothesisAnalyzer just the very basic properties pt(), eta() and mass() of the candidates for the hadronic W and the hadronic top are taken and filled into histograms.

In the next step, the corresponding generator particles are obtained:


const reco::GenParticle* genHadTop = semiLepEvt->hadronicDecayTop();
const reco::GenParticle* genHadW   = semiLepEvt->hadronicDecayW();

They are used in the analyzer to calculate the so called pulls in pT, pseudorapidity and mass, i.e. the difference between the reconstructed and the generated values divided by the generated value, which are also filled into histograms.

In the last step of the analyzer, a quality criterion of the GenMatch hypothesis is read from the TtSemiLeptonicEvent and used to fill histograms: The summed deltaR between the jets and the partons via semiLepEvt->genMatchSumDR().

The HypothesisAnalyzer as checked out from CVS will run on 1000 events. Please set the maxEvents parameter in the analyzeTopHypotheses_cfg.py to a larger number if you feel that you need higher statistics for the histograms.

If you then run the analyzer and afterwards inspect the results with the analyzeTopHypotheses.C root macro, you will see three canvases. The distributions for the hadronic W can be found on the first canvas and those for the hadronic top on the second. On the third canvas the deltaR of the GenMatch hypothesis is shown.

On the third canvas you will see that in most events the summed deltaR between the partons and the jets in the GenMatch hypothesis is smaller than 0.5 but that you also have quite some events with significantly larger values. To increase the purity of the GenMatch, you could use an outlier rejection and/or a different algorithm for the jet-parton matching. By including the following line in your config file, you choose the unambiguousOnly algorithm which only accepts unambiguous jet-parton pairs within cones of 0.3 in deltaR:

process.ttSemiLepJetPartonMatch.algorithm = "unambiguousOnly"

This increases the purity of the GenMatch but at the same time it reduces the efficiency, i.e. you will get less events with the GenMatch hypothesis.

Now have a look into analyzeTopHypotheses_cfg.py again. You will find some lines about b-tagging:


## use b-tagging to distinguish between light and b jets
process.ttSemiLepHypGeom.useBTagging = False
## choose algorithm for b-tagging
process.ttSemiLepHypGeom.bTagAlgorithm = "trackCountingHighEffBJetTags"
## minimum b discriminator value required for b jets
process.ttSemiLepHypGeom.minBDiscBJets     = 1.90
## maximum b discriminator value allowed for non-b jets
process.ttSemiLepHypGeom.maxBDiscLightJets = 3.99

There you can switch on the use of b-tagging and decide which b-tagging algorithm is to be used. If you set useBTagging = True only jets with a discriminator value above the minBDiscBJets value are taken into account for the b jet assignment and only jets with a discriminator value below the maxBDiscLightJets value are taken into account for the light jet assignment.

In addition you can specify the jet energy correction level:


process.ttSemiLepHypGeom.jetCorrectionLevel = "abs"

If you choose a flavour dependent jet energy correction the flavour is directly taken from the event hypothesis. As the hypotheses are not able to separate between uds and c jets, a weighted mixture of the uds correction factor and the c correction factor is applied to the light jets.

Results

Please switch on the use of b-tagging useBTagging = True, choose the hadron level correction process.ttSemiLepHypGeom.jetCorrectionLevel = "had" and rename the output file fileName = cms.string('analyzeTopHypothesisBTagHadCor.root'). Then run the HypothesisAnalyzer again:

cmsRun TopQuarkAnalysis/Examples/test/analyzeTopHypotheses_cfg.py

And compare the new results with the previous:

root -l -b -q TopQuarkAnalysis/Examples/test/analyzeTopHypothesesComparison.C+
gv analyzeTopHypothesesComparison.ps

You will finally get plots like these (for these plots the unambiguousOnly algorithm is not used):

canvasHadTopEta.png canvasHadTopMass.png canvasHadWEta.png canvasHadWMass.png canvasGenMatchQuali.png

How to get more information

  • The main TQAF TWiki page is the SWGuideTQAF.
  • Documentation of the classes TopGenEvent and TtSemiLeptonicEvent can be found in the SWGuideTQAFClasses.
  • Some more information on the MVA based jet-parton association as well as on the algorithms for the GenEvent based jet-parton matching can be found in the SWGuideTQAFLayer2.
  • The main documentation of the CMS.PhysicsTools MVA package is in the SWGuideMVAFramework.

Review status

Reviewer/Editor and Date (copy from screen) Comments
HolgerEnderle - 02 Dec 2009 Updated for the PAT Tutorial December 2009

Responsible: SebastianNaumann

Edit | Attach | Watch | Print version | History: r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r1 - 2010-03-10 - RogerWolf
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback