Contents

Introduction

This page was created for the PAT Tutorial March 2010. You can find the corresponding talk on the Indico agenda.

This tutorial will show you examples of usage for the Top Quark Analysis Framework (TQAF). For this purpose, the dileptonic ttbar channel with muons was chosen.

In the following you will learn:

  • how to produce an object of the TtFullLeptonicEvent class that contains two different event hypotheses plus generator information
  • how to access the candidates from the hypotheses, the generator particles and meta information from the TtFullLeptonicEvent
  • how to change the most important parameters of the hypotheses building algorithms
  • how to use b-tag information and different jet energy corrections for certain event hypotheses

Previous tutorials which concentrated on the single-muon channel

Here you can find the Twiki pages from previous tutorials: (June 2009, September 2009, December 2009).

How to get the code

For this tutorial we use CMSSW_3_5_4. You don't have to check out the whole TopQuarkAnalysisFramework but only the Examples which contain the files you need here and the latest tag of TopEventProducers to be able to use the last implemented features:

cmsrel CMSSW_3_5_4
cd CMSSW_3_5_4/src
cmsenv
cvs co -r V06-05-02 TopQuarkAnalysis/TopEventProducers
cvs co -r B35X_PatTutorial_March2010 TopQuarkAnalysis/Examples

How to run the code

Info This section introduces the most important files and commands for this tutorial. They will be described in detail in the subsequent section, in which you will also be asked again to run the code.

Don't forget to compile the code (maybe on more than one core, e.g. with four processes simultaneously, to make it faster):

scram b -j4

You can now run a first test:

cmsRun TopQuarkAnalysis/Examples/test/analyzeFullLepHypotheses_cfg.py

You will get a file named analyzeFullLepHypothesis.root which you can inspect using a root macro that is provided in the Examples package and writes the histograms into a postscript file:

root -l -b -q TopQuarkAnalysis/Examples/test/analyzeFullLepHypotheses.C
gv FullLepHypotheses.ps

In order to compare different results, re-configure the FullLepHypothesisAnalyzer, rename the output root file and run the FullLepHypothesisAnalyzer again. Then inspect the differences.

Find out more about the details

The FullLepHypothesisAnalzer runs on input files that are part of a PAT tuple created from a Madgraph ttbar sample. It is preselected on generator level and contains only dimuon events. In the configuration file

TopQuarkAnalysis/Examples/test/analyzeFullLepHypotheses_cfg.py
you see some inactive lines which are commented.


import FWCore.ParameterSet.Config as cms

process = cms.Process("TEST")

## configure message logger
process.load("FWCore.MessageLogger.MessageLogger_cfi")
process.MessageLogger.cerr.threshold = cms.untracked.string('INFO')
process.MessageLogger.cerr.FwkReport.reportEvery = 1 
process.MessageLogger.categories.append('TtFullLeptonicEvent')
process.MessageLogger.categories.append('BtagFilter')
process.MessageLogger.cerr.TtFullLeptonicEvent = cms.untracked.PSet(
    limit = cms.untracked.int32(-1)   
)

## define input
process.source = cms.Source("PoolSource",
    fileNames = cms.untracked.vstring(
      'rfio:/castor/cern.ch/user/d/dammann/pattutorial/ttbar_dimuon_pat.root'
    )                           
)

## define maximal number of events to loop over
process.maxEvents = cms.untracked.PSet(
    input = cms.untracked.int32(100)
)

## configure process options
process.options = cms.untracked.PSet(
    wantSummary = cms.untracked.bool(True)
)

## filter sequence
process.load("TopQuarkAnalysis.Examples.FullLeptonicSelection_cff")

## sequences for ttGenEvent and TtFullLeptonicEvent
process.load("TopQuarkAnalysis.TopEventProducers.sequences.ttGenEvent_cff")
process.load("TopQuarkAnalysis.TopEventProducers.sequences.ttFullLepEvtBuilder_cff")
## enable additional per-event printout from the TtFullLeptonicEvent
process.ttFullLepEvent.verbosity = 0

## helper functions to change parameters of event reconstruction
## (can be used from tag v-6-05-02 upwards
from TopQuarkAnalysis.TopEventProducers.sequences.ttFullLepEvtBuilder_cff import *

## change maximum number of jets taken into account per event (default: 4)
#setForAllTtFullLepHypotheses(process,"maxNJets",4)

## change input muon collection
#setForAllTtFullLepHypotheses(process,"muons","myMuons")

## change input jet collection
#setForAllTtFullLepHypotheses(process,"jets","myJets")

## change jet correction level
#setForAllTtFullLepHypotheses(process,"jetCorrectionLevel","part")


## filter for b-tagging
process.load("TopQuarkAnalysis.Examples.BtagFilter_cfi")
process.filterBtag.jets = cms.InputTag("myJets")

## load HypothesisAnalyzer
process.load("TopQuarkAnalysis.Examples.FullLepHypothesisAnalyzer_cff")

# register TFileService
process.TFileService = cms.Service("TFileService",
    fileName = cms.string('analyzeFullLepHypotheses.root')
)

## end path   
process.path = cms.Path(#process.fullLeptonicSelection *
                        process.makeGenEvt *
                        process.makeTtFullLepEvent *
			#process.filterBtag *
                        process.analyzeHypotheses)

If you look into the path


process.path = cms.Path(#process.fullLeptonicSelection *
                        process.makeGenEvt *
                        process.makeTtFullLepEvent *
			#process.filterBtag *
                        process.analyzeHypotheses)

you see that the fullLeptonicSelection and filterBtag are commented. They will be explained later. First the already included modules are described.

The TtGenEvent

The sequence makeGenEvt is used to produce the TtGenEvent, a container class that keeps the genParticles of the ttbar decay chain and additionally provides some meta information obtained from the Monte Carlo generator information for each event. This sequence is defined in:

TopQuarkAnalysis/TopEventProducers/python/sequences/ttGenEvent_cff.py

The TtFullLeptonicEvent

The sequence makeTtFullLepEvt produces the TtFullLeptonicEvent, a container class that provides twodifferent event hypotheses for dileptonic ttbar events plus some meta information. The main differences between the hypotheses consist in the algorithms used to derive the jet-parton association within the events. The makeTtFullLepEvt sequence is defined in:

TopQuarkAnalysis/TopEventProducers/pyhton/sequences/ttFullLepEvtBuilder_cff.py

If you change the value in the line

process.ttFullLepEvent.verbosity = 0

to 1 and run the analyzeFullLepHypotheses_cfg.py, as described above, you will get a printout like the following for each event:

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 TtGenEvent says: Fully Leptonic TtBar, Muon-Muon-Channel
 Number of available event hypothesis classes: 2
 - JetLepComb:   b     bbar   e1(+)  e2(-)  mu1(+) mu2(-)
------------------------------------------------------------
 GenMatch-Hypothesis:
 * JetLepComb:   1      0      -1      -1      0      1
 * Sum(DeltaR) : 1.64556
 * Sum(DeltaPt): 61.0049
------------------------------------------------------------
 KinSolution-Hypothesis:
 * JetLepComb:   0      1      -1      -1      0      1
 * Weight      : 0.925721
 * isWrongCharge: 0
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

You can see that both existing algorithms the TtFullLeptonicEvent were run and you can read from the tabular printout which jet from the jet collection was assigned to be the b-jet and which one the bbar-jet. In this event example the GenMatch hypothesis matches the first jet in the collection with the index 0 to the bbar and the second leading jet with index 1 to the b-quark while the KinSolution hypothesis reconstructs the jets vice versa.

The next to numbers which are -1 in both hypotheses give the indices of the electrons used in the event reconstruction where -1 means that no electrons have been included in the hypothesis. Since we use a dimuon sample and reconstruct only dimuon events it is clear that the electron indices should be -1.

The last indices are those of the positive and the negative muon. In most cases the numbers should be the same for both hypotheses. In most cases the leading two muons are from the ttbar decay chain and the KinSolution hypothesis only takes these two muons into account.

Also you see for each hypothesis further information. For the GenMatch hypothesis the sum of the deltaR between generated partons and matched jets and deltaR between generated and reconstructed muons are shown. This quantity can be used as an indicator of the quality of the hypothesis. The same is true for Sum(DeltaPt) which is the same with deltaPt instead of deltaR.

For the KinSolution you see the weight of the best kinematic solution found in the event. It is a number between zero and one (or -1 if no solution has been found). The boolean isWrongCharge gives information if both leptons in the hypothesis have the same charge. Because by default only events with oppositely charged leptons are reconstructed this should always be zero.

The FullLepHypothesisAnalyzer

Finally, the sequence analyzeHypotheses invokes the FullLepHypothesisAnalyzer. If you have a look at the file

TopQuarkAnalysis/Examples/python/FullLepHypothesisAnalyzer_cff.py

you see that actually two different clones of the corresponding module are run with this sequence.


import FWCore.ParameterSet.Config as cms

#
# make simple analysis plots for a comparison
# between a kinematic reconstruction and a
# gen match event hypothesis
#

# initialize analyzers
from TopQuarkAnalysis.Examples.FullLepHypothesisAnalyzer_cfi import *
analyzeGenMatch    = analyzeFullLepHypothesis.clone()
analyzeKinSolution = analyzeFullLepHypothesis.clone()


# configure analyzers
analyzeGenMatch.hypoClassKey    = "ttFullLepHypGenMatch:Key"
analyzeKinSolution.hypoClassKey = "ttFullLepHypKinSolution:Key"

# define sequence
analyzeHypotheses = cms.Sequence(analyzeGenMatch *
                                 analyzeKinSolution)

Let's go through the FullLepHypothesisAnalyzer now. The relevant files are:

TopQuarkAnalysis/Examples/plugins/FullLepHypothesisAnalyzer.h
TopQuarkAnalysis/Examples/plugins/FullLepHypothesisAnalyzer.cc
TopQuarkAnalysis/Examples/python/FullLepHypothesisAnalyzer_cfi.py

You can see in the source file how the TtFullLeptonicEvent is read as a product from the edm::Event and made available to the analyzer.


fullLepEvt_ (cfg.getParameter<edm::InputTag>("fullLepEvent"))

edm::Handle<TtFullLeptonicEvent> fullLepEvt;
event.getByLabel(fullLepEvt_, fullLepEvt);

The member fullLepEvent_ just holds the corresponding InputTag which is "ttFullLepEvent" as defined in the cfi file.

A second product is read from the event: A simple string which is internally converted into the type of TtFullLeptonicEvent::HypoClassKey.


hypoClassKey(cfg.getParameter<std::string>("hypoClassKey"))

The HypoClassKey enumerator is defined in the TtFullLeptonicEvent and used for accessing the hypothesis classes. The only difference between the two clones of the FullLepHypothesisAnalyzer consists in the input tags that are used for the hypoClassKey. As can be seen from the FullLepHypothesisAnalyzer_cff.py, the first input tag belongs to the "GenMatch" and the second to the "KinSolution" hypothesis. This means, the same analyzer is run two times to study first the GenMatch hypothesis and then the hypothesis named !KinSolution.

After having read in the hypoClassKey, it is checked whether the corresponding hypothesis is available and valid in this event.


if( !fullLepEvt->isHypoAvailable(hypoClassKey) ){
  edm::LogInfo("FullLepHypothesisAnalyzer") << "Hypothesis not available for this event";
  return;
}

if( !fullLepEvt->isHypoValid(hypoClassKey) ){
  edm::LogInfo("FullLepHypothesisAnalyzer") << "Hypothesis not valid for this event";
  return;
}

A hypothesis is not available if the module that produces the hypothesis was not run.

A hypothesis is not valid if it is not available or if, for example, there is no appropriate lepton candidate in the event or if there are not enough jets. And the GenMatch hypothesis is not valid if the event was not generated in the dileptonic channel. So, depending on the event selection, you might get quite some events with hypotheses that are not valid.

The candidates for the top quarks and W-bosons are accessed from the hypotheses in the analyzer using the following lines:


const reco::Candidate* top    = fullLepEvt->top(hypoClassKey);
 const reco::Candidate* topBar = fullLepEvt->topBar(hypoClassKey);
 
 const reco::Candidate* wPlus  = fullLepEvt->wPlus(hypoClassKey);
 const reco::Candidate* wMinus = fullLepEvt->wMinus(hypoClassKe

Similar getter functions could be used to access the other particles of the ttbar decay chain:

  • top(const HypoKey& key)
  • b(const HypoKey& key)
  • wPlusW(const HypoKey& key)
  • leptonBar(const HypoKey& key)
  • neutrino(const HypoKey& key)
  • topBar(const HypoKey& key)
  • bBar(const HypoKey& key)
  • wMinus(const HypoKey& key)
  • lepton(const HypoKey& key)
  • neutrinoBar(const HypoKey& key)

In the FullLepHypothesisAnalyzer just the very basic properties pt(), eta() and mass() of the candidates for the top quarks and W-bosons are taken and filled into histograms.

In the next step, the corresponding generator particles of the top and the W^+ are obtained:


const reco::GenParticle* genTop   = fullLepEvt->genTop();
const reco::GenParticle* genWPlus = fullLepEvt->genWPlus();

They are used in the analyzer to calculate the so called pulls in pT, pseudorapidity and mass, i.e. the difference between the reconstructed and the generated values divided by the generated value, which are also filled into histograms.

After that the two-dimensional energy spectrums of the neutrinos is plottet for as well the reconstructed as the generated neutrinos

In the last step of the analyzer some histograms about the quality histograms are filled: for both hypotheses the difference between top and anti-top mass is plotted. The second histogram is for the weight of the kinematic event solution. For the GenMatch hypothesis the summed deltaR between the jets and the partons plus the deltaR between the generated leptons and leptons used for the hypothesis is the criterion for the quality.

The FullHypothesisAnalyzer as checked out from CVS will run on 100 events. Please set the maxEvents parameter in the analyzeTopHypotheses_cfg.py to a larger number if you feel that you need higher statistics for the histograms. The sample contains more than 3000 events.

If you then run the analyzer and afterwards inspect the results with the analyzeTopHypotheses.C root macro, you will see five pages of output.

Now have a look into analyzeFullLepHypotheses_cfg.py again. We now come to the commented sequence fullLeptonicSelection in the path. Also in the config you find the lines


## filter sequence
process.load("TopQuarkAnalysis.Examples.FullLeptonicSelection_cff

In this file the filter sequence is defined:

import FWCore.ParameterSet.Config as cms


## selectors to build new collection of objects fullfilling given cuts
from CMS.PhysicsTools.PatAlgos.selectionLayer1.muonSelector_cfi import *
from CMS.PhysicsTools.PatAlgos.selectionLayer1.jetSelector_cfi import *

## muon selection
myMuons = selectedPatMuons.clone(src = 'selectedPatMuons', 
                                    cut = 'pt > 20. & abs(eta) < 2.4' )
## jet selection
myJets = selectedPatJets.clone(src = 'selectedPatJets', 
                                  cut = 'pt > 30. & abs(eta) < 2.4' )



## count filters to cut on the number of objects in a given collection
from CMS.PhysicsTools.PatAlgos.selectionLayer1.muonCountFilter_cfi import *
from CMS.PhysicsTools.PatAlgos.selectionLayer1.jetCountFilter_cfi import *

## at least two muons
muonSelection  = countPatMuons.clone(src = 'myMuons', minNumber = 2)

## at least two jets
jetSelection  = countPatJets.clone(src = 'myJets', minNumber = 2)


## define filter sequences
buildCollections = cms.Sequence(myMuons * myJets)

filterObjects = cms.Sequence(muonSelection * jetSelection)

fullLeptonicSelection = cms.Sequence(buildCollections * filterObjects)

You see two PAT object selectors which are imported from muonSelector_cfi and jetSelector_cfi which read in the default collection selectedPatMuons and selectedPatJets and apply cuts on them. The muons resp. jets that fullfill this cuts are written into a new collection myMuons resp myJets.

Further you see two count filters where the first demands that an event has to contain at least two myMuons. The second count filter does the same for jets.

So as result the sequence fullLeptonicSelection filters on events that contain at least two muons with Pt > 20GeV and |eta| <2.4 and two jets with Pt>30GeV and |eta|<2.4.

Of course you apply such a selection with the idea to use only these objects from the new collections to build event hypotheses. To achieve this the config file contains the commented lines


## change input muon collection
#setForAllTtFullLepHypotheses(process,"muons","myMuons")

## change input jet collection
#setForAllTtFullLepHypotheses(process,"jets","myJets")

By uncommenting them you achieve that the input collections used by both event hypotheses are switched from their default values selectedPatMuons, selectedPatJets to the new collections myMuons and myJets.

There are two other commented helper functions


## change maximum number of jets taken into account per event (default: 4)
#setForAllTtFullLepHypotheses(process,"maxNJets",4)

## change jet correction level
#setForAllTtFullLepHypotheses(process,"jetCorrectionLevel","part")

to change the number of the jets taken into account by the hypothesis algorithms and two change the level of the jet energy correction.

As the last point a b-tagging selection can be applied. A simple b-tagging filter is implemented in the examples package. You find the code in

TopQuarkAnalysis/Examples/python/BtagFilter_cfi.py
TopQuarkAnalysis/Examples/plugins/BtagFilter.cc
TopQuarkAnalysis/Examples/plugins/BtagFilter.cc
In the cfi-file you see that the track counting algorithm trackCountingHighEffBJetTags is used. The values for the discriminator cuts are defined in the line
 bDiscriminator = cms.vdouble(2.,2.)
If only one argument is given only one b-tagged jet is demanded in the event.

The code itself gives an example how to access the original PAT objects from a given hypothesis. The particles stored in the FullLeptonicEvent are reco::Candidates and not PAT objects so the indices you have already seen in the event printout are used to access the original PATobjects as seen in the code of the BtagFilter:


int bJetIdx    = FullLepEvt->jetLeptonCombination(hypoKey)[0];
  int bBarJetIdx = FullLepEvt->jetLeptonCombination(hypoKey)[1];  

  const pat::Jet B    = jets->at(bJetIdx);   
  const pat::Jet BBar = jets->at(bBarJetIdx);   

From this pat::Jets you can directly access the b-tagging discriminator using
jet.bDiscriminator(const string algorithm);
as seen in the code. To use the same jet collection in the filter as in the hypotheses reconstruction is paramount. You see that the BtagFilter is included in our configuration file with the lines

## filter for b-tagging
process.load("TopQuarkAnalysis.Examples.BtagFilter_cfi")
process.filterBtag.jets = cms.InputTag("myJets")

where the input jet collection is already set to our personal collection myJets. You can activate the b-tagging by simply uncommenting the line
#process.filterBtag *
in your path.

Results

In this tutorial you have learned how to run the FullLeptonicEvent hypotheses and how two modify some of its most important parameters like e.g. input collections. Further the analysis path was combined with a PAT based object selection. With the indices stored in the event hypotheses you can get the original PAT objects and use their methods.

How to get more information

  • The main TQAF TWiki page is the SWGuideTQAF.
  • Documentation of the classes TopGenEvent and TtSemiLeptonicEvent can be found in the SWGuideTQAFClasses.

Review status

Reviewer/Editor and Date (copy from screen) Comments
DirkDammann - 11 Mar 2010 Updated for the PAT Tutorial March 2010

Responsible: DirkDammann

Edit | Attach | Watch | Print version | History: r5 < r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r5 - 2010-03-12 - DirkDammann
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback