TQAF Tutorial
Complete:
Introduction
This tutorial has not been fully adapted to the structural changes that came along with CMSSW_2_2_10 and PAT_v2!
This section contains various examples on how to use TQAF Layer1 and 2 objects for analyses within the full framework, FWLite or SK. They are simple and focussed on basic principles such as access to relevant information for analyses creating histograms with the
TFileService, aso. All examples should run out of the box within CMSSW_2_2_X as indicated on the
TQAF Installation page unless stated otherwise. In case of problems check the sanity of your installation as indicated on the
TQAF Sanity Tests and if the problem persists send a mail to the
Top PAG Hypernews or to the indicated contact person with a detailed description.
Analyze TQAF Layer1 Objects
This section contains detailed examples on how to analyze TQAF Layer 1 objects within the full framework or FWLite. Within the full framework the corresponding layers can be produced on the fly (as indicated in the corresponding cfg files of each example) or in advance. For an example on how to produce TQAF Layer1 or 2 objects from fullsim AOD or from scratch with Fastsim have a look to the
TQAF Examples. In this section you will learn:
- how to use the TFileService
- how to access TQAF Layer 1 object information
- how to produce TQAF Layer 1 objects on the fly
- how to run an EDAnalyzer to analyze TQAF Layer 1 objects
- how to run a FWLite executable to analyze TQAF Layer 1 objects
Full Framework
Contact person
Maryam Zeinali
The EDAnalyzer class or any other class that inherits from it contains three member functions:
beginJob,
analyze,
endJob. In
beginJob you can open a root file to which the results of the analysis will be saved to. All data analysis will be performed in the
analyze function. Final operations (e.g. like histogram normalization) may be performed in the
endJob function. For more details on how to make an analysis within the full framework can be found
here in the Workbook. For the following examples TFileService was used for histogram management. For more information on it have a look
here. The source code of all examples can be found in the
Examples
package. To edit and to play with it, check it out from the cvs and compile. In the
Example/test directories you find the following cfg files:
- analyzeTopElectron_cfg.py to analyze electrons
- analyzeTopMuon_cfg.py to analyze muons
- analyzeTopJet_cfg.py to analyze jets
Opening the file analyzeTopMuon_cfg.py with your favorite editor it should look like this:
Failed to include URL http://cmssw.cvs.cern.ch/cgi-bin/cmssw.cgi/CMSSW/TopQuarkAnalysis/Examples/test/analyzeTopMuon_cfg.py?content-type=text%2Fplain&view=co Can't connect to cmssw.cvs.cern.ch:80 (Bad hostname)
The number of events to loop over is chosen to be 100. To run over all events in the input you can set it to -1. In the next block you find the command lines to produce the TQAF Layer 1 on the fly from fullsim AOD. In case the TQAF Layer 1 was already produced you could comment these lines. Next the TFileService is registered, defining the name of the output file. Finally the analysis module is initialized including TopMuonAnalyzer_cfi.py from the package. The TopMuonAnalyzer_cfi.py file can be found in the
python directory. It should look like this:
Failed to include URL http://cmssw.cvs.cern.ch/cgi-bin/cmssw.cgi/CMSSW/TopQuarkAnalysis/Examples/python/TopMuonAnalyzer_cfi.py?content-type=text%2Fplain&view=co Can't connect to cmssw.cvs.cern.ch:80 (Bad hostname)
Each python config file has to start with the lines:
import FWCore.ParameterSet.Config as cms
The following block defines the module of name "TopMuonAnalyzer" in
cmsRun,
analyzeMuon is the name of the module which will be used throughout the cfg files and
TopMuonAnalyzer the declaration of the plugin. Default values to initialize the module are also provided within the module declaration. As you see the module expects an electron and a muon collection as input. For your fun you can change the inputs to
allLayer1Muons and
allLayer1Electrons. For the available collections and their configurations have a look at the TQAF Layer 1 explanations
here. To declare the number and order of modules to be called the
analyzeMuon module is finally announced in path
p1. Several paths may be declared within
cmsRun which are run independently each.
The implementation of the class you find in the
plugins directory of the package. There you find the files:
- TopMuonAnalyzer.h
- TopMuonAnalyzer.cc
implementing a simple EDAnlayzer class. You can change and extend the implementation according to your needs. The plugin declaration (to allow
cmsRun to find the class during runtime) is done in the
SealModule.cc file in the same directory via the line:
DEFINE_FWK_MODULE(TopMuonAnalyzer);
Be aware that the argument in the macro has to corredspond to the class name in TopMuonAnalyzer_cfi.py. To run the example invoke:
cmsRun TopQuarkAnalysis/Examples/test/analyzeTopMuon_cfg.py
from your working directory. In case you changed the class implementation don't forget to re-compile the package before. After completion you will find a root file named
analyzeTopMuon.root in your working directory containing a directory
analyzeMuon (corresponding to the name of the module in the corresponding cfi file) which contains the histograms that were booked and filled in the EDAnalyzer. The directory is created by the TFileService to distinguish histograms (or trees) from different modules in the same root file. You can find similar examples for the analysis of electrons, taus and jets in the package.
FWLite
Contact person
Roger Wolf
Similar examples as for the use within the full framework exist for the use of
FWLite. You can find the implementation in the
bin
directory of the package. The executables provide exactely the same file structure and histograms as the EDAnalyzer example within the full framework does. Having a look at one of them in your favorite editor it should look like this:
Failed to include URL http://cmssw.cvs.cern.ch/cgi-bin/cmssw.cgi/CMSSW/TopQuarkAnalysis/Examples/bin/TopMuonFWLiteAnalyzer.cc?view=co Can't connect to cmssw.cvs.cern.ch:80 (Bad hostname)
You find five sections for:
- booking of the histograms,
- opening of the file and getting access to the branches of interest,
- the event loop; this would correspond to the analyze function within the full framework
- saving of histograms,
- freeing of the allocated space of the booked histograms
As you see from the file opening section both the root input file and the process that created the file are expected to be given to the executable as inputs. The executable needs the TQAF Layer 1 objects to be produced in advance. In order to run the example do the following:
cmsRun TopQuarkAnalysis/TopObjectProducers/test/tqafLayer1_fromAOD_full_cfg.py
This will produce a
TQAFLayer1_Output.fromAOD_full.root file with the process TEST. You may increase the number of events in the loop if you like. Having a look to the file with the root TBrowser gives you an idea of what branches are kept by the TQAF Layer1 standard sequence:
Check to get the right branch name and run the executable:
TopMuonFWLiteAnalyzer TQAFLayer1_Output.fromAOD_full.root TEST
open file: TQAFLayer1_Output.fromAOD_full.root
start looping 100 events...
processing event: 10
processing event: 20
processing event: 30
...
close file
As for the full framework example you will find a file
analyzeMuon.root with the same file structure and histograms as explained above. You can find similar examples for the analysis of electrons and jets in the
bin directory of the package.
Analyze TQAF Layer2 Objects
This section contains a detailed example on how to exploit the
TtSemiLeptonicEvent
as an example of a TQAF Layer 2 object within the full framework or FWLite. Within the full framework the layer 1 and 2 can be produced on the fly (as indicated in the corresponding cfg file of the example) or in advance. For an example how to produce TQAF Layer1 and 2 objects from fullsim AOD or from scratch with Fastsim have a look to the
TQAF Examples. In this section you will learn:
- how to clone modules
- how to use the TtSemiLeptonicEvent as an example of a TQAF Layer 2 object
- how to implement your own event hypothesis as part of the TtSemiLeptonicEvent
- how to run an EDAnalyzer to access and analyze the TtSemiLeptonicEvent
- how to run a FWLite executable to access and analyze the TtSemiLeptonicEvent
Full Framework
Contact person
Roger Wolf
Event hypotheses are based on the principle of the
CompositeCandidate. They allow a flexible and intuitive combination of base
Candidates for combinatorial analyses, which is fully determined by the user and does not even have to reflect the topology of ttbar decays. They are well supported and their design is regularly adapted to the needs and use cases within analyses. Their structure, implementation and application are well separated within TQAF and allow an arbitarily large number of different implementations without structural losses. The user is free to produce one or more favorite hypothesis implementations with different (user defined) steerable settings, like different working points of a MVA discriminant, or the best and second best solution of a kinematic fit. For the semi-leptonic ttbar decay channel all hypotheses are stored in a dedicated event class called
TtSemiLeptonicEvent, which can be made persistent and used within FWLite. The first section of this example will show how to use and read out relevant information of already implemented event hypotheses. The second part will show how to implement your own event hypothesis in the case of semi-leptonic ttbar events.
Existing Implementations
An example file how to produce and analyze event hypotheses on the fly within TQAF is located in the
test
directory of the
Examples package. There you will find the following lines:
Failed to include URL http://cmssw.cvs.cern.ch/cgi-bin/cmssw.cgi/CMSSW/TopQuarkAnalysis/Examples/test/analyzeTopHypotheses_cfg.py?content-type=text%2Fplain&view=co Can't connect to cmssw.cvs.cern.ch:80 (Bad hostname)
Apart from its general steering the process is organized in three major blocks:
- In the first block the tqafLayer1 objects, which are the input for the event hypotheses are produced in the standard sequence.
- In the second block the TtSemiLeptonicEvent structure, which holds the event hypotheses is filled. All shown sequences are part the tqafLayer2 standard sequence.
- In the third block the analysis of two different event hypotheses is steered.
Having a look to the HypothesisAnalyzer_cff.py file you will find the following lines:
Failed to include URL http://cmssw.cvs.cern.ch/cgi-bin/cmssw.cgi/CMSSW/TopQuarkAnalysis/Examples/python/HypothesisAnalyzer_cff.py?content-type=text%2Fplain&view=co Can't connect to cmssw.cvs.cern.ch:80 (Bad hostname)
In the first part the file analyzeHypothesis module is cloned and configured to analyze the two different event hypothesis MaxSumPtWMass
and GenMatch. The hypothesis of choice is steered from
keys, which are stored in the event with each hypothesis in the production chain. Currently available keys are listed below:
Available Hypothesis |
Key |
Description |
KinFit |
kKinFit |
based on a kinematic fit on an event by event basis |
MVADisc |
kMVADisc |
based on a fully configurable MVA discriminator |
GenMatch |
kGenMatch |
based on MC generator information |
MaxSumPtWMass |
kMaxSumPtWMass |
based on a simple algorithm |
WMassMaxSumPt |
kWMassMaxSumPt |
based on a simple algorithm |
Geom |
kGeom |
based on a simple algorithm |
Feel free to investigate any of these hypotheses with this analyzer. If you want to compare more than two of them just extend the number of clones in the HypothesisAnalyzer_cff.py file. You should take care though that especially complex hypothesis types like those based on MVA or fit methods should
not be run blindly. They will deliver results, but definitely need training, parameter tuning and monitoring. This is why they were not chosen for this example. Be aware that the
TtSemiLepJetCombMVAFileSource service for the MVA discriminant computation only points to a dummy
mva training file in order to get the TQAF Layer 2 standard sequence running without complaints. To arrive at reasonable results using the MVA discriminant method you will have to provide and understand a well defined mva training file. Please consult the
SWGuideTQAFLayer2 to learn how to do it. To run the example invoke:
cmsRun TopQuarkAnalysis/Examples/test/analyzeTopHypotheses_cfg.py
which will produce a file
analyzeHypotheses.root. In there you will find the structure defined by the TFileService and a few histograms to exploit the hypotheses. These histograms have been booked and filled in the corresponding
plugin
directory of the package.
Implementing New Hypotheses
All event hypotheses are produced in the
TopJetCombination
package. For the semi-leptonic ttbar decay channel the abstract
TtSemiLepHypothesis
base class defines the interface to the
TtSemiLeptonicEvent container class and the way to build the
CompositeCandidate from base
Candidates. Each concrete implementation is obliged to inherit from this class and to provide the following two functions:
// -----------------------------------------
// implemet the following two functions
// for a concrete event hypothesis
// -----------------------------------------
/// build the event hypothesis key
virtual void buildKey() = 0;
/// build event hypothesis from the reco objects of a semi-leptonic event
virtual void buildHypo(edm::Event& event,
const edm::Handle<edm::View<reco::RecoCandidate> >& lepton,
const edm::Handle<std::vector<pat::MET> >& neutrino,
const edm::Handle<std::vector<pat::Jet> >& jets,
std::vector<int>& jetPartonAssociation,
const unsigned int iComb) = 0;
The algorithmic implementation is performed in the
buildHypo function, in which the explicit choice of the basic
Candidates is taken. This can be based on generator information, information based on a kinematic fit, special algorithms or mva methods. It is free to take any additional steering parameter for the algorithm (like the number of considered jets). The
buildKey function associates the implementation to an enumerator in the
TtSemiLeptonicEvent class. When implementing new event hypotheses please stick to the following guidelines and rules:
- The implementation derives from TtSemiLepHypothesis.
- Each hypothesis is identified by a key which is public to the TtSemiLeptonicEvent container class.
- The implementation resides in the plugins directory of the TopJetCombination package.
- Necessary cfi and cff files reside in the data directory of the TopJetCombination package.
- The hypothesis is made known to the TtSemiLepEventBuilder, which implements the TtSemiLeptonicEvent class in the files TtSemiLepEvtBuilder_cfi.py
and ttSemiLepEvtBuilder_cff.py
.
Following these guidlines you are only left with the physics concerns of your algorithm. All hypotheses are collected in the
TtSemiLeptonicEvent class, which hosts the following information:
- The initial partons of the ttbar production process (if available).
- The decay branches of the ttbar event on generator level (if available).
- All hypotheses chosen by trhe user (all per default).
- All relevant meta information to judge and interpret these hypotheses.
Be aware that the four vectors of status 3 particles, represent the kinematics before parton showering and therefore are un-physical. The four vectors of all decay components (including the two top quarks) have therefore been recaulated! The user may exclude particular hypotheses from the
TtSemiLeptonicEvent class by editing the files
TtSemiLepEvtBuilder_cfi.py and
ttSemiLepEvtBuilder_cff.py in the
python directory of the
TopEventProducers
package correspondingly.
FWLite
Contact person
Roger Wolf
A similar example as for the use within the full framework exist for the use of
FWLite. You can find the implementation in the
bin
directory of the package. The executable provides exactely the same file structure and histograms as the EDAnalyzer example within the full framework does for a single hypothesis. The feature of cloned EDAnalyzers as described above is not mirrored into the binary though. Having a look at it in your favorite editor it should look like this:
Failed to include URL http://cmssw.cvs.cern.ch/cgi-bin/cmssw.cgi/CMSSW/TopQuarkAnalysis/Examples/bin/TopHypothesisFWLiteAnalyzer.cc?view=co Can't connect to cmssw.cvs.cern.ch:80 (Bad hostname)
As you see from the file opening section the following inputs are required:
- the root input file
- the process that created the file
- The hypothesis key.
In case of wrong inputs printouts will clarify what inputs are expected. The executable needs the TQAF objects to be produced in advance. In order to run the example do the following:
cmsRun TopQuarkAnalysis/TopEventProducers/test/tqafFromAOD_full_cfg.py
This will produce a
tqafOutput.fromAOD_full.root file with the process TQAF. You may increase the number of events in the loop if you like. Having a look to the file with the root TBrowser gives you an idea of what branches are kept by the TQAF standard sequence. Check to get the right branch name and run the executable:
TopHypothesisFWLiteAnalyzer tqafOutput.fromAOD_full.root TQAF kMaxSumPtWMass
open file: tqafOutput.fromAOD_full.root
start looping 100 events...
...
close file
Instead of
kMaxSumPtWMass
you can also try one of the other hypothesis keys. You might get
Hypothesis not valid for this event
messages during the event loop, which typically come from events that did not fulfil certain criteria of the hypothesis (e.g. a minimum number of jets). As for the full framework example you will find a file
analyzeHypotheses.root with the same file structure and histograms as explained above.