TQAF Tutorial

Complete: 3

Introduction

Warning, important This tutorial has not been fully adapted to the structural changes that came along with CMSSW_2_2_10 and PAT_v2!

This section contains various examples on how to use TQAF Layer1 and 2 objects for analyses within the full framework, FWLite or SK. They are simple and focussed on basic principles such as access to relevant information for analyses creating histograms with the TFileService, aso. All examples should run out of the box within CMSSW_2_2_X as indicated on the TQAF Installation page unless stated otherwise. In case of problems check the sanity of your installation as indicated on the TQAF Sanity Tests and if the problem persists send a mail to the Top PAG Hypernews or to the indicated contact person with a detailed description.

Analyze TQAF Layer1 Objects

This section contains detailed examples on how to analyze TQAF Layer 1 objects within the full framework or FWLite. Within the full framework the corresponding layers can be produced on the fly (as indicated in the corresponding cfg files of each example) or in advance. For an example on how to produce TQAF Layer1 or 2 objects from fullsim AOD or from scratch with Fastsim have a look to the TQAF Examples. In this section you will learn:

  • how to use the TFileService
  • how to access TQAF Layer 1 object information
  • how to produce TQAF Layer 1 objects on the fly
  • how to run an EDAnalyzer to analyze TQAF Layer 1 objects
  • how to run a FWLite executable to analyze TQAF Layer 1 objects

Full Framework

Contact person Maryam Zeinali

The EDAnalyzer class or any other class that inherits from it contains three member functions: beginJob, analyze, endJob. In beginJob you can open a root file to which the results of the analysis will be saved to. All data analysis will be performed in the analyze function. Final operations (e.g. like histogram normalization) may be performed in the endJob function. For more details on how to make an analysis within the full framework can be found here in the Workbook. For the following examples TFileService was used for histogram management. For more information on it have a look here. The source code of all examples can be found in the Examples package. To edit and to play with it, check it out from the cvs and compile. In the Example/test directories you find the following cfg files:

  • analyzeTopElectron_cfg.py to analyze electrons
  • analyzeTopMuon_cfg.py to analyze muons
  • analyzeTopJet_cfg.py to analyze jets

Opening the file analyzeTopMuon_cfg.py with your favorite editor it should look like this:



      Failed to include URL http://cmssw.cvs.cern.ch/cgi-bin/cmssw.cgi/CMSSW/TopQuarkAnalysis/Examples/test/analyzeTopMuon_cfg.py?content-type=text%2Fplain&view=co Can't connect to cmssw.cvs.cern.ch:80 (Bad hostname)


The number of events to loop over is chosen to be 100. To run over all events in the input you can set it to -1. In the next block you find the command lines to produce the TQAF Layer 1 on the fly from fullsim AOD. In case the TQAF Layer 1 was already produced you could comment these lines. Next the TFileService is registered, defining the name of the output file. Finally the analysis module is initialized including TopMuonAnalyzer_cfi.py from the package. The TopMuonAnalyzer_cfi.py file can be found in the python directory. It should look like this:



      Failed to include URL http://cmssw.cvs.cern.ch/cgi-bin/cmssw.cgi/CMSSW/TopQuarkAnalysis/Examples/python/TopMuonAnalyzer_cfi.py?content-type=text%2Fplain&view=co Can't connect to cmssw.cvs.cern.ch:80 (Bad hostname)


Each python config file has to start with the lines:

import FWCore.ParameterSet.Config as cms

The following block defines the module of name "TopMuonAnalyzer" in cmsRun, analyzeMuon is the name of the module which will be used throughout the cfg files and TopMuonAnalyzer the declaration of the plugin. Default values to initialize the module are also provided within the module declaration. As you see the module expects an electron and a muon collection as input. For your fun you can change the inputs to allLayer1Muons and allLayer1Electrons. For the available collections and their configurations have a look at the TQAF Layer 1 explanations here. To declare the number and order of modules to be called the analyzeMuon module is finally announced in path p1. Several paths may be declared within cmsRun which are run independently each.

The implementation of the class you find in the plugins directory of the package. There you find the files:

  • TopMuonAnalyzer.h
  • TopMuonAnalyzer.cc

implementing a simple EDAnlayzer class. You can change and extend the implementation according to your needs. The plugin declaration (to allow cmsRun to find the class during runtime) is done in the SealModule.cc file in the same directory via the line:

DEFINE_FWK_MODULE(TopMuonAnalyzer);

Be aware that the argument in the macro has to corredspond to the class name in TopMuonAnalyzer_cfi.py. To run the example invoke:

cmsRun TopQuarkAnalysis/Examples/test/analyzeTopMuon_cfg.py

from your working directory. In case you changed the class implementation don't forget to re-compile the package before. After completion you will find a root file named analyzeTopMuon.root in your working directory containing a directory analyzeMuon (corresponding to the name of the module in the corresponding cfi file) which contains the histograms that were booked and filled in the EDAnalyzer. The directory is created by the TFileService to distinguish histograms (or trees) from different modules in the same root file. You can find similar examples for the analysis of electrons, taus and jets in the package.

FWLite

Contact person Roger Wolf

Similar examples as for the use within the full framework exist for the use of FWLite. You can find the implementation in the bin directory of the package. The executables provide exactely the same file structure and histograms as the EDAnalyzer example within the full framework does. Having a look at one of them in your favorite editor it should look like this:



      Failed to include URL http://cmssw.cvs.cern.ch/cgi-bin/cmssw.cgi/CMSSW/TopQuarkAnalysis/Examples/bin/TopMuonFWLiteAnalyzer.cc?view=co Can't connect to cmssw.cvs.cern.ch:80 (Bad hostname)


You find five sections for:

  • booking of the histograms,
  • opening of the file and getting access to the branches of interest,
  • the event loop; this would correspond to the analyze function within the full framework
  • saving of histograms,
  • freeing of the allocated space of the booked histograms

As you see from the file opening section both the root input file and the process that created the file are expected to be given to the executable as inputs. The executable needs the TQAF Layer 1 objects to be produced in advance. In order to run the example do the following:

cmsRun TopQuarkAnalysis/TopObjectProducers/test/tqafLayer1_fromAOD_full_cfg.py

This will produce a TQAFLayer1_Output.fromAOD_full.root file with the process TEST. You may increase the number of events in the loop if you like. Having a look to the file with the root TBrowser gives you an idea of what branches are kept by the TQAF Layer1 standard sequence:

TBrowser.png

Check to get the right branch name and run the executable:

TopMuonFWLiteAnalyzer TQAFLayer1_Output.fromAOD_full.root TEST
open  file: TQAFLayer1_Output.fromAOD_full.root
start looping 100 events...
  processing event: 10
  processing event: 20
  processing event: 30
  ...
close file

As for the full framework example you will find a file analyzeMuon.root with the same file structure and histograms as explained above. You can find similar examples for the analysis of electrons and jets in the bin directory of the package.

Analyze TQAF Layer2 Objects

This section contains a detailed example on how to exploit the TtSemiLeptonicEvent as an example of a TQAF Layer 2 object within the full framework or FWLite. Within the full framework the layer 1 and 2 can be produced on the fly (as indicated in the corresponding cfg file of the example) or in advance. For an example how to produce TQAF Layer1 and 2 objects from fullsim AOD or from scratch with Fastsim have a look to the TQAF Examples. In this section you will learn:

  • how to clone modules
  • how to use the TtSemiLeptonicEvent as an example of a TQAF Layer 2 object
  • how to implement your own event hypothesis as part of the TtSemiLeptonicEvent
  • how to run an EDAnalyzer to access and analyze the TtSemiLeptonicEvent
  • how to run a FWLite executable to access and analyze the TtSemiLeptonicEvent

Full Framework

Contact person Roger Wolf

Event hypotheses are based on the principle of the CompositeCandidate. They allow a flexible and intuitive combination of base Candidates for combinatorial analyses, which is fully determined by the user and does not even have to reflect the topology of ttbar decays. They are well supported and their design is regularly adapted to the needs and use cases within analyses. Their structure, implementation and application are well separated within TQAF and allow an arbitarily large number of different implementations without structural losses. The user is free to produce one or more favorite hypothesis implementations with different (user defined) steerable settings, like different working points of a MVA discriminant, or the best and second best solution of a kinematic fit. For the semi-leptonic ttbar decay channel all hypotheses are stored in a dedicated event class called TtSemiLeptonicEvent, which can be made persistent and used within FWLite. The first section of this example will show how to use and read out relevant information of already implemented event hypotheses. The second part will show how to implement your own event hypothesis in the case of semi-leptonic ttbar events.

Existing Implementations

An example file how to produce and analyze event hypotheses on the fly within TQAF is located in the test directory of the Examples package. There you will find the following lines:



      Failed to include URL http://cmssw.cvs.cern.ch/cgi-bin/cmssw.cgi/CMSSW/TopQuarkAnalysis/Examples/test/analyzeTopHypotheses_cfg.py?content-type=text%2Fplain&view=co Can't connect to cmssw.cvs.cern.ch:80 (Bad hostname)


Apart from its general steering the process is organized in three major blocks:

  • In the first block the tqafLayer1 objects, which are the input for the event hypotheses are produced in the standard sequence.
  • In the second block the TtSemiLeptonicEvent structure, which holds the event hypotheses is filled. All shown sequences are part the tqafLayer2 standard sequence.
  • In the third block the analysis of two different event hypotheses is steered.

Having a look to the HypothesisAnalyzer_cff.py file you will find the following lines:



      Failed to include URL http://cmssw.cvs.cern.ch/cgi-bin/cmssw.cgi/CMSSW/TopQuarkAnalysis/Examples/python/HypothesisAnalyzer_cff.py?content-type=text%2Fplain&view=co Can't connect to cmssw.cvs.cern.ch:80 (Bad hostname)


In the first part the file analyzeHypothesis module is cloned and configured to analyze the two different event hypothesis MaxSumPtWMass and GenMatch. The hypothesis of choice is steered from keys, which are stored in the event with each hypothesis in the production chain. Currently available keys are listed below:

Available Hypothesis Key Description
KinFit kKinFit based on a kinematic fit on an event by event basis
MVADisc kMVADisc based on a fully configurable MVA discriminator
GenMatch kGenMatch based on MC generator information
MaxSumPtWMass kMaxSumPtWMass based on a simple algorithm
WMassMaxSumPt kWMassMaxSumPt based on a simple algorithm
Geom kGeom based on a simple algorithm

Feel free to investigate any of these hypotheses with this analyzer. If you want to compare more than two of them just extend the number of clones in the HypothesisAnalyzer_cff.py file. You should take care though that especially complex hypothesis types like those based on MVA or fit methods should not be run blindly. They will deliver results, but definitely need training, parameter tuning and monitoring. This is why they were not chosen for this example. Be aware that the TtSemiLepJetCombMVAFileSource service for the MVA discriminant computation only points to a dummy mva training file in order to get the TQAF Layer 2 standard sequence running without complaints. To arrive at reasonable results using the MVA discriminant method you will have to provide and understand a well defined mva training file. Please consult the SWGuideTQAFLayer2 to learn how to do it. To run the example invoke:

cmsRun TopQuarkAnalysis/Examples/test/analyzeTopHypotheses_cfg.py

which will produce a file analyzeHypotheses.root. In there you will find the structure defined by the TFileService and a few histograms to exploit the hypotheses. These histograms have been booked and filled in the corresponding plugin directory of the package.

Implementing New Hypotheses

All event hypotheses are produced in the TopJetCombination package. For the semi-leptonic ttbar decay channel the abstract TtSemiLepHypothesis base class defines the interface to the TtSemiLeptonicEvent container class and the way to build the CompositeCandidate from base Candidates. Each concrete implementation is obliged to inherit from this class and to provide the following two functions:

  // -----------------------------------------
  // implemet the following two functions
  // for a concrete event hypothesis
  // -----------------------------------------
  /// build the event hypothesis key
  virtual void buildKey() = 0;
  /// build event hypothesis from the reco objects of a semi-leptonic event 
  virtual void buildHypo(edm::Event& event,
          const edm::Handle<edm::View<reco::RecoCandidate> >& lepton, 
          const edm::Handle<std::vector<pat::MET> >& neutrino, 
          const edm::Handle<std::vector<pat::Jet> >& jets, 
          std::vector<int>& jetPartonAssociation,
          const unsigned int iComb) = 0;

The algorithmic implementation is performed in the buildHypo function, in which the explicit choice of the basic Candidates is taken. This can be based on generator information, information based on a kinematic fit, special algorithms or mva methods. It is free to take any additional steering parameter for the algorithm (like the number of considered jets). The buildKey function associates the implementation to an enumerator in the TtSemiLeptonicEvent class. When implementing new event hypotheses please stick to the following guidelines and rules:

  • The implementation derives from TtSemiLepHypothesis.
  • Each hypothesis is identified by a key which is public to the TtSemiLeptonicEvent container class.
  • The implementation resides in the plugins directory of the TopJetCombination package.
  • Necessary cfi and cff files reside in the data directory of the TopJetCombination package.
  • The hypothesis is made known to the TtSemiLepEventBuilder, which implements the TtSemiLeptonicEvent class in the files TtSemiLepEvtBuilder_cfi.py and ttSemiLepEvtBuilder_cff.py.

Following these guidlines you are only left with the physics concerns of your algorithm. All hypotheses are collected in the TtSemiLeptonicEvent class, which hosts the following information:

  • The initial partons of the ttbar production process (if available).
  • The decay branches of the ttbar event on generator level (if available).
  • All hypotheses chosen by trhe user (all per default).
  • All relevant meta information to judge and interpret these hypotheses.

Be aware that the four vectors of status 3 particles, represent the kinematics before parton showering and therefore are un-physical. The four vectors of all decay components (including the two top quarks) have therefore been recaulated! The user may exclude particular hypotheses from the TtSemiLeptonicEvent class by editing the files TtSemiLepEvtBuilder_cfi.py and ttSemiLepEvtBuilder_cff.py in the python directory of the TopEventProducers package correspondingly.

FWLite

Contact person Roger Wolf

A similar example as for the use within the full framework exist for the use of FWLite. You can find the implementation in the bin directory of the package. The executable provides exactely the same file structure and histograms as the EDAnalyzer example within the full framework does for a single hypothesis. The feature of cloned EDAnalyzers as described above is not mirrored into the binary though. Having a look at it in your favorite editor it should look like this:



      Failed to include URL http://cmssw.cvs.cern.ch/cgi-bin/cmssw.cgi/CMSSW/TopQuarkAnalysis/Examples/bin/TopHypothesisFWLiteAnalyzer.cc?view=co Can't connect to cmssw.cvs.cern.ch:80 (Bad hostname)


As you see from the file opening section the following inputs are required:

  • the root input file
  • the process that created the file
  • The hypothesis key.

In case of wrong inputs printouts will clarify what inputs are expected. The executable needs the TQAF objects to be produced in advance. In order to run the example do the following:

cmsRun TopQuarkAnalysis/TopEventProducers/test/tqafFromAOD_full_cfg.py

This will produce a tqafOutput.fromAOD_full.root file with the process TQAF. You may increase the number of events in the loop if you like. Having a look to the file with the root TBrowser gives you an idea of what branches are kept by the TQAF standard sequence. Check to get the right branch name and run the executable:

TopHypothesisFWLiteAnalyzer tqafOutput.fromAOD_full.root TQAF kMaxSumPtWMass
open file: tqafOutput.fromAOD_full.root
start looping 100 events...
   ...
close file

Instead of kMaxSumPtWMass you can also try one of the other hypothesis keys. You might get Hypothesis not valid for this event messages during the event loop, which typically come from events that did not fulfil certain criteria of the hypothesis (e.g. a minimum number of jets). As for the full framework example you will find a file analyzeHypotheses.root with the same file structure and histograms as explained above.

Topic attachments
I Attachment History Action Size Date Who Comment
PNGpng TBrowser.png r1 manage 31.3 K 2008-07-23 - 13:53 RogerWolf  
Edit | Attach | Watch | Print version | History: r31 < r30 < r29 < r28 < r27 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r31 - 2011-10-17 - SebastianNaumann
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback