PAT Exercise 05: Collecting Information of the Relation of Objects of Different Type with PAT

Contents

Objectives

  • Learn how PAT supports the adding of Cross Object Collection information (COC).
  • Learn how to configure and how to make use of the COC information support of PAT.
ALERT! Note:

This web course is part of the PAT Tutorial, which takes regularly place at cern and in other places. When following the PAT Tutorial the answers of questions marked in RED should be filled into the exercise form that has been introduced at the beginning of the tutorial. Also the solutions to the Exercises should be filled into the form. The exercises are marked in three colours, indicating whether this exercise is basic (obligatory), continuative (recommended) or optional (free). The colour coding is summarized in the table below:

Color Code Explanation
red Basic exercise, which is obligatory for the PAT Tutorial.
yellow Continuative exercise, which is recommended for the PAT Tutorial to deepen what has been learned.
green Optional exercise, which shows interesting applications of what has been learned.

Basic exercises ( red ) are obliged and the solutions to the exercises should be filled into the exercise form during the PAT Tutorial.

Introduction

The inital purpose of COC is based on the 'traditional' way of detector based high level analysis object reconstruction. Electrons usually are just reconstructed from energy deposits in the calorimeters and tracks in the silicon detectors. The clustering algorithms run over the calorimeter objects and create clusters and clusters of clusters called super clusters, which is one of the elements of the electron reconstruction at CMS. First problems occur when a CaloTower is created as the seed of a CaloJet, including the energy deposits of the electron. Then the energy of the electron in the detector is reconstructed twice: once as an electron and once as a jet.

In the era of particle flow, which provides a full unambiguous reconstructed particle based event interpretation, there are more sophisticated ways to do objection disambiguation. Examples are given in Exercise 7. But still PAT provides capabilities to analyze associations of objects, which are part of different high level reconstruction objects on python level.

Example:
Think of a cut which is usually applied in analyses with isolated leptons and jets, where to be on the safe side minimal distance is required between the isolated lepton and the closest jet. Jets within a certain radius in the vicinity of the lepton are not taken into account for the analysis or event for which this is the case are excluded form further consideration. PAT Cross Object Collection (COC) information provides you with python configurable tools to relate a collection of selected objects of any kind with other object collections. This information is saved within the selected objects.

Example:
Easy access on back-to-back objects such as a jet and a photon for jet calibration.

This page will guide you through a tutorial and corresponding exercises to learn more about how to use and how to configure COC information with PAT. Though there will be hints and reminders throughout the exercises it will require the following knowledge from you:

If you feel uncomfortable with one of these points please follow the links given above and make yourself familiar with it.

Setting up of the environment

We assume that you are logged in on lxplus and are in your work directory. If not you can follow the instruction given here.

your_lxplus_Name@lxplus.cern.ch
[.... enter password...]
cd scratch0/
mkdir exercise05
cd exercise05
cmsrel CMSSW_7_4_1_patch4
cd CMSSW_7_4_1_patch4/src 
cmsenv
git cms-addpkg PhysicsTools/PatAlgos
git cms-merge-topic -u CMS-PAT-Tutorial:CMSSW_7_4_1_patTutorial
scram b -j 4

PAT COC default configuration

The event content of the default pat::Tuple that you get from the release contains the selectedPatCandidates, that do not contain any COC information, only some recommended cuts on the physics objects. You can learn more details about default pat::Tuple creation and configuration in WorkBookPATTupleCreationExercise.

As a first step, let's make a default pat::Tuple:

cmsRun PhysicsTools/PatAlgos/test/patTuple_standard_cfg.py 

Convince yourself that this PAT tuple really only contains selectedPatCandidates using the edmDumpEventContent tool:

edmDumpEventContent patTuple_standard.root

Type                                  Module                   Label          Process   
----------------------------------------------------------------------------------------
edm::OwnVector<reco::BaseTagInfo,edm::ClonePolicy<reco::BaseTagInfo> >    "selectedPatJets"        "tagInfos"     "PAT"     
vector<CaloTower>                     "selectedPatJets"        "caloTowers"   "PAT"     
vector<pat::Electron>                 "selectedPatElectrons"   ""             "PAT"     
vector<pat::Jet>                      "selectedPatJets"        ""             "PAT"     
vector<pat::MET>                      "patMETs"                ""             "PAT"     
vector<pat::Muon>                     "selectedPatMuons"       ""             "PAT"     
vector<pat::Photon>                   "selectedPatPhotons"     ""             "PAT"     
vector<pat::Tau>                      "selectedPatTaus"        ""             "PAT"     
vector<reco::GenJet>                  "selectedPatJets"        "genJets"      "PAT"  

The example COC configuration is defined in the cleaningLayer1 directory of the PhysicsTools/PatAlgos package. Please note that this configuration should be understood as only an example. To turn on this toy example COC configuration you should modify PhysicsTools/PatAlgos/test/patTuple_standard_cfg.py in a following way:

  • add cleaned* collections to your output:

from PhysicsTools.PatAlgos.patEventContent_cff import patEventContent
process.out.outputCommands = cms.untracked.vstring('drop *', *patEventContent ) 

  • add cleaning modules to your configuration, so they are executed during unscheduled execution:

process.load("PhysicsTools.PatAlgos.cleaningLayer1.cleanPatCandidates_cff")

The output of edmDumpEventContent command run on the patTuple produced with cleaning sequences on should look like this:

edm::OwnVector<reco::BaseTagInfo,edm::ClonePolicy<reco::BaseTagInfo> >    "selectedPatJets"     "tagInfos"     "PAT"     
vector<CaloTower>                     "selectedPatJets"     "caloTowers"   "PAT"     
vector<pat::Electron>                 "cleanPatElectrons"   ""             "PAT"     
vector<pat::Jet>                      "cleanPatJets"        ""             "PAT"     
vector<pat::MET>                      "patMETs"             ""             "PAT"     
vector<pat::Muon>                     "cleanPatMuons"       ""             "PAT"     
vector<pat::Photon>                   "cleanPatPhotons"     ""             "PAT"     
vector<pat::Tau>                      "cleanPatTaus"        ""             "PAT"     
vector<reco::GenJet>                  "selectedPatJets"     "genJets"      "PAT" 

Now we will try to create a file with customized COC information. To go on with our customized COC information we have created a second file

PhysicsTools/PatExamples/test/patTuple_addCOC_cfg.py 

We are going to investigate our COC configuration using the python interpreter. You can also open the files directly in your favourite editor to check the actual steps of configuration in the cfg file.

import FWCore.ParameterSet.Config as cms

process = cms.Process("COC")

#unscheduled mode
process.options = cms.untracked.PSet(allowUnscheduled = cms.untracked.bool(True) )

## MessageLogger
process.load("FWCore.MessageLogger.MessageLogger_cfi")

process.source = cms.Source("PoolSource",
    fileNames = cms.untracked.vstring("file:patTuple_standard.root")
)

## Maximal Number of Events
process.maxEvents = cms.untracked.PSet( input = cms.untracked.int32(-1) )

## load the configuration for the customized COC running
process.load("PhysicsTools.PatExamples.customizedSelection_cff")
process.load("PhysicsTools.PatExamples.customizedCOC_cff")

## define the name and content of the output file
process.out = cms.OutputModule("PoolOutputModule",
                               fileName = cms.untracked.string('cocTuple.root'),
                               outputCommands = cms.untracked.vstring(
                                   'keep *',
                                   'drop *_selectedPatJets_*_*',
                                   'keep *_*_caloTowers_*',
                                   'keep *_*_genJets_*'
                                   )
                                )

process.outpath = cms.EndPath(process.out)

We want to follow a typical use case of an analysis with isolated muons and/or electrons and jets. The first thing we did was to create two new collections to contain the objects of interest for us, the isolated leptons, in a special cms.Sequence that we call customSelection. You can investigate it in the python interpreter doing the following:

python -i PhysicsTools/PatExamples/test/patTuple_addCOC_cfg.py
>>> process.customSelection

cms.Sequence(isolatedPatElectrons+isolatedPatMuons)

Have a look into the configuration of the individual modules. We give an example for the electrons:

>>> process.isolatedPatElectrons

cms.EDFilter("PATElectronSelector",
    src = cms.InputTag("selectedPatElectrons"),
    cut = cms.string('pt>10 & abs(eta)<2.5 & (trackIso+caloIso)/pt< 5')
)

Our customized coc part you find in the file PhysicsTools/PatExamples/python/customizedCOC_cff.py. The sequence of interest is called customCOC:

>>> process.customCOC

cms.Sequence(cocPatJets)

As you see we decided to add cross object information for jets. We want to add cross links to isolated leptons in the vicinity of the jet. Later on we want to make use of these cross links to quickly point from the jets to the objects in their surroundings. Let's have a look into the configuration of the jets:

>>> process.cocPatJets

cms.EDProducer("PATJetCleaner",
    src = cms.InputTag("selectedPatJets"), 

    # preselection (any string-based cut on pat::Jet)
    preselection = cms.string(''),

    # overlap checking configurables
    checkOverlaps = cms.PSet(

        isolatedMuons = cms.PSet(
           src       = cms.InputTag("isolatedPatMuons"),
           algorithm = cms.string("byDeltaR"),
           preselection        = cms.string(""),
           deltaR              = cms.double(0.5),
           checkRecoComponents = cms.bool(False), # don't check if they share some AOD object ref
           pairCut             = cms.string(""),
           requireNoOverlaps   = cms.bool(False), # overlaps don't cause the jet to be discared
        ),                                
        isolatedElectrons = cms.PSet(
           src       = cms.InputTag("isolatedPatElectrons"),
           algorithm = cms.string("byDeltaR"),
           preselection        = cms.string(""),
           deltaR              = cms.double(0.5),
           checkRecoComponents = cms.bool(False), # don't check if they share some AOD object ref
           pairCut             = cms.string(""),
           requireNoOverlaps   = cms.bool(False), # overlaps don't cause the jet to be discared
        )
    ),
    # finalCut (any string-based cut on pat::Jet)
    finalCut = cms.string(''),
)

ALERT! Note: as you see the structure of the cocJet module is quite complex. On the other hand you will find the same structure for any other collection that you might want to configure yourself in the future. So once you got through it in this example there is nothing more to come. After all the most important individual parameters are also pretty intuitive. We will go through them now step by step:

  • src: this is the input collection that we want to add the COC information to. As you see for the jets we chose the collection of (pre)selected jets ( selectedPatJets). The output of the module, the cocPatJets, will be a a collection identical to the selectedPatJets but with the extra COC information added. Once the new collection has been produced, you can drop the old one.
  • preselection: you can add a selection string to the input collection before even starting to add the COC information.
  • finalCut: you can add a selection string here to the input collection after having added the COC information. You wonder what the difference between preselection and finalCut is? Remember that in the selection string you can make use of any member function of the pat::Jet. So in the final selection you could apply a selection already based on the presence of the added COC information.
  • checkOverlaps: this is the heart part of the COC information configuration. We will therefore discuss it in more detail below.

The heart piece of the COC information configuration are the following edm::ParameterSet's or PSet's. The first thing to know is that you can add as many of those as you like. Also you are completely free of the naming of these parameter sets. In our example we chose a PSet called isolatedMuons and a PSet called isolatedElectrons. The only important part of these parameter sets is their structure, e.g. for the isolatedMuons PSet:

isolatedMuons = cms.PSet( 
    src = cms.InputTag("isolatedPatMuons"), 
    deltaR = cms.double(0.5), 
    pairCut = cms.string(''), 
    checkRecoComponents = cms.bool(False), 
    algorithm = cms.string('byDeltaR'), 
    preselection = cms.string(''), 
    requireNoOverlaps = cms.bool(False) 
) 

ALERT! Note: we list and explain the main parameters of the parameter set below:

  • src: this is the input of the collections, of which the cross information should be added to the cocPatJet. In our example we want to have a cross link of any isolatedPatMuon, that might be located in the vicinity of the cocPatJet.
  • algorithm: here we make the decision to add the COC information based on deltaR. The distance in deltaR is the customary way to do that. There are also alternatives to this (e.g. photons and electrons might use the same super cluster seed). But they are somewhat more involved and most of the time the deltaR criterion is sufficient. Have a look to SWGuidePATCrossCleaning for more details.
  • deltaR: we add COC information based on a judgement whether the muon is in the vicinity of the jet or not. Have a look to the description of the algorithm parameter for some more details. In our example we choose deltaR<0.5
  • pairCut: this is also a special parameter. You might hardly make use of it. You can for instance apply a minimal cut on the invariant mass of the jet and the muon in consideration.
  • preselection: also here you can apply a preselection, but this time it will be applied to the muons and not to the jet. We could have chosen e.g. to consider only isolated muons ( trackIso<3). In this case only the muons fulfilling the isolation criterion would have been considered.
  • requireNoOverlaps: you will find this switch set to False in most cases. Setting it to True will drop the jet out of the jet collection when finding a muon that fulfills the preselection requirements in the vicinity of this jet. This switch was sometimes used in the times of 'traditional' cross object cleaning, to prevent double counting as discussed above. Nowadays it can be used in a selection to put a requirement that the objects are well separated.

Question Question 5 a) You can find a similar parameter set ( PSet) for isolatedElectrons. What is the difference in the configuration? How would you configure it to get overlapping information with selectedPatMuons in addition?


You should be set now to create your own pat::Candidates including COC information:

cmsRun PhysicsTools/PatExamples/test/patTuple_addCOC_cfg.py 

Note that with this configuration file will take the patTuple.root that we have produced above as input. The output that we are going to produce now is called cocTuple.root. It still is a patTuple.root. We just gave it a different name. Check the output again using edmDumpEventContent:

Type                       Module                   Label          Process   
-----------------------------------------------------------------------------
vector<CaloTower>          "selectedPatJets"        "caloTowers"   "PAT"     
vector<pat::Electron>      "selectedPatElectrons"   ""             "PAT"     
vector<pat::MET>           "patMETs"                ""             "PAT"     
vector<pat::Muon>          "selectedPatMuons"       ""             "PAT"     
vector<pat::Photon>        "selectedPatPhotons"     ""             "PAT"     
vector<pat::Tau>           "selectedPatTaus"        ""             "PAT"     
vector<reco::GenJet>       "selectedPatJets"        "genJets"      "PAT"        
vector<pat::Electron>      "isolatedPatElectrons"   ""             "COC"     
vector<pat::Jet>           "cocPatJets"             ""             "COC"     
vector<pat::Muon>          "isolatedPatMuons"       ""             "COC"  

ALERT! Note: You see that some new collections popped up in the cocTuple.root, when compared to the patTuple.root that we used as input. We also dropped the selectedPatJet collection that was identical to cocPatJets collection except that cocPatJets_collection had some new information added. You can have a look to the configuration of the module _out in the patTuple_addCOC_cfg.py file to see how this replacement has been taken place.

Question Question 5 b) We will not use the isolatedPatMuon collection, but still we kept it in the event on purpose. Do you know why we did this? If not re-create the cocTuple.root file with the isolatedPatMuon collection dropped and run the exercise below. Can you explain what's happening?

We will now analyze the COC information in the newly created pat::Jets. For this we will use a dedicated FWLiteAnalyzer. To use it do the following

PatCOCExercise PhysicsTools/PatExamples/bin/analyzePatCOC_cfg.py

As you see we run the executable together with a cfg file. It is possible to use some features of python cfg files also within FWLite. To learn more about that have a look to Exercise 04. We will have a short look into the cfg file:

import FWCore.ParameterSet.Config as cms

process = cms.Process("FWLitePlots")

process.FWLiteParams = cms.PSet( 
    inputFile = cms.string('file:cocTuple.root'), 
    outputFile = cms.string('analyzePatCOC.root'), 
    jets = cms.InputTag('cocPatJets'), 
    overlaps = cms.string('isolatedElectrons') 
) 

ALERT! Note: As you see three variables are defined:

  • input: denotes the input file we are going to use.
  • outputFile: denotes the ouput file we are going to use.
  • jetSrc: is the cms.InputTag for the jet collection we want to analyze.
  • overlaps: is a cms.string label for the overlapping objects we want to analyze.
You see that for the first go we chose to analyse the overlaps of jets with electrons. You can use any other parameter set that was defined in the configuration of the cocPatJets before. The executable creates a root file named analyzePatCOC.root. You can open it and view the histograms in the TBrowser.

Having a closer look to the FWLite macro

To see how this histogram was filled, open the file PatCOCExercise.cc with your favourite editor (you can find the file in PhysicsTools/PatExamples/bin). The first important thing is that, in addition to including other needed header files, we need to include the header files for FWLite and PAT:

#include "DataFormats/Math/interface/deltaR.h" 
#include "DataFormats/FWLite/interface/Event.h" 
#include "DataFormats/Common/interface/Handle.h" 
#include "DataFormats/PatCandidates/interface/Jet.h" 
#include "FWCore/ParameterSet/interface/ProcessDesc.h" 
#include "FWCore/FWLite/interface/AutoLibraryLoader.h" 
#include "PhysicsTools/FWLite/interface/TFileService.h" 
#include "FWCore/PythonParameterSet/interface/PythonProcessDesc.h" 

In the main method, first we need to read the input argument and set the input file name. We will do this using the config file features of FWLite:

// get the python configuration 
PythonProcessDesc builder(argv[1]); 
const edm::ParameterSet& fwliteParameters = builder.processDesc()->getProcessPSet()->getParameter("FWLiteParams");

// now get each parameter 
std::string input_ ( fwliteParameters.getParameter("inputFile" ) ); 
std::string output_ ( fwliteParameters.getParameter("outputFile" ) ); 
std::string overlaps_( fwliteParameters.getParameter("overlaps") ); 
edm::InputTag jets_ ( fwliteParameters.getParameter("jets" ) ); 

As you see we read back the parameters we had defined above in the patCOCExercise_cfg.py FWLite config file. After the file name is set, a set of histograms is booked via the TFileService.

// book a set of histograms 
fwlite::TFileService fs = fwlite::TFileService(output_.c_str()); 
TFileDirectory theDir = fs.mkdir("analyzePatCOC"); 
TH1F* deltaRElecJet_ = theDir.make<TH1F>("deltaRElecJet" , "#DeltaR (elec, jet)" , 10, 0., 0.5); 
TH1F* elecOverJet_ = theDir.make<TH1F>("elecOverJet" , "E_{elec}/E_{jet}" , 100, 0., 2.); 
TH1F* nOverlaps_ = theDir.make<TH1F>("nOverlaps" , "Number of overlaps" , 5, 0., 5.); 

ALERT! Note: Note that we booked the histograms not directly in the file but created a directory with the name of the 'analyzePatCOC' beforehand. We do this for a better overview. Next we loop over the events in the input file:

// open input file (can be located on eos or remote storage via xrootd) 
TFile* inFile = TFile::Open(input_.c_str());

// loop the events 
unsigned int iEvent=0; 
fwlite::Event ev(inFile); 
for(ev.toBegin(); !ev.atEnd(); ++ev, ++iEvent){ 
  edm::EventBase const & event = ev;

// break loop after end of file is reached 
// or after 1000 events have been processed 
if( iEvent==1000 ) break;

// simple event counter 
if(iEvent > 0 && iEvent%1==0){ 
std::cout << " processing event: " << iEvent << std::endl; 
}

// handle to jet collection 
edm::Handle > jets; event.getByLabel(jets_, jets);

// loop over the jets in the event 
for( std::vector::const_iterator jet = jets->begin(); jet != jets->end(); jet++ ){ 
   if(jet->pt() > 20 && jet==jets->begin()){ 
      ... 

We read in the cocPatJets via edm::InputTag, loop over all jets and fill our histograms with COC info of each leading jet with pt greater than 20 GeV. This will be detailed in the next section.

Making use of additional the COC information

In the next sections you will see how the COC information can be accessed from the pat::Jet. Technically this is done with minimal overhead in space consumption using edm::Ptr. You can read the COC information that has been added to the jets calling the method hasOverlaps(label), where label denotes the object type you want to check the overlap with. This is done in the lower part of the jet loop:

... 
if(jet->hasOverlaps(overlaps_)){ 
  //get all overlaps 
  const reco::CandidatePtrVector overlaps = jet->overlaps(overlaps_); 
  nOverlaps_->Fill( overlaps.size() ); 
  //loop over the overlaps 
  for( reco::CandidatePtrVector::const_iterator overlap = overlaps.begin(); overlap != overlaps.end(); overlap++){ 
    float deltaR = reco::deltaR( (*overlap)->eta(), (*overlap)->phi(), jet->eta(), jet->phi() ); 
    deltaRElecJet_->Fill( deltaR ); 
    elecOverJet_->Fill( (*overlap)->energy()/jet->energy() ); 
 } 
} ... 

ALERT! Note: In fact each pat::Candidate has such a method which is inherited from the pat::PATObject class. For more information about the hierarchy of pat::Candidates you can have a look at WorkBookPATDataFormats. As already mentioned above the hasOverlap(label) method needs one argument, which is the name of the overlap class that has been defined in the configuration of the cocPatJets module. Remember that you are free to choose any name of the PSet in the configuration that you want. Of course this has a consequence on what checks for COC information are available in your cocPatJet. In our example the available classes are: isolatedElectrons and isolatedMuons. The function returns a Boolean, indicating whether there was an object of that type in the vicinity of the jet or not.

But checking whether an overlap exists or not is not the only action we can do from the stored COC information. From those jets that have another object in their vicinity we next want to acces the reference pointers to these other objects in the isolatedElectrons collection to analyze these a bit further. Have a look to the other histograms that have been filled in the EDAnalyzer and see whether they show what you would have expected. Just as an example of what you can do we filled the following histograms:

  • nOverlaps: The number of overlapping objects (which can be larger than 1 as you will see!),
  • elecOverJet: The fraction that the corresponding overlapping electron carries from the (jet energy scale corrected) jet energy
  • deltaRElecJet: The distance between the overlapping electron and the jet axes.
ALERT! Note: A technical remark: The reference pointers you get are to objects of type reco::Candidate, so you can easily access all information of a reco::Candidate. Nevertheless as we know that we are pointing into a pat::Electron collection you can use a C++ dynamic_cast to 'expand' the data type from reco::Candidate to pat::Electron. In this way you can access all pat::Electron information and e.g. easily apply an electronID cut on the ovelapping electrons in later states of your analysis. Have a look into Lecture 3.1, slide 15 of the June 2011 Tutorial to find out how to do this.

Exercises

Before leaving this page try to do the following exercises:

red Exercise 5 a):
Question Run the PatCOCExercise macro with the different inputs of 'overlaps' that we have configured in the TWiki above.

...-->

yellow Exercise 5 b):
Question Find out which overlaps are checked for the cleanPatJets in the PAT example configuration. To do this have a look into the patTuple_standard_cfg.py with the added configuration for example PAT COC (see above) with the python interpreter. Add the list of PSet names of objects for which COC information is added to the submission form.

...-->

ALERT! Note:

In case of problems don't hesitate to contact the SWGuidePAT#Support. Having successfully finished Exercise 5 you might want to proceed to the other other exercises of the WorkBookPATTutorial to learn more about how PAT supports and facilitates many common high level analysis tasks.

Review status

Reviewer/Editor and Date (copy from screen) Comments
RogerWolf - 17 March 2012 added color coding.

Responsible: HamedBakhshianSohi

Topic attachments
I Attachment History Action Size Date Who Comment
PNGpng efmcleanjets.png r1 manage 9.5 K 2010-06-23 - 16:07 KatiLassilaPerini  
PNGpng efmcleanjets_383.png r1 manage 18.6 K 2010-09-18 - 23:58 SudhirMalik  
PNGpng emfAllJets_383.png r1 manage 18.8 K 2010-09-18 - 23:33 SudhirMalik  
PNGpng emfAllJets_4_1_3_patch2.png r1 manage 37.3 K 2011-03-26 - 22:01 SudhirMalik  
PNGpng emfalljets.png r1 manage 10.2 K 2010-06-23 - 15:54 KatiLassilaPerini  
PNGpng emfcleanjets_4_1_3_patch2.png r1 manage 39.0 K 2011-03-26 - 21:55 SudhirMalik  
Cascading Style Sheet filecss tutorial.css r1 manage 0.2 K 2010-06-16 - 16:08 SudhirMalik  
Edit | Attach | Watch | Print version | History: r112 < r111 < r110 < r109 < r108 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r112 - 2015-06-29 - TaeJeongKim
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback