PAT Exercise 05: Collecting Information of the Relation of Objects of Different Type with PAT
Contents
Objectives
- Learn how PAT supports the adding of Cross Object Collection information (COC).
- Learn how to configure and how to make use of the COC information support of PAT.
Note:
This web course is part of the
PAT Tutorial, which takes regularly place at cern and in other places. When following the PAT Tutorial the answers of questions marked in
RED should be filled into the exercise form that has been introduced at the beginning of the tutorial. Also the solutions to the
Exercises should be filled into the form. The exercises are marked in three colours, indicating whether this exercise is basic (obligatory), continuative (recommended) or optional (free). The colour coding is summarized in the table below:
Color Code |
Explanation |
|
Basic exercise, which is obligatory for the PAT Tutorial. |
|
Continuative exercise, which is recommended for the PAT Tutorial to deepen what has been learned. |
|
Optional exercise, which shows interesting applications of what has been learned. |
Basic exercises (

) are obliged and the solutions to the exercises should be filled into the exercise form during the PAT Tutorial.
Introduction
The inital purpose of COC is based on the 'traditional' way of detector based high level analysis object reconstruction. Electrons usually are just reconstructed from energy deposits in the calorimeters and tracks in the silicon detectors. The clustering algorithms run over the calorimeter objects and create clusters and clusters of clusters called super clusters, which is one of the elements of the electron reconstruction at CMS. First problems occur when a CaloTower is created as the seed of a CaloJet, including the energy deposits of the electron. Then the energy of the electron in the detector is reconstructed twice: once as an electron and once as a jet.
In the era of particle flow, which provides a full unambiguous reconstructed particle based event interpretation, there are more sophisticated ways to do objection disambiguation. Examples are given in
Exercise 7. But still PAT provides capabilities to analyze associations of objects, which are part of different high level reconstruction objects on python level.
Example: Think of a cut which is usually applied in analyses with isolated leptons and jets, where to be on the safe side minimal distance is required between the isolated lepton and the closest jet. Jets within a certain radius in the vicinity of the lepton are not taken into account for the analysis or event for which this is the case are excluded form further consideration. PAT Cross Object Collection (COC) information provides you with python configurable tools to relate a collection of selected objects of any kind with other object collections. This information is saved within the selected objects.
Example: Easy access on back-to-back objects such as a jet and a photon for jet calibration.
This page will guide you through a tutorial and corresponding exercises to learn more about how to use and how to configure COC information with PAT. Though there will be hints and reminders throughout the exercises it will require the following knowledge from you:
If you feel uncomfortable with one of these points please follow the links given above and make yourself familiar with it.
Setting up of the environment
We assume that you are logged in on
lxplus
and are in your work directory. If not you can follow the instruction given
here.
your_lxplus_Name@lxplus.cern.ch
[.... enter password...]
cd scratch0/
mkdir exercise05
cd exercise05
cmsrel CMSSW_7_4_1_patch4
cd CMSSW_7_4_1_patch4/src
cmsenv
git cms-addpkg PhysicsTools/PatAlgos
git cms-merge-topic -u CMS-PAT-Tutorial:CMSSW_7_4_1_patTutorial
scram b -j 4
PAT COC default configuration
The event content of the default
pat::Tuple that you get from the release contains the
selectedPatCandidates, that do not contain any COC information, only some recommended cuts on the physics objects. You can learn more details about default
pat::Tuple creation and configuration in
WorkBookPATTupleCreationExercise.
As a first step, let's make a default
pat::Tuple:
cmsRun PhysicsTools/PatAlgos/test/patTuple_standard_cfg.py
Convince yourself that this PAT tuple really only contains
selectedPatCandidates using the
edmDumpEventContent tool:
edmDumpEventContent patTuple_standard.root
Type Module Label Process
----------------------------------------------------------------------------------------
edm::OwnVector<reco::BaseTagInfo,edm::ClonePolicy<reco::BaseTagInfo> > "selectedPatJets" "tagInfos" "PAT"
vector<CaloTower> "selectedPatJets" "caloTowers" "PAT"
vector<pat::Electron> "selectedPatElectrons" "" "PAT"
vector<pat::Jet> "selectedPatJets" "" "PAT"
vector<pat::MET> "patMETs" "" "PAT"
vector<pat::Muon> "selectedPatMuons" "" "PAT"
vector<pat::Photon> "selectedPatPhotons" "" "PAT"
vector<pat::Tau> "selectedPatTaus" "" "PAT"
vector<reco::GenJet> "selectedPatJets" "genJets" "PAT"
The example COC configuration is defined in the
cleaningLayer1 directory of the
PhysicsTools/PatAlgos package. Please note that this configuration should be understood as only an example. To turn on this toy example COC configuration you should modify
PhysicsTools/PatAlgos/test/patTuple_standard_cfg.py in a following way:
- add cleaned* collections to your output:
from PhysicsTools.PatAlgos.patEventContent_cff import patEventContent
process.out.outputCommands = cms.untracked.vstring('drop *', *patEventContent )
- add cleaning modules to your configuration, so they are executed during unscheduled execution:
process.load("PhysicsTools.PatAlgos.cleaningLayer1.cleanPatCandidates_cff")
The output of
edmDumpEventContent command run on the patTuple produced with cleaning sequences on should look like this:
edm::OwnVector<reco::BaseTagInfo,edm::ClonePolicy<reco::BaseTagInfo> > "selectedPatJets" "tagInfos" "PAT"
vector<CaloTower> "selectedPatJets" "caloTowers" "PAT"
vector<pat::Electron> "cleanPatElectrons" "" "PAT"
vector<pat::Jet> "cleanPatJets" "" "PAT"
vector<pat::MET> "patMETs" "" "PAT"
vector<pat::Muon> "cleanPatMuons" "" "PAT"
vector<pat::Photon> "cleanPatPhotons" "" "PAT"
vector<pat::Tau> "cleanPatTaus" "" "PAT"
vector<reco::GenJet> "selectedPatJets" "genJets" "PAT"
Now we will try to create a file with customized COC information. To go on with our customized COC information we have created a second file
PhysicsTools/PatExamples/test/patTuple_addCOC_cfg.py
We are going to investigate our COC configuration using the
python interpreter. You can also open the files directly in your favourite editor to check the actual steps of configuration in the cfg file.
import FWCore.ParameterSet.Config as cms
process = cms.Process("COC")
#unscheduled mode
process.options = cms.untracked.PSet(allowUnscheduled = cms.untracked.bool(True) )
## MessageLogger
process.load("FWCore.MessageLogger.MessageLogger_cfi")
process.source = cms.Source("PoolSource",
fileNames = cms.untracked.vstring("file:patTuple_standard.root")
)
## Maximal Number of Events
process.maxEvents = cms.untracked.PSet( input = cms.untracked.int32(-1) )
## load the configuration for the customized COC running
process.load("PhysicsTools.PatExamples.customizedSelection_cff")
process.load("PhysicsTools.PatExamples.customizedCOC_cff")
## define the name and content of the output file
process.out = cms.OutputModule("PoolOutputModule",
fileName = cms.untracked.string('cocTuple.root'),
outputCommands = cms.untracked.vstring(
'keep *',
'drop *_selectedPatJets_*_*',
'keep *_*_caloTowers_*',
'keep *_*_genJets_*'
)
)
process.outpath = cms.EndPath(process.out)
We want to follow a typical use case of an analysis with isolated muons and/or electrons and jets. The first thing we did was to create two new collections to contain the objects of interest for us, the isolated leptons, in a special
cms.Sequence that we call
customSelection. You can investigate it in the
python interpreter doing the following:
python -i PhysicsTools/PatExamples/test/patTuple_addCOC_cfg.py
>>> process.customSelection
cms.Sequence(isolatedPatElectrons+isolatedPatMuons)
Have a look into the configuration of the individual modules. We give an example for the electrons:
>>> process.isolatedPatElectrons
cms.EDFilter("PATElectronSelector",
src = cms.InputTag("selectedPatElectrons"),
cut = cms.string('pt>10 & abs(eta)<2.5 & (trackIso+caloIso)/pt< 5')
)
Our customized coc part you find in the file
PhysicsTools/PatExamples/python/customizedCOC_cff.py. The sequence of interest is called
customCOC:
>>> process.customCOC
As you see we decided to add cross object information for jets. We want to add cross links to isolated leptons in the vicinity of the jet. Later on we want to make use of these cross links to quickly point from the jets to the objects in their surroundings. Let's have a look into the configuration of the jets:
>>> process.cocPatJets
cms.EDProducer("PATJetCleaner",
src = cms.InputTag("selectedPatJets"),
# preselection (any string-based cut on pat::Jet)
preselection = cms.string(''),
# overlap checking configurables
checkOverlaps = cms.PSet(
isolatedMuons = cms.PSet(
src = cms.InputTag("isolatedPatMuons"),
algorithm = cms.string("byDeltaR"),
preselection = cms.string(""),
deltaR = cms.double(0.5),
checkRecoComponents = cms.bool(False), # don't check if they share some AOD object ref
pairCut = cms.string(""),
requireNoOverlaps = cms.bool(False), # overlaps don't cause the jet to be discared
),
isolatedElectrons = cms.PSet(
src = cms.InputTag("isolatedPatElectrons"),
algorithm = cms.string("byDeltaR"),
preselection = cms.string(""),
deltaR = cms.double(0.5),
checkRecoComponents = cms.bool(False), # don't check if they share some AOD object ref
pairCut = cms.string(""),
requireNoOverlaps = cms.bool(False), # overlaps don't cause the jet to be discared
)
),
# finalCut (any string-based cut on pat::Jet)
finalCut = cms.string(''),
)
Note: as you see the structure of the
cocJet module is quite complex. On the other hand you will find the same structure for any other collection that you might want to configure yourself in the future. So once you got through it in this example there is nothing more to come. After all the most important individual parameters are also pretty intuitive. We will go through them now step by step:
- src: this is the input collection that we want to add the COC information to. As you see for the jets we chose the collection of (pre)selected jets ( selectedPatJets). The output of the module, the cocPatJets, will be a a collection identical to the selectedPatJets but with the extra COC information added. Once the new collection has been produced, you can drop the old one.
- preselection: you can add a selection string to the input collection before even starting to add the COC information.
- finalCut: you can add a selection string here to the input collection after having added the COC information. You wonder what the difference between preselection and finalCut is? Remember that in the selection string you can make use of any member function of the pat::Jet. So in the final selection you could apply a selection already based on the presence of the added COC information.
- checkOverlaps: this is the heart part of the COC information configuration. We will therefore discuss it in more detail below.
The heart piece of the COC information configuration are the following
edm::ParameterSet's or
PSet's. The first thing to know is that you can add as many of those as you like. Also you are completely free of the naming of these parameter sets. In our example we chose a
PSet called
isolatedMuons and a
PSet called
isolatedElectrons. The only important part of these parameter sets is their structure, e.g. for the
isolatedMuons PSet:
isolatedMuons = cms.PSet(
src = cms.InputTag("isolatedPatMuons"),
deltaR = cms.double(0.5),
pairCut = cms.string(''),
checkRecoComponents = cms.bool(False),
algorithm = cms.string('byDeltaR'),
preselection = cms.string(''),
requireNoOverlaps = cms.bool(False)
)
Note: we list and explain the main parameters of the parameter set below:
- src: this is the input of the collections, of which the cross information should be added to the cocPatJet. In our example we want to have a cross link of any isolatedPatMuon, that might be located in the vicinity of the cocPatJet.
- algorithm: here we make the decision to add the COC information based on deltaR. The distance in deltaR is the customary way to do that. There are also alternatives to this (e.g. photons and electrons might use the same super cluster seed). But they are somewhat more involved and most of the time the deltaR criterion is sufficient. Have a look to SWGuidePATCrossCleaning for more details.
- deltaR: we add COC information based on a judgement whether the muon is in the vicinity of the jet or not. Have a look to the description of the algorithm parameter for some more details. In our example we choose deltaR<0.5
- pairCut: this is also a special parameter. You might hardly make use of it. You can for instance apply a minimal cut on the invariant mass of the jet and the muon in consideration.
- preselection: also here you can apply a preselection, but this time it will be applied to the muons and not to the jet. We could have chosen e.g. to consider only isolated muons ( trackIso<3). In this case only the muons fulfilling the isolation criterion would have been considered.
- requireNoOverlaps: you will find this switch set to False in most cases. Setting it to True will drop the jet out of the jet collection when finding a muon that fulfills the preselection requirements in the vicinity of this jet. This switch was sometimes used in the times of 'traditional' cross object cleaning, to prevent double counting as discussed above. Nowadays it can be used in a selection to put a requirement that the objects are well separated.
Question 5 a) You can find a similar parameter set (
PSet) for
isolatedElectrons. What is the difference in the configuration?
How would you configure it to get overlapping information with selectedPatMuons in addition?
You should be set now to create your own
pat::Candidates including COC information:
cmsRun PhysicsTools/PatExamples/test/patTuple_addCOC_cfg.py
Note that with this configuration file will take the
patTuple.root that we have produced above as input. The output that we are going to produce now is called
cocTuple.root. It still is a patTuple.root. We just gave it a different name. Check the output again using
edmDumpEventContent:
Type Module Label Process
-----------------------------------------------------------------------------
vector<CaloTower> "selectedPatJets" "caloTowers" "PAT"
vector<pat::Electron> "selectedPatElectrons" "" "PAT"
vector<pat::MET> "patMETs" "" "PAT"
vector<pat::Muon> "selectedPatMuons" "" "PAT"
vector<pat::Photon> "selectedPatPhotons" "" "PAT"
vector<pat::Tau> "selectedPatTaus" "" "PAT"
vector<reco::GenJet> "selectedPatJets" "genJets" "PAT"
vector<pat::Electron> "isolatedPatElectrons" "" "COC"
vector<pat::Jet> "cocPatJets" "" "COC"
vector<pat::Muon> "isolatedPatMuons" "" "COC"
Note: You see that some new collections popped up in the
cocTuple.root, when compared to the
patTuple.root that we used as input. We also dropped the
selectedPatJet collection that was identical to
cocPatJets collection except that
cocPatJets_collection had some new information added. You can have a look to the configuration of the module _out in the
patTuple_addCOC_cfg.py file to see how this replacement has been taken place.
Question 5 b) We will not use the
isolatedPatMuon collection, but still we kept it in the event on purpose. Do you know why we did this? If not re-create the cocTuple.root file with the
isolatedPatMuon collection dropped and run the exercise below.
Can you explain what's happening?
We will now analyze the COC information in the newly created
pat::Jets. For this we will use a dedicated FWLiteAnalyzer. To use it do the following
PatCOCExercise PhysicsTools/PatExamples/bin/analyzePatCOC_cfg.py
As you see we run the executable together with a cfg file. It is possible to use some features of python cfg files also within FWLite. To learn more about that have a look to
Exercise 04. We will have a short look into the cfg file:
import FWCore.ParameterSet.Config as cms
process = cms.Process("FWLitePlots")
process.FWLiteParams = cms.PSet(
inputFile = cms.string('file:cocTuple.root'),
outputFile = cms.string('analyzePatCOC.root'),
jets = cms.InputTag('cocPatJets'),
overlaps = cms.string('isolatedElectrons')
)
Note: As you see three variables are defined:
- input: denotes the input file we are going to use.
- outputFile: denotes the ouput file we are going to use.
- jetSrc: is the cms.InputTag for the jet collection we want to analyze.
- overlaps: is a cms.string label for the overlapping objects we want to analyze.
You see that for the first go we chose to analyse the overlaps of jets with
electrons. You can use any other parameter set that was defined in the configuration of the
cocPatJets before. The executable creates a root file named
analyzePatCOC.root. You can open it and view the histograms in the
TBrowser.
Having a closer look to the FWLite macro
To see how this histogram was filled, open the file
PatCOCExercise.cc with your favourite editor (you can find the file in
PhysicsTools/PatExamples/bin). The first important thing is that, in addition to including other needed header files, we need to include the header files for FWLite and PAT:
#include "DataFormats/Math/interface/deltaR.h"
#include "DataFormats/FWLite/interface/Event.h"
#include "DataFormats/Common/interface/Handle.h"
#include "DataFormats/PatCandidates/interface/Jet.h"
#include "FWCore/ParameterSet/interface/ProcessDesc.h"
#include "FWCore/FWLite/interface/AutoLibraryLoader.h"
#include "PhysicsTools/FWLite/interface/TFileService.h"
#include "FWCore/PythonParameterSet/interface/PythonProcessDesc.h"
In the
main method, first we need to read the input argument and set the input file name. We will do this using the config file features of FWLite:
// get the python configuration
PythonProcessDesc builder(argv[1]);
const edm::ParameterSet& fwliteParameters = builder.processDesc()->getProcessPSet()->getParameter("FWLiteParams");
// now get each parameter
std::string input_ ( fwliteParameters.getParameter("inputFile" ) );
std::string output_ ( fwliteParameters.getParameter("outputFile" ) );
std::string overlaps_( fwliteParameters.getParameter("overlaps") );
edm::InputTag jets_ ( fwliteParameters.getParameter("jets" ) );
As you see we read back the parameters we had defined above in the
patCOCExercise_cfg.py FWLite config file. After the file name is set, a set of histograms is booked via the TFileService.
// book a set of histograms
fwlite::TFileService fs = fwlite::TFileService(output_.c_str());
TFileDirectory theDir = fs.mkdir("analyzePatCOC");
TH1F* deltaRElecJet_ = theDir.make<TH1F>("deltaRElecJet" , "#DeltaR (elec, jet)" , 10, 0., 0.5);
TH1F* elecOverJet_ = theDir.make<TH1F>("elecOverJet" , "E_{elec}/E_{jet}" , 100, 0., 2.);
TH1F* nOverlaps_ = theDir.make<TH1F>("nOverlaps" , "Number of overlaps" , 5, 0., 5.);
Note: Note that we booked the histograms not directly in the file but created a directory with the name of the 'analyzePatCOC' beforehand. We do this for a better overview. Next we loop over the events in the input file:
// open input file (can be located on eos or remote storage via xrootd)
TFile* inFile = TFile::Open(input_.c_str());
// loop the events
unsigned int iEvent=0;
fwlite::Event ev(inFile);
for(ev.toBegin(); !ev.atEnd(); ++ev, ++iEvent){
edm::EventBase const & event = ev;
// break loop after end of file is reached
// or after 1000 events have been processed
if( iEvent==1000 ) break;
// simple event counter
if(iEvent > 0 && iEvent%1==0){
std::cout << " processing event: " << iEvent << std::endl;
}
// handle to jet collection
edm::Handle > jets; event.getByLabel(jets_, jets);
// loop over the jets in the event
for( std::vector::const_iterator jet = jets->begin(); jet != jets->end(); jet++ ){
if(jet->pt() > 20 && jet==jets->begin()){
...
We read in the
cocPatJets via
edm::InputTag, loop over all jets and fill our histograms with COC info of each leading jet with
pt greater than 20 GeV. This will be detailed in the next section.
Making use of additional the COC information
In the next sections you will see how the COC information can be accessed from the
pat::Jet. Technically this is done with minimal overhead in space consumption using
edm::Ptr
. You can read the COC information that has been added to the jets calling the method
hasOverlaps(label)
, where
label denotes the object type you want to check the overlap with. This is done in the lower part of the jet loop:
...
if(jet->hasOverlaps(overlaps_)){
//get all overlaps
const reco::CandidatePtrVector overlaps = jet->overlaps(overlaps_);
nOverlaps_->Fill( overlaps.size() );
//loop over the overlaps
for( reco::CandidatePtrVector::const_iterator overlap = overlaps.begin(); overlap != overlaps.end(); overlap++){
float deltaR = reco::deltaR( (*overlap)->eta(), (*overlap)->phi(), jet->eta(), jet->phi() );
deltaRElecJet_->Fill( deltaR );
elecOverJet_->Fill( (*overlap)->energy()/jet->energy() );
}
} ...
Note: In fact each
pat::Candidate has such a method which is inherited from the
pat::PATObject
class. For more information about the hierarchy of
pat::Candidates you can have a look at
WorkBookPATDataFormats. As already mentioned above the
hasOverlap(label) method needs one argument, which is the name of the overlap class that has been defined in the configuration of the
cocPatJets module. Remember that you are free to choose any name of the
PSet in the configuration that you want. Of course this has a consequence on what checks for COC information are available in your
cocPatJet. In our example the available classes are:
isolatedElectrons and
isolatedMuons. The function returns a
Boolean, indicating whether there was an object of that type in the vicinity of the jet or not.
But checking whether an overlap exists or not is not the only action we can do from the stored COC information. From those jets that have another object in their vicinity we next want to acces the reference pointers to these other objects in the
isolatedElectrons collection to analyze these a bit further. Have a look to the other histograms that have been filled in the EDAnalyzer and see whether they show what you would have expected. Just as an example of what you can do we filled the following histograms:
- nOverlaps: The number of overlapping objects (which can be larger than 1 as you will see!),
- elecOverJet: The fraction that the corresponding overlapping electron carries from the (jet energy scale corrected) jet energy
- deltaRElecJet: The distance between the overlapping electron and the jet axes.
Note: A technical remark: The reference pointers you get are to objects of type
reco::Candidate, so you can easily access all information of a
reco::Candidate
. Nevertheless as we know that we are pointing into a
pat::Electron collection you can use a
C++ dynamic_cast to 'expand' the data type from
reco::Candidate to
pat::Electron. In this way you can access all
pat::Electron
information and e.g. easily apply an electronID cut on the ovelapping electrons in later states of your analysis. Have a look into
Lecture 3.1, slide 15
of the June 2011 Tutorial to find out how to do this.
Exercises
Before leaving this page try to do the following exercises:
Exercise 5 a):
Run the PatCOCExercise macro with the different inputs of 'overlaps' that we have configured in the TWiki above.
Exercise 5 b):
Find out which overlaps are checked for the cleanPatJets in the PAT example configuration. To do this have a look into the patTuple_standard_cfg.py with the added configuration for example PAT COC (see above) with the python interpreter. Add the list of PSet names of objects for which COC information is added to the submission form.
Note:
In case of problems don't hesitate to contact the
SWGuidePAT#Support. Having successfully finished
Exercise 5 you might want to proceed to the other other exercises of the
WorkBookPATTutorial to learn more about how PAT supports and facilitates many common high level analysis tasks.
Review status