PAT Examples: Exotica Analysis
Contents
Example: A Generic Analyzer
Introduction
Here we introduce a very simple exotic-like signature based search with very basic PAT utilities.
Within Exotica PAG, the physics analyses could be very different among topics and subgroups.
For example, the analysis could be closer to the EWK group for the Z' searches, and other studies involving excited
quarks may be similar to TOP studies. Hence, we introduce a 'quasi-generic' study with lepton+jets final state in this tutorial.
Assuming a new physics model X produces a very unique/special final state, like pp -> Z+top+bottom, and we would like to
see if it exists in our data or not (Surely, this model X does not exist so far).
The production cross section is assumed to be 10 pb at 10
TeV LHC collision.
Here are a line-up of the issues of this virtual new physics model search:
- The final state is assumed to be Z+t+b, and here we take Z->ee or mumu with a full hadronic top, for example.
- Although the final state is defined, but we have no idea what are the good variables that could be used in the analysis.
- Although we can guess the dominant background should be Z+jets, but it's unclear what are the other background sources.
- And, generally, understanding of the standard model background is more important.
Based on these requirements and in order get a quick 'feeling' of this virtual analysis at the beginning, the following steps are introduced:
- A generic analyzer (see the 'coding' part below) is introduced.
- The events should pass through signal electron trigger or signal muon trigger.
- We store root trees (instead of fixed histograms) to get higher freedom after the work of analyzer.
- Use some simple root codes to get the final physics plots.
How to get the code
First step is to follow the
PAT recipes:
To get the "older" version of cmssw, please "setenv SCRAM_ARCH slc5_ia32_gcc434"
cmsrel CMSSW_3_5_1 [is alreday deprecated, can be used with CMSSW_3_5_2 curently available]
cd CMSSW_3_5_1/src
cmsenv
And copy the example 'ObjectCounter' analyzer:
tar xfvz /afs/cern.ch/user/k/kfjack/public/ObjectCounter/fullsrc.tar.gz
Basically the example code is consistent of a single analysis module (
ObjectCounter.cc
), one testing python configure file (
test.py
),
and testing root script (
proj.cc
) with a tree format definition (
format.h
).
NOTE: for the release CMSSW_3_4_2, please apply the following modifications for backward capability.
Replace the following lines (264-267) in
MyAna/ObjectCounter/src/ObjectCounter.cc
iEvent.getByLabel("cleanPatMuons", MuonHandle);
iEvent.getByLabel("cleanPatElectrons", ElectronHandle);
iEvent.getByLabel("cleanPatJets", JetHandle);
iEvent.getByLabel("patMETs", METHandle);
by
iEvent.getByLabel("cleanLayer1Muons", MuonHandle);
iEvent.getByLabel("cleanLayer1Electrons", ElectronHandle);
iEvent.getByLabel("cleanLayer1Jets", JetHandle);
iEvent.getByLabel("layer1METs", METHandle);
This is due to the changes made in the PAT branch naming schema after CMSSW_3_5_1.
Replace the following part in
MyAna/ObjectCounter/test/test.py
(this is for the right input and tag for release 3_4_2):
process.source = cms.Source("PoolSource",
fileNames = cms.untracked.vstring(
'rfio:/castor/cern.ch/cms/store/relval/CMSSW_3_4_2/RelValTTbar/GEN-SIM-RECO/
STARTUP3X_V15-v1/0011/06A5351A-7F13-DF11-9E89-001A9281172E.root'
)
)
and
process.GlobalTag.globaltag = cms.string('STARTUP3X_V14::All')
How to run the code
First is to build the example analyzer:
scramv1 b
and go to the test directory and run the testing configuration (loading some summer08 ttbar testing events):
cd MyAna/ObjectCounter/test
cmsRun test.py
There should be a single root output (
results.root
) generated afterwards.
It can be easily checked by root TBrowser interface, e.g.
root -l results.root
root [1] TBrowser b;
Or, by running the simple root code,
proj.cc
, a simple plot consisting of inclusive jets' pt
can be generated:
root -l proj.cc
Find out more about the details
This example analyzer is actually very simple. We made no modification on the default PAT configurations. As the
test.py
testing configurations,
the only required parts (in addition to PAT) are declaring the analyzer, placing a
TFileService for storing output root branches, and adding the
analyzer to the main job path.
process.ObjectCounter = cms.EDAnalyzer("ObjectCounter")
process.TFileService = cms.Service("TFileService",
fileName = cms.string('results.root')
)
process.p = cms.Path(
process.patDefaultSequence
+ process.ObjectCounter
)
This example analyzer only loads the standard PAT objects (electrons, muons, jets, and met), and generator information (genParticles)
and fill the corresponding information into several root branches (and no real fancy stuff).
The root branches are defined at the beginning of the
ObjectCounter.cc
. As an example,
here shows the definition of the
LepInfoBranches
. Basically it's just a simply array of some common lepton information that may be needed for further analysis.
class LepInfoBranches {
public:
int Size;
int Index[MAX_LEPTONS];
int LeptonType[MAX_LEPTONS];
int Charge[MAX_LEPTONS];
float Pt[MAX_LEPTONS];
float Eta[MAX_LEPTONS];
float Phi[MAX_LEPTONS];
float TrackIso[MAX_LEPTONS];
float EcalIso[MAX_LEPTONS];
float HcalIso[MAX_LEPTONS];
float GenPt[MAX_LEPTONS];
float GenEta[MAX_LEPTONS];
float GenPhi[MAX_LEPTONS];
int GenPdgID[MAX_LEPTONS];
int GenMCTag[MAX_LEPTONS]; // 0: unknown, 1: decay from W, 2: decay from Z
Candidate* CandRef[MAX_LEPTONS]; // backward pointer to the PAT objects
void Register(TTree *root) {
root->Branch("LepInfo.Size" , &Size , "LepInfo.Size/I" );
root->Branch("LepInfo.Index" , &Index[0] , "LepInfo.Index[LepInfo.Size]/I" );
root->Branch("LepInfo.LeptonType" , &LeptonType[0] , "LepInfo.LeptonType[LepInfo.Size]/I" );
root->Branch("LepInfo.Charge" , &Charge[0] , "LepInfo.Charge[LepInfo.Size]/I" );
root->Branch("LepInfo.Pt" , &Pt[0] , "LepInfo.Pt[LepInfo.Size]/F" );
root->Branch("LepInfo.Eta" , &Eta[0] , "LepInfo.Eta[LepInfo.Size]/F" );
root->Branch("LepInfo.Phi" , &Phi[0] , "LepInfo.Phi[LepInfo.Size]/F" );
root->Branch("LepInfo.TrackIso" , &TrackIso[0] , "LepInfo.TrackIso[LepInfo.Size]/F" );
root->Branch("LepInfo.EcalIso" , &EcalIso[0] , "LepInfo.EcalIso[LepInfo.Size]/F" );
root->Branch("LepInfo.HcalIso" , &HcalIso[0] , "LepInfo.HcalIso[LepInfo.Size]/F" );
root->Branch("LepInfo.GenPt" , &GenPt[0] , "LepInfo.GenPt[LepInfo.Size]/F" );
root->Branch("LepInfo.GenEta" , &GenEta[0] , "LepInfo.GenEta[LepInfo.Size]/F" );
root->Branch("LepInfo.GenPhi" , &GenPhi[0] , "LepInfo.GenPhi[LepInfo.Size]/F" );
root->Branch("LepInfo.GenPdgID" , &GenPdgID[0] , "LepInfo.GenPdgID[LepInfo.Size]/I" );
root->Branch("LepInfo.GenMCTag" , &GenMCTag[0] , "LepInfo.GenMCTag[LepInfo.Size]/I" );
}
};
The main code block,
ObjectCounter::analyze(const edm::Event& iEvent, const edm::EventSetup& iSetup)
, simply load the
hard coded PAT objects (here is muon, as an example). For simplification, we apply a pT threshold of 20
GeV/c in the code, as well as the 'GlobalMuonPromptTight'
muon ID before filling the branches. For the generator information, we do not only fill the matched genParticle, but also identify if the 'mother' particle is Z or W bosons.
edm::Handle<std::vector<pat::Muon> > MuonHandle;
iEvent.getByLabel("selectedLayer1Muons", MuonHandle);
memset(&LepInfo,0x00,sizeof(LepInfo));
for( std::vector<pat::Muon>::const_iterator it_mu = MuonHandle->begin();
it_mu != MuonHandle->end(); it_mu++ ) {
if (LepInfo.Size>=MAX_LEPTONS) {
fprintf(stderr,"ERROR: number of leptons exceeds the size of array.\n");
exit(1);
}
if (it_mu->pt()<20.) continue;
if (!it_mu->muonID("GlobalMuonPromptTight")) continue;
if (it_mu->innerTrack()->d0()>=0.2 || it_mu->innerTrack()->numberOfValidHits()<11) continue;
LepInfo.Index [LepInfo.Size] = LepInfo.Size;
LepInfo.LeptonType [LepInfo.Size] = 13;
LepInfo.Charge [LepInfo.Size] = it_mu->charge();
LepInfo.Pt [LepInfo.Size] = it_mu->pt();
LepInfo.Eta [LepInfo.Size] = it_mu->eta();
LepInfo.Phi [LepInfo.Size] = it_mu->phi();
LepInfo.TrackIso [LepInfo.Size] = it_mu->trackIso();
LepInfo.EcalIso [LepInfo.Size] = it_mu->ecalIso();
LepInfo.HcalIso [LepInfo.Size] = it_mu->hcalIso();
const reco::GenParticle* gen = it_mu->genLepton();
if (gen!=NULL) {
LepInfo.GenPt [LepInfo.Size] = gen->pt();
LepInfo.GenEta [LepInfo.Size] = gen->eta();
LepInfo.GenPhi [LepInfo.Size] = gen->phi();
LepInfo.GenPdgID [LepInfo.Size] = gen->pdgId();
const reco::Candidate* genCand = gen;
while(genCand!=NULL &&
genCand->numberOfMothers()==1) {
genCand = genCand->mother(0);
if (LepInfo.GenMCTag[LepInfo.Size]==0) {
if (abs(genCand->pdgId())==23) LepInfo.GenMCTag[LepInfo.Size] = 2;
else
if (abs(genCand->pdgId())==24) LepInfo.GenMCTag[LepInfo.Size] = 1;
}
}
}
LepInfo.CandRef [LepInfo.Size] = (reco::Candidate*)&(*it_mu);
LepInfo.Size++;
}
Based on the output root branches, we are already able to do many simple 'analysis works'. Here are a summary of the pre-selections done by this analysis code:
Post-analyzer studies
Some 10
TeV MC samples have been processed for this tutorial. They can be found at
/afs/cern.ch/user/k/kfjack/public/ObjectCounter
. A further pre-selection (
at least two lepton candidates in an event) is applied
in order to reduce the sample size (by uncommenting line 367 of the
ObjectCounter.cc
).
Here are the summary table of these tutorial samples (note: these are the resulting root branches, not the PAT-tuples):
Process |
generator |
cross section |
sample size |
file name |
model-X (the signal), Z->ee/mumu |
madgraph |
10 pb |
20K |
objcounter_modelX.root |
tt+jets |
madgraph |
317 pb (LO) |
1M |
objcounter_ttjets.root |
Z+jets |
madgraph |
3700 pb (LO) |
1M |
objcounter_Zjets.root |
W+jets |
madgraph |
40000 pb (LO) |
10M |
objcounter_Wjets.root |
WW |
pythia |
74 pb (LO) |
200K |
objcounter_WW.root |
ZZ |
pythia |
10.5 pb (LO) |
200K |
objcounter_ZZ.root |
WZ |
pythia |
32 pb (LO) |
200K |
objcounter_WZ.root |
Then let's produce some physics plots using these samples. This is a demonstration of what can be done after such a simply analyzer (less then 600 lines actually).
Please note the following stuff are not PAT anymore, they are simply root codes.
Basic object kinematic distributions
Please refer to the root code
/afs/cern.ch/user/k/kfjack/public/ObjectCounter/show_obj.cc
. This cint script will produce several plots for basic objects (electrons, muons, jets, met, and pairs), using the root branches given above. For example, by typing
root -l 'show_obj.cc+("elec")'
A canvas will pop up and shows 4 basic distributions of electron candidates. A summary of the possible options of this root macro is given as following:
- "elec" - electron related plots (pt, eta, track isolation, calorimeter isolation).
- "muon" - muon related plots (pt, eta, track isolation, calorimeter isolation).
- "jet" - jet plots including corrected pt, raw pt, eta, jet charge, no. of calortowers, b-tagging.
- "met" - Missing energy (MET, and MET-phi).
- "pair" - masses of pairings - it shows the M(e+e-), M(mu+mu-), generic M(jj), M(jj) with MC matched to a W.
The macro loads
objcounter_modelX.root
by default (= signal distributions).
Signal+background stacking distributions
Please refer to the root code
/afs/cern.ch/user/k/kfjack/public/ObjectCounter/show_phys.cc
.
This example is more interesting since the root code will load all the samples, and calculate the scaling factors according to the cross sections and target luminosity.
Stacking histograms including both signal and background are produced and shown in a single TCanvas.
The command is very similar:
root -l 'show_phys.cc+("ht")'
The HT distributions for electron channel and muon channel will be shown. The following cuts (for further reducing the background contributions)
are included in the root code:
- Only one good di-electron or di-muon pair in an event.
- 80 < M(e+e-) or M(mu+mu-) < 100 GeV/c^2.
- N(jets) >=4, jets are excluded if overlapped (dR<0.3) with electron candidates.
- leading jet pt > 80 GeV/c
Surely there are other available options for this root macro:
- "met" - Missing energy distributions, after the requirements listed above.
- "njets" - No. of jets after the requirements listed above except N(jet)>=4
- "mll" - M(ee) or M(mumu) distributions, with the requirements listed above except the mass cut.
- "m3j" - Invariant mass of the 2nd,3rd,4th jets (more or less equal to the mass of the hadronic top), after the full given cuts.
- "j1pt" - pt of the leading jet (highest pt jet) in the event, with the requirements listed above except pt > 80 GeV/c.
- "j1btag" - b-tagging variable of the leading jet (it supports to be the hard b-jet) in the event, after the full given cuts.
The open area of the resulting plots represents the signal and the filled histograms are the background. The light green area is the contributions from Z+jets (the rest of the background events are shown in navy). It is clear that we could have further background suppression power with b-tagging, or other possible variables.
Example: The HEEP Analyzer
Using the HEEP Selector
In case anything is unclear, please email me Sam Harper at sharper@(you know where).ch and I will be happy to clarify.
CMSSW_3_1_2 is the version used for summer production. The HEEPSelector has been only been tested in this version. It should work in later 31X versions although this is untested.
A class named heep::EleSelector performs the heep selection on an electron and returns an int with individual bits set if the cut they correspond to fails. Thus a value of 0x0 is all cuts have passed. More details on this class
There are three ways you can use high energy electrons in CMSSW_3_1_2 with the heep selection.
- with bare PAT / reco::CMS.GsfElectron, example analyzer is HEEPAnalyzerBarePAT
- with heep processed PAT, example analyzer is HEEPAnalyzerHEEPPAT
- with heep::Ele and heep::Event, example analyser is HEEPAnalyzer
The main disadvantage of bare PAT/ reco::!CMS.GsfElectron is that you have to manually correct the electrons energy to the ecal energy every time you wish to access the fourmomentum, energy or Et and that the selection result is not stored in the object. The module HEEPAttStatusToPAT written by M. Mozer addresses this issue by making a new collection of pat electrons with the energy reset to the ecal energy and the heep selection result accessable by pat::Electron::userInt("HEEPId"), simplifing the use of the electrons. Finally there is the heep::Event and heep::Ele classes which also store the result and the correct energy. These are HEEP group maintained classes which aim to simply the use of high energy electrons. However with recent developments in reco and pat objects, the heep::Ele class is mostly redundant currently although it does have the advantage of being able to make quick changes as needed in the future. Each method has an example module with makes the mass spectrum of EB-EB and EB-EE events passing the heep selection.
Instructions
scramv1 proj -n CMSSW_312_heepPat CMSSW CMSSW_3_1_2
cd CMSSW_312_heepPat/src
cvs co -r V00-00-13 -d SHarper/HEEPAnalyzer UserCode/SHarper/HEEPAnalyzer
scramv1 b
Overview of HEEP Analyzer
The HEEP analyzer is a collection of classes whose aim to facilitate the use of high energy electrons. The are designed to wrap around standard RECOs and PAT objects, making them more user friendly. Note that you do not have to use these classes, you can achieve everything with bare PAT. All these classes do is a) serve as a reference implementation and b) save time by automating common tasks. For example for each event, it will automatically get the superclusters, pat electrons in the standard way, saving you having to explicitly get the handles to the collections yourself.
The HEEP Analyzer config files do two seperate things. First they configure the PAT to the needs of the HEEP group although currently default PAT works fine for us. After this stage, users can use the pat as normal with the caveate that that the selection will need to be performed manually. There is a class available which will do this, heep::EleSelector. The second stage is configuring the heep::Event and the heep selection. This is an optional step but these classes are designed to make the easier to use and they are designed to be nimble and flexible to react quickly to changes in electron ID, etc. Therefore these classes will change in the future to reflect the best knowledge that I have of electron ID. Users are not encouraged to write them to file because a) it is unnecessary as they are simple wrappers and b) it means changes will risk breaking backwards compatibility.
heep::Event
The HEEP event consists of a selection of handles to products of interest to the HEEP group. The aim of this class is to be able to access everything we wish to know about the event from this class without having to require any additional information (eg input tag of electron collection). Like all HEEP classes, it has a reference to the RECO/PAT object to pass into non HEEP functions.
heep::Ele
The HEEP electron wraps around a PAT or GSF electron but simplifies some of the methods. For example deltaPhiSuperClusterTrackAtVtx() is remapped to dPhiIn() its more common name. It main purpose is to faciliate the addition of new electron data members faster than that possible for the PAT (eg we had sigmaIEtaIEta, e2x5, etc way ahead of the PAT). It also makes our custom isolations easier to use (ele.hcalDepth2() is a lot clearer than ele.userIso(1) in my opinion). It also offers two additional peices of information, the HEEP cuts it failed and the triggers it passed. It is not possible to use the standard electron ID implemented in the PAT as that just gives an inadequate pass/fail with no information on what cuts actually failed. Again, a reference to the original PAT or GSF electron is available so it can be used with all functions requiring PAT or GSF electrons.
heep::EleSelector
This is a class which automatically selects the correct cuts (barrel or endcap) for the electron and returns a int with each bit corresponding to whether a particular cut has failed or not. The bit is set if its failed so a value of 0 indicates a pass. The bits are defined in heep::CutCodes together with tools for efficiently accessing them. The cut values are stored in heep::EleCutValues and defined in the config file. It can take heep::Ele, reco::CMS.GsfElectron or pat::Electron objects as input.
heep::EventHelper
This class keeps track of all the labels for the collections heep::Event requires and to set the handles in heep::Event for each new event. It essentially creates the heep::Event from an edm::Event. It also creates the heep::Ele objects. This is the core of the HEEPAnalyzer which contains all the information to make a heep::Event. It is expected that users using the HEEP software package will make a new analyzer and create a new object of heep::EventHelper in that analyzer to access all the HEEP tools.
heep::TrigCodes, heep::trigtools
heep::TrigCodes defines the format used to store HLT information in the HEEP electrons. TrigCodes defines a TrigBitSet which is a std::bitset<64> meaning we can have 64 possible triggers and the tools to manipulate it. The reasoning behind this is that its much simpler and faster to work with trigger bits than trigger names. Eg if you want to know if you pass the Ele15 and Ele15LW triggers which are assigned bits 0 and 2, while you have a total of 10 triggers defined. Using strings, you would have to sort them at creation and then perform a binary search for the Ele15 and
Ele15LW trigger names. As HLT filter names are very long hltL1NonIsoHLTNonIsoSingleElectronLWEt15TrackIsolFilter and typically differ at the end, this is not speedy. Using bits the same operation is
(0x5 & trigBits) == 0x5
for both and
(0x5 & trigBits) !=0x0
for at least one. Note it is very important to release that bitwise operators (&,|,^) are of lower presedence than == and != so you have to use the brackets as shown above.
Another reason I do not use the PATs trigger matching is that to match a PAT ele with a trigger object requires me to define and run two additional modules which quickly mounts up when you consider all the trigger paths. Still you can use the pat trigger matching if you wish, HEEP and PAT use identical requirements.
Note that definitions of each trigger bit are unique to the job (ie you change what filters you are looking for, you change the bit definitions) so you should never store them outside of the job. Internally within the job you should store the bit not the string to take advantage of the speed boost and ease of use. eg this is pseudo c++
module::module(...)
{
heep::TrigCodes::TrigBitSet triggersToCheck_ = TrigCodes::getCode("hltL1NonIsoHLTNonIsoSingleElectronLWEt15TrackIsolFilter:hltL1NonIsoHLTNonIsoSingleElectronEt15TrackIsolFilter"); //trig name separated by ":"
....
}
void module::analyze(...)
{
...
if( (heepEle.trigBits() & triggerToCheck_) !=0x0 ) //do what you wanted to do if ele passed trigger
...
}
How to get more information
For more information regarding the Exotica analyses, please refer to the main Exotica PAG TWiki.
https://twiki.cern.ch/twiki/bin/view/CMS/EXOTICA
Review status
Responsible:
KaiFengChen and
KatiLassilaPerini
Last reviewed by: most recent reviewer and date