PAT Examples: Exotica Analysis

Contents

Example: A Generic Analyzer

Introduction

Here we introduce a very simple exotic-like signature based search with very basic PAT utilities.

Within Exotica PAG, the physics analyses could be very different among topics and subgroups. For example, the analysis could be closer to the EWK group for the Z' searches, and other studies involving excited quarks may be similar to TOP studies. Hence, we introduce a 'quasi-generic' study with lepton+jets final state in this tutorial. Assuming a new physics model X produces a very unique/special final state, like pp -> Z+top+bottom, and we would like to see if it exists in our data or not (Surely, this model X does not exist so far). The production cross section is assumed to be 10 pb at 10 TeV LHC collision. Here are a line-up of the issues of this virtual new physics model search:

  • The final state is assumed to be Z+t+b, and here we take Z->ee or mumu with a full hadronic top, for example.
  • Although the final state is defined, but we have no idea what are the good variables that could be used in the analysis.
  • Although we can guess the dominant background should be Z+jets, but it's unclear what are the other background sources.
  • And, generally, understanding of the standard model background is more important.

Based on these requirements and in order get a quick 'feeling' of this virtual analysis at the beginning, the following steps are introduced:

  • A generic analyzer (see the 'coding' part below) is introduced.
  • The events should pass through signal electron trigger or signal muon trigger.
  • We store root trees (instead of fixed histograms) to get higher freedom after the work of analyzer.
  • Use some simple root codes to get the final physics plots.

How to get the code

First step is to follow the PAT recipes:

To get the "older" version of cmssw, please "setenv SCRAM_ARCH slc5_ia32_gcc434"
cmsrel CMSSW_3_5_1 [is alreday deprecated, can be used with CMSSW_3_5_2 curently available]
cd CMSSW_3_5_1/src
cmsenv

And copy the example 'ObjectCounter' analyzer:

tar xfvz /afs/cern.ch/user/k/kfjack/public/ObjectCounter/fullsrc.tar.gz

Basically the example code is consistent of a single analysis module (ObjectCounter.cc), one testing python configure file (test.py), and testing root script (proj.cc) with a tree format definition (format.h).

NOTE: for the release CMSSW_3_4_2, please apply the following modifications for backward capability.

Replace the following lines (264-267) in MyAna/ObjectCounter/src/ObjectCounter.cc

  iEvent.getByLabel("cleanPatMuons",     MuonHandle);
  iEvent.getByLabel("cleanPatElectrons", ElectronHandle);
  iEvent.getByLabel("cleanPatJets",      JetHandle);
  iEvent.getByLabel("patMETs",           METHandle);
by
  iEvent.getByLabel("cleanLayer1Muons",     MuonHandle);
  iEvent.getByLabel("cleanLayer1Electrons", ElectronHandle);
  iEvent.getByLabel("cleanLayer1Jets",      JetHandle);
  iEvent.getByLabel("layer1METs",           METHandle);
This is due to the changes made in the PAT branch naming schema after CMSSW_3_5_1.

Replace the following part in MyAna/ObjectCounter/test/test.py (this is for the right input and tag for release 3_4_2):

process.source = cms.Source("PoolSource",
    fileNames = cms.untracked.vstring(
        'rfio:/castor/cern.ch/cms/store/relval/CMSSW_3_4_2/RelValTTbar/GEN-SIM-RECO/
STARTUP3X_V15-v1/0011/06A5351A-7F13-DF11-9E89-001A9281172E.root'
    )
)
and
process.GlobalTag.globaltag = cms.string('STARTUP3X_V14::All')

How to run the code

First is to build the example analyzer:

scramv1 b

and go to the test directory and run the testing configuration (loading some summer08 ttbar testing events):

cd MyAna/ObjectCounter/test 
cmsRun test.py

There should be a single root output (results.root) generated afterwards. It can be easily checked by root TBrowser interface, e.g.

root -l results.root 
root [1] TBrowser b;

Or, by running the simple root code, proj.cc, a simple plot consisting of inclusive jets' pt can be generated:

root -l proj.cc

Find out more about the details

This example analyzer is actually very simple. We made no modification on the default PAT configurations. As the test.py testing configurations, the only required parts (in addition to PAT) are declaring the analyzer, placing a TFileService for storing output root branches, and adding the analyzer to the main job path.

process.ObjectCounter = cms.EDAnalyzer("ObjectCounter")

process.TFileService = cms.Service("TFileService",
    fileName = cms.string('results.root')
)

process.p = cms.Path(
                  process.patDefaultSequence
                + process.ObjectCounter
            )

This example analyzer only loads the standard PAT objects (electrons, muons, jets, and met), and generator information (genParticles) and fill the corresponding information into several root branches (and no real fancy stuff). The root branches are defined at the beginning of the ObjectCounter.cc. As an example, here shows the definition of the LepInfoBranches. Basically it's just a simply array of some common lepton information that may be needed for further analysis.

class LepInfoBranches {
public:
  int   Size; 
  int   Index[MAX_LEPTONS];
  int   LeptonType[MAX_LEPTONS];
  int   Charge[MAX_LEPTONS];
  float Pt[MAX_LEPTONS];
  float Eta[MAX_LEPTONS];
  float Phi[MAX_LEPTONS];
  float TrackIso[MAX_LEPTONS];
  float EcalIso[MAX_LEPTONS];
  float HcalIso[MAX_LEPTONS];
  float GenPt[MAX_LEPTONS];
  float GenEta[MAX_LEPTONS];
  float GenPhi[MAX_LEPTONS];
  int   GenPdgID[MAX_LEPTONS];
  int   GenMCTag[MAX_LEPTONS];   // 0: unknown, 1: decay from W, 2: decay from Z
  
  Candidate* CandRef[MAX_LEPTONS]; // backward pointer to the PAT objects
  
  void Register(TTree *root) {
       root->Branch("LepInfo.Size"       , &Size          , "LepInfo.Size/I"                     );
       root->Branch("LepInfo.Index"      , &Index[0]      , "LepInfo.Index[LepInfo.Size]/I"      );
       root->Branch("LepInfo.LeptonType" , &LeptonType[0] , "LepInfo.LeptonType[LepInfo.Size]/I" );
       root->Branch("LepInfo.Charge"     , &Charge[0]     , "LepInfo.Charge[LepInfo.Size]/I"     );
       root->Branch("LepInfo.Pt"         , &Pt[0]         , "LepInfo.Pt[LepInfo.Size]/F"         );
       root->Branch("LepInfo.Eta"        , &Eta[0]        , "LepInfo.Eta[LepInfo.Size]/F"        );
       root->Branch("LepInfo.Phi"        , &Phi[0]        , "LepInfo.Phi[LepInfo.Size]/F"        );
       root->Branch("LepInfo.TrackIso"   , &TrackIso[0]   , "LepInfo.TrackIso[LepInfo.Size]/F"   );
       root->Branch("LepInfo.EcalIso"    , &EcalIso[0]    , "LepInfo.EcalIso[LepInfo.Size]/F"    );   
       root->Branch("LepInfo.HcalIso"    , &HcalIso[0]    , "LepInfo.HcalIso[LepInfo.Size]/F"    ); 
       root->Branch("LepInfo.GenPt"      , &GenPt[0]      , "LepInfo.GenPt[LepInfo.Size]/F"      );
       root->Branch("LepInfo.GenEta"     , &GenEta[0]     , "LepInfo.GenEta[LepInfo.Size]/F"     );
       root->Branch("LepInfo.GenPhi"     , &GenPhi[0]     , "LepInfo.GenPhi[LepInfo.Size]/F"     );
       root->Branch("LepInfo.GenPdgID"   , &GenPdgID[0]   , "LepInfo.GenPdgID[LepInfo.Size]/I"   );
       root->Branch("LepInfo.GenMCTag"   , &GenMCTag[0]   , "LepInfo.GenMCTag[LepInfo.Size]/I"   );
  }  
};

The main code block, ObjectCounter::analyze(const edm::Event& iEvent, const edm::EventSetup& iSetup), simply load the hard coded PAT objects (here is muon, as an example). For simplification, we apply a pT threshold of 20 GeV/c in the code, as well as the 'GlobalMuonPromptTight' muon ID before filling the branches. For the generator information, we do not only fill the matched genParticle, but also identify if the 'mother' particle is Z or W bosons.

edm::Handle<std::vector<pat::Muon> >     MuonHandle;
iEvent.getByLabel("selectedLayer1Muons",     MuonHandle);

memset(&LepInfo,0x00,sizeof(LepInfo));
  
for( std::vector<pat::Muon>::const_iterator it_mu = MuonHandle->begin(); 
     it_mu != MuonHandle->end(); it_mu++ ) {
       
  if (LepInfo.Size>=MAX_LEPTONS) {
        fprintf(stderr,"ERROR: number of leptons exceeds the size of array.\n");
        exit(1);
  }
    
  if (it_mu->pt()<20.) continue;
  if (!it_mu->muonID("GlobalMuonPromptTight")) continue;
  if (it_mu->innerTrack()->d0()>=0.2 || it_mu->innerTrack()->numberOfValidHits()<11) continue;

  LepInfo.Index      [LepInfo.Size] = LepInfo.Size;
  LepInfo.LeptonType [LepInfo.Size] = 13;
  LepInfo.Charge     [LepInfo.Size] = it_mu->charge();
  LepInfo.Pt         [LepInfo.Size] = it_mu->pt();
  LepInfo.Eta        [LepInfo.Size] = it_mu->eta();
  LepInfo.Phi        [LepInfo.Size] = it_mu->phi();
  LepInfo.TrackIso   [LepInfo.Size] = it_mu->trackIso();
  LepInfo.EcalIso    [LepInfo.Size] = it_mu->ecalIso();
  LepInfo.HcalIso    [LepInfo.Size] = it_mu->hcalIso();
    
  const reco::GenParticle* gen = it_mu->genLepton();
  if (gen!=NULL) {
        LepInfo.GenPt    [LepInfo.Size] = gen->pt();
        LepInfo.GenEta   [LepInfo.Size] = gen->eta();
        LepInfo.GenPhi   [LepInfo.Size] = gen->phi();
        LepInfo.GenPdgID [LepInfo.Size] = gen->pdgId();
    
        const reco::Candidate* genCand = gen;
        
        while(genCand!=NULL &&
              genCand->numberOfMothers()==1) {
                    
              genCand = genCand->mother(0);
                        
              if (LepInfo.GenMCTag[LepInfo.Size]==0) {
                      if (abs(genCand->pdgId())==23) LepInfo.GenMCTag[LepInfo.Size] = 2;
              else
                      if (abs(genCand->pdgId())==24) LepInfo.GenMCTag[LepInfo.Size] = 1;        
              }
        }
  }
    
  LepInfo.CandRef [LepInfo.Size] = (reco::Candidate*)&(*it_mu);
  LepInfo.Size++;
}

Based on the output root branches, we are already able to do many simple 'analysis works'. Here are a summary of the pre-selections done by this analysis code:

Object Root branches Selections
Muon LepInfoBranches pt>20, GlobalMuonPromptTight ID, inner track d0<0.2, N(hits)>11
Electron LepInfoBranches pt>20, Robust-loose ID
Jet JetInfoBranches pt>30
MET EvtInfoBranches none
di-lepton Pairs PairInfoBranches passing the lepton selections above + invariant mass < 200 GeV
di-jet Pairs PairInfoBranches passing the jet selections above + invariant mass < 200 GeV

Post-analyzer studies

Some 10 TeV MC samples have been processed for this tutorial. They can be found at /afs/cern.ch/user/k/kfjack/public/ObjectCounter. A further pre-selection (at least two lepton candidates in an event) is applied in order to reduce the sample size (by uncommenting line 367 of the ObjectCounter.cc). Here are the summary table of these tutorial samples (note: these are the resulting root branches, not the PAT-tuples):

Process generator cross section sample size file name
model-X (the signal), Z->ee/mumu madgraph 10 pb 20K objcounter_modelX.root
tt+jets madgraph 317 pb (LO) 1M objcounter_ttjets.root
Z+jets madgraph 3700 pb (LO) 1M objcounter_Zjets.root
W+jets madgraph 40000 pb (LO) 10M objcounter_Wjets.root
WW pythia 74 pb (LO) 200K objcounter_WW.root
ZZ pythia 10.5 pb (LO) 200K objcounter_ZZ.root
WZ pythia 32 pb (LO) 200K objcounter_WZ.root

Then let's produce some physics plots using these samples. This is a demonstration of what can be done after such a simply analyzer (less then 600 lines actually). Please note the following stuff are not PAT anymore, they are simply root codes.

Basic object kinematic distributions

Please refer to the root code /afs/cern.ch/user/k/kfjack/public/ObjectCounter/show_obj.cc. This cint script will produce several plots for basic objects (electrons, muons, jets, met, and pairs), using the root branches given above. For example, by typing

root -l 'show_obj.cc+("elec")'
A canvas will pop up and shows 4 basic distributions of electron candidates. A summary of the possible options of this root macro is given as following:

  • "elec" - electron related plots (pt, eta, track isolation, calorimeter isolation).
  • "muon" - muon related plots (pt, eta, track isolation, calorimeter isolation).
  • "jet" - jet plots including corrected pt, raw pt, eta, jet charge, no. of calortowers, b-tagging.
  • "met" - Missing energy (MET, and MET-phi).
  • "pair" - masses of pairings - it shows the M(e+e-), M(mu+mu-), generic M(jj), M(jj) with MC matched to a W.

The macro loads objcounter_modelX.root by default (= signal distributions).

Signal+background stacking distributions

Please refer to the root code /afs/cern.ch/user/k/kfjack/public/ObjectCounter/show_phys.cc. This example is more interesting since the root code will load all the samples, and calculate the scaling factors according to the cross sections and target luminosity. Stacking histograms including both signal and background are produced and shown in a single TCanvas. The command is very similar:

root -l 'show_phys.cc+("ht")'
The HT distributions for electron channel and muon channel will be shown. The following cuts (for further reducing the background contributions) are included in the root code:
  • Only one good di-electron or di-muon pair in an event.
  • 80 < M(e+e-) or M(mu+mu-) < 100 GeV/c^2.
  • N(jets) >=4, jets are excluded if overlapped (dR<0.3) with electron candidates.
  • leading jet pt > 80 GeV/c
Surely there are other available options for this root macro:

  • "met" - Missing energy distributions, after the requirements listed above.
  • "njets" - No. of jets after the requirements listed above except N(jet)>=4
  • "mll" - M(ee) or M(mumu) distributions, with the requirements listed above except the mass cut.
  • "m3j" - Invariant mass of the 2nd,3rd,4th jets (more or less equal to the mass of the hadronic top), after the full given cuts.
  • "j1pt" - pt of the leading jet (highest pt jet) in the event, with the requirements listed above except pt > 80 GeV/c.
  • "j1btag" - b-tagging variable of the leading jet (it supports to be the hard b-jet) in the event, after the full given cuts.

The open area of the resulting plots represents the signal and the filled histograms are the background. The light green area is the contributions from Z+jets (the rest of the background events are shown in navy). It is clear that we could have further background suppression power with b-tagging, or other possible variables.

Example: The HEEP Analyzer

Using the HEEP Selector

In case anything is unclear, please email me Sam Harper at sharper@(you know where).ch and I will be happy to clarify.

CMSSW_3_1_2 is the version used for summer production. The HEEPSelector has been only been tested in this version. It should work in later 31X versions although this is untested.

A class named heep::EleSelector performs the heep selection on an electron and returns an int with individual bits set if the cut they correspond to fails. Thus a value of 0x0 is all cuts have passed. More details on this class

There are three ways you can use high energy electrons in CMSSW_3_1_2 with the heep selection.

  • with bare PAT / reco::CMS.GsfElectron, example analyzer is HEEPAnalyzerBarePAT
  • with heep processed PAT, example analyzer is HEEPAnalyzerHEEPPAT
  • with heep::Ele and heep::Event, example analyser is HEEPAnalyzer

The main disadvantage of bare PAT/ reco::!CMS.GsfElectron is that you have to manually correct the electrons energy to the ecal energy every time you wish to access the fourmomentum, energy or Et and that the selection result is not stored in the object. The module HEEPAttStatusToPAT written by M. Mozer addresses this issue by making a new collection of pat electrons with the energy reset to the ecal energy and the heep selection result accessable by pat::Electron::userInt("HEEPId"), simplifing the use of the electrons. Finally there is the heep::Event and heep::Ele classes which also store the result and the correct energy. These are HEEP group maintained classes which aim to simply the use of high energy electrons. However with recent developments in reco and pat objects, the heep::Ele class is mostly redundant currently although it does have the advantage of being able to make quick changes as needed in the future. Each method has an example module with makes the mass spectrum of EB-EB and EB-EE events passing the heep selection.

Instructions

scramv1 proj -n CMSSW_312_heepPat CMSSW CMSSW_3_1_2
cd CMSSW_312_heepPat/src
cvs co -r V00-00-13 -d SHarper/HEEPAnalyzer UserCode/SHarper/HEEPAnalyzer 
scramv1 b

Overview of HEEP Analyzer

The HEEP analyzer is a collection of classes whose aim to facilitate the use of high energy electrons. The are designed to wrap around standard RECOs and PAT objects, making them more user friendly. Note that you do not have to use these classes, you can achieve everything with bare PAT. All these classes do is a) serve as a reference implementation and b) save time by automating common tasks. For example for each event, it will automatically get the superclusters, pat electrons in the standard way, saving you having to explicitly get the handles to the collections yourself.

The HEEP Analyzer config files do two seperate things. First they configure the PAT to the needs of the HEEP group although currently default PAT works fine for us. After this stage, users can use the pat as normal with the caveate that that the selection will need to be performed manually. There is a class available which will do this, heep::EleSelector. The second stage is configuring the heep::Event and the heep selection. This is an optional step but these classes are designed to make the easier to use and they are designed to be nimble and flexible to react quickly to changes in electron ID, etc. Therefore these classes will change in the future to reflect the best knowledge that I have of electron ID. Users are not encouraged to write them to file because a) it is unnecessary as they are simple wrappers and b) it means changes will risk breaking backwards compatibility.

heep::Event

The HEEP event consists of a selection of handles to products of interest to the HEEP group. The aim of this class is to be able to access everything we wish to know about the event from this class without having to require any additional information (eg input tag of electron collection). Like all HEEP classes, it has a reference to the RECO/PAT object to pass into non HEEP functions.

heep::Ele

The HEEP electron wraps around a PAT or GSF electron but simplifies some of the methods. For example deltaPhiSuperClusterTrackAtVtx() is remapped to dPhiIn() its more common name. It main purpose is to faciliate the addition of new electron data members faster than that possible for the PAT (eg we had sigmaIEtaIEta, e2x5, etc way ahead of the PAT). It also makes our custom isolations easier to use (ele.hcalDepth2() is a lot clearer than ele.userIso(1) in my opinion). It also offers two additional peices of information, the HEEP cuts it failed and the triggers it passed. It is not possible to use the standard electron ID implemented in the PAT as that just gives an inadequate pass/fail with no information on what cuts actually failed. Again, a reference to the original PAT or GSF electron is available so it can be used with all functions requiring PAT or GSF electrons.

heep::EleSelector

This is a class which automatically selects the correct cuts (barrel or endcap) for the electron and returns a int with each bit corresponding to whether a particular cut has failed or not. The bit is set if its failed so a value of 0 indicates a pass. The bits are defined in heep::CutCodes together with tools for efficiently accessing them. The cut values are stored in heep::EleCutValues and defined in the config file. It can take heep::Ele, reco::CMS.GsfElectron or pat::Electron objects as input.

heep::EventHelper

This class keeps track of all the labels for the collections heep::Event requires and to set the handles in heep::Event for each new event. It essentially creates the heep::Event from an edm::Event. It also creates the heep::Ele objects. This is the core of the HEEPAnalyzer which contains all the information to make a heep::Event. It is expected that users using the HEEP software package will make a new analyzer and create a new object of heep::EventHelper in that analyzer to access all the HEEP tools.

heep::TrigCodes, heep::trigtools

heep::TrigCodes defines the format used to store HLT information in the HEEP electrons. TrigCodes defines a TrigBitSet which is a std::bitset<64> meaning we can have 64 possible triggers and the tools to manipulate it. The reasoning behind this is that its much simpler and faster to work with trigger bits than trigger names. Eg if you want to know if you pass the Ele15 and Ele15LW triggers which are assigned bits 0 and 2, while you have a total of 10 triggers defined. Using strings, you would have to sort them at creation and then perform a binary search for the Ele15 and Ele15LW trigger names. As HLT filter names are very long hltL1NonIsoHLTNonIsoSingleElectronLWEt15TrackIsolFilter and typically differ at the end, this is not speedy. Using bits the same operation is
 (0x5 & trigBits) == 0x5 
for both and
 (0x5 & trigBits) !=0x0 
for at least one. Note it is very important to release that bitwise operators (&,|,^) are of lower presedence than == and != so you have to use the brackets as shown above.

Another reason I do not use the PATs trigger matching is that to match a PAT ele with a trigger object requires me to define and run two additional modules which quickly mounts up when you consider all the trigger paths. Still you can use the pat trigger matching if you wish, HEEP and PAT use identical requirements.

Note that definitions of each trigger bit are unique to the job (ie you change what filters you are looking for, you change the bit definitions) so you should never store them outside of the job. Internally within the job you should store the bit not the string to take advantage of the speed boost and ease of use. eg this is pseudo c++

module::module(...)
{
heep::TrigCodes::TrigBitSet triggersToCheck_ = TrigCodes::getCode("hltL1NonIsoHLTNonIsoSingleElectronLWEt15TrackIsolFilter:hltL1NonIsoHLTNonIsoSingleElectronEt15TrackIsolFilter"); //trig name separated by ":"
....
}

void module::analyze(...)
{
...
if( (heepEle.trigBits() & triggerToCheck_) !=0x0 ) //do what you wanted to do if ele passed trigger
...
}

How to get more information

For more information regarding the Exotica analyses, please refer to the main Exotica PAG TWiki. https://twiki.cern.ch/twiki/bin/view/CMS/EXOTICA

Review status

Reviewer/Editor and Date (copy from screen) Comments
SamirGuragain - June 22 2012 Added how to setup the environment for older cmssw release
KaiFengChen - 15 Mar 2010 Add the HEEP example
KaiFengChen - 20 Feb 2010 Update to CMSSW_3_5_1 release
KaiFengChen - 14 June 2009 Added tutorial description
RogerWolf - 13 May 2009 Created the template page

Responsible: KaiFengChen and KatiLassilaPerini
Last reviewed by: most recent reviewer and date

Edit | Attach | Watch | Print version | History: r13 < r12 < r11 < r10 < r9 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r13 - 2012-06-22 - unknown
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback