Chapter 8: Physics Group Anylysis Examples



No permission to view CMS.BPhysicsPAT

PAT Examples: Electroweak Analysis

Contents

Introduction

In this tutorial we will show how to make a PAT-based analysis in the context of the EWK PAG. The inclusive Z->mumu analysis, which exploits PAT objects as muons and tracks, will be taken as an example. If you are interested in EWK electron analysis, please refer to the EWK Electron tutorial

The Z->mumu analysis is performed in two steps. In the first step we run the dimuon skim over the original AOD collections. The skim makes a HLT-based preselection of events, creates PAT objects from the "official" muons and tracks provided by the CMS reconstruction, and builds dimuon candidates to be stored into the skim output. In the second step we run an analyzer on the skim output which selects the dimuon candidates compatible with a Z->mumu decay and plots their invariant mass distribution.

You will learn:

  • how to run the dimuon skim
  • how to access some variables from the PAT objects in order to make a signal selection

How to get the dimuon skim code

The tutorial analysis works in CMSSW_4_2_4 release. So, first create a project area based on 4_2_4:
cmsrel CMSSW_4_2_4
cd CMSSW_4_2_4/src/
cmsenv

Then, get the package for the skim and compile:

addpkg ElectroWeakAnalysis/Skimming V01-01-12
cd ElectroWeakAnalysis/Skimming
scram b

How to run the dimuon skim

Go to the test directory of the package:

cd ElectroWeakAnalysis/Skimming/test/

Specify a Z->mumu MC collection to be given in input by editing the following parameter in the cfg file dimuonsSkim.py:

process.source = cms.Source("PoolSource", 
     fileNames = cms.untracked.vstring(
     'rfio:/castor/cern.ch/user/f/fabozzi/tutorial/sep2010/62C86D62-BFAF-DF11-85B3-003048678A6C.root'
    )
)

Specify the number of events to be processed (a few hundred is enough for the tutorial):

process.maxEvents = cms.untracked.PSet( input = cms.untracked.int32(200) )

Run the skim [Before running, make sure the input file is accessible and the corresponding global tag]:

cmsRun dimuonsSkim.py

At the end you will get a skimmed collection containing AOD + dimuon candidates objects:

testDimuonSkim.root

Find out more about the skim configuration

Look at files into ElectroWeakAnalysis/Skimming/python/ if you want to inspect the configuration of the dimuon skim.

In order to check the relevant configurations for PAT muons and tracks, have a look at:

ElectroWeakAnalysis/Skimming/python/patCandidatesForDimuonsSequences_cff.py
Comment lines in this file should help you in following the description below.

PAT muons are created by calling standard PAT configuration files:

CMS.PhysicsTools/PatAlgos/python/mcMatchLayer0/muonMatch_cfi.py
CMS.PhysicsTools/PatAlgos/python/producersLayer1/muonProducer_cfi.py
CMS.PhysicsTools/PatAlgos/python/selectionLayer1/muonSelector_cfi.py
Note that there are changes of the default parameters of the module making MC matching of PAT muons to generated muons:
from CMS.PhysicsTools.PatAlgos.mcMatchLayer0.muonMatch_cfi import *
muonMatch.maxDeltaR = 0.15
muonMatch.maxDPtRel = 1.0
muonMatch.resolveAmbiguities = True
muonMatch.resolveByMatchQuality = True

We also compute custom tracker, ecal, and hcal isolation variables to be embedded in PAT muons:

from CMS.PhysicsTools.PatAlgos.producersLayer1.muonProducer_cfi import *
patMuons.isoDeposits = cms.PSet(
        tracker = cms.InputTag("muIsoDepositTk"),
        ecal    = cms.InputTag("muIsoDepositCalByAssociatorTowers","ecal"),
        hcal    = cms.InputTag("muIsoDepositCalByAssociatorTowers","hcal"),
)
patMuons.userIsolation = cms.PSet(
        hcal = cms.PSet(
            src = cms.InputTag("muIsoDepositCalByAssociatorTowers","hcal"),
            deltaR = cms.double(0.3)
        ),
        tracker = cms.PSet(
            veto = cms.double(0.015),
            src = cms.InputTag("muIsoDepositTk"),
            deltaR = cms.double(0.3),
            threshold = cms.double(1.5)
            ),
        ecal = cms.PSet(
            src = cms.InputTag("muIsoDepositCalByAssociatorTowers","ecal"),
            deltaR = cms.double(0.3)
        )
    )

We apply a (dummy) selection to patMuons obtaining the PAT muon objects called selectedPatMuons. Finally we perform trigger matching between selectedPatMuons and the HLT muons firing the HLT_Mu9 trigger path. The collection of PAT muons embedding the trigger matching result is called selectedPatMuonsTriggerMatch.

Standard and custom configuration files are used to make PAT tracks. Intermediate candidates (patAODTrackCands) are first created. Only tracks above a certain pT threshold (10 GeV/c) are considered. Note that at this level, IsoDeposits in tracker and calorimeters associated to the tracks are computed from AOD, and MC matching to generated muons is performed.

PAT tracks (allPatTracks) are built by the standard producer for PAT generic particles:

CMS.PhysicsTools/PatAlgos/python/producersLayer1/genericParticleProducer_cfi.py
The following parameters are specified in order to embed some variables into PAT tracks (MC match to generated muons, IsoDeposits, tracker, ecal, and hcal pre-computed isolation values):
allPatTracks = patGenericParticles.clone(
    src = cms.InputTag("patAODTrackCands"),
    # isolation configurables
    userIsolation = cms.PSet(
      tracker = cms.PSet(
        veto = cms.double(0.015),
        src = cms.InputTag("patAODTrackIsoDepositCtfTk"),
        deltaR = cms.double(0.3),
        threshold = cms.double(1.5)
      ),
      ecal = cms.PSet(
        src = cms.InputTag("patAODTrackIsoDepositCalByAssociatorTowers","ecal"),
        deltaR = cms.double(0.3)
      ),
      hcal = cms.PSet(
        src = cms.InputTag("patAODTrackIsoDepositCalByAssociatorTowers","hcal"),
        deltaR = cms.double(0.3)
      ),
    ),
    isoDeposits = cms.PSet(
      tracker = cms.InputTag("patAODTrackIsoDepositCtfTk"),
      ecal = cms.InputTag("patAODTrackIsoDepositCalByAssociatorTowers","ecal"),
      hcal = cms.InputTag("patAODTrackIsoDepositCalByAssociatorTowers","hcal")
    ),
    addGenMatch = cms.bool(True),
    genParticleMatch = cms.InputTag("trackMuMatch")
)
We apply a selection to the tracks (again a pT threshold of 10 GeV/c) and thus we are left with the PAT track objects called selectedPatTracks. No track cleaning is applied.

Dimuon candidates are built starting from selectedPatMuonsTriggerMatch and SelectedPatTracks. You can check the following files: dimuons_cfi.py, dimuonsOneTrack_cfi.py.

Event preselection based on HLT is defined in dimuonsHLTFilter_cfi.py. The skim paths are defined into dimuons_SkimPaths_cff.py. The event content of the skim output is defined into dimuonsOutputModule_cfi.py.

How to get the analyzer code

The tutorial analyzer has been tested in CMSSW_3_8_2 release. So, in a 3_8_2 project area checkout and compile the analyzer in the CMS.PhysicsTools/PatExamples package:
cvs co -r CMSSW_3_8_2 PhysicsTools/PatExamples
cd PhysicsTools/PatExamples/plugins
scram b

How to run the analyzer

Go to the test directory of the package:
cd PhysicsTools/PatExamples/test/

You can run on the skimmed collection made in the previous step. Alternatively, if you want to run on more events, a skimmed collection (made from 1007 events) is available on castor. Specify the path of the skimmed collection by editing the following parameter in the cfg file zMuMuTutorialAnalysis.py:

process.source = cms.Source(
    "PoolSource",
    fileNames = 
cms.untracked.vstring("rfio:/castor/cern.ch/user/f/fabozzi/tutorial/sep2010/testDimuonSkim.root")
    )

Run the analysis [The file is dead. The instructions will be updated soon.]:

cmsRun zMuMuTutorialAnalysis.py
At the end you will get an output root file containing a few invariant mass histograms:
zmumu_plot.root

Find out more about the analyzer

Have a look at CMS.PhysicsTools/PatExamples/plugins/ZMuMuTutorialAnalyzer.cc. The comment lines in the file should help you in following the description below.

The module takes as input parameter a certain category of dimuons candidates and applies cuts to the muon daughters in order to select candidates compatible with a Z->mu+mu- decay. The selection is the following:

  • Both muons must be global muons
  • The net charge of the pair must be 0
  • Each muon must satisfy geometrical and kinematical acceptance cuts
  • Each muon must satisfy an isolation cut. Note the two alternative ways to get muon isolation. In the first case we just get the pre-computed tracker isolation variable. In the second case we compute the isolation variable from the embedded IsoDeposits (this is useful when, for instance, you want to test alternative isolation definitions).
  • The invariant mass of the pair must be within a certain interval
The invariant mass of the dimuon candidates passing these cuts is plotted. We study also the generated muons which are MC-matched to the Z candidates daughters: we exploits the mother-daughter relationships of the generated particles in order to identify the Z candidates which correspond to a true Z->mumu(gamma) decay.

The dimuon candidates to be analyzed and the analysis cuts are specified at cfg level in zMuMuTutorialAnalysis.py.

Make changes by yourself

  • Plot the pT spectrum of the highest momentum muon coming from the Z and the pT spectrum of the lowest momentum muon coming from the Z. Do that for reconstructed muons and generated muons.
  • Plot the pre-computed Ecal isolation for both muons coming from the Z (before applying tracker isolation). Make the same for the sum of Ecal and Tracker isolations. Try also to compute by yourself an Ecal isolation from IsoDeposits by using a dR isolation cone of 0.2.
  • Repeat the analysis but require that one muon must be a global muon and the other muon must be a standalone muon
  • Repeat the analysis but give in input the dimuonsOneTrack collection. Require in the analysis that the PAT muon must be a global muon.

How to get more information

Refer to the EWK twiki for more general informations about PAG analyses and MC collections:

https://twiki.cern.ch/twiki/bin/view/CMS/TWikiEWK

EWK Electron analysis tutorial

Refer to the following twiki:

https://twiki.cern.ch/twiki/bin/view/CMS/PATExampleElectroweakElectrons

Review status

Reviewer/Editor and Date (copy from screen) Comments
SamirGuragain - 17 May 2012 Updated tutorial partially for 4_2_4 release and the analyzer part is incomplete and the work is in progress.
FrancescoFabozzi - 03 Sep 2010 Updated tutorial for 3_8_2 release
FrancescoFabozzi - 10 Mar 2010 Updated tutorial for 3_5_4 release
FrancescoFabozzi - 28 Dec 2009 Added link to EWK Electron tutorial
FrancescoFabozzi - 27 Dec 2009 Updated tutorial for 3_3_5 release
FrancescoFabozzi - 03 Sep 2009 Updated tutorial for 3_1_2 release
FrancescoFabozzi - 08 Jun 2009 Added tutorial description
RogerWolf - 13 May 2009 Created the template page

Responsible: FrancescoFabozzi
Last reviewed by: most recent reviewer and date



PAT Examples: Exotica Analysis

Contents

Example: A Generic Analyzer

Introduction

Here we introduce a very simple exotic-like signature based search with very basic PAT utilities.

Within Exotica PAG, the physics analyses could be very different among topics and subgroups. For example, the analysis could be closer to the EWK group for the Z' searches, and other studies involving excited quarks may be similar to TOP studies. Hence, we introduce a 'quasi-generic' study with lepton+jets final state in this tutorial. Assuming a new physics model X produces a very unique/special final state, like pp -> Z+top+bottom, and we would like to see if it exists in our data or not (Surely, this model X does not exist so far). The production cross section is assumed to be 10 pb at 10 TeV LHC collision. Here are a line-up of the issues of this virtual new physics model search:

  • The final state is assumed to be Z+t+b, and here we take Z->ee or mumu with a full hadronic top, for example.
  • Although the final state is defined, but we have no idea what are the good variables that could be used in the analysis.
  • Although we can guess the dominant background should be Z+jets, but it's unclear what are the other background sources.
  • And, generally, understanding of the standard model background is more important.

Based on these requirements and in order get a quick 'feeling' of this virtual analysis at the beginning, the following steps are introduced:

  • A generic analyzer (see the 'coding' part below) is introduced.
  • The events should pass through signal electron trigger or signal muon trigger.
  • We store root trees (instead of fixed histograms) to get higher freedom after the work of analyzer.
  • Use some simple root codes to get the final physics plots.

How to get the code

First step is to follow the PAT recipes:

To get the "older" version of cmssw, please "setenv SCRAM_ARCH slc5_ia32_gcc434"
cmsrel CMSSW_3_5_1 [is alreday deprecated, can be used with CMSSW_3_5_2 curently available]
cd CMSSW_3_5_1/src
cmsenv

And copy the example 'ObjectCounter' analyzer:

tar xfvz /afs/cern.ch/user/k/kfjack/public/ObjectCounter/fullsrc.tar.gz

Basically the example code is consistent of a single analysis module (ObjectCounter.cc), one testing python configure file (test.py), and testing root script (proj.cc) with a tree format definition (format.h).

NOTE: for the release CMSSW_3_4_2, please apply the following modifications for backward capability.

Replace the following lines (264-267) in MyAna/ObjectCounter/src/ObjectCounter.cc

  iEvent.getByLabel("cleanPatMuons",     MuonHandle);
  iEvent.getByLabel("cleanPatElectrons", ElectronHandle);
  iEvent.getByLabel("cleanPatJets",      JetHandle);
  iEvent.getByLabel("patMETs",           METHandle);
by
  iEvent.getByLabel("cleanLayer1Muons",     MuonHandle);
  iEvent.getByLabel("cleanLayer1Electrons", ElectronHandle);
  iEvent.getByLabel("cleanLayer1Jets",      JetHandle);
  iEvent.getByLabel("layer1METs",           METHandle);
This is due to the changes made in the PAT branch naming schema after CMSSW_3_5_1.

Replace the following part in MyAna/ObjectCounter/test/test.py (this is for the right input and tag for release 3_4_2):

process.source = cms.Source("PoolSource",
    fileNames = cms.untracked.vstring(
        'rfio:/castor/cern.ch/cms/store/relval/CMSSW_3_4_2/RelValTTbar/GEN-SIM-RECO/
STARTUP3X_V15-v1/0011/06A5351A-7F13-DF11-9E89-001A9281172E.root'
    )
)
and
process.GlobalTag.globaltag = cms.string('STARTUP3X_V14::All')

How to run the code

First is to build the example analyzer:

scramv1 b

and go to the test directory and run the testing configuration (loading some summer08 ttbar testing events):

cd MyAna/ObjectCounter/test 
cmsRun test.py

There should be a single root output (results.root) generated afterwards. It can be easily checked by root TBrowser interface, e.g.

root -l results.root 
root [1] TBrowser b;

Or, by running the simple root code, proj.cc, a simple plot consisting of inclusive jets' pt can be generated:

root -l proj.cc

Find out more about the details

This example analyzer is actually very simple. We made no modification on the default PAT configurations. As the test.py testing configurations, the only required parts (in addition to PAT) are declaring the analyzer, placing a TFileService for storing output root branches, and adding the analyzer to the main job path.

process.ObjectCounter = cms.EDAnalyzer("ObjectCounter")

process.TFileService = cms.Service("TFileService",
    fileName = cms.string('results.root')
)

process.p = cms.Path(
                  process.patDefaultSequence
                + process.ObjectCounter
            )

This example analyzer only loads the standard PAT objects (electrons, muons, jets, and met), and generator information (genParticles) and fill the corresponding information into several root branches (and no real fancy stuff). The root branches are defined at the beginning of the ObjectCounter.cc. As an example, here shows the definition of the LepInfoBranches. Basically it's just a simply array of some common lepton information that may be needed for further analysis.

class LepInfoBranches {
public:
  int   Size; 
  int   Index[MAX_LEPTONS];
  int   LeptonType[MAX_LEPTONS];
  int   Charge[MAX_LEPTONS];
  float Pt[MAX_LEPTONS];
  float Eta[MAX_LEPTONS];
  float Phi[MAX_LEPTONS];
  float TrackIso[MAX_LEPTONS];
  float EcalIso[MAX_LEPTONS];
  float HcalIso[MAX_LEPTONS];
  float GenPt[MAX_LEPTONS];
  float GenEta[MAX_LEPTONS];
  float GenPhi[MAX_LEPTONS];
  int   GenPdgID[MAX_LEPTONS];
  int   GenMCTag[MAX_LEPTONS];   // 0: unknown, 1: decay from W, 2: decay from Z
  
  Candidate* CandRef[MAX_LEPTONS]; // backward pointer to the PAT objects
  
  void Register(TTree *root) {
       root->Branch("LepInfo.Size"       , &Size          , "LepInfo.Size/I"                     );
       root->Branch("LepInfo.Index"      , &Index[0]      , "LepInfo.Index[LepInfo.Size]/I"      );
       root->Branch("LepInfo.LeptonType" , &LeptonType[0] , "LepInfo.LeptonType[LepInfo.Size]/I" );
       root->Branch("LepInfo.Charge"     , &Charge[0]     , "LepInfo.Charge[LepInfo.Size]/I"     );
       root->Branch("LepInfo.Pt"         , &Pt[0]         , "LepInfo.Pt[LepInfo.Size]/F"         );
       root->Branch("LepInfo.Eta"        , &Eta[0]        , "LepInfo.Eta[LepInfo.Size]/F"        );
       root->Branch("LepInfo.Phi"        , &Phi[0]        , "LepInfo.Phi[LepInfo.Size]/F"        );
       root->Branch("LepInfo.TrackIso"   , &TrackIso[0]   , "LepInfo.TrackIso[LepInfo.Size]/F"   );
       root->Branch("LepInfo.EcalIso"    , &EcalIso[0]    , "LepInfo.EcalIso[LepInfo.Size]/F"    );   
       root->Branch("LepInfo.HcalIso"    , &HcalIso[0]    , "LepInfo.HcalIso[LepInfo.Size]/F"    ); 
       root->Branch("LepInfo.GenPt"      , &GenPt[0]      , "LepInfo.GenPt[LepInfo.Size]/F"      );
       root->Branch("LepInfo.GenEta"     , &GenEta[0]     , "LepInfo.GenEta[LepInfo.Size]/F"     );
       root->Branch("LepInfo.GenPhi"     , &GenPhi[0]     , "LepInfo.GenPhi[LepInfo.Size]/F"     );
       root->Branch("LepInfo.GenPdgID"   , &GenPdgID[0]   , "LepInfo.GenPdgID[LepInfo.Size]/I"   );
       root->Branch("LepInfo.GenMCTag"   , &GenMCTag[0]   , "LepInfo.GenMCTag[LepInfo.Size]/I"   );
  }  
};

The main code block, ObjectCounter::analyze(const edm::Event& iEvent, const edm::EventSetup& iSetup), simply load the hard coded PAT objects (here is muon, as an example). For simplification, we apply a pT threshold of 20 GeV/c in the code, as well as the 'GlobalMuonPromptTight' muon ID before filling the branches. For the generator information, we do not only fill the matched genParticle, but also identify if the 'mother' particle is Z or W bosons.

edm::Handle<std::vector<pat::Muon> >     MuonHandle;
iEvent.getByLabel("selectedLayer1Muons",     MuonHandle);

memset(&LepInfo,0x00,sizeof(LepInfo));
  
for( std::vector<pat::Muon>::const_iterator it_mu = MuonHandle->begin(); 
     it_mu != MuonHandle->end(); it_mu++ ) {
       
  if (LepInfo.Size>=MAX_LEPTONS) {
        fprintf(stderr,"ERROR: number of leptons exceeds the size of array.\n");
        exit(1);
  }
    
  if (it_mu->pt()<20.) continue;
  if (!it_mu->muonID("GlobalMuonPromptTight")) continue;
  if (it_mu->innerTrack()->d0()>=0.2 || it_mu->innerTrack()->numberOfValidHits()<11) continue;

  LepInfo.Index      [LepInfo.Size] = LepInfo.Size;
  LepInfo.LeptonType [LepInfo.Size] = 13;
  LepInfo.Charge     [LepInfo.Size] = it_mu->charge();
  LepInfo.Pt         [LepInfo.Size] = it_mu->pt();
  LepInfo.Eta        [LepInfo.Size] = it_mu->eta();
  LepInfo.Phi        [LepInfo.Size] = it_mu->phi();
  LepInfo.TrackIso   [LepInfo.Size] = it_mu->trackIso();
  LepInfo.EcalIso    [LepInfo.Size] = it_mu->ecalIso();
  LepInfo.HcalIso    [LepInfo.Size] = it_mu->hcalIso();
    
  const reco::GenParticle* gen = it_mu->genLepton();
  if (gen!=NULL) {
        LepInfo.GenPt    [LepInfo.Size] = gen->pt();
        LepInfo.GenEta   [LepInfo.Size] = gen->eta();
        LepInfo.GenPhi   [LepInfo.Size] = gen->phi();
        LepInfo.GenPdgID [LepInfo.Size] = gen->pdgId();
    
        const reco::Candidate* genCand = gen;
        
        while(genCand!=NULL &&
              genCand->numberOfMothers()==1) {
                    
              genCand = genCand->mother(0);
                        
              if (LepInfo.GenMCTag[LepInfo.Size]==0) {
                      if (abs(genCand->pdgId())==23) LepInfo.GenMCTag[LepInfo.Size] = 2;
              else
                      if (abs(genCand->pdgId())==24) LepInfo.GenMCTag[LepInfo.Size] = 1;        
              }
        }
  }
    
  LepInfo.CandRef [LepInfo.Size] = (reco::Candidate*)&(*it_mu);
  LepInfo.Size++;
}

Based on the output root branches, we are already able to do many simple 'analysis works'. Here are a summary of the pre-selections done by this analysis code:

Object Root branches Selections
Muon LepInfoBranches pt>20, GlobalMuonPromptTight ID, inner track d0<0.2, N(hits)>11
Electron LepInfoBranches pt>20, Robust-loose ID
Jet JetInfoBranches pt>30
MET EvtInfoBranches none
di-lepton Pairs PairInfoBranches passing the lepton selections above + invariant mass < 200 GeV
di-jet Pairs PairInfoBranches passing the jet selections above + invariant mass < 200 GeV

Post-analyzer studies

Some 10 TeV MC samples have been processed for this tutorial. They can be found at /afs/cern.ch/user/k/kfjack/public/ObjectCounter. A further pre-selection (at least two lepton candidates in an event) is applied in order to reduce the sample size (by uncommenting line 367 of the ObjectCounter.cc). Here are the summary table of these tutorial samples (note: these are the resulting root branches, not the PAT-tuples):

Process generator cross section sample size file name
model-X (the signal), Z->ee/mumu madgraph 10 pb 20K objcounter_modelX.root
tt+jets madgraph 317 pb (LO) 1M objcounter_ttjets.root
Z+jets madgraph 3700 pb (LO) 1M objcounter_Zjets.root
W+jets madgraph 40000 pb (LO) 10M objcounter_Wjets.root
WW pythia 74 pb (LO) 200K objcounter_WW.root
ZZ pythia 10.5 pb (LO) 200K objcounter_ZZ.root
WZ pythia 32 pb (LO) 200K objcounter_WZ.root

Then let's produce some physics plots using these samples. This is a demonstration of what can be done after such a simply analyzer (less then 600 lines actually). Please note the following stuff are not PAT anymore, they are simply root codes.

Basic object kinematic distributions

Please refer to the root code /afs/cern.ch/user/k/kfjack/public/ObjectCounter/show_obj.cc. This cint script will produce several plots for basic objects (electrons, muons, jets, met, and pairs), using the root branches given above. For example, by typing

root -l 'show_obj.cc+("elec")'
A canvas will pop up and shows 4 basic distributions of electron candidates. A summary of the possible options of this root macro is given as following:

  • "elec" - electron related plots (pt, eta, track isolation, calorimeter isolation).
  • "muon" - muon related plots (pt, eta, track isolation, calorimeter isolation).
  • "jet" - jet plots including corrected pt, raw pt, eta, jet charge, no. of calortowers, b-tagging.
  • "met" - Missing energy (MET, and MET-phi).
  • "pair" - masses of pairings - it shows the M(e+e-), M(mu+mu-), generic M(jj), M(jj) with MC matched to a W.

The macro loads objcounter_modelX.root by default (= signal distributions).

Signal+background stacking distributions

Please refer to the root code /afs/cern.ch/user/k/kfjack/public/ObjectCounter/show_phys.cc. This example is more interesting since the root code will load all the samples, and calculate the scaling factors according to the cross sections and target luminosity. Stacking histograms including both signal and background are produced and shown in a single TCanvas. The command is very similar:

root -l 'show_phys.cc+("ht")'
The HT distributions for electron channel and muon channel will be shown. The following cuts (for further reducing the background contributions) are included in the root code:
  • Only one good di-electron or di-muon pair in an event.
  • 80 < M(e+e-) or M(mu+mu-) < 100 GeV/c^2.
  • N(jets) >=4, jets are excluded if overlapped (dR<0.3) with electron candidates.
  • leading jet pt > 80 GeV/c
Surely there are other available options for this root macro:

  • "met" - Missing energy distributions, after the requirements listed above.
  • "njets" - No. of jets after the requirements listed above except N(jet)>=4
  • "mll" - M(ee) or M(mumu) distributions, with the requirements listed above except the mass cut.
  • "m3j" - Invariant mass of the 2nd,3rd,4th jets (more or less equal to the mass of the hadronic top), after the full given cuts.
  • "j1pt" - pt of the leading jet (highest pt jet) in the event, with the requirements listed above except pt > 80 GeV/c.
  • "j1btag" - b-tagging variable of the leading jet (it supports to be the hard b-jet) in the event, after the full given cuts.

The open area of the resulting plots represents the signal and the filled histograms are the background. The light green area is the contributions from Z+jets (the rest of the background events are shown in navy). It is clear that we could have further background suppression power with b-tagging, or other possible variables.

Example: The HEEP Analyzer

Using the HEEP Selector

In case anything is unclear, please email me Sam Harper at sharper@(you know where).ch and I will be happy to clarify.

CMSSW_3_1_2 is the version used for summer production. The HEEPSelector has been only been tested in this version. It should work in later 31X versions although this is untested.

A class named heep::EleSelector performs the heep selection on an electron and returns an int with individual bits set if the cut they correspond to fails. Thus a value of 0x0 is all cuts have passed. More details on this class

There are three ways you can use high energy electrons in CMSSW_3_1_2 with the heep selection.

  • with bare PAT / reco::CMS.GsfElectron, example analyzer is HEEPAnalyzerBarePAT
  • with heep processed PAT, example analyzer is HEEPAnalyzerHEEPPAT
  • with heep::Ele and heep::Event, example analyser is HEEPAnalyzer

The main disadvantage of bare PAT/ reco::!CMS.GsfElectron is that you have to manually correct the electrons energy to the ecal energy every time you wish to access the fourmomentum, energy or Et and that the selection result is not stored in the object. The module HEEPAttStatusToPAT written by M. Mozer addresses this issue by making a new collection of pat electrons with the energy reset to the ecal energy and the heep selection result accessable by pat::Electron::userInt("HEEPId"), simplifing the use of the electrons. Finally there is the heep::Event and heep::Ele classes which also store the result and the correct energy. These are HEEP group maintained classes which aim to simply the use of high energy electrons. However with recent developments in reco and pat objects, the heep::Ele class is mostly redundant currently although it does have the advantage of being able to make quick changes as needed in the future. Each method has an example module with makes the mass spectrum of EB-EB and EB-EE events passing the heep selection.

Instructions

scramv1 proj -n CMSSW_312_heepPat CMSSW CMSSW_3_1_2
cd CMSSW_312_heepPat/src
cvs co -r V00-00-13 -d SHarper/HEEPAnalyzer UserCode/SHarper/HEEPAnalyzer 
scramv1 b

Overview of HEEP Analyzer

The HEEP analyzer is a collection of classes whose aim to facilitate the use of high energy electrons. The are designed to wrap around standard RECOs and PAT objects, making them more user friendly. Note that you do not have to use these classes, you can achieve everything with bare PAT. All these classes do is a) serve as a reference implementation and b) save time by automating common tasks. For example for each event, it will automatically get the superclusters, pat electrons in the standard way, saving you having to explicitly get the handles to the collections yourself.

The HEEP Analyzer config files do two seperate things. First they configure the PAT to the needs of the HEEP group although currently default PAT works fine for us. After this stage, users can use the pat as normal with the caveate that that the selection will need to be performed manually. There is a class available which will do this, heep::EleSelector. The second stage is configuring the heep::Event and the heep selection. This is an optional step but these classes are designed to make the easier to use and they are designed to be nimble and flexible to react quickly to changes in electron ID, etc. Therefore these classes will change in the future to reflect the best knowledge that I have of electron ID. Users are not encouraged to write them to file because a) it is unnecessary as they are simple wrappers and b) it means changes will risk breaking backwards compatibility.

heep::Event

The HEEP event consists of a selection of handles to products of interest to the HEEP group. The aim of this class is to be able to access everything we wish to know about the event from this class without having to require any additional information (eg input tag of electron collection). Like all HEEP classes, it has a reference to the RECO/PAT object to pass into non HEEP functions.

heep::Ele

The HEEP electron wraps around a PAT or GSF electron but simplifies some of the methods. For example deltaPhiSuperClusterTrackAtVtx() is remapped to dPhiIn() its more common name. It main purpose is to faciliate the addition of new electron data members faster than that possible for the PAT (eg we had sigmaIEtaIEta, e2x5, etc way ahead of the PAT). It also makes our custom isolations easier to use (ele.hcalDepth2() is a lot clearer than ele.userIso(1) in my opinion). It also offers two additional peices of information, the HEEP cuts it failed and the triggers it passed. It is not possible to use the standard electron ID implemented in the PAT as that just gives an inadequate pass/fail with no information on what cuts actually failed. Again, a reference to the original PAT or GSF electron is available so it can be used with all functions requiring PAT or GSF electrons.

heep::EleSelector

This is a class which automatically selects the correct cuts (barrel or endcap) for the electron and returns a int with each bit corresponding to whether a particular cut has failed or not. The bit is set if its failed so a value of 0 indicates a pass. The bits are defined in heep::CutCodes together with tools for efficiently accessing them. The cut values are stored in heep::EleCutValues and defined in the config file. It can take heep::Ele, reco::CMS.GsfElectron or pat::Electron objects as input.

heep::EventHelper

This class keeps track of all the labels for the collections heep::Event requires and to set the handles in heep::Event for each new event. It essentially creates the heep::Event from an edm::Event. It also creates the heep::Ele objects. This is the core of the HEEPAnalyzer which contains all the information to make a heep::Event. It is expected that users using the HEEP software package will make a new analyzer and create a new object of heep::EventHelper in that analyzer to access all the HEEP tools.

heep::TrigCodes, heep::trigtools

heep::TrigCodes defines the format used to store HLT information in the HEEP electrons. TrigCodes defines a TrigBitSet which is a std::bitset<64> meaning we can have 64 possible triggers and the tools to manipulate it. The reasoning behind this is that its much simpler and faster to work with trigger bits than trigger names. Eg if you want to know if you pass the Ele15 and Ele15LW triggers which are assigned bits 0 and 2, while you have a total of 10 triggers defined. Using strings, you would have to sort them at creation and then perform a binary search for the Ele15 and Ele15LW trigger names. As HLT filter names are very long hltL1NonIsoHLTNonIsoSingleElectronLWEt15TrackIsolFilter and typically differ at the end, this is not speedy. Using bits the same operation is
 (0x5 & trigBits) == 0x5 
for both and
 (0x5 & trigBits) !=0x0 
for at least one. Note it is very important to release that bitwise operators (&,|,^) are of lower presedence than == and != so you have to use the brackets as shown above.

Another reason I do not use the PATs trigger matching is that to match a PAT ele with a trigger object requires me to define and run two additional modules which quickly mounts up when you consider all the trigger paths. Still you can use the pat trigger matching if you wish, HEEP and PAT use identical requirements.

Note that definitions of each trigger bit are unique to the job (ie you change what filters you are looking for, you change the bit definitions) so you should never store them outside of the job. Internally within the job you should store the bit not the string to take advantage of the speed boost and ease of use. eg this is pseudo c++

module::module(...)
{
heep::TrigCodes::TrigBitSet triggersToCheck_ = TrigCodes::getCode("hltL1NonIsoHLTNonIsoSingleElectronLWEt15TrackIsolFilter:hltL1NonIsoHLTNonIsoSingleElectronEt15TrackIsolFilter"); //trig name separated by ":"
....
}

void module::analyze(...)
{
...
if( (heepEle.trigBits() & triggerToCheck_) !=0x0 ) //do what you wanted to do if ele passed trigger
...
}

How to get more information

For more information regarding the Exotica analyses, please refer to the main Exotica PAG TWiki. https://twiki.cern.ch/twiki/bin/view/CMS/EXOTICA

Review status

Reviewer/Editor and Date (copy from screen) Comments
SamirGuragain - June 22 2012 Added how to setup the environment for older cmssw release
KaiFengChen - 15 Mar 2010 Add the HEEP example
KaiFengChen - 20 Feb 2010 Update to CMSSW_3_5_1 release
KaiFengChen - 14 June 2009 Added tutorial description
RogerWolf - 13 May 2009 Created the template page

Responsible: KaiFengChen and KatiLassilaPerini
Last reviewed by: most recent reviewer and date



PAT Examples: Higgs Analysis: H->ZZ->4l analyses

This example comes from the link: here.

Contents:

Physics Motivation/Goals

The purpose of the High Mass H->ZZ exercise is to run the full H->ZZ->4l analysis on 2010+2011 data and Summer11 Monte Carlo samples. The 4e, 4mu and 2e2mu final state will be treated at the same time consistently. Emphasys on the strategy to suppress the background from Zbbar and ttbar will be given.
The event yields at each step of the selection will be studied with Monte Carlo simulation for backgrounds and signal with different masses.
Distributions of the relevant observables for data and MC will be looked at different selection steps.
The students will be requested to evaluate the significance of the various cuts and optimize them: are you able to find even better cuts than the standard analysis?

Contacts

Sara Bolognesi & Nicola De Filippis & Alexey Drozdetskiy & Marco Meneghelli

H->ZZ->4l analysis in a nutshell:

Introductory slides here

  • Signatures: 4e, 4mu and 2e2mu final state
  • Smaller rate compared to H->WW but very clean signal and mass peak reconstruction
  • Very small backgrounds:
    • irreducible ZZ
    • reducible Zbb, tt with leptons from b/c hadrons decays
    • reducible Z+jets, W+jets, QCD with fake leptons
  • Selection strategy and observables:
    • 4 well reconstructed and isolated leptons
    • leptons coming from the primary vertex
    • di-lepton fitting at least one on-mass shell Z

Pre-requisites

  • Basic knowledge of C++ and CMSSW
  • Basic knowledge of physics objects and the identification techniques: electrons and muons
  • Knowledge of ROOT and ability to write, compile and execute macros
  • Short Exercises suggested:
    • Electrons at link here
    • Muons at link: here

Getting started

Setup CMSSW_4_4_2 locally (eg, in your cmslcp account or lxplus account). Four configurations are provided to do the exericses: one for "FNAL", one for "CERN", one for "PISA", one for a generic local PBS batch scheduler.

At FNAL:

bash (->to move to bash shell)
source /uscmst1/prod/sw/cms/bashrc prod
cmscvsroot CMSSW
cd nobackup

AT CERN:

bash (->to move to bash shell)
cd scratch0 

At PISA: setup here

bash (->to move to bash shell)
source /afs/pi.infn.it/grid_exp_sw/cms/scripts/setcms.sh
cmscvsroot CMSSW
cd /gpfs/gpfsddn/cms/user/`whoami`

Go ahead:

scramv1 project CMSSW CMSSW_4_4_2
cd CMSSW_4_4_2/src/
eval `scramv1 runtime -sh`

Download the HiggsToZZ4leptons code and compile it

cvs co HiggsAnalysis/HiggsToZZ4Leptons
cvs co HiggsAnalysis/Skimming
cvs co Configuration/Skimming/python/PDWG_HZZSkim_cff.py
scramv1 b 

This code run on the samples (MC and data) produced centrally and it stores the relevant variables into root-tuples. The configuration files that apply the skim and produce the root-ples are:

HiggsAnalysis/HiggsToZZ4Leptons/test/HiggsToZZ_HZZSkim_mc.py
HiggsAnalysis/HiggsToZZ4Leptons/test/HiggsToZZ_HZZSkim_data.py

You don't need to run this code since it's slow and we will provide root-tuples to you. You can try to look at it and run it in your spare time wink

Location of root-ples

  • At FNAL:
/pnfs/cms/WAX/11/store/user/cmsdas/2012/HZZHighMassExercise/data  for 2010 and 2011 data
/pnfs/cms/WAX/11/store/user/cmsdas/2012/HZZHighMassExercise/sig    for MC H->ZZ->4l signal (gluon fusion and VBF)
/pnfs/cms/WAX/11/store/user/cmsdas/2012/HZZHighMassExercise/bkg   for Summer11 background samples

  • At CERN:
/castor/cern.ch/user/n/ndefilip/DAS/data for 2010 and 2011 data
/castor/cern.ch/user/n/ndefilip/DAS/sig for MC H->ZZ->4l signal (gluon fusion and VBF)
/castor/cern.ch/user/n/ndefilip/DAS/bkg for Summer11 background samples
/castor/cern.ch/user/m/mene/HIGGS or the exercise on monitoring

  • At PISA:
/gpfs/gpfsddn/srm/cms/store/user/cmsdas/2012/HZZHighMassExercise/data  for 2010 and 2011 data
/gpfs/gpfsddn/srm/cms/store/user/cmsdas/2012/HZZHighMassExercise/sig    for MC H->ZZ->4l signal (gluon fusion and VBF)
/gpfs/gpfsddn/srm/cms/store/user/cmsdas/2012/HZZHighMassExercise/bkg   for Summer11 background samples
/gpfs/gpfsddn/srm/cms/store/user/cmsdas/2012/HZZHighMassExercise/HIGGS for the exercise on monitoring

Preliminary setup

Step 1: The input files

Go in the HiggsAnalysis/HiggsToZZ4Leptons/test/macros directory. A lot of material is provided.

cd HiggsAnalysis/HiggsToZZ4Leptons/test/macros
Create the tree of directories which will be used later to store the output:
bash createdir.sh

In case of CERN the input files are stored on castor (as reported above). We have to be sure that the files are copied into disk (and not only on tape), i.e. they are "STAGED". To do that run this script:

bash stager_get.sh

Copy locally one of the tuple and look into it. If you are on lxplus you can copy it into /tmp/ directory, for instance.
In case of FNAL configuration:

/opt/d-cache/dcap/bin/dccp  /pnfs/cms/WAX/11/store/user/cmsdas/2012/HZZHighMassExercise/data/roottree_leptons_DoubleMu_Cert_160404-177515_7TeV_PromptReco_Collisions11_JSON_03Oct2011-v1.root some_local_path_you_like
In case of CERN configuration:
rfcp /castor/cern.ch/user/n/ndefilip/DAS/data/roottree_leptons_DoubleMu_Cert_160404-177515_7TeV_PromptReco_Collisions11_JSON_03Oct2011-v1.root some_local_path_you_like
In case of PISA configuration:
cp  /gpfs/gpfsddn/srm/cms/store/user/cmsdas/2012/HZZHighMassExercise/data/roottree_leptons_DoubleMu_Cert_160404-177515_7TeV_PromptReco_Collisions11_JSON_03Oct2011-v1.root some_local_path_you_like
Open the root-ple copied locally with the command:
root -l some_local_path_you_like/roottree_leptons_DoubleMu_Cert_160404-177515_7TeV_PromptReco_Collisions11_JSON_03Oct2011-v1.root 

Content of root-ples and brief explanation is provided here.
Let's look just at some basic variables:
How many events does it contains?
How many muons per events?
What is their muon pT spectrum?
What is their isolation distribution?

Step 2: The analysis code

Caveat: you don't need to know these macros in details, just try to run them. If you are interested, you can look further at them in your spare time wink

Software macros to analyze the roottuple are
BaselineMacro4mu.C, BaselineMacro4e.C, BaselineMacro2e2mu.C.
Those macros run the full selection for 4mu, 4e and 2e2mu and save histograms in output files in ROOT format. They also print some info in text format for each event.

Those macros are run by the following script
compilebaseline_4mu_single.C, compilebaseline_4e_single.C, compilebaseline_2e2mu_single.C
Let's modify the 4mu script putting a signal input file with our favorite mass:

  • for CERN /castor/cern.ch/user/n/ndefilip/DAS/sig/roottree_leptons_GluGluToHToZZTo4L_M-120_7TeV-powheg-pythia6.root
  • for FNAL dcap://cmsgridftp.fnal.gov:24125/pnfs/cms/WAX/11/store/user/cmsdas/2012/HZZHighMassExercise/sig/roottree_leptons_GluGluToHToZZTo4L_M-120_7TeV-powheg-pythia6.root
  • for PISA /gpfs/gpfsddn/srm/cms/store/user/cmsdas/2012/HZZHighMassExercise/sig/roottree_leptons_GluGluToHToZZTo4L_M-120_7TeV-powheg-pythia6.root
Compile all the macros:
cmsenv; bash compilebaseline.sh

Thus you created several executables called RunBaselineMacro*
Let's run them. You should put an argument corresponding to the site configuration: CERN or FNAL. Moreover you may redirect the output into a local file (e.g., on /tmp/)

./RunBaselineMacro4mu FNAL > some_local_path_you_like/out.txt &
or
./RunBaselineMacro4mu CERN  > some_local_path_you_like/out.txt &
or
./RunBaselineMacro4mu PISA  > some_local_path_you_like/out.txt &

While the script is running, let's look at the output text file. The macro has also produced a file with histograms, let's look at it.
How many events survived the cut?
How much is the resolution on the Higgs mass peak?

Try to answer to this question more quantitatively. In the folder MASS_AN/ you can find some macros for fits. The main one is Analyze_Mass.C
First copy the file with mass histogram in the folder MASS_AN/ (better changing its name)

cp output.root MASS_AN/output_4mu.root

Then run Analyze_Mass.C giving as input the file name, the file typr (S signal or B background) and the fit type.

cd MASS_AN
root -l
.x Analyze_Mass.C("output_4mu.root","S","gaus")
Try the various fits proposed:

Change the range for fitting in Analyze_Mass.C according to the Higgs mass range.

Which is the best one? What is the mass resolution?
Do the same for 4e and 2e2mu final state.


Run now on one sample of signal (let's say 4mu with mH=150), the ttbar sample and the ZZ and try to superimpose the histograms for mZ1, isolation and impact parameter for the 3 samples normalizing to the same area. Try to understand which cuts are the best to discriminate the signal from the background. Copy the histograms file from the samples area:

At FNAL:

/opt/d-cache/dcap/bin/dccp /pnfs//cms/WAX/11/store/user/cmsdas/2012/HZZHighMassExercise/output_GluGluToHToZZTo4L_M-150_7TeV-powheg-pythia6.root .
/opt/d-cache/dcap/bin/dccp /pnfs//cms/WAX/11/store/user/cmsdas/2012/HZZHighMassExercise/output_TTTo2L2Nu2B_7TeV-powheg-pythia6_H150.root . 
/opt/d-cache/dcap/bin/dccp /pnfs//cms/WAX/11/store/user/cmsdas/2012/HZZHighMassExercise/output_ZZTo4mu_7TeV-powheg-pythia6_H150.root . 

At CERN:

rfcp /castor/cern.ch/user/n/ndefilip/DAS/output_GluGluToHToZZTo4L_M-150_7TeV-powheg-pythia6.root .
rfcp /castor/cern.ch/user/n/ndefilip/DAS/output_TTTo2L2Nu2B_7TeV-powheg-pythia6_H150.root . 
rfcp /castor/cern.ch/user/n/ndefilip/DAS/output_ZZTo4mu_7TeV-powheg-pythia6_H150.root . 

At Pisa:

cp  /gpfs/gpfsddn/srm/cms/store/user/cmsdas/2012/HZZHighMassExercise/output_GluGluToHToZZTo4L_M-150_7TeV-powheg-pythia6.root .
cp  /gpfs/gpfsddn/srm/cms/store/user/cmsdas/2012/HZZHighMassExercise/output_TTTo2L2Nu2B_7TeV-powheg-pythia6_H150.root . 
cp  /gpfs/gpfsddn/srm/cms/store/user/cmsdas/2012/HZZHighMassExercise/output_ZZTo4mu_7TeV-powheg-pythia6_H150.root . 


You may find useful the macro Sgn_Bkg_superimpose/Sgn_Bkg_superimpose.C, designed to normalize and plot the histograms of mZ1, isolation and impact parameter for a set of files given as input:. First copy the root files inside the folder Sgn_Bkg_superimpose/, then:

root -l
.x Sgn_Bkg_superimpose.C("file1","file2","file3")
If you want you can add some other variables, by editing the macro.
Try also to fit the ZZ background with the dedicated fit "ZZ" (ZZ.C) (it is a complicate function, for reference see CMS AN 202/2011).

Step 3: Run the analysis code iteratively, on all the samples, using the batch system:

In this step we will run the previous executable on all the list of samples: data, signal and background Monte Carlo (Skip this step if at PISA).
The ingredients are the following:
  • The list of all the samples with their cross section and weights is reported here:
    data_input_all.txt, bkg_input_all.txt, sig_input_all.txt
  • Script to run the executable and save the stdout/stderr:
    submit_HZZ4LeptonsAnalysis_FNAL.sh (to run at FNAL, open it and modify the path to your working directory)
    submit_HZZ4LeptonsAnalysis_CERN.sh (to run at CERN, open it and modify the path to your working directory)
  • A template for running with CONDOR at FNAL: condor_template.cfg ( open it and modify the email address to for notification)
  • Scripts to submit jobs on all the samples of data, MC signal and bkg:
    loopcheck_signal_2e2mu.sh loopcheck_signal_4e.sh loopcheck_signal_4mu.sh
    loopcheck_bkg_2e2mu.sh loopcheck_bkg_4e.sh loopcheck_bkg_4mu.sh
    loopcheck_data_2e2mu.sh loopcheck_data_4e.sh loopcheck_data_4mu.sh

Each loopcheck script does the following:

  • need an argument: FNAL, CERN or something generic to run at FNAL or CERN
  • loops on the numbers of samples in the input txt files (data_input_all.txt, bkg_input_all.txt, sig_input_all.txt)
  • creates cards for each samples saved in BkgCards4mu, BkgCards4e, BkgCards2e2mu
  • prepares a bash script for each job saved in
    • jobs/submit_HZZ4LeptonsAnalysis_h150_sample name_4mu.sh
    • jobs/submit_HZZ4LeptonsAnalysis_h150_sample name_4e.sh
    • jobs/submit_HZZ4LeptonsAnalysis_h150_sample name_2e2mu.sh
  • creates scripts for CONDOR scheduler at FNAL for each shell submit_*.sh script.
  • submits each job to the local scheduler

Let's run them! As usual, put CERN or FNAL as argument if you are running at CERN or FNAL.

chmod +x loopcheck*

For 2e2mu analysis at FNAL or CERN:
./loopcheck_data_2e2mu.sh FNAL (or CERN)
./loopcheck_bkg_2e2mu.sh FNAL (or CERN)
./loopcheck_signal_2e2mu.sh FNAL (or CERN)

For 4mu analysis at FNAL or CERN:
./loopcheck_data_4mu.sh FNAL (or CERN)
./loopcheck_bkg_4mu.sh FNAL (or CERN)
./loopcheck_signal_4mu.sh FNAL (or CERN)

For 4e analysis at FNAL or CERN:
./loopcheck_data_4e.sh FNAL (or CERN)
./loopcheck_bkg_4e.sh FNAL (or CERN)
./loopcheck_signal_4e.sh FNAL (or CERN)

The output files with histograms are saved in histos4mu, histos4e, histos2e2mu directories. Those files are used as input to produced plots of the analysis.
The files with standard output are saved in

*jobs/RunBaselineMacro<4l>_<type of data>_<sample name>_<4l>.log*  for FNAL configuration
*$CASTOR_HOME/DAS/RunBaselineMacro<4l>_<type of data>_<sample name>_<4l>.log*  for CERN configuration

This output is the same as we looked into the previous step but now is produced for all the samples (data and MC). In the next step we will use macros to look at the physics content of what we just produced.

Exercise 1: Make your own favorite plots

The macro PlotEvent4mu.C makes a plot with the number of event after each selection cut for all samples and the data. Let's run it.

root -l
.L PlotEvent4mu.C+
PlotEvent4mu()
.q
display plots/h_nEvent_4l_new_4mu_log.png

Look at the plot:
Each bin on x-axis corresponds to a selections step, full description of the cuts is provided here and this is a summary:

  1. First Z: a pair of lepton candidates of opposite charge and matching flavour satisfying m1,2 > 50 GeV/c^2, pT,1 > 20 GeV/c and pT,2 > 10 GeV/c, Riso,1 + Riso,2 < 0.35 and |SIP3D1,2| < 4; the pair with reconstructed mass closest to the nominal Z boson mass is retained and denoted Z1.
  2. Three or more leptons: at least another lepton candidate of any flavour or charge.
  3. Four or more leptons and a matching pair: a fourth lepton candidate with the flavour of the third lepton candidate from the previous step, and with opposite charge.
  4. Choice of the ``best 4l'' and Z1, Z2 assignments: retain a second lepton pair, denoted Z2, among all the remaining l+l- combinations with MZ2 > 12 GeV/c^2 and such that the reconstructed four-lepton invariant mass satisfies mZ1Z2 > 100 GeV/c^2. For the 4e and 4mu final states, at least three of the four combinations of opposite sign pairs must satisfy mll > 12 GeV/c^2. If more than one Z2 combination satisfies all the criteria, the one built from leptons of highest pT is chosen.
  5. Relative isolation for selected leptons: for any combination of two leptons i and j, irrespective of flavour or charge, the sum of the combined relative isolation Riso,j + Riso,i < 0.35.
  6. Impact parameter for selected leptons: the significance of the 3D impact parameter to the event vertex, SIP3D, is required to satisfy |SIP3D| = < 4 for each lepton
  7. Z1 and Z2 kinematics:
    Low-Mass selection : 50 < MZ1 < 120 GeV/c^2 and 12 < MZ2 < 120 GeV/c^2 (best for significance at MH < 130 GeV/c^2)
    Baseline selection : 60 < MZ1 < 120 GeV/c^2 and 20 < MZ2 < 120 GeV/c^2 (best for significance at 130 < MH < 180 GeV/c^2)
    High-Mass selection : 60 < MZ1 < 120 GeV/c^2 and 60 < MZ2 < 120 GeV/c^2 (best for significance at MH > 180 GeV/c^2)

What is the main background before the cuts? What is the main background after all the cuts?
What cuts are more effective in cutting the various backgrounds?
What is the S/B (signal over background) before the cuts and after all the cuts?
Are there big differences between 4mu and 4e final state?

Another useful macro is PlotStack4l.C which creates plot of interesting variables at different selection step for all the samples and the data. By default it produce plots summing up the distribution for 4e, 4mu and 2e2mu for the samples listed in the input file filelist_4l.txt. You can run separately on each channel editing the macro and replacing filelist_4l.txt with filelist_4e.txt, filelist_4mu.txt, filelist_2e2mu.txt according to the final state.

root -l
.L PlotStack4l.C+
PlotStack4l()
.q
display plots/h_hafterbestZ1_Z1mass_4l_log.png
This macro produce a plot of the mass of the first Z at the beginning of the selection (just after the cut 1.).
Which background has a real Z in it? Do you understand the shape of the distributions for each backgorund source ?

Let's try to produce the plot for another variable: the mass of the 4 lepton after cut 1.,2.,3.,4. and after the full selection.

  • comment the line
     std::string histolabel = "hafterbestZ1_Z1mass"; 
  • uncomment the line
     std::string histolabel = "hfourlepbestmass_4l_afterpresel";  
  • change the X and Y range of the histogram in the line
    TH2F *hframe= new TH2F("hframe","hframe",500,95,605.,500,0.0004,18.);//mass 
  • change the binning using the variable nRebin (e.g., change in the macro nRebin =10)
  • change the scale from logarithmic to linear: change in the macro useLogY = false
  • You need also to change the label on x and y axis in the lines:
         hframe->SetYTitle("Events/10 GeV/c^{2}");
         hframe->SetXTitle("M_{4l} [GeV/c^{2}]");
         
  • run again the macro

Try to produce the same plot after the full selection:

std::string histolabel = "hfourlepbestmass_4l_afterSel_new"; 

Let's look at isolation (e.g., hafterbestZ1_X) and impact parameter (e.g., hafterbestZ1_IP) distribution. Which are the backgrounds most rejected with these cuts?

Please produce the plots for the 3 channels separately and alltogheter.

For more experts: please produce a plot of the effieicny of the selection vs the higgs mass at the main step of the selection.

Exercise 2: Monitor the main variables

The search for the Higgs boson in the decay channel H → ZZ → 4l requires, as a first step, the reconstruction and selection of a Z boson decaying into a pair of electrons or muons. It is therefore important to monitor continuously the behavior of the selection of Z bosons decaying leptonically.

In order to achieve this, we divide data recorded by CMS in 2011 in successive luminosity slots of 20 /pb each. For each lumi-slot we will monitor 4 variables (after the first step of selection, the Z1 reconstruction):

  • a variable called Z-yield", defined as the number of lepton pairs Nll , with l = e, μ, with a reconstructed invariant mass mll lying in the range [80, 100] GeV/c2 , divided by L, the luminosity of the lumi-slot.With this definition Zyield is equivalent to the cross section σ ( pp → Z ) times the branching ratio BR( Z → ll ) times the overall selection efficiency.
  • The (mean value of) transverse momentum: pt of Z1
  • The (mean value of) isolation of leptons: Riso = Riso_l1 + Riso_l2
  • The (mean value of) significance of impact parameters SIP3D = max( SIP3D_l1,SIP3D_l2 )

The code you need for this exercise is in the folder: HiggsAnalysis/HiggsToZZ4Leptons/test/macros/ES_PISA

The .root files with all the 2011 statistics and the associated .csv files, containing the lumi info, are in CASTOR at CERN:

 /castor/cern.ch/user/m/mene/HIGGS/Mu_ALL_7nov.root 
 /castor/cern.ch/user/m/mene/HIGGS/Mu_ALL_7nov.csv 
 /castor/cern.ch/user/m/mene/HIGGS/Ele_ALL_7nov.root 
 /castor/cern.ch/user/m/mene/HIGGS/Ele_ALL_7nov.csv 

or in Pisa in

/gpfs/gpfsddn/srm/cms/store/user/cmsdas/2012/HZZHighMassExercise/HIGGS
directory.

First run the code to divide the 2011 data (4.64 /fb) in 20 /pb lumi-slots, to apply the Z1 selection and construct the histograms for the variables you want to monitor. You will get 4 histos for each lumi-slot, one per variable: Z-yield, pt, Iso, SIP.
You have to do this twice, for muons Z → 2 μ and electrons Z → 2 e.
The code for this is in the files ZmumuMonitor.C and ZeleeleMonitor.C, take a look before to run it..
Run the code (this might take hours..):

  • Real-time: open the files and modify the path to your working directory
     
         CERN:  rfcp /castor/cern.ch/user/m/mene/HIGGS/Ele_ALL_7nov.csv .       or     PISA: cp /gpfs/gpfsddn/srm/cms/store/user/cmsdas/2012/HZZHighMassExercise/HIGGS/Ele_ALL_7:nov.csv .   
         CERN:  rfcp /castor/cern.ch/user/m/mene/HIGGS/Mu_ALL_7nov.csv .       or     PISA: cp /gpfs/gpfsddn/srm/cms/store/user/cmsdas/2012/HZZHighMassExercise/HIGGS/Mu_ALL_7:nov.csv .
         root -q -b submit_ZeleeleMonitor.C
         root -q -b submit_ZmumuMonitor.C 
         
  • Via batch: open the files and modify the path to your working directory
     
         bsub -q 1nd -J Mu_CERN < submit_ZmumuMonitor.sh    (at CERN)
         bsub -q 1nd -J Ele_CERN < submit_ZeleeleMonitor.sh     (at CERN)
         
     
         bsub -q local < submit_ZmumuMonitor.sh    (at PISA)
         bsub -q lcal < submit_ZeleeleMonitor.sh       (at PISa)
         
Once you have run the code, you will have the histograms stored in the files: ZmumuMass.histo and ZeleeleMass.histo
(Copies of) these files can be found in CASTOR ( /castor/cern.ch/user/m/mene/HIGGS/ ).
Plot one or more of these histograms for the various variables you want to monitor. What about these distributions?

Run on the histos to obtain the variables you want to monitor as functions of the lumi-slot number (what do you expect to see?):

     root -l
     .x  PlotZmumuX.C
     
     root -l
     .x  PlotZeleeleX.C
     

You should get 4 plots for each Z decay channel ( μμ and ee ), one for every variable.
Comment on these plots! Are they as you thought they should have looked? Can you see any anomalies? If yes, do you have explanations?

Try to think about other variables that (in your opinion) are worth to be monitored.. Edit the files (.C) to monitor them.
In this exercise we stopped after the first step of selection chain (Z1 selection), why? Want you try to go further in the selection chain to monitor other variables?

Exercise 3: Optimize the cuts

The script cut_optimization.C is a template one can use to re-optimize some final cuts of the HZZ4l analysis. It runs on 4e final states
It runs on the set of ASCII input files prepared in the previous section
(You can even copy them locally on your laptop and run from there)

Let's look at the macro and run it.

root -l
.L cut_optimization.C+
analysis()

Output is the table with data/MC-background/MC-signal yields after sets of cuts implemented and measures of signal vs. background discrimination. There are various ways of estimating the "powerfulness" of the cuts in enhancing the signal and rejecting the background (aka "significance")
In the macros two estimators are used

  • simple ratio on number of signal events over number of background event: s/b
  • a more correct estimator is the significance defined by the log likelihood ratio, ScL: sqrt(2.0*(s+b)*log(1+s/b)-2.0*s).

Now, edit the macro and try to find more optimized set of cuts in terms of signal vs. background. Try to changes the macro to run also on 4mu and 2e2mu

Useful links

SwGuide for the HZZ4leptons package here

Contents

Introduction

Goal: Build and run the H->ZZ->4l analyses by using PAT objects algorithms

How to achieve that goal:

  • setup PATLayers to run on current HZZ skim samples
  • produce PATtuples with relevant collection in the event
  • make the analysis code working with edm::View to use the same code on RECO and PAT objects
  • prepare configuration files to use PAT objects and modules for the analysis
  • run the analysis with PAT configuration files
  • extract the results (plots and numbers)

How to get the code

Create a CMSSW working area:

 
   scramv1 project CMSSW CMSSW_2_2_13
   cd CMSSW_2_2_13/src

Download the relevant tags for HiggsAnalysis code:

 
   cvs co -r hzz4l_2213_V01_01_04 HiggsAnalysis/HiggsToZZ4Leptons
   cvs co -r V00-02-18 HiggsAnalysis/Skimming
   cvs co -r V00-01-11-1 RecoEgamma/ElectronIdentification
   cvs co -r CMSSW_2_2_13 PhysicsTools/Utilities/interface/AndSelector.h
   cvs co -r CMSSW_2_2_13 PhysicsTools/Utilities/interface/OrSelector.h
   cvs co -r CMSSW_2_2_13 PhysicsTools/UtilAlgos/interface/EventSetupInitTrait.h

How to setup the code to create PAT-tuples:

Modify some files:

 
   PhysicsTools/Utilities/interface/AndSelector.h
   PhysicsTools/Utilities/interface/OrSelector.h
   PhysicsTools/UtilAlgos/interface/EventSetupInitTrait.h

replacing the string "helpers" with "newhelpers" everywhere.

Follow the recipe to setup PATv2 in CMSSW_2_2_13 at link:

 
   addpkg CondFormats/JetMETObjects  V01-08-04
   addpkg PhysicsTools/RecoAlgos V08-06-16-06-02
   addpkg PhysicsTools/PFCandProducer V03-01-16
   addpkg RecoMET/Configuration V00-04-02-17
   addpkg RecoMET/METAlgorithms V02-05-00-21
   addpkg RecoMET/METProducers V02-08-02-17
   addpkg DataFormats/METReco V00-06-02-09
   addpkg DataFormats/MuonReco V07-02-12-03
   addpkg JetMETCorrections/Type1MET VB04-00-02-04
   addpkg RecoJets/JetAssociationAlgorithms V01-04-03
   addpkg JetMETCorrections/Algorithms V01-08-02-01
   addpkg JetMETCorrections/Configuration V01-08-15
   addpkg JetMETCorrections/JetPlusTrack V03-02-06
   addpkg JetMETCorrections/Modules V02-09-02

Modify PhysicsTools/PatAlgos/python/patEventContent_cff.py to add some collections needed for the analysis

 
   patEventContent = [
       'keep *_cleanLayer1Photons_*_*', 
       'keep *_cleanLayer1Electrons_*_*', 
       'keep *_cleanLayer1Muons_*_*', 
       'keep *_cleanLayer1Taus_*_*', 
       'keep *_cleanLayer1Jets_*_*',
       'keep *_layer1METs_*_*',
       'keep *_cleanLayer1Hemispheres_*_*',
       'keep *_cleanLayer1PFParticles_*_*',
       'keep *_offlinePrimaryVertices_*_*',
       'keep recoGsfTrackExtras_pixelMatchGsfFit_*_*',
       'keep recoTrackExtras_pixelMatchGsfFit_*_*',
       'keep recoTracks_generalTracks_*_*',
       'keep recoTrackExtras_generalTracks_*_*',
       'keep *_offlineBeamSpot_*_*'
   ]

Modify PhysicsTools/PatAlgos/python/recoLayer0/electronId_cff.py to run the same loose electronId we are currently using for the HZZ analyses

 
   import FWCore.ParameterSet.Config as cms

   from RecoEgamma.ElectronIdentification.electronIdCutBasedClassesExt_cfi import *
   import RecoEgamma.ElectronIdentification.electronIdCutBasedClassesExt_cfi
 
   eidRobustHighEnergy = 
RecoEgamma.ElectronIdentification.electronIdCutBasedClassesExt_cfi.eidCutBasedClassesExt.clone()

   patElectronId = cms.Sequence(
     eidRobustHighEnergy
   )

Modify the file PhysicsTools/PatAlgos/python/patSequences_cff.py by removing the trigger matching:

 
   beforeLayer1Objects = cms.Sequence(
      patAODReco +  # use '+', as there is no dependency
      patMCTruth   # among these sequences
   )

How to run the code to create PAT-tuples:

* Compile the code:*

 
   cd CMSSW_2_2_13/src
   scramv1 b

* Run the code to produce PAT-tuples:*

   
   cd HiggsAnalysis/HiggsToZZ4Leptons/test  
   cmsRun patLayer1_fromAOD_full.cfg.py

* The output PATtuple will include the following collections:*

PAToutput.bmp

How to produce PAT-tuples:

 
      /afs/cern.ch/user/n/ndefilip/public/crab_pat_H350_ZZ_4l_10TeV_GEN_HLT_4mu.cfg
      

Selection for HZZ analysis

Preselection consists of:

   -- PAT Layers for cleaning and electronID with the same tuning
   -- PAT Layers for building <PAT::object>  objects
   -- at least 2 PAT electrons with pT> 5 GeV/c irrespective of the charge
   -- at least 2 PAT muons with pT > 5 GeV/c  irrespective of the charge
   -- candidate combiner to build Zs and H 
   -- at least 1 Z->ee candidate with mll  >  12 GeV/c2
   -- at least 1 Z->mumu candidate with mll  >  12 GeV/c2
   -- at least one H candidate    with mllll > 100 GeV/c2
   -- two loose isolated electrons and muons

Full Selection consists of:

   -- tight isolation on leptons
   -- impact parameter constraint
   -- 2dIso vs pT cuts, pT cuts, mZ, mZ*, mH cuts --> not included in the python sequences

A simplified schema of the analysis could be found in this schema.

How to setup the HZZ analysis code

  • Use of edm::View to access physics objects (RECO or PAT), such as:

     edm::Handle<edm::View<Muon> > muons;
     edm::Handle<edm::View<CMS.GsfElectron> > electrons;

  • Prepare configuration files to use PAT for preselection and complete analysis, such as:

    HiggsAnalysis/HiggsToZZ4Leptons/python/hTozzTo4leptonsPreselectionPAT_2e2mu_cff.py
    HiggsAnalysis/HiggsToZZ4Leptons/python/hTozzTo4leptonsCompleteAnalysisPAT_2e2mu_cff.py

  • Example of a preselection python cfg file for 2e2mu analysis:

# Electron selection
from CMS.PhysicsTools.PatAlgos.selectionLayer1.electronSelector_cfi import *
import CMS.PhysicsTools.PatAlgos.selectionLayer1.electronSelector_cfi 
hTozzTo4leptonsElectronSelector=
CMS.PhysicsTools.PatAlgos.selectionLayer1.electronSelector_cfi.selectedLayer1Electrons.clone()
hTozzTo4leptonsElectronSelector.src = cms.InputTag("cleanLayer1Electrons")
hTozzTo4leptonsElectronSelector.cut = cms.string('pt > 5. & abs(eta) < 2.5')

# Muon selection
from CMS.PhysicsTools.PatAlgos.selectionLayer1.muonSelector_cfi import *
import CMS.PhysicsTools.PatAlgos.selectionLayer1.muonSelector_cfi 
hTozzTo4leptonsMuonSelector=
CMS.PhysicsTools.PatAlgos.selectionLayer1.muonSelector_cfi.selectedLayer1Muons.clone()
hTozzTo4leptonsMuonSelector.src = cms.InputTag("cleanLayer1Muons")
hTozzTo4leptonsMuonSelector.cut = 
cms.string('(pt > 5. & abs(eta) < 1.1) | (pt > 3. & p > 9. & abs(eta) >= 1.1)')

# zToEE
from HiggsAnalysis.HiggsToZZ4Leptons.zToEE_cfi import *
# zToMuMu
from HiggsAnalysis.HiggsToZZ4Leptons.zToMuMu_cfi import *                       

# hTozzToEEMuMu
from HiggsAnalysis.HiggsToZZ4Leptons.hTozzTo4leptons_cfi import *

# Electron loose isolation
from CMS.PhysicsTools.PatAlgos.selectionLayer1.electronSelector_cfi import *
import CMS.PhysicsTools.PatAlgos.selectionLayer1.electronSelector_cfi 
hTozzTo4leptonsElectronIsolationProducer=
CMS.PhysicsTools.PatAlgos.selectionLayer1.electronSelector_cfi.selectedLayer1Electrons.clone()
hTozzTo4leptonsElectronIsolationProducer.src = cms.InputTag("hTozzTo4leptonsElectronSelector")
hTozzTo4leptonsElectronIsolationProducer.cut = cms.string('trackIso/pt < 0.7')

# Muon loose isolation
from CMS.PhysicsTools.PatAlgos.selectionLayer1.muonSelector_cfi import *
import CMS.PhysicsTools.PatAlgos.selectionLayer1.muonSelector_cfi 
hTozzTo4leptonsMuonIsolationProducer=
CMS.PhysicsTools.PatAlgos.selectionLayer1.muonSelector_cfi.selectedLayer1Muons.clone()
hTozzTo4leptonsMuonIsolationProducer.src = cms.InputTag("hTozzTo4leptonsMuonSelector")
hTozzTo4leptonsMuonIsolationProducer.cut = 
cms.string('(2.0*trackIso+1.5*ecalIso+1.*hcalIso) < 60')

# Common preselection 
from HiggsAnalysis.HiggsToZZ4Leptons.hTozzTo4leptonsCommonPreselectionSequences_cff import *

How to run the HZZ analysis code

  • Configuration files to run the preselection in HiggsAnalysis/HiggsToZZ4Leptons/test:
   - HiggsToZZPreselection_2e2mu.py for 2e2mu preselection
   - HiggsToZZPreselection_4e.py for 4e preselection
   - HiggsToZZPreselection_4mu.py for 4mu preselection
   - HiggsToZZPreselection_4l.py for 2e2mu,4e and 4mu preselection

  • Configuration files to run the full analysis in HiggsAnalysis/HiggsToZZ4Leptons/test:
   - HiggsToZZCompleteAnalysis_2e2mu.py for 2e2mu full analysis
   - HiggsToZZCompleteAnalysis_4e.py for 4e full analysis
   - HiggsToZZCompleteAnalysis_4mu.py for 4mu full analysis
   - HiggsToZZCompleteAnalysis_4l.py for 2e2mu,4e and 4mu full analysis

  • Edit a configuration file and set a flag for PAT usage to 'true'
   usePAT='true'

  • Be sure that the input list of files is built with PAT-tuples such as
   /castor/cern.ch/user/n/ndefilip/PAT/PATLayer1_Output.fromAOD_full_1_h150_2e2mu.root
   /castor/cern.ch/user/n/ndefilip/PAT/PATLayer1_Output.fromAOD_full_1_h150_4e.root
   /castor/cern.ch/user/n/ndefilip/PAT/PATLayer1_Output.fromAOD_full_1_h150_4mu.root

  • Run the analysis:
   - cmsRun HiggsToZZCompleteAnalysis_2e2mu.py  #  for 2e2mu full analysis
   - cmsRun HiggsToZZCompleteAnalysis_4e.py       #  for 4e full analysis
   - cmsRun HiggsToZZCompleteAnalysis_4mu.py     #  for 4mu full analysis

  • Output files for 2e2mu analysis:
   preselect2e2mu.out             --> preselection efficiency for 2e2mu
   offselect2e2mu.out              --> offline selection efficiency for 2e2mu
   hTozzToEEMuMuCSA07.root  --> EDM file with filtered events
   roottree_2e2mu.root            --> ROOT tree with relevant variables

How to extract results: numbers and plots

Preselection:

cat preselect2e2mu.out
*********************** 
Preselection efficiency 
*********************** 

nSkim           : 155
nElec           : 107
nMuon           : 95
Z->EE           : 93
Z->MuMu         : 87
H->ZZ           : 87
loose IsolEle   : 87
loose IsolMu    : 82

If you run on all the signal samples a use the ROOT macro:

/afs/cern.ch/user/n/ndefilip/public/HZZ2e2muEfficiency.C
root -q HZZ2e2muEfficiency.C

you could compile a plot like this:

Effpresel.bmpf.

Plots after the complete analysis could be done with the macro:

/afs/cern.ch/user/n/ndefilip/public/simpleplots.C
root -q simpleplots.C

and you could obtain distributions like: masees.gif

Review status

Reviewer/Editor and Date (copy from screen) Comments
RogerWolf - 13 May 2009 Created the template page
SamirGuragain - 31 May 2012 Copied the recent instructions from https://twiki.cern.ch/twiki/bin/view/CMS/SWGuideCMSDataAnalysisSchoolHighMassHiggsSearchExercise and pasted here. The existing instructions in deprecated CMSSW release are in show/hide under the Useful links.

Responsible: KatiLassilaPerini
Last reviewed by: Samir Guragain 05-31-2012



PAT Examples: QCD Analysis

Contents

Introduction

  • QCD analysis of inclusive jet distribution, as preferred by the High pT QCD group, is presented as an example.
  • The object that play an important role in this analysis are
    • PAT::Jet

How to get the code

  • Login to lxplus
  • cd to your CMSSW base folder ($CMSSW_BASE/src) * This tutorial was tested and commissioned for CMSSW_3_8_2 [Special instruction: "setenv SCRAM_ARCH slc5_ia32_gcc434" is required to get this version of cmssw. ]
  • At the command prompt
    • >cmsenv
    • >cvs co UserCode/QCDPATExample/
    • >scramv1 b
  • This example predominantly uses the PAT default sequence and requires no additional code to be checked out.

How to run the code on Data:

  • To run the code
    • To make PAT tuples from RECO from real data:
      • >cd UserCode/QCDPATExample/test
      • >cmsRun QCD_PAT_Example_Data_cfg.py
      • This example produces a set of PAT objects to be analyzed offline, from a preselected, centrally located file containing actual CMS data from 2010. The configuration file can be used with any analysis in CMSSW_3_8_2 by moving the file to the test directory of your software package (or by simply running it here as shown in this example).
    • To produce plots:
      • >cmsRun qcdpatexample_cfg.py
    • This configuration file runs off of the output of the previous step. It produces a short series of histograms in a file labelled "PATTutorial.root" that you may wish to use as a starting point for producing more rigorous analysis plots.

How to run the code on Monte Carlo

  • To compile the code
    • $CMSSW_BASE/src/UserCode/QCDAnalysisExample
    • scram b
  • To run the code
    • To make PAT tuples from RECO from Monte Carlo:
      • >cd UserCode/QCDPATExample/test
      • >cmsRun QCD_PAT_Example_MC_cfg.py
      • This example produces a set of PAT objects to be analyzed offline, from a small sample of Monte Carlo QCD Jets from the Release Validation of CMSSW_3_8_2.
    • To produce plots:
      • >cmsRun qcdpatexample_mc_cfg.py
    • This configuration file runs off of the output of the previous step. It produces a short series of histograms in a file labelled "PATTutorialMC.root"

Find out more about the details

  • PAT object production using the jetSkimPAT.py is mainly using the PAT defaults. Let's take a look at the configuration file itself, specifically the version that runs on Data:


   from CMS.PhysicsTools.PatAlgos.patTemplate_cfg import *

   from CMS.PhysicsTools.PatAlgos.tools.coreTools import *
   removeMCMatching(process, ['All'])

  • We begin by importing a basic template for running the PAT default sequence, adding an additional statement that removes all matching to MC-generated objects. Obviously the last of these lines should be removed if you are running on MC.

 

   from CMS.PhysicsTools.PatAlgos.tools.metTools import *
   addPfMET(process, 'PF')

  • In this example, we will be performing our analysis on jets reconstructed using the Anti-kT algorithm and the Particle Flow (PF) scheme. Particle Flow uses its own brand of MET, separate from that in the default PAT sequence. These lines add that brand into our PAT Ntuple.

   from CMS.PhysicsTools.PatAlgos.tools.jetTools import *
   addJetCollection(process, cms.InputTag('ak5PFJets'), 'AK5', 'PF',
     doJTA            = False,
     doBTagging       = False,
     jetCorrLabel     = ('AK5', 'PF'),
     doType1MET       = False,
     genJetCollection = cms.InputTag("ak5GenJets"),
     doJetID          = False,
   )

  • Here we choose the jet algorithm we are going to use with the addJetCollection command from the jetTools PAT toolkit. The arguments are as follows:
    • Jet Algorithm: In this case, Anti-kT of size .5.
    • Jet Type: Here, Particle Flow. For both jet type and algorithm, see the above jet Tools documentation for lists of the most commonly used and accepted entries for these arguments.
    • doJTA, doBTagging: Run the jet-track associator, and perform B-Tagging respectively. Set to true by default.
    • jetCorrLabel: Choose the type of jet response corrections you wish to use (generally, this should correspond to the algorithm and type you specified in the first two arguments)
    • doType1MET: Triggers whether to use this particular MET correction for the jet collection. Non-Calojets do not use this correction, so for this example it is set to False.
    • genJetCollection: Select the collection of MC jets you wish to perform matching against. Unless you are looking at MC, this will not be relevant.
    • doJetID: Add JetID variables to the jet collection. Set to false here as Particle Flow Jets have the variables used as Jet ID build into the Jet object itself.
    • jetIDlabel: Select the type of jet ID variables you would like to use. Should, again, correspond to the algorithm selected above. Not used here as Jet ID is turned off.


   process.filterJets = cms.EDFilter("CandViewSelector",
      cut = cms.string  ('pt > 10.0'),
      src = cms.InputTag("ak5CMS.CaloJets"),
   )

   process.countJets = cms.EDFilter("PATCandViewCountFilter",
     minNumber = cms.uint32  (4),
     maxNumber = cms.uint32  (999999),
     src       = cms.InputTag("filterJets"),
   )

Here we do some basic jet screening as a form of event selection, filtering only on events containing a certain number of jets satisfying basic kinematic requirements.

  
   process.scrapingVeto = cms.EDFilter("FilterOutScraping",
     applyfilter = cms.untracked.bool  (True),
     debugOn     = cms.untracked.bool  (False),
     numtrack    = cms.untracked.uint32(10),
     thresh      = cms.untracked.double(0.25)
   )
 
   process.primaryVertexFilter = cms.EDFilter("GoodVertexFilter",
     vertexCollection = cms.InputTag('offlinePrimaryVertices'), 
     minimumNDOF      = cms.uint32  (4),
     maxAbsZ          = cms.double  (24.0),
     maxd0            = cms.double  (2.0),
   )

*These are further event selection cuts on beam scraping and vertex reconstruction, based on the general recommendations for QCD analyses at large by the tracking group, etc. The remainder of the file is the basic establishemnt of input/output files and selection of data objects to maintain in your PATtuple, which should be covered elsewhere in the tutorial.

Looking at the Executable

  • Now we'll take a look at the compiled code that runs over the PATtuple produced in the previous step, to produce some basic histograms. This code can be found by heading to: *>cd $CMSSW_BASE/src/UserCode/QCDPATExample/src/QCDPATExample.cc


   using namespace pat;
   edm::Handle<edm::View<pat::Jet> > jets;
   iEvent.getByLabel(jetCollection_,jets);

  • Here we acquire our PAT Jet collection. The label of this particular collection is "selectedPatJetsAK5PF", corresponding to the addJetCollection command from the previous steps of the tutorial, in which we instructed PAT to save jets from the AK5 algorithm and PF scheme. You can change the input to this step of the tutorial by editing the corresponding line in the qcdpatexample_cfg.py file

   int iJetMult=0;

   float fEtaMin=-2.4;
   float fEtaMax=2.4;
   
   float fPtMin=10.0;

   int NConst_min = 2;
   int NCharged_min = 1;
   float Charged_hadron_frac_min = 0.;
   float Charged_em_frac_max = 1.;
   float Neutral_hadron_frac_max = 0.9;
   float Neutral_em_frac_max = 0.9;

   

  • Next we set some jet ID variables to cut on. In this case, we are restricting ourselves to the inner edge of the forward region (abs(eta) < 2.4), applying the same pT cut we applied in the PAT filter, and additionally setting the approved noise thresholds for Particle Flow jets.


   for(edm::View<pat::Jet>::const_iterator i_jet=jets->begin(); i_jet!=jets->end(); ++i_jet){

      if(i_jet->pt()<fPtMin||i_jet->eta()<fEtaMin||i_jet->eta()>fEtaMax){
          continue;
        }
    
        if((abs(i_jet->getPFConstituents().size()) > NConst_min) &&
      (i_jet->chargedMultiplicity() > NCharged_min) &&
      (i_jet->chargedHadronEnergyFraction() > Charged_hadron_frac_min) &&
      (i_jet->chargedEmEnergyFraction() < Charged_em_frac_max) &&
      (i_jet->neutralHadronEnergyFraction() < Neutral_hadron_frac_max) &&
      (i_jet->neutralEmEnergyFraction() < Neutral_em_frac_max)){
          continue;
        }

  • Here we actually apply the cuts by accessing the corresponding cuts by looping over and accessing the individual PAT jet information. The remainder is simply filling histograms. If you follow these steps exactly with no modifications to input files, you should have plots similar to those shown below.

How to get more information

Review status

Reviewer/Editor and Date (copy from screen) Comments
VasundharaChetluru - 01 Jun 2009 Created the page from the template

Responsible: VasundharaChetluru
Last reviewed by: most recent reviewer and date

  • Jet Eta:
    JetEta.gif

  • Jet p_T:
    JetPt.gif


PAT Examples: Top Quark Analysis

Contents

Introduction

After having discussed many details and aspects of PAT during Exercise01 to Exercise10 of the regular PAT Tutorial we would like to flash a more comprehensive example for the use of PAT including the interplay of many of these aspects in a real life experiment. We have chosen the TopPAG Lepton+Jets reference selection that was used for the synchronisation of different analyses in the preparation for the latest results for the ICHEP2010. You can find these results summarised here.

We will concentrate on the selection of events containing top quark pairs in the semi-leptonic decay channel with a muon in the final state. These events are characterised by:

  • an isolated high pt muon with large transverse momentum.
  • four jets with large transverse momentum.
  • two jets originating from b-quarks.
  • missing transverse momentum (MET).

You can find a sketch of such an event below:

TTbar.png

The reference selection mostly makes use of the isolated high pt muon. It consists of a loose and a tight selection path, which are divided in up to seven steps. We give a slightly reduced table as it was used for the synchronisation exercises below:

Step Selection Comments/details/code snippets
1 HLT_Mu9 pass single muon trigger (trigger menu 8E29)
  NO other bits needed (bit 0, tech triggers, beam halo ...)
2 Primary vertex existence of a good primary vertex, satisfying
not(isFake) && ndof>4 && abs(z)<15 && ρ<2.0
3a Muon selection exactly one muon with the following requirements:
o GlobalMuon && TrackerMuon
o pT>20 GeV
o fabs(eta)<2.1
o relIso=(mu.trackIso()+mu.ecalIso()+mu.hcalIso())/muon.pT<0.05 (isolation cones dR=0.3)
o isGlobalMuonPromptTight, i.e.:
- muon.globalTrack()->normalizedChi2()<10.0
- muon.globalTrack()->hitPattern().numberOfValidMuonHits()>0
o Number of Hits in the silicon tracker NHits>10
muon.innerTrack()->numberOfValidHits()>10;
o ΔR(μ,jet)>0.3 where jet is any jet passing the jet requirements listed in step 6
o absolute 2D impact parameter d0(Bsp)<0.02cm using muon.dB() of the pat::Muon (i.e. w.r.t. the average beam spot using the innerTrack (tracker track) of the muon)
NOTE! You must set process.patMuons.usePV = False to set the dB to the beamspot.
3b Alternatively to 3a same as 3a, but relIso<0.1
4 Loose muon veto Reject event containing an additional (looser) muon, defined as:
o GlobalMuon
o pT>10 GeV, fabs(eta)<2.5
o relIso=(mu.trackIso()+mu.ecalIso()+mu.hcalIso())/muon.pT<0.2 (isolation cones dR=0.3)
5 Electron veto Reject event containing an electron, defined as:
o ET>15 GeV, fabs(eta)<2.5
o relIso=(ele.dr03TkSumPt()+ele.dr03EcalRecHitSumEt()+ele.dr03HcalTowerSumEt())/ele.et<0.2 (isolation cones dR=0.3)
6a,b,c >= 1,2,3 jets using antikt5CaloJets (L2L3 corrected) fulfilling:
o pT>30 GeV
o fabs(eta)<2.4
o jet ID, emf (electromagnetic fraction)>0.01 (patJets->emf())
o jet ID, n90Hits>1 (minimal number of RecHits containing 90% of the jet energy) (patJets->jetID().n90Hits)
o jet ID, fHPD<0.98 (fraction of energy in the hottest HPD readout) (patJets->jetID().fHPD)
7 >= 4 jets same as step 6, but requesting at least 4 jets

Data and selection strategy

For this example we use ~2 pb-1 of the PromptReco Muon skim ( /Mu/Run2010A-PromptReco-v4/RECO ) as indicated on DBS. The considered simulated samples and event numbers are given below:

Sample Cross Section Considered Events Corresp. Lumi link to DBS
ttbar (madgraph) 165 pb 1,000 6.06 pb-1 here
zjets (madgraph) 3,110 pb 10,000 3.21 pb-1 here
wjets (madgraph) 28,000 pb 100,000 3.57 pb-1 here
QCD (pythia) (1) 86,101 pb 3000,000 3.48 pb-1 here

(1) Corresponding to a muon enriched QCD sample.

We reduced the number of simulated events that we take into account to keep the amount of processing time manageable. The selection strategy starts from a customised PAT tuple, based on selectedPatCandidates. In a real life measurement you would most probably produce these PAT tuples using crab as detailed in Exercise 3.

Based on these PAT tuples we apply each of the seven selection steps following two selection paths corresponding to the tight (3a) and loose (3b) selection scenario with tighter or more relaxed requirements on the muon isolation. For the monitoring we extended the PatBasicAnalyzer from Exercise 4 to get a rough picture of the involved objects. To monitor them we interleave each selection step with a monitoring unit. Our choice fell on a full CMSSW implementation as we can easily make use of the SWGuidePhysicsCutParser for an intuitive and flexible object selection and re-apply the monitoring module before and/or after each selection step. The idea is to compare the event yield from data with the expectation from the simulation.

ALERT! Note: For fairness we should mention that for an analysis of this scale it is more suited to run on a batch system and to divide the work into smaller subjobs. To do the exercise indicated below on a shorter time scale you might want to divide all event yields by a factor of 10 or more.

Setting up of the environment

First of all connect to lxplus and go to some working directory. You can choose any directory, provided that you have enough space. You need ~5 MB of free disc space for this exercise. We recommend you to use your ~/scratch0 space. In case you don't have this (or do not even know what it is) check your quota typing fs lq and follow this link. If you don't have enough space, you may instead use the temporary space (/tmp/your_user_name), but be aware that this is lost once you log out of lxplus (or within something like a day). We will expect in the following that you have such a ~/scratch0 directory.

ssh lxplus
[ ... enter password ... ]
cd scratch0/

Create a directory for this exercise (to avoid interference with code from the other exercises).

mkdir topPAG
cd topPAG

Create a local release area and enter it.

cmsrel CMSSW_5_3_14
cd CMSSW_5_3_14/src 

The first command creates all directories needed in a local release area. Setting up the environment is done invoking the following script:

cmsenv

PAT tuple production

We start the exercise with the configuration file for the customised PAT tuple production. Checkout and compile the PatExamples package.

addpkg PhysicsTools/PatExamples V00-05-23
scram b PhysicsTools/PatExamples
(Instead of "addpkg PhysicsTools/PatExamples V00-05-23", "git cms-addpkg PhysicsTools/PatExamples" worked (Aug 16, 2014).)

ALERT! Note: You might want to use more than one core to compile the package making use of the scram compiler flag -j followed by the number of cores you would like to make use of.

To safe some time we have already produced the customised PAT tuples for you, starting from the patTuple_standard_cfg.py file as described in Exercise 2. You can find these PAT tuples for the data and the various samples under:

rfdir /castor/cern.ch/user/c/cmssup/
-rw-r--r--   1 cmssup   zh                  172500977 Sep 16 19:12 patTuple_wjets_0.root
-rw-r--r--   1 cmssup   zh                  172832053 Sep 16 19:12 patTuple_wjets_1.root
-rw-r--r--   1 cmssup   zh                  173885178 Sep 16 19:13 patTuple_wjets_2.root
-rw-r--r--   1 cmssup   zh                  173885178 Sep 16 19:13 patTuple_wjets_3.root
-rw-r--r--   1 cmssup   zh                  173885178 Sep 16 19:13 patTuple_wjets_4.root
-rw-r--r--   1 cmssup   zh                  173885178 Sep 16 19:13 patTuple_wjets_42X.root
-rw-r--r--   1 cmssup   zh                  173885178 Sep 16 19:13 patTuple_wjets_5.root
-rw-r--r--   1 cmssup   zh                  173885178 Sep 16 19:13 patTuple_wjets_6.root
-rw-r--r--   1 cmssup   zh                  173885178 Sep 16 19:13 patTuple_wjets_7.root
-rw-r--r--   1 cmssup   zh                  173885178 Sep 16 19:13 patTuple_wjets_8.root
-rw-r--r--   1 cmssup   zh                  173885178 Sep 16 19:13 patTuple_wjets_9.root
...

ALERT! Note: You will find these files only on castor at cern. When you are trying to do this exercise from some workgroup server pool outside cern you should check that these pat tuples are available and accessible there.

Have a look into the file patTuple_topPreproduction_cfg.py to learn more about the customisations that we applied:

## import skeleton process
from PhysicsTools.PatAlgos.patTemplate_cfg import *

## ---
## adjust inputs if necessary
## ---
##from PhysicsTools.PatAlgos.tools.cmsswVersionTools import run36xOn35xInput
##run36xOn35xInput(process)

## --- 
## adjust workflow to need in TopPAG
## ---
from PhysicsTools.PatAlgos.tools.coreTools import *
removeCleaning(process)
removeMCMatching(process, ['All'])
removeSpecificPATObjects(process, ['Photons','Taus'])

## ---
## adjust content
## ---
process.patMuons.usePV      = False
process.patMuons.embedTrack = True

## define event content
from PhysicsTools.PatAlgos.patEventContent_cff import patEventContentNoCleaning
process.out.outputCommands = cms.untracked.vstring('drop *', *patEventContentNoCleaning ) 
process.out.outputCommands+= [ 'keep edmTriggerResults_*_*_*',
                               'keep *_offlinePrimaryVertices_*_*'
                               ] 

process.p = cms.Path(
    process.patDefaultSequence
)

ALERT! Note: Due to technical reasons we had to adjust the workflow for the simulated events that had been produced within the CMSSW_3_5_X release series. You don't have to do this when testing the configuration file with the standard ttbar input sample that we used for the PAT Tutorial. Therefore we commented these lines here. ( In general it turns out that often when starting an analysis you have to check the exact event content of the samples that you are planning to use sample by sample and adjust the configuration accordingly. ) We dropped the photons and tau leptons from the content as we will not make further use of them and switched the MC matching and object cross cleaning off. We made sure to make use of the most actual JEC constants, added L5Flavor corrections as derived from simulated ttbar events, defined the impact parameter for muons with respect to the beamspot (in contrary to the primary vertex) and embedded the reco::Track information into the pat:.Muon as we later on want to apply selection requirements on it. We saved all edmTriggerResults (for selection step 1), the offlinePrimaryVertices collection (for selection step 2) and all remaining selectedPatCandidates into the PAT tuple. Many of the functions and steps applied in this file you encountered during Exercise 5 and Exercise 6.

You should run this configuration file once on a few events of the standard ttbar input sample that we used for the PAT Tutorial to convince yourself that it works and that you understand what it is doing.

ALERT! Note: The patTuple_topPreproduction_cfg.py_configuration file makes use of the a default file used for data validation at cern! When taken literally this example only works when working on _lxplus. If you want to try this example from somewhere else you have to adapt the input file.

[rwolf@lxplus231]~/scratch0/CMSSW_5_3_14/src% cmsRun PhysicsTools/PatExamples/test/patTuple_topPreproduction_cfg.py
removed from lepton counter: taus
---------------------------------------------------------------------
INFO   : some objects have been removed from the sequence. Switching 
         off PAT cross collection cleaning, as it might be of limited
         sense now. If you still want to keep object collection cross
         cleaning within PAT you need to run and configure it by hand
---------------------------------------------------------------------
INFO   : cleaning has been removed. Switch output from clean PAT     
         candidates to selected PAT candidates.
************** MC dependence removal ************
removing MC dependencies for photons
removing MC dependencies for electrons
removing MC dependencies for muons
removing MC dependencies for taus
WARNING: called applyPostfix for module/sequence tauGenJets which is not in patDefaultSequence!
WARNING: called applyPostfix for module/sequence tauGenJetsSelectorAllHadrons which is not in patDefaultSequence!
WARNING: called applyPostfix for module/sequence tauGenJetMatch which is not in patDefaultSequence!
removing MC dependencies for jets
WARNING: called applyPostfix for module/sequence cleanPatCandidates which is not in patDefaultSequence!
---------------------------------------------------------------------
INFO   : cleaning has been removed. Switch output from clean PAT     
         candidates to selected PAT candidates.
20-Sep-2010 19:02:04 CEST  Initiating request to open file rfio:/castor/cern.ch/cms/store/relval/CMSSW_3_8_0_pre8/RelValTTbar/GEN-SIM-RECO/START38_V6-v1/0004/847D00B0-608E-DF11-A37D-003048678FA0.root
20-Sep-2010 19:02:16 CEST  Successfully opened file rfio:/castor/cern.ch/cms/store/relval/CMSSW_3_8_0_pre8/RelValTTbar/GEN-SIM-RECO/START38_V6-v1/0004/847D00B0-608E-DF11-A37D-003048678FA0.root
Begin processing the 1st record. Run 1, Event 1001, LumiSection 777787 at 20-Sep-2010 19:02:25 CEST
Begin processing the 2nd record. Run 1, Event 1002, LumiSection 777787 at 20-Sep-2010 19:02:37 CEST
Begin processing the 3rd record. Run 1, Event 1003, LumiSection 777787 at 20-Sep-2010 19:02:37 CEST
Begin processing the 4th record. Run 1, Event 1004, LumiSection 777787 at 20-Sep-2010 19:02:37 CEST
Begin processing the 5th record. Run 1, Event 1005, LumiSection 777787 at 20-Sep-2010 19:02:37 CEST
Begin processing the 6th record. Run 1, Event 1006, LumiSection 777787 at 20-Sep-2010 19:02:37 CEST
Begin processing the 7th record. Run 1, Event 1007, LumiSection 777787 at 20-Sep-2010 19:02:37 CEST
Begin processing the 8th record. Run 1, Event 1008, LumiSection 777787 at 20-Sep-2010 19:02:37 CEST
Begin processing the 9th record. Run 1, Event 1009, LumiSection 777787 at 20-Sep-2010 19:02:37 CEST
...

You can check the event content of the resulting PAT tuple making use of the edmDumpEventContent tool:

[rwolf@lxplus231]~/scratch0/CMSSW_%PATRELEASE%/src% edmDumpEventContent patTuple.root 

  *******************************************
  *                                         *
  * Welcome to my .rootlogon.C!!!           *
  *                                         *
  * ...Define MyStyle 22.01.07...           *
  * ...Load CMSSW libraries...              *
  *                                         *
  * ...done                                 *
  *                                         *
  *******************************************

edm::TriggerResults               "TriggerResults"        ""            "HLT."         
edm::TriggerResults               "TriggerResults"        ""            "RECO."        
vector<reco::Vertex>              "offlinePrimaryVertices"    ""            "RECO."        
edm::OwnVector<reco::BaseTagInfo,edm::ClonePolicy<reco::BaseTagInfo> >     "selectedPatJets"       "tagInfos"    "PAT."         
edm::TriggerResults               "TriggerResults"        ""            "PAT."         
vector<CaloTower>                 "selectedPatJets"       "caloTowers"    "PAT."         
vector<pat::Electron>             "selectedPatElectrons"    ""            "PAT."         
vector<pat::Jet>                  "selectedPatJets"       ""            "PAT."         
vector<pat::MET>                  "patMETs"               ""            "PAT."         
vector<pat::Muon>                 "selectedPatMuons"      ""            "PAT."         
vector<reco::GenJet>              "selectedPatJets"       "genJets"     "PAT."         
[rwolf@lxplus231]~/scratch0/CMSSW_3_8_3/src% 

Event selection

For the event selection we make use of the file analyzeTopSelection_cfg.py in the test directory of the PatExamples package. It contains the pre-processed PAT tuples for the simulated ttbar events as input, you can just execute it as it is:

cmsRun PhysicsTools/PatExamples/test/analyzeTopSelection_cfg.py

We are going to have a closer look into the structure of the file while the first job is running. Apart from the obligatory input source and TFileService declaration it consists of four sections. We will go through them step by step in the following:

analyzerTopSelection_cfg.py

## ----------------------------------------------------------------
## Apply object selection according to TopPAG reference selection
## for ICHEP 2010. This will result in 5 additional collections:
##
## * goodJets
## * vetoElecs
## * vetoMuons
## * looseMuons
## * tightMuons
##
## Have a look ont the cff file to learn more about the exact
## selection citeria.
## ----------------------------------------------------------------
process.load("PhysicsTools.PatExamples.topObjectSelection_cff")
process.topObjectProduction = cms.Path(
    process.topObjectSelection
)

ALERT! Note: In this section all objects that will be needed for the event selection are created beforehand. These are object collections for which we applied the object requirements as defined in the selection table above. We have chosen the module labels pretty self-explaining if you have the above table in mind:

  • tightMuons
  • looseMuons
  • vetoMuons
  • vetoElecs
  • goodJets

All of them are pat::Candidate collections that have the corresponding default selectedPatCandidate collections as input. You can find their definitions in the indicated cff file. We will only discuss the muon collections here, the others are derived in an analogue way:

topObjectSelection_cff.py

## ---
##
## this cff file keep all object selections used for the TopPAG
## reference selection for ICHEP 2010
##
## ---
from PhysicsTools.PatAlgos.cleaningLayer1.muonCleaner_cfi import *
looseMuons = cleanPatMuons.clone(
    preselection =
    'isGlobalMuon & isTrackerMuon &'
    'pt > 20. &'
    'abs(eta) < 2.1 &'
    '(trackIso+caloIso)/pt < 0.1 &'
    'innerTrack.numberOfValidHits > 10 &'
    'globalTrack.normalizedChi2 < 10.0 &'
    'globalTrack.hitPattern.numberOfValidMuonHits > 0 &'
    'abs(dB) < 0.02',
    checkOverlaps = cms.PSet(
      jets = cms.PSet(
        src                 = cms.InputTag("goodJets"),
        algorithm           = cms.string("byDeltaR"),
        preselection        = cms.string(""),
        deltaR              = cms.double(0.3),
        checkRecoComponents = cms.bool(False),
        pairCut             = cms.string(""),
        requireNoOverlaps   = cms.bool(True),
      )
    )
)

tightMuons = cleanPatMuons.clone(
    src = 'looseMuons',
    preselection = '(trackIso+caloIso)/pt < 0.05'
)

from PhysicsTools.PatAlgos.selectionLayer1.muonSelector_cfi import *
vetoMuons = selectedPatMuons.clone(
    src = 'selectedPatMuons',
    cut =
    'isGlobalMuon &'
    'pt > 10. &'
    'abs(eta) < 2.5 &'
    '(trackIso+caloIso)/pt < 0.2'
)

ALERT! Note: We start off from the looseMuons collection for which we clone the cleanPatMuons as defined in the python/cleaningLayer1 directory of the PatAlgos package. We do this to implement the ΔR(μ,jet)>0.3 requirement between the muon and the closest selected jet in <η, φ> space. As this is a requirement on a relation between two objects we cannot simply apply it via the SWGuidePhysicsCutParser. The cut is applied in the parameter set checkOverlaps, where the jet collection we want to check the muons against (goodJets), the overlap checking algorithm (byDeltaR) and the value of deltaR are defined. The switch requireNoOverlaps takes care that muons, which have an overlap are indeed discarded from the collection. For the rest of the selection we make use of the preselection parameter of the cleanPatMuons module. Thus the PAT cleaning that you encountered in Exercise 7 appears in a new light here!

For convenience the tightMuons, which only adds stronger requirements on top of the looseMuons have the latter as input. We do not have to apply the full selection strings again. Note that all parameters, which are not replaced remain the same as defined in the looseMuons module. Note that for the vetoMuons collection there is no ΔR(μ,jet) requirement. It is therefore enough to stay with the selectedPatMuon module. Also for the jet collection and the electron veto collection there is no need to make use of the cleaning module, as there all requirements are applied just on the objects themselves. Accordingly you will see that we cloned the corresponding selectedPatCandidate modules as defined in the python/selectionLayer1 directory of the PatAlgos package to configure the modules for these collections.

We now come to the second block in the

analyzerTopSelection_cfg.py

## ----------------------------------------------------------------
## Define the steps for the TopPAG reference selection for ICHEP
## 2010. Have a look at the WorkBookPATExampleTopQuarks. These
## are event selections. They make use of the object selections
## applied in the step above.
## ----------------------------------------------------------------

## Trigger bit (HLT_mu9)
from HLTrigger.HLTfilters.hltHighLevel_cfi import *
process.step1  = hltHighLevel.clone(TriggerResultsTag = "TriggerResults::HLT", HLTPaths = ["HLT_Mu9"])
## Vertex requirement
process.step2  = cms.EDFilter("VertexSelector", src = cms.InputTag("offlinePrimaryVertices"), cut = cms.string("!isFake && ndof > 4 && abs(z) < 15 && position.Rho < 2"), filter = cms.bool(True))
## Exact one tight muon
from PhysicsTools.PatAlgos.selectionLayer1.muonCountFilter_cfi import *
process.step3a = countPatMuons.clone(src = 'tightMuons', minNumber = 1, maxNumber = 1)
## Exact one loose muon
process.step3b = countPatMuons.clone(src = 'looseMuons', minNumber = 1, maxNumber = 1)
## Veto on additional muons 
process.step4  = countPatMuons.clone(src = 'vetoMuons' , maxNumber = 1)
## Veto on additional electrons
from PhysicsTools.PatAlgos.selectionLayer1.electronCountFilter_cfi import *
process.step5  = countPatMuons.clone(src = 'vetoElecs' , maxNumber = 0)
## Different jet multiplicity selections
from PhysicsTools.PatAlgos.selectionLayer1.jetCountFilter_cfi import *
process.step6a = countPatJets.clone(src = 'goodJets'   , minNumber = 1)
process.step6b = countPatJets.clone(src = 'goodJets'   , minNumber = 2)
process.step6c = countPatJets.clone(src = 'goodJets'   , minNumber = 3)
process.step7  = countPatJets.clone(src = 'goodJets'   , minNumber = 4)

ALERT! Note: Here we define the event selection steps, making use of the candidateCountFilters as defined in the python/selectionLayer1 directory of the PatAlgos package. You can very easily connect the module names with the selection steps defined in the selection table. Note that these modules are event filters! While the modules for object selection were producers that created new collections containing the objects that passed the imposed requirements (which could also result into an empty collection) these modules will really discard events in cases where the collections do not fulfill the requirements.

In the third section we configure a set of modules to fill a set of very basic monitoring histograms, with according replacements for the input collections. The purpose of each module instance is pretty clear from its module label.

analyzerTopSelection_cfg.py

## ----------------------------------------------------------------
## Define monitoring modules for the event selection. You should
## few this only as an example for an analyses technique including
## full CMSSW features, not as a complete analysis.
## ----------------------------------------------------------------

from PhysicsTools.PatExamples.PatTopSelectionAnalyzer_cfi import *
process.monStart  = analyzePatTopSelection.clone(jets='goodJets')
process.monStep1  = analyzePatTopSelection.clone(jets='goodJets')
process.monStep2  = analyzePatTopSelection.clone(jets='goodJets')
process.monStep3a = analyzePatTopSelection.clone(muons='tightMuons', jets='goodJets')
process.monStep4  = analyzePatTopSelection.clone(muons='vetoMuons' , jets='goodJets')
process.monStep5  = analyzePatTopSelection.clone(muons='vetoMuons', elecs='vetoElecs', jets='goodJets')
process.monStep6a = analyzePatTopSelection.clone(muons='vetoMuons', elecs='vetoElecs', jets='goodJets')
process.monStep6b = analyzePatTopSelection.clone(muons='vetoMuons', elecs='vetoElecs', jets='goodJets')
process.monStep6c = analyzePatTopSelection.clone(muons='vetoMuons', elecs='vetoElecs', jets='goodJets')
process.monStep7  = analyzePatTopSelection.clone(muons='vetoMuons', elecs='vetoElecs', jets='goodJets')

ALERT! Note: You can find the module definition in the https://github.com/cms-sw/cmssw/blob/CMSSW_8_1_X/PhysicsTools/PatExamples/python/PatTopSelectionAnalyzer_cfi.py file in the python directory of the PatExamples package. We shortly flash it below:

analyzePatTopSelection = cms.EDAnalyzer("PatTopSelectionAnalyzer",
    elecs = cms.untracked.InputTag("selectedPatElectrons"),
    muons = cms.untracked.InputTag("selectedPatMuons"),                                             
    jets  = cms.untracked.InputTag("selectedPatJets"),
    met   = cms.untracked.InputTag("patMETs")
)

Indeed it's a modification of the PatBasicAnalyzer that you encountered in Exercise 4. The histograms that we are going to monitor are defined in the implementation of the module. We show the important excerpt below:

// book histograms:
  hists_["yield"   ]=fs->make<TH1F>("yield"   , "electron multiplicity",   1, 0.,   1.);
  hists_["elecMult"]=fs->make<TH1F>("elecMult", "electron multiplicity",  10, 0.,  10.);
  hists_["elecIso" ]=fs->make<TH1F>("elecIso" , "electron isolation"   ,  20, 0.,   1.);
  hists_["elecPt"  ]=fs->make<TH1F>("elecPt"  , "electron pt"          ,  30, 0., 150.);
  hists_["muonMult"]=fs->make<TH1F>("muonMult", "muon multiplicity"    ,  10, 0.,  10.);
  hists_["muonIso" ]=fs->make<TH1F>("muonIso" , "muon isolation"       ,  20, 0.,   1.);
  hists_["muonPt"  ]=fs->make<TH1F>("muonPt"  , "muon pt"              ,  30, 0., 150.);
  hists_["jetMult" ]=fs->make<TH1F>("jetMult" , "jet multiplicity"     ,  15, 0.,  15.);
  hists_["jet0Pt"  ]=fs->make<TH1F>("jet0Pt"  , "1. leading jet pt"    ,  50, 0., 250.);
  hists_["jet1Pt"  ]=fs->make<TH1F>("jet1Pt"  , "1. leading jet pt"    ,  50, 0., 250.);
  hists_["jet2Pt"  ]=fs->make<TH1F>("jet2Pt"  , "1. leading jet pt"    ,  50, 0., 200.);
  hists_["jet3Pt"  ]=fs->make<TH1F>("jet3Pt"  , "1. leading jet pt"    ,  50, 0., 200.);
  hists_["met"     ]=fs->make<TH1F>("met"     , "missing E_{T}"        ,  25, 0., 200.);

We have taken it from the PatTopSelectionAnalyzer.cc file in the plugins directory of the PatExamples package. Just have a look into the file yourself to learn more about the exact implementation and histogram filling.

The fourth and final section of the analyzeTopSelection_cfg.py file is dedicated to the process paths:

analyzerTopSelection_cfg.py

## ----------------------------------------------------------------
## Define the analysis paths: we define two selection paths to 
## monitor the cutflow according to the TopPAG reference selection
## for ICHEP 2010. All necessary object collections have been pro-
## duced in the cms.Path topObjectProduction before hand. The out-
## put report is switched on to get a quick overview of the number
## number of events after each selection step. 
## ----------------------------------------------------------------

## Switch output report on
process.options   = cms.untracked.PSet( wantSummary = cms.untracked.bool(True) )

## Define loose event selection path
process.looseEventSelection = cms.Path(
    process.step1      *
    process.step2      *
    process.step3b     *
    process.step4      *
    process.step5      *
    process.step6a     *
    process.step6b     *
    process.step6c
    )

## Define tight event selection path
process.tightEventSelection = cms.Path(
    process.monStart   * 
    process.step1      *
    process.monStep1   *         
    process.step2      *
    process.monStep2   * 
    process.step3a     *
    process.monStep3a  *     
    process.step4      *
    process.monStep4   *     
    process.step5      *
    process.monStep5   *     
    process.step6a     *
    process.monStep6a  *     
    process.step6b     *
    process.monStep6b  *     
    process.step6c     *
    process.monStep6c  *
    process.step7      *
    process.monStep7     
    )

ALERT! Note: You see that we switched the wantSummary option to True. We want to use it for our cutflow tables. We defined two paths as we want to follow a tight and a loose selection scenario. As after the object count filters whole events are discarded we cannot have the tight and the loose selection in one path, but nothing is wrong with having both paths in parallel. For simplicity reasons we fill monitor histograms only for the tight selection scenario up to selection step 7.

Once your process is finished you should see an output of this type in your shell:

TrigReport ---------- Modules in Path: topObjectProduction ------------
TrigReport  Trig Bit#    Visited     Passed     Failed      Error Name
TrigReport     1    0        100        100          0          0 goodJets
TrigReport     1    0        100        100          0          0 vetoElecs
TrigReport     1    0        100        100          0          0 vetoMuons
TrigReport     1    0        100        100          0          0 looseMuons
TrigReport     1    0        100        100          0          0 tightMuons

TrigReport ---------- Modules in Path: looseEventSelection ------------
TrigReport  Trig Bit#    Visited     Passed     Failed      Error Name
TrigReport     1    1        100         28         72          0 step1
TrigReport     1    1         28         28          0          0 step2
TrigReport     1    1         28         18         10          0 step3b
TrigReport     1    1         18         18          0          0 step4
TrigReport     1    1         18         15          3          0 step5
TrigReport     1    1         15         15          0          0 step6a
TrigReport     1    1         15         14          1          0 step6b
TrigReport     1    1         14         12          2          0 step6c

TrigReport ---------- Modules in Path: tightEventSelection ------------
TrigReport  Trig Bit#    Visited     Passed     Failed      Error Name
TrigReport     1    2        100        100          0          0 monStart
TrigReport     1    2        100         28         72          0 step1
TrigReport     1    2         28         28          0          0 monStep1
TrigReport     1    2         28         28          0          0 step2
TrigReport     1    2         28         28          0          0 monStep2
TrigReport     1    2         28         18         10          0 step3a
TrigReport     1    2         18         18          0          0 monStep3a
TrigReport     1    2         18         18          0          0 step4
TrigReport     1    2         18         18          0          0 monStep4
TrigReport     1    2         18         15          3          0 step5
TrigReport     1    2         15         15          0          0 monStep5
TrigReport     1    2         15         15          0          0 step6a
TrigReport     1    2         15         15          0          0 monStep6a
TrigReport     1    2         15         14          1          0 step6b
TrigReport     1    2         14         14          0          0 monStep6b
TrigReport     1    2         14         12          2          0 step6c
TrigReport     1    2         12         12          0          0 monStep6c
TrigReport     1    2         12          6          6          0 step7
TrigReport     1    2          6          6          0          0 monStep7

ALERT! Note: This example was only run on 100 events for demonstration purposes. You can easily find the paths back that we defined in our process:

  • topObjectProduction: to produce the selected object collections we used as input for the event selection.
  • looseEventSelection: the loose event selection scenario.
  • tightEventSelection: the tight event selection scenario.

Question Exercise a):
You can read the event numbers from the latter two path summaries when looking into the column passed. Run over all remaining samples and fill out the tables below:

Tight Selection Data Top Wjets Zjets QCD
step1 1206415       233807
step2 1194487       233739
step3a 10727       918
step4 10354       916
step5 10309       913
step6a 1907       361
step6b 372       46
step6c 81       5
step7 20       0
Weight Factor -        

Add the numbers you find for the following samples and steps in the tutorial form:

  • Ttbar, step 3a
  • Wjets, step 5
  • Zjets, step 6b

Save the histogram files as:

  • analyzePatTopSelection.root (data)
  • analyzePatTopSelection_ttbar.root
  • analyzePatTopSelection_wjets.root
  • analyzePatTopSelection_zjets.root
  • analyzePatTopSelection_qcd.root

ALERT! ATTENTION ALERT! The simulated events that have been used so far have been subject to a different pre-selection than the data. Therefore they are not fully comparable up to selection step 3a. In real life the agreement is much better. We will replaced the simulation by proper files in the next CMSSW update cycle.

Question Exercise b):
When finished with Exercise a) start root and run the macro indicated blelow:

[rwolf@lxplus231]~/scratch0/CMSSW_5_3_14/src% ls
PhysicsTools                     analyzePatTopSelection_ttbar.root  analyzePatTopSelection_zjets.root
analyzePatTopSelection_qcd.root  analyzePatTopSelection_wjets.root  patTuple.root
[rwolf@lxplus231]~/scratch0/CMSSW_5_3_14/src% root- l 
.x PhysicsTools/PatExamples/bin/monitorTopSelection.C+("yield", "Step1")
Info in : creating shared library /afs/cern.ch/user/r/rwolf/scratch0/CMSSW_3_8_3/src/./PhysicsTools/PatExamples/bin/monitorTopSelection_C.so
/usr/bin/ld: skipping incompatible /usr/lib64/libm.so when searching for -lm
/usr/bin/ld: skipping incompatible /usr/lib64/libm.a when searching for -lm
/usr/bin/ld: skipping incompatible /usr/lib64/libc.so when searching for -lc
/usr/bin/ld: skipping incompatible /usr/lib64/libc.a when searching for -lc

This should result in a set of plots of this kind:

Yield.png MuonPt.png

(left/upper) Event yield after each selection step. (right/lower) Transverse momentum of the leading muon after selection step 3a.

ALERT! Note: The macro monitorTopSelection.C expects all input files to present in the src directory of your working environment. It takes two arguments, histName indicating the physics quantity to be checked and selectionStep indicating the level at which to check this quantity. Both parameters have been set to default values for the beginning, resulting in the histogram shown above. You can check the following

histName Physics Quantity
yield yield plot
elecMult electron multiplicity
elecIso relIso for the leading electron
elecPt pt for the leading electron
muonMult muon multiplicity
muonIso relIso for the leading muon
muonPt pt for the leading muon
jetMult jet multipliciy
jet1Pt pt for the 1. leading jet
jet2Pt pt for the 2. leading jet
jet3Pt pt for the 3. leading jet
jet4Pt pt for the 4. leading jet
met missing transverse momentum (1)

(1) from default patMET, defined as the calorimeter based MET after type1 and muon corrections.

corresponding to the names that have been used for the histogram booking above. You may use the following selection steps Step1 till Step7 as second argument corresponding to the directories in the corresponding root histogram file. ( Indeed for yield the second argument selectionStep will be ignored. ) Have a look into one of the root histogram files to understand the file structure and a look into the implementation of the macro to see how it works. After all the macro is not too complicated.

ALERT! Note: You can find a prepared set of root histogram files, which you can copy into your working directory at

/afs/cern.ch/cms/Tutorials/TWIKI_DATA/root/

More complex analyses of top quarks

In the above example we followed a very of an analysis, the selection of events containing top quarks pairs. For high level analyses of top quark pair quantities the TopPAG supports an own toolkit:

The Top Quark Analysis Framework (TQAF) is a toolkit to facilitate the analysis of final states containing top quark pairs in the dileptonic, semi-leptonic and full hadronic decay channel. It is an integral part of CMSSW and build up on the Physics Analysis Toolkit (PAT). For more details on the TQAF have a look to the SWGuideTQAF. You will find the links to a set of examples on the use of the Top Quark Analysis Framework (TQAF) below. Each chapter will list the existing and planned tutorials for each corresponding decay channel fo the top antitop quark pair.

Review status

Reviewer/Editor and Date (copy from screen) Comments
LukasKreczko - 2016-07-07 CVS → Github
SamirGuragain - 11 June 2012 Went over the instructions.
RogerWolf - 7 July 2011 Update to 42X with some frictional losses.

Responsible: Volker Adler, Sebastian Naumann-Emme, and Kati Lassila Perini

Topic attachments
I Attachment History Action Size Date Who Comment
PNGpng 0107366249d1faea2b8acab040249225.png   manage 0.2 K 2006-12-20 - 07:51 UnknownUser  
PNGpng 146c613534e5b801c120dc3a1b91ff87.png   manage 0.4 K 2006-12-20 - 07:51 UnknownUser  
PNGpng 2796af5074a7f27ecccd3cd17e165d53.png   manage 0.2 K 2006-12-20 - 07:51 UnknownUser  
PNGpng 5edb782479fb92eaa1f65d62075eee1c.png   manage 0.3 K 2006-12-20 - 07:51 UnknownUser  
PNGpng 75511ce7f520a4d76f51938f35e05981.png   manage 0.3 K 2007-03-15 - 09:59 UnknownUser  
PNGpng 967878d1da852d4b07a961e3168b0fff.png   manage 0.2 K 2006-12-20 - 07:51 UnknownUser  
PNGpng c931be5cf7972a42e3540b594fbc95a1.png   manage 0.3 K 2006-12-20 - 07:51 UnknownUser  
PNGpng cbc93ea5c2e797d783673051f2e7cfdf.png   manage 0.2 K 2006-12-20 - 07:51 UnknownUser  
Edit | Attach | Watch | Print version | History: r27 < r26 < r25 < r24 < r23 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r27 - 2010-01-19 - KatiLassilaPerini


ESSENTIALS

ADVANCED TOPICS


 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback