PAT v2 Cleaning

Complete: 2


In the new PAT workflow, the cleaning tools have been changed to benefit from the fact they now run on PAT Objects.

Generally speaking, a cleaning is a series of steps (cmssw modules) each of which produces one collection of clean items starting from the inclusive lists of objects and from the outputs of the previous cleaning steps.

Each cleaning step starts from one input collection and:

  1. Discard from its input collection any items already present in one or more collections of clean objects of the same type (e.g. to make two exclusive lists of jets, one that doesn't include the ones overlapping with electrons and one that includes only those)
    Warning, important Note: this has not yet been implemented.
  2. Apply one generic preselection cut to the input objects
  3. Check for overlaps with one or more input collections. Depending on how each overlap checking test is configured, items that have overlaps can be kept or discarded. The algorithm saves in the clean objects some EDM Pointers to the items with which they overlap, so that they can be retrieved in the later steps of the analysis (as long as they're still in the root file. otherwise, one can only know the number of overlapping items for each test)
  4. One generic selection cut is then applied to the objects before saving them in the event.

Default configuration

In the default PAT cleaning configuration

  1. Muons: all muons are considered clean; no preselection or final selection cut is applied, and no overlaps are checked for.
  2. Electrons: clean muons that overlap by delta R (<0.3) and saved in the pat electrons but electrons are not discarded. no preselection or final selection cut is applied.
  3. Photons: no preselection or final selection cut is applied; photons are checked for overlap against the clean electrons by supercluster seed, and photons that overlap are discarded.
  4. Taus: by default only taus that pass the discriminator by isolation are accepted; overlaps by delta R (0.3) with electrons and muons are saved but no cuts are applied.
  5. Jets: by default all jets are kept. overlaps with all the above collections are saved (delta R < 0.5). In addition, another overlap test is performed against tracker isolated electrons (delta R < 0.3; electron trackIso < 3; electron pt > 10 GeV) and saved with label "tkIsoElectrons" (this is identical to the old flag in PAT v1).

Note: there is no MET cleaning, it doesn't make any sense, there is one and only one MET in the event.

This configuration is there mostly for historical reasons, and will be soon reviewed to make it somewhat more sensible.

Looking at the overlaps in the output PAT Objects

If a PAT object has been passed trough the overlap checking part of a cleaner, and it was not discarded because of the overlaps (e.g. because requireNoOverlaps is kept to False), it's possible to inspect the overlaps from the object directly at any later step of the analysis.

  • To see if there was any overlap with some collection, you can use the hasOverlap method, passing the name of the overlap check.
  • To get a list of items that overlap to a given PAT object, you can use the overlaps method, again passing the name of the overlap check The result is a CandidatePtrVector, a vector of EDM pointers to Candidate objects, so you can:
    • Ask the size() to know how many items there were
    • Get the basic Candidate quantities like kinematics for one of the items (note: this works only if the collection they live in is still in the root file).
    • Use dynamic_cast to convert the pointers to a the specific PAT Object type to access variables which are not in the base Candidate class
    • Check if two edm::Ptrs point to the same object, by comparing them with '==' Example:
      const reco::CandidatePtrVector & elecs = myJet.overlaps("electrons");
      std::cout << "This jet overlaps with " << elecs.size() <<  " electrons." << std::endl;
      for (size_t i = 0; i < elecs.size(); ++i) {
           std::cout << "  electron " <<  i << " pt = " <<  elecs[i]->pt() <<  std::endl;
           // try to convert in a pat::Electron
           const pat::Electrion *elec = dynamic_cast<const pat::Electron *>(&*elecs[i]);
           if (elec) {
                 std::cout << "  electron " <<  i << " electron id " <<  ele->electronID("eidRobustTight") <<  std::endl;
  • Get the full list of overlap checks that found at least one overlap, to do some inspection or debugging, by using overlapLabels()

Easy changes to the configuration

Some simple example of configuration changes can be done in the PAT cleaning:

  • The muon preselection cut can be changed e.g. to apply some standard muon ID, pt or isolation cut.
    process.cleanLayer1Muons.preselection = "isGood('GlobalMuonPromptTight')"
  • The electron preselection cut can be changed e.g. to apply some standard electron ID, pt or isolation cut.
    process.cleanLayer1Electrons.preselection = "(electronID('eidRobustLoose') > 0) && (trackIso < 3)"
  • You can choose to accept also photons that share the same supercluster seed with the electrons
    process.cleanLayer1Photons.checkOverlaps.electrons.requireNoOverlaps = False
  • You can choose to swap the order in which electrons and photons are cleaned
    # take away electrons and put them after the photons
    process.cleanLayer1Objects.replace(process.cleanLayer1Photons, process.cleanLayer1Photons * process.cleanLayer1Electrons)
    # don't remove electrons from photons
    process.cleanLayer1Photons.checkOverlaps = cms.PSet()
    # remove photons from electrons
    process.cleanLayer1Electrons.checkOverlaps = cms.PSet(
         photons = cms.PSet(
               src       = cms.InputTag("cleanLayer1Photons"),
               algorithm = cms.string("bySuperClusterSeed"),
               requireNoOvelaps = cms.bool(True), # discard electrons that overlap!
  • You can add to the sequence run a copy of a cleaner module with different parameters
    # make a copy of e.g. the electron cleaner
    process.myCleanLayer1Electrons = process.cleanLayer1Electrons.clone()
    # modify some configuration (e.g. the preselection)
    process.myCleanLayer1Electrons.preselection = 'pt > 5'
    # add it next to the electron cleaner
    process.cleanLayer1Objects.replace(process.cleanLayer1Electrons, process.cleanLayer1Electrons + process.myCleanLayer1Electrons)
    # (optional) add it to the cleanLayer1Summary, to see how many items pass
    process.cleanLayer1Summary.candidates.append( cms.InputTag("myCleanLayer1Electrons") )

Detailed Description

Preselection and Final Cut

Preselection and final selection cuts use the standard physics tools cut parser, and can access all the variables and methods of the specific object (e.g. all pat::Electron methods in the PATElectronCleaner ).

In the final selection cut the overlaps have been already added to the item, so one can use them. Just as an example, even if not particularly well sounded physics-wise, you can select taus that overlap with exactly one muon and not with any electron through the cut string !hasOverlaps("electrons") &&  (overlaps("muons").size() == 1)

Overlap checking

Overlap checking can be used to mark or select items that have overlaps with others.

Different algorithms can be used; currently there is one generic overlap algorithm (by deltaR, optionally checking also for shared references to AOD objects, e.g. tracks, among the two) and a specific algorithm that selects objects that share a supercluster seed; the latter can also serve as an example for developing similar specific algorithms.

The configuration of the overlap checking is as follows:

checkOverlaps = cms.PSet(
       someCheck = cms.PSet(
               src       = cms.InputTag("collection to check overlap against"),
               algorithm = cms.string("the alogorithm to use to search the overlap"),
               ... parameters specific to the algorithm, e.g. a deltaR cut ...
               requireNoOvelaps   = cms.bool(True if objects with overlaps must be discarded in the cleaning, False otherwise)               
       otherCheck = cms.PSet(

Default Overlap by deltaR

The default overlap algorithm, based on delta R distance, is named byDeltaR and does the following checks:

  1. A preselection on the list of items to check overlaps against (e.g. to flag only jets that overlap with isolated electrons). This is controlled by the parameter preselection , which is a generic cut string and has access to all variables of PAT objects (of course you will get an exception if you try to use an electron-specific variable when the input collection contains PAT Muon instead of PAT Electrons)
  2. A delta R matching, with cone size equal to the deltaR parameter
  3. If requested, it uses the Candidate Overlap Checker to check if the two items share a reference to the same RECO object (e.g. if a PAT Muon and a PAT GenericParticle use the same reco::Track ). This is controlled by the parameter checkRecoComponents .
  4. It can appy a combined cut on the variables of the two objects of the pair, e.g. 0.5 < < 1.5 . You can refer to the two members of the pair by their type ( "ele", "mu", "tau", "gam" for Photon, "jet", "met", "part" for GenericParticle and "pf" for PFParticle ). If the two particles are of the same type, you need to specify the number (e.g. "ele1", "mu2"); particle 1 is the from the collection you're cleaning, particle 2 is the one from the other collection.
    You can also access the deltaR ( as deltaR ) and the total 4-momentum ( as totalP4 ).

If the total collection you're checking the overlap against is not of PAT objects, you can generically refer to its items as "cand2" (because "cand1" is the item you are cleaning)

Overlap checker by SuperCluster Seed

This algorithm is there both as an example and because it was present in the previous versions of PAT cleaning. It is called =bySuperClusterSeed=<>, and doesn't take any additional parameter; it just checks if the two superclusters refer to the same seed.


  • Both objects should be of some type inheriting from reco::RecoCandidate , and should have a non-null SuperCluster reference (e.g. Electron, Photon or PAT GenericParticle made from RecoEcalCandidate).
  • The SuperClusters must be accessible, either because their collection is in the ROOT file or because they have been embedded in the PAT objects
  • The seeds, instead, don't have to actually be available, as the algorithm just checks if the two seed references point to the same seed and doesn't look at the seed contents.

Reviewer/Editor and Date (copy from screen) Comments
GiovanniPetrucciani - 14 Jan 2009 created page

Responsible: PAT Team

Edit | Attach | Watch | Print version | History: r5 < r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r5 - 2009-06-17 - FredericRonga
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback