Jet Flavour Identification (MC Truth)

Complete: 5

Introduction

The package PhysicsTools/JetMCAlgos provides tools to identify the flavour of reconstructed jets based on the Monte Carlo truth information provided in the GenParticle collection. The identification is based on the hadrons or the partons. Data formats have been created in SimDataFormats/JetMatching which allow for saving the jet flavour information in the event.

Hadron(+parton)-based jet flavour definition UPDATED

Warning, important NOTE: Included in CMSSW_5_3_X, 6_2_X_SLHC, 7_0_X, and newer.

There is no unambiguous answer to the correct underlying flavour of a reconstructed jet. Hence, several different jet flavour definitions have existed in CMS, reflecting different points of view on the subject. However, since the introduction of the original jet flavour tools (see the "Legacy parton-based jet flavour definition" section), several developments have occurred which made it necessary to update the existing jet flavour tools. The first development was the introduction of new C++-based Monte Carlo event generators/hadronizers (such as Pythia8, Herwig++, and Sherpa) which follow the HepMC particle status code convention. Since the original jet flavour tools were written specifically for Pythia6, which follows the older HEPEVT particle status code convention for Fortran-based Monte Carlo event generators (such as Pythia6 and Herwig6), it was necessary to update the tools to support additional event generators/hadronizers. To avoid defining generator-specific jet flavour algorithms, b and c hadrons are now used to define b and c jets. Nevertheless, to further distinguish different types of light-flavour jets, partons still need to be used. The second development was the introduction of various jet substructure techniques, with fat jets and subjets of varying sizes. To address this second issue, the jet flavour is defined using the jet clustering algorithms.

The jet flavour is determined by re-clustering the jet contituents with selected hadrons and partons. The hadron and parton four-momenta are rescaled by a very small number (the default rescale factor is 10-18) which turns them into the so-called "ghosts". The "ghost" hadrons and partons are clustered together with all of the jet constituents. It is important to use the same clustering algorithm and jet size as for the original input jet collection. Since the "ghost" hadrons and partons are extremely soft, the resulting jet collection will be practically identical to the original one but now with "ghost" hadrons and partons clustered inside jets. The jet flavour is determined based on the "ghost" hadrons clustered inside a jet:

  • jet is considered a b jet if there is at least one b "ghost" hadron clustered inside it (hadronFlavour=5)
  • jet is considered a c jet if there is at least one c and no b "ghost" hadrons clustered inside it (hadronFlavour=4)
  • jet is considered a light-flavour jet if there are no b or c "ghost" hadrons clustered inside it (hadronFlavour=0)

To further assign a more specific flavour to light-flavour jets, "ghost" partons are used:

  • jet is considered a b jet if there is at least one b "ghost" parton clustered inside it (partonFlavour=5)
  • jet is considered a c jet if there is at least one c and no b "ghost" partons clustered inside it (partonFlavour=4)
  • jet is considered a light-flavour jet if there are light-flavour and no b or c "ghost" partons clustered inside it. The jet is assigned the flavour of the hardest light-flavour "ghost" parton clustered inside it (partonFlavour=1, 2, 3, or 21)
  • jet has an undefined flavour if there are no "ghost" partons clustered inside it (partonFlavour=0)

In rare instances a conflict between the hadron- and parton-based flavours can occur. In such cases it is possible to keep both flavours or to give priority to the hadron-based flavour. This is controlled by the hadronFlavourHasPriority switch. The priority is given to the hadron-based flavour as follows:

  • if hadronFlavour==0 && (partonFlavour==4 || partonFlavour==5): partonFlavour is set to the flavour of the hardest light-flavour parton clustered inside the jet if such parton exists. Otherwise, the parton flavour is left undefined
  • if hadronFlavour!=0 && hadronFlavour!=partonFlavour: partonFlavour is set equal to hadronFlavour

The flavour can also be assigned to subjets of fat jets. In that case, three input jet collections are required:

  • jets, in this case represented by fat jets
  • groomed jets, which is a collection of fat jets from which the subjets are derived (e.g. pruned, filtered, soft drop, top-tagged, etc. jets)
  • subjets, derived from the groomed fat jets

The "ghost" hadrons and partons clustered inside a fat jet are assigned to the closest subjet in the rapidity-phi space. Once hadrons and partons have been assigned to subjets, the subjet flavour is determined in the same way as for jets. The reason for requiring three jet collections as input in order to determine the subjet flavour is to avoid possible inconsistencies between the fat jet and subjet flavours (such as a non-b fat jet having a b subjet and vice versa) as well as the fact that re-clustering the constituents of groomed fat jets will generally result in a jet collection different from the input groomed fat jets. Also note that "ghost" particles generally cannot be clustered inside subjets in the same way this is done for fat jets. This is because some of the jet grooming techniques could reject such very soft particle. So instead, the "ghost" particles are assigned to the closest subjet.

Finally, "ghost" leptons can also be clustered inside jets but they are not used in any way to determine the jet flavour. This functionality is optional and is potentially useful to identify jets from hadronic taus.

Warning, important NOTE 1: Presently, the hadron-based jet flavour cannot be run on CaloJets. The reason is that CaloJets are reconstructed from CaloTowers with a primary vertex correction applied (primary interaction vertex is set as the reference point for the calculation of the CaloTower four-momentum). This feature is presently not implemented. In addition, CaloTowers (*_towerMaker_*_*) are dropped from AOD since CMSSW≥6_1_0 making it impossible to recluster CaloJets.

Warning, important NOTE 2: The hadron-based jet flavour should not be run on groomed jets or any modification of the original jets where some of the jet constituents have been removed. This is because reclustering constituents of such jets will generally result in a different jet configuration (only original jets with all of their constituents included are stable under reclustering).

Warning, important NOTE 3: Using "ghost" hadrons and partons seems like a natural way to define the jet flavour since the "ghost" particles participate in the jet clustering but don't change individual jets or the overall jet configuration and in that way end up clustered in only one jet or their own pure "ghost"-like jet. Hence, the jet flavour definition based on the catchment area of jets has a built-in ambiguity resolution and naturally handles different jet sizes. However, because of the extreme softness of the "ghost" particles and the way pairwise distance measure is defined for different clustering algorithms, for the Cambridge/Aachen and kT clustering algorithms it is possible that two or more "ghost" particles first get clustered together into a pure "ghost"-like pseudojet and only then with one of the actual jet constituents (for the anti-kT algorithm the "ghost" particles will first get clustered with one of the actual jet constituents before being clustered with any of the other "ghost" particles). This means that the jet flavour for Cambridge/Aachen and kT jets will have a slight dependence on what "ghost" particles are being clustered (for example, the hadron-based flavour for individual jets might change slightly depending on whether "ghost" leptons are being clustered or not). In other words, the algorithm is not infrared safe.

DataFormats

JetFlavourInfo, defined in SimDataFormats/JetMatching/interface/JetFlavourInfo.h, is a class where the following information is stored:

  • vector of EDM references to clustered b hadrons
  • vector of EDM references to clustered c hadrons
  • vector of EDM references to clustered partons
  • vector of EDM references to clustered leptons
  • the hadron-based flavour by value
  • the parton-based flavour by value

JetFlavourInfoMatchingCollection, defined in SimDataFormats/JetMatching/interface/JetFlavourInfoMatching.h, is a data format where an object of the JetFlavourInfo type is associated (using edm::AssociationVector) to each jet.

Plugins to get the jet flavour

All the following routines are defined in PhysicsTools/JetMCAlgos/plugins/. First a subset of the GenParticle collection needed to run the jet flavour definition is built using the HadronAndPartonSelector.cc plugin. The plugin selects hadrons, partons, and leptons from the GenParticle collection and stores vectors of EDM references to these particles in the event. The following hadrons are selected:

  • b hadrons that do not have other b hadrons as daughters
  • c hadrons that do not have other c hadrons as daughters

As mentioned above, older Fortran-based Monte Carlo event generators/hadronizers (Pythia6 and Herwig6) follow the HEPEVT particle status code convention while newer C++-based Monte Carlo event generators/hadronizers (Pythia8, Herwig++, and Sherpa) follow the HepMC particle status code convention. However, both conventions give considerable freedom in defining the status codes of intermediate particle states. Hence, the parton selection is generator-dependent. Using the provenance information of the GenEventInfoProduct, the HadronAndPartonSelector.cc plugin attempts to automatically determine what generator/hadronizer was used to hadronize events and based on that information decides what parton selection mode to use. It is also possible to enforce any of the supported parton selection modes. The following partons are selected:

  • status=2 partons for Pythia6, Herwig6, and Herwig++
  • status=11 partons for Sherpa
  • for Pythia8 partons that don't have other partons as daughters, i.e., partons from the end of the parton showering sequence are selected (in releases earlier than CMSSW_7_5_5 and 7_6_1, these used to be status=71 or 72 partons but this selection was found to be not inclusive enough)

In addition, the following leptons are selected:

  • status=1 electrons and muons
  • status=2 taus

The HadronAndPartonSelector.cc plugin has the following configurable parameters:

  • InputTag src: the GenEventInfoProduct name
  • InputTag particles: the GenParticle collection
  • string partonMode: the parton selection mode (supported modes are Auto, Pythia6, Pythia8, Herwig6, Herwig++, and Sherpa; default is Auto)

The selected hadrons and partons are finally used by the JetFlavourClustering.cc plugin that determines the jet flavour. The JetFlavourClustering.cc plugin has the following configurable parameters:

  • InputTag jets: the input jet collection
  • InputTag groomedJets: the input groomed jet collection (optional)
  • InputTag subjets: the input subjet collection (optional)
  • InputTag bHadrons: the input b hadrons (from the HadronAndPartonSelector.cc plugin)
  • InputTag cHadrons: the input c hadrons (from the HadronAndPartonSelector.cc plugin)
  • InputTag partons: the input partons (from the HadronAndPartonSelector.cc plugin)
  • InputTag leptons: the input leptons (from the HadronAndPartonSelector.cc plugin, optional)
  • string jetAlgorithm: the jet clustering algorithms (supported algorithms are CambridgeAachen, Kt, and AntiKt)
  • double rParam: the jet size
  • double ghostRescaling: the ghost momentum rescale factor (default is 10-18)
  • bool hadronFlavourHasPriority: the switch to give priority to the hadron-based flavour

The jet flavour information is stored in the event as an JetFlavourInfoMatchingCollection which associates an object of type JetFlavourInfo to each of the jets. If groomed jets and subjets are defined, an additional JetFlavourInfoMatchingCollection is produced that provides the flavour information for subjets.

Warning, important NOTE: Herwig++ status codes in CMSSW currently break the HepMC convention. Hence, the selection of status=2 partons is expected to change once the status codes are fixed.

Examples

Example is given in PhysicsTools/JetExamples/test/ with simple analyzer and config files:

printJetFlavourInfo.py and printJetFlavourInfo.cc

Jet flavour in PAT UPDATED

Warning, important NOTE: Starting from CMSSW_7_4_12 and CMSSW_7_5_3 in the respective release cycles, the hadron-based flavour no longer gets priority over the parton-based jet flavour, lepton clustering is enabled by default as well as embedding of the full JetFlavourInfo object. In addition, the selection of generator partons being matched to jets was updated to allow defining the physics definition of the jet flavour.

Starting from CMSSW_7_0_5_patch1 the hadron-based jet flavour is the default jet flavour in PAT. In PAT the jet flavour ID is configured in the following sequence:

PhysicsTools/PatAlgos/python/mcMatchLayer0/jetFlavourId_cff.py

Users now have the option of separately accessing the hadron-based flavour via patJet->hadronFlavour() and the parton-based flavour patJet->partonFlavour(). The parton-based definition resembles very much the legacy algorithmic definition. The hadron-based definition is also very similar in spirit to the legacy algorithmic definition but has one major advantage, its definition is generator-independent. However, its main disadvantage is that it cannot be used to distinguish different types of light-flavour jets. Hence, the main use case for the hadron-based flavour is b tagging. In case of conflicts between the hadron- and parton-based definition, it is left to the user to decide how to handle such cases, which could depend on a specific use case. Finally, for users interested in the physics definition, they can obtain it via patJet->genParton().pdgId().

If changing the default jet collection using the switchJetCollection(...) function, make sure the algo and rParam parameters are set correctly as in the following example

switchJetCollection(process,
    jetSource=cms.InputTag('ak4PFJetsCHS'),
    algo='AK',
    rParam=0.4,
    ...
)

Similarly, if adding a new jet collection, also make sure the algo and rParam parameters are set correctly as in the following example

addJetCollection(process,
    labelName='AK8PFCHS'
    jetSource=cms.InputTag('ak8PFJetsCHS'),
    algo='AK',
    rParam=0.8,
    ...
)

Note that the algo parameter is used to determine the jet clustering algorithm. The code looks for one of the following three two-character strings inside these parameters (case insensitive): ak (AntiKt), ca (Cambridge/Aachen), kt (Kt). In cases in which the hadron-based jet flavour either cannot work or is not supported, you can either switch to the legacy flavour (see the "Legacy parton-based jet flavour in PAT" section) or you can disable the jet flavour by adding getJetMCFlavour=False to the switchJetCollection(...) or addJetCollection(...) function.

In the 6_2_X_SLHC (starting from CMSSW_6_2_0_SLHC13) and 5_3_X (starting from 5_3_20) release cycles the hadron-based jet flavour is available but needs to be explicitly enabled since the legacy jet flavour is used by default. For the default jet collection in 5_3_X this can be done using the switchJetCollection(...) function where the jetIdLabel, rParam, and useLegacyFlavour parameters need to be specified as in the following example

switchJetCollection(process,
    jetCollection=cms.InputTag('ak5PFJets'),
    jetIdLabel='ak5',
    rParam=0.5,
    useLegacyFlavour=False,
    ...
)

Similarly, for added jets it is necessary to specify the rParam and useLegacyFlavour parameters in the addJetCollection(...) call as in the following example

addJetCollection(process,
    jetCollection=cms.InputTag('ak5PFJets'),
    algoLabel='AK5',
    typeLabel='PF',
    rParam=0.5,
    useLegacyFlavour=False,
    ...
)

Note that the jetIdLabel and algoLabel parameters are used to determine the jet clustering algorithm. The code looks for one of the following three two-character strings inside these parameters (case insensitive): ak (AntiKt), ca (Cambridge/Aachen), kt (Kt).

Support for subjets

Starting from CMSSW_7_3_0, subjet flavour is also supported in PAT. All you have to do is to specify the algo, rParam, fatJets, and groomedFatJets parameters in the addJetCollection(...) call as in the following example

addJetCollection(process,
    labelName = 'AK8PFCHSPrunedSubjets',
    jetSource = cms.InputTag('ak8PFJetsCHSPruned','SubJets'),
    algo = 'AK',
    rParam = 0.8,
    fatJets = cms.InputTag("ak8PFJetsCHS"),
    groomedFatJets = cms.InputTag("ak8PFJetsCHSPruned"),
    ...
)

Note that here the algo and rParam parameters refer to the clustering algorithm and jet size parameter for fat jets.

More detailed flavour info

It is also possible to get a more detailed jet flavour information by accessing the full JetFlavourInfo object via patJet->jetFlavourInfo() at which point you can loop over the clustered hadrons and partons in the same way it is done here. For example, with the full JetFlavourInfo object available, you can define as gluon splitting candidates those jets that have two b hadrons/quarks clustered inside them, etc. Possible ways to define the gluon splitting candidate jets are the following ones:

  • patJet->genParton().pdgId()==21 && patJet->jetFlavourInfo().getbHadrons().size()==2 or more simply
  • patJet->jetFlavourInfo().getbHadrons().size()==2.
However, it is important to keep in mind that the former definition, while more likely to identify genuine gluon splitting jets, is less robust against hard final-state radiation and vice versa for the latter definition.

Hadron-based origin identification of heavy-flavour jets

In some cases, such as events with multiple heavy-flavour jets, it might be important to have a more detailed information on the origin of different heavy-flavour jets. A tool has been developed that traces the hadron lineage back to the hard process particles and provides more detailed information on the heavy-flavour jet origin. See GenHFHadronMatcher for more details.

Legacy parton-based jet flavour definition

The following jet flavour definitions have been used during the LHC Run 1. The corresponding software tools were specifically designed for the Pythia6 event record which follows the HEPEVT particle status code convention. Three definitions are defined, reflecting three different points of view:

  • Physics definition:
    • Match reconstructed jets to “initial” (status=3) partons from the primary physics process (within ΔR < 0.3 of the reconstructed jet axis). For example, for tt events, the initial partons would be: 2 b jets from the decays of the top quarks, 2 non-b jets per hadronic W decay, and no initial gluon jets.
    • No matching if hard final-state radiation (FSR) occurred and the parton direction changes significantly
    • No flavour assignment if no unambigous answer (> 1 matched initial parton)
    • No flavour assignment if a status=2 b or c quark not originating from the matched initial parton found within ΔR<0.7. However, the flavour assignment is kept if the matched status=2 parton and the matched initial parton are both c quarks
    • Gluon jets splitting to c or b quarks are labeled "gluon"

  • Algorithmic definition:
    • Try to find the parton that most likely determines the properties of the jet and assign that flavour as the true flavour
    • Here, the “final state” partons (after showering, radiation) are analyzed (also within ΔR < 0.3 of the reconstructed jet axis). Partons selected for the algorithmic definition are those partons that don't have other partons as daughters, without any explicit requirement on their status (for Pythia6 these are status=2 partons).
    • Jets from radiation are matched with full efficiency
    • If there is a b/c within the jet cone: label as b/c
    • Otherwise: assign flavour of the hardest parton

  • Energetic definition:
    • This definition applies to GenJets where the constituents of a jet are a set of GenParticles
    • A variable is built for each jet computing the fraction of the energy of the jet which comes from b (c) hadrons
    • The 2 variables bRatio and cRatio can be used to attribute the flavour to the GenJet
    • A matched CaloJet (for example by ΔR) can get the same flavour as the matched GenJet.

Main differences:

  • Gluon splitting
    • Only algorithmic and energetic definitions see gluon splitting, physics definition “blind” to it
    • Important for some channels, e.g. if QCD is a serious background or any other channel with initial hard gluons, e.g. ttjj
    • Algorithmic definition causes some “contamination” from gluon splitting to b and c jets dependence of the performance on the sample composition!
    • All three definitions can be applied to GenJets while only the first two (Physics and Algorithmic definitions) can be applied to BasicJets, CaloJets and PFJets.
  • JEC
    • Only physics definition finds gluons in Herwig6/Herwig++, algorithmic does not (this is because Herwig splits gluons into qq pairs before the hadronization while the flavour code selects the last generation of parton shower partons)
    • JEC flavour uncertainties are produced with physics definition, with unmatched jets considered as gluons; this definition needs to be used for consistency
    • Algorithmic definition finds uds and g fractions much closer to 50% than physics definition does; this is assumed to be due to "randomization" taking place in the first step of the partons shower when status=3 partons turn into status=2 partons: g(3)->q(2)q(2) or g(2)g(2), q(3)->g(2)q(2) or q(2)g(2), with the hardest status=2 giving the flavour assignment

Warning, important NOTE 1: Algorithmic definition is used by default for all b-tagging purposes.

Warning, important NOTE 2: Physics definition is used for JEC flavour uncertainties.

Warning, important NOTE 3: The code implementing the energetic definition probably does not work correctly for generators other than Pythia6/Pythia8. This is because of the assumption that particles in the decay chain of b hadrons have only one mother particle. However, in Herwig and Sherpa this is not always the case because of intermediate quarks introduced in the decay chain.

DataFormats

The flavour of a jet (calculated from the Algorithmic and/or Physics definition) is defined associating to each jet a reference to a parton (u,d,s,c,b, or g) from the GenParticle collection.

MatchedPartons, defined in SimDataFormats/JetMatching/interface/MatchedPartons.h, is a class where 5 partons (using EDM references) are saved:

  • the heaviest flavour in the signal cone
  • the nearest status=2 parton
  • the nearest status=3 parton
  • the Physics definition parton
  • the Algorithmic definition parton

JetMatchedPartonsCollection, defined in SimDataFormats/JetMatching/interface/JetMatchedPartons.h, is a data format where an object of the MatchedPartons type is associated (using edm::AssociationVector) to each jet.

If you want to drop the GenParticle collection from the event (for example, to save space), there is another option to save in the event the flavour of a jet (by value).

JetFlavour, defined in SimDataFormats/JetMatching/interface/JetFlavour.h, allows to save the jet flavour (using the parton's PDG ID code) and the parton momentum (using LorentzVector) and vertex position.

JetFlavourMatchingCollection, defined in SimDataFormats/JetMatching/interface/JetFlavourMatching.h, is similar to JetMatchedPartonsCollection allowing to save in the event the JetFlavour object associated to each jet.

Finally, the Energetic definition is saved in the event using the JetFloatAssociation::Container objects defined in DataFormats/JetReco/interface/JetFloatAssociation.h.

Plugins to get the jet flavour

All the following routines are defined in PhysicsTools/JetMCAlgos/plugins/. First a subset of the GenParticle collection needed to run the Phys-Algo definition is built using the PartonSelector.cc plugin. This allows to reduce the number of particles, where the loop runs, by a factor of 10 to 20. When you run on thousands of events, the saved CPU time is large. You can also choose to include status=3 leptons if you need.

The output of the PartonSelector.cc plugin is given as input to the JetPartonMatcher.cc plugin. Here both the Algorithmic and Physics definition are implemented. The parameters you can configure are the following:

  • InputTag jets: the jet collection for which you want to run the flavour identification
  • InputTag partons: the GenParticle collection to be used in the definition (usually what you get from the PartonSelector.cc plugin)
  • double coneSizeToAssociate: the signal cone size which define the cone size around the jet direction

If you want to drop the GenParticle collection from the event, you can use the JetFlavourIdentifier.cc plugin which has the following configurable parameters:

  • InputTag srcByReference: the output of JetPartonMatcher.cc
  • bool physicsDefinition: switch to choose between the Phys and Algo flavour definition.

If you need to save both definitions, you have to run the JetFlavourIdentifier.cc plugin twice changing the physicsDefinition value. After that, you are allowed to drop both the GenParticle collection and the JetMatchedPartonsCollection from the event keeping the jet flavour information in the JetFlavourMatchingCollection.

Finally, the Energetic definition is obtained through the GenJetBCEnergyRatio.cc plugins. The only InputTag needed is the GenJet collection. The producer save in the event 2 JetFloatAssociation::Container objects: the bRatio and cRatio values are associated to each jet in the GenJet collection.

Examples

Examples are given in PhysicsTools/JetExamples/test/ with two simple analyzers and config files:

printJetFlavour.py and printJetFlavour.cc (for the Algorithmic and Physics definition)

printGenJetRatio.py and printGenJetRatio.cc (for the Energetic definition)

Legacy jet flavour in PAT

In PAT the jet flavour ID is configured in the following sequence:

PhysicsTools/PatAlgos/python/mcMatchLayer0/jetFlavourId_cff.py

The default flavour definition is the algorithmic definition. You get the flavour via patJet->partonFlavour().

Note, however, that starting from CMSSW_7_0_5_patch1 the hadron-based jet flavour is the default flavour and to continue using the legacy flavour, you will need to explicitly enable it in your PAT configuration file by doing

getattr(process,'patJets').useLegacyJetMCFlavour = cms.bool(True)

where 'patJets' is the name of a PAT jet collection for which the legacy flavour is being enabled.

Previous versions of the TWiki

Review Status

Editor/Reviewer and date Comments
ThomasSpeer - 14 Nov 2006 page author (Thomas Speer)
JennyWilliams - 28 Feb 2007 moved into SWGuide
AttilioSantocchia - 29 Oct 2007 RECO/AOD tools included
DinkoFerencek - 15 Apr 2014 Added description of hadron-based jet flavour definition
DinkoFerencek - 28 Apr 2015 Replaced LXR code links with GitHub CMSSW_7_4_0 links
DinkoFerencek - 21 Sep 2015 Updated TWiki to reflect the latest code changes

Responsible: ThomasSpeer
Last reviewed by: Reviewer

Edit | Attach | Watch | Print version | History: r41 < r40 < r39 < r38 < r37 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r41 - 2016-05-16 - DinkoFerencek



 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback