The CMS NanoAOD data tier

NanoAOD format consists of an Ntuple like format, readable with bare root and containing the per-event information that is needed in most generic analyses. A NanoAOD is not a CMSSW EDM file but several EDM features are available in this simplified format too.

The size per event of NanoAOD is order of 1Kb, they can be centrally produced in O(weeks) time, central production can be triggered a few times per year if many new features are made available in "hot periods". Users can easily extend NanoAOD for their specific study making a private production when needed (suggested for specific studies on samples that are not too large or for some special needs) or can in the central format (suggested when need to run on multi billion events) by writing on the XPOG CMStalk category, or by opening an issue in the XPOG coordination gitlab.

Centrally produced nanoAOD samples

NanoAOD are centrally produced. All the information about NanoAOD versions, their content and changes, which CMSSW versions were used, and where to find the samples, can be found in nanoAOD campaigns .

NanoAOD format

A NanoAOD file contains a main TTree named Events.

Additional auxiliary TTrees, namely: LuminosityBlocks, Runs, MetaData, ParameterSets, are also included in the nanoAOD file.

Content auto-documentation

Each branch has built-in documentation distributed with every file, if you want to get the description of the content of a branch named e.g. SoftActivityJetNjets5, it is enough to do

root [1] Events->GetBranch("SoftActivityJetNjets5")->GetTitle()
(const char *) "number of soft activity jet pt, pt >5"
root [2] 

A dump of the documentation of content for different releases is available here (NB: the actual content of a branch could dependent on the input sample for some generator quantities).

The "Events" TTree

The main TTree contains only scalar branches or simple array branches. No special ROOT dictionary is needed to read NanoAODs. Physics objects features are grouped via naming conventions and by sharing the same array dimensionality. Auto-documentation is provided, so simply browsing through the nanoAOD event content is useful to understand each branch.

As an example there is a branch named Muon_pt[nMuon] and a branch named Muon_eta[nMuon], both are arrays of length nMuon. Access as whole objects can be obtained with light frameworks.

References across objects are implemented via branches with idx suffix that are just bare indices that contains information about the collection they point to, e.g. Electron_jetIdx contains information about the jet, if available that contains a given electron. A value of -1 means the ref is null, a value i >= 0 means that given Electron is associated to the i-th object in the Jet collection. Another example Jet_genJetIdx is the index to the GenJet collection. The pointer could also point outside the boundary of the collection if some elements in the tail of the collection are dropped. For example before accessing the GenJet associated to a Jet one should verify that 0<= Jet_genJetIdx < nGenJet.

HLT bits are automatically generated based on HLT bits available in the input file. Each time files are merged (in CMSSW or via the haddnano.py utility) if the list of bits is not matching the new/old bits are zero-filled so that the branches are properly aligned.

Object systematic uncertainties (e.g. all JES variations) are not stored persistently but, instead, the per-event information needed to compute those corrections is saved.

Many variables are stored with limited precision (i.e. less than 32 bits float) by zeroing a given number of bits in the float mantissa (this results in a better compression when stored on disk).

No skimming is applied at production level.

Some linking to between reco-level to gen objects is available. No cross-collection cleaning is applied but many objects are cross referenced (with indices, as discussed above). The cross referencing is (mostly) based on the sharing of common PF constituents, not on DeltaR (there are some exceptions, such as electrons and photons, which are matched based on the sharing of ECAL clusters). DeltaR matching is less accurate and can be reproduced from NanoAOD if needed, while PF matching can only be done while reading MiniAOD. Those matching indices can then be used in the user code to apply any chosen cross-cleaning logic.

Specific information on some branches/objects

Photons

SlimmedPhotons with pT> 5 GeV are stored.

Up to NanoAOD10, residual energy scale and resolution corrections are applied to the stored electrons to match the data. The original four-momentum (as stored in MiniAOD) can be obtained by rescaling by the reciprocal of Photon_eCorr.

Starting from NanoAODv11, the corrections need to be applied on top of NanoAOD using correctionlib.

More details in EG POG nanoAOD Twiki

Electrons

SlimmedElectrons with pT> 5 GeV are stored.

Up to NanoAOD10, residual energy scale and resolution corrections are applied to the stored electrons to match the data: example. The original four-momentum (as stored in MiniAOD) can be obtained by rescaling by the reciprocal of Electron_eCorr.

Starting from NanoAODv11, the corrections need to be applied on top of NanoAOD using correctionlib.

More details in EG POG nanoAOD Twiki

Muons

SlimmedMuons passing the following selection are stored in nanoAOD:

  • pT(mu) > 3 GeV
  • pass one of the muonID: 'CutBasedIdLoose' || 'SoftCutBasedId' || 'SoftMvaId' || 'CutBasedIdGlobalHighPt' || 'CutBasedIdTrkHighPt'

Branches related to the muon quality and properties (dxy, dz, num of matched stations, etc..), the isolation (PF Isolation, and miniIsolation), and matching to GenMuon, Jets etc, are also stored.

No residual correction to the muon momentum is applied.

Taus

Taus with pT > 18 GeV that pass at least on of all weakest tau identification discriminator working points are stored. More details about the available discriminators and corrections can be found in the TAU ID Twiki

Jets

SlimmedJets, i.e. ak4 PFJets PUPPI (for Run3) or CHS (for Run2, up to NanoAODv9) with JECs applied, with pT > 15 GeV are stored.

  • Jets are corrected to latest available JECs and provided with latest available JetID, PU ID, b-tag and QGL.
    • Jet_jetId is a jet ID flag corresponding to the different working points. In this case: the flag represents passlooseID*1+passtightID*2+passtightLepVetoID*4. Then,
      • For EOY 2016 samples:
        • jetId==1 means: pass loose ID, fail tight, fail tightLepVeto
        • jetId==3 means: pass loose and tight ID, fail tightLepVeto
        • jetId==7 means: pass loose, tight, tightLepVeto ID.
      • For EOY 2017 and 2018, and UL 2016/2017/2018 samples:
        • jetId==2 means: pass tight ID, fail tightLepVeto
        • jetId==6 means: pass tight and tightLepVeto ID.
    • Jet_puId is a PU JetID flag corresponding to the different working points. PU JetID should be applied only to AK4 CHS jets with pt < 50 GeV. In this case: the flag represents passlooseID*4+passmediumID*2+passtightID*1. Then,
      • puId==0 means 000: fail all PU ID;
      • puId==4 means 100: pass loose ID, fail medium, fail tight;
      • puId==6 means 110: pass loose and medium ID, fail tight;
      • puId==7 means 111: pass loose, medium, tight ID.
      • *NOTE (For NanoAODv9 2016 UL)*: Please consult the 2016 UL PU ID twiki. Special care is needed for a bug in NanoAODv9 production.

  • No systematic variations of JECs are stored, but the quantities (ρ, area, pT, η) needed to recompute JECs or JEC systematic variations are available.
  • b-jet energy regression, developed for final states with one or more H->bb, is also available.
  • Jets of MC samples are not smeared.
  • The factor stored in Jet_muonSubtrFactor allows computing raw pT with muons subtracted ( NanoAOD configuration and C++ code), using the same procedure as in the standard implementation of the type 1 correction to missing pT.

Branches CorrT1METJet_* provide a reduced set of properties of jets with corrected pT < 15 GeV. They are needed to propagate JEC uncertainties or JER smearing into missing pT.

FatJets

SlimmedJetsAK8, i.e. ak8 PFJets with Puppi and JECs applied, after pT > 175 GeV are stored.

MET

The default missing pT (MET_*) has the type 1 correction based on AK4 PF CHS jets applied. It does not include other corrections such as the φ modulation or propagation of JER smearing. The uncertainties for the default type 1 corrected OF MET can be obtained using the NanoAOD-tools package, specifically with jetmetHelperRun2, and the Type 1 corrections can also be re-applied for newer set of JECs.

Other types of missing pT such as PUPPI MET (type 1 corrected), CaloMET, TkMET, or raw PF (PUPPI) MET are also available. A set of uncertainties for PUPPI MET (type 1 corrected) are stored in the central NanoAOD production for the latest version.

The NanoAOD<=v7 does not apply the correct Type-1 corrections on PUPPI MET, but rather reuses the corrections applied during MiniAOD production as reported in this issue. This affects mainly Run2018 where 2017 JECs are used for Type-1 corrections, while for other years potentially preliminary JEC versions have been applied. While this can be covered by JEC uncertainties across part of the phase-space, please report data/MC disagreement in Type-1 PUPPI MET not covered by the JEC uncertainties. The size of the effect on MET response in Run2018D has been found to be of order 2% in these slides

For 2017, an alternative missing pT with the EE noise mitigation applied is provided in branches METFixEE2017_*. Except for this mitigation, it is defined in the same way as the default missing pT (MET_*).

Since nanoAOD-v8:

Trigger

Trigger bits information is available for all trigger in the input files. If multiple files are given in input with different trigger bits, the bits not available in some events are filled with zeros. When merging two nanoaod file with haddnano.py the same algorithm to fill missing trigger bits is used (i.e. the output file will have zeros for events that had a given trigger bit not available)

Trigger objects are stored in nanoaod for some filters as defined here, if you need some additional object p4 to be stored please open an issue.

Generator, dressed leptons, LHE

Several collections are available for gen level information:

  • a subset of gen particle selected according to this pruner configuration
  • dressed lepton are also available as configured here
  • gen jets with pt > 10 are stored. Parton and Hadron flavour as well as link to gen jet is available in jets
  • Some LHE information is stored (see here)

For reconstructed jets, the index of the matched particle-level jet is provided in branch Jet_genJetIdx. As explained above, a value of -1 indicates there is no match. The standard angular matching is used, with a cut of ΔR < 0.4.

Vertices and pileup

The expected (AKA ‘true’) number of pileup interactions in simulation is provided in branch Pileup_nTrueInt. Note that this should be a real number, but in NanoAOD up to and including NanoAODv9 it gets truncated to an integer (as in std::trunc, not std::round).

Weights

The nominal generator-level weight is stored in branches genWeight and Generator_weight, which both contain the same value obtained from GenEventInfoProduct::weight() ( here and here). This is typically the product of the nominal weight from LHE (if available) and the nominal weight from Pythia. The nominal LHE weight is available as LHEWeight_originalXWGTUP, filled from LHEEventProduct::originalXWGTUP() ( here).

Table with generator weights (configuration) produces the following branches for systematic variations:

  • LHEScaleWeight: Variations in the factorization scale and the renormalization scale in the matrix element.
  • PSWeight: Variations in the renormalization scale in the parton shower.
  • LHEPdfWeight: Weights from a single PDF set. This is the first PDF set from the preference list given in the configuration, that is present in the sample. LHAPDF IDs of the weights that have actually been stored are mentioned in the title of the branch.
  • LHEReweightingWeight.

The "LuminosityBlocks" TTree

Contains the following branches:

  • run
  • genEventCount

The "Runs" TTree

Stores GEN-level related information. The branches are filled in GenWeightsTableProducer

The "MetaData" TTree

The "ParameterSets" TTree

Producing private (customized) NanoAODs

Please see here: https://gitlab.cern.ch/cms-nanoAOD/nanoaod-doc/-/wikis/Instructions/Private-production

Contributing to NanoAOD

Please see here: https://gitlab.cern.ch/cms-nanoAOD/nanoaod-doc/-/wikis/Instructions/Contributing-to-NanoAOD

-- SebastienWertz - 2023-01-27

Edit | Attach | Watch | Print version | History: r105 < r104 < r103 < r102 < r101 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r105 - 2023-01-30 - SebastienWertz
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback