The CMS NanoAOD data tier

NanoAOD format consists of an Ntuple like format, readable with bare root and containing the per-event information that is needed in most generic analyses. A NanoAOD is not a CMSSW EDM file but several EDM features are available in this simplified format too.

The size per event of NanoAOD is order of 1Kb, they can be centrally produced in ~1-2 week time, central production can be triggered often if many new features are made available in "hot periods". Users can easily extend NanoAOD for their specific study making a private production when needed (suggested for specific studies on samples that are not too large or for some special needs) or can ask for new features inclusion in the central format (suggested when need to run on multi billion events)

NanoAOD format

A NanoAOD file contains a main TTree named Events.

Additional auxiliary TTrees, namely: LuminosityBlocks, Runs, MetaData, ParameterSets, are also included in the nanoAOD file.

The "Events" TTree

The main TTree contains only scalar branches or simple array branches. No special dataformat is needed to read NanoAODs. Physics objects features are grouped via naming conventions and by sharing the same array dimensionality. Auto-documentation is provided so by simply browsing through the nanoAOD event content is would be useful to understand each branch.

As an example there is a branch named Muon_pt[nMuon] and a branch named Muon_eta[nMuon], both are arrays of length nMuon. Access as whole objects can be obtained with light frameworks (see e.g. the NanoAOD-Tools)

References across objects are implemented via branches with idx suffix that are just bare indices that contains information about the collection they point to, e.g. Electron_jetIdx contains information about the jet, if available that contains a given electron. A value of -1 means the ref is null, a value i >= 0 means that given Electron is associated to the i-th object in the Jet collection. Another example Jet_genJetIdx is the index to the GenJet collection. The pointer could also point outside the boundary of the collection if some elements in the tail of the collection are dropped. For example before accessing the GenJet associated to a Jet one should verify that 0<= Jet_genJetIdx < nGenJet.

HLT bits are automatically generated based on HLT bits available in the input file. Each time files are merged (in CMSSW or via the haddnano.py utility) if the list of bits is not matching the new/old bits are zero-filled so that the branches are properly aligned.

Object systematic uncertainties (e.g. all JES variations) are not stored persistently but, instead, the per-event information needed to compute those corrections is saved.

Many variables are stored with limited precision (i.e. less than 32bits float) by zeroing a given number of bits in the float mantissa (this results in a higher compression when stored on disk).

No skimming is applied at production level.

Some linking to gen and trigger objects is available. No cleaning is applied but objects are cross referenced (with indices, as discussed above). The cross referencing is based on common PF constituents, not on DeltaR. DeltaR matching is less accurate and can be reproduced from NanoAOD if needed, while PF matching can only be done while reading MiniAOD. Predefined clean bitmasks can be used to store persistently a given cleaning choice (e.g. specific of a whole PAG), so that the user can clean by simply checking the proper bit.

Specific information on some branches/objects

Photons

SlimmedPhotons with pT> 5 GeV are stored

The full list of features stored in nanoAOD with CMSSW_10_6_19 (i.e. nanoAOD-v8) is here: nanoAOD:Phtoons

More details in EG POG nanoAOD Twiki

Electrons

SlimmedElectrons with pT> 5 GeV are stored

Residual energy scale and resolution corrections are applied to the stored electrons to match the data: example. The original four-momentum (as stored in MiniAOD) can be obtained by rescaling by the reciprocal of Electron_eCorr.

The full list of features stored in nanoAOD with CMSSW_10_6_19 (i.e. nanoAOD-v8) is here: nanoAOD:Electrons

More details in EG POG nanoAOD Twiki

Muons

SlimmedMuons passing the following selection are stored in nanoAOD:

  • pT(mu) > 3 GeV
  • pass one of the muonID: 'CutBasedIdLoose' || 'SoftCutBasedId' || 'SoftMvaId' || 'CutBasedIdGlobalHighPt' || 'CutBasedIdTrkHighPt'

Branches related to the muon quality and properties (dxy, dz, num of matched stations, etc..), the isolation (PF Isolation, and miniIsolation), and matching to GenMuon, Jets etc, are also stored.

The full list of features stored in nanoAOD with CMSSW_10_6_19 (i.e. nanoAOD-v8) is here: nanoAOD:Muons

Taus

Taus with pT > 18 GeV that pass at least on of all weakest tau identification discriminator working points are stored. More details about the available discriminators and corrections can be found in the TAU ID Twiki

Jets

SlimmedJets, i.e. ak4 PFJets CHS with JECs applied, after pT > 15 GeV are stored.

  • Jets are corrected to latest available JECs and provided with latest available JetID, PU ID, b-tag and QGL.
    • Jet_jetId is a jet ID flag corresponding to the different working points. In this case: the flag represents passlooseID*1+passtightID*2+passtightLepVetoID*4. Then,
      • For EOY 2016 samples:
        • jetId==1 means: pass loose ID, fail tight, fail tightLepVeto
        • jetId==3 means: pass loose and tight ID, fail tightLepVeto
        • jetId==7 means: pass loose, tight, tightLepVeto ID.
      • For EOY 2017 and 2018, and UL 2016/2017/2018 samples:
        • jetId==2 means: pass tight ID, fail tightLepVeto
        • jetId==6 means: pass tight and tightLepVeto ID.
    • Jet_puId is a PU JetID flag corresponding to the different working points. PU JetID should be applied only to AK4 CHS jets with pt < 50 GeV. In this case: the flag represents passlooseID*4+passmediumID*2+passtightID*1. Then,
      • puId==0 means 000: fail all PU ID;
      • puId==4 means 100: pass loose ID, fail medium, fail tight;
      • puId==6 means 110: pass loose and medium ID, fail tight;
      • puId==7 means 111: pass loose, medium, tight ID.
  • No systematic variations of JECs are stored, but the quantities (ρ, area, pT, η) needed to recompute JECs or JEC systematic variations are available. Dedicated nanoAOD-tools modules exist to compute those quantities.
  • b-jet energy regression, developed for final states with one or more H->bb, is also available.
  • Jets of MC samples are not smeared (dedicated nanoAOD-tools modules exist to apply the smearing).
  • The factor stored in Jet_muonSubtrFactor allows computing raw pT with muons subtracted ( NanoAOD configuration and C++ code), using the same procedure as in the standard implementation of the type 1 correction to missing pT.

Branches CorrT1METJet_* provide a reduced set of properties of jets with corrected pT < 15 GeV ( selection). They are needed to propagate JEC uncertainties or JER smearing into missing pT.

FatJets

SlimmedJetsAK8, i.e. ak8 PFJets with Puppi and JECs applied, after pT > 175 GeV are stored.

The full list of features stored in nanoAOD with CMSSW_10_6_19 (i.e. nanoAOD-v8) is here: nanoAOD:FatJets

NB. In nanoAOD-v8 AK8 PF-PUPPI Jets (Fatjets) use puppi tune v11 (i.e., not the v15 tune used for PuppiMET). The upgrade to v15 is postponed for nanoAOD-v9

MET

The default missing pT (MET_*) has the type 1 correction based on AK4 PF CHS jets applied. It does not include other corrections such as the φ modulation or propagation of JER smearing. The uncertainties for the default type 1 corrected OF MET can be obtained using the NanoAOD-tools package, specifically with jetmetHelperRun2, and the Type 1 corrections can also be re-applied for newer set of JECs.

Other types of missing pT such as PUPPI MET (type 1 corrected), CaloMET, TkMET, or raw PF (PUPPI) MET are also available. A set of uncertainties for PUPPI MET (type 1 corrected) are stored in the central NanoAOD production for the latest version.

The NanoAOD<=v7 does not apply the correct Type-1 corrections on PUPPI MET, but rather reuses the corrections applied during MiniAOD production as reported in this issue. This affects mainly Run2018 where 2017 JECs are used for Type-1 corrections, while for other years potentially preliminary JEC versions have been applied. While this can be covered by JEC uncertainties across part of the phase-space, please report data/MC disagreement in Type-1 PUPPI MET not covered by the JEC uncertainties. The size of the effect on MET response in Run2018D has been found to be of order 2% in these slides

For 2017, an alternative missing pT with the EE noise mitigation applied is provided in branches METFixEE2017_*. Except for this mitigation, it is defined in the same way as the default missing pT (MET_*).

Since nanoAOD-v8:

Trigger

Trigger bits information is available for all trigger in the input files. If multiple files are given in input with different trigger bits, the bits not available in some events are filled with zeros. When merging two nanoaod file with haddnano.py the same algorithm to fill missing trigger bits is used (i.e. the output file will have zeros for events that had a given trigger bit not available)

Trigger objects are stored in nanoaod for some filters as defined here, if you need some additional object p4 to be stored please open an issue.

Generator, dressed leptons, LHE

Several collections are available for gen level information:

  • a subset of gen particle selected according to this pruner configuration
  • dressed lepton are also available as configured here
  • gen jets with pt > 10 are stored. Parton and Hadron flavour as well as link to gen jet is available in jets
  • Some LHE information is stored (see here)

For reconstructed jets, the index of the matched particle-level jet is provided in branch Jet_genJetIdx. As explained above, a value of -1 indicates there is no match. The standard angular matching is used, with a cut of ΔR < 0.4.

Vertices and pileup

The expected (AKA ‘true’) number of pileup interactions in simulation is provided in branch Pileup_nTrueInt. Normally, this is a real number, but in NanoAOD it gets truncated to an integer (as in std::trunc, not std::round).

Weights

The nominal generator-level weight is stored in branches genWeight and Generator_weight, which both contain the same value obtained from GenEventInfoProduct::weight() ( here and here). This is typically the product of the nominal weight from LHE (if available) and the nominal weight from Pythia. The nominal LHE weight is available as LHEWeight_originalXWGTUP, filled from LHEEventProduct::originalXWGTUP() ( here).

Table with generator weights (configuration) produces the following branches for systematic variations:

  • LHEScaleWeight: Variations in the factorization scale and the renormalization scale in the matrix element.
  • PSWeight: Variations in the renormalization scale in the parton shower.
  • LHEPdfWeight: Weights from a single PDF set. This is the first PDF set from the preference list given in the configuration, that is present in the sample. LHAPDF IDs of the weights that have actually been stored are mentioned in the title of the branch.
  • LHEReweightingWeight.

The "LuminosityBlocks" TTree

Contains the following branches:

  • run
  • genEventCount

The "Runs" TTree

Stores GEN-level related information. The branches are filled in GenWeightsTableProducer

The "MetaData" TTree

The "ParameterSets" TTree

Content auto-documentation

Each branch has built-in documentation distributed with every file, if you want to get the description of the content of a branch named e.g. SoftActivityJetNjets5, it is enough to do

root [1] Events->GetBranch("SoftActivityJetNjets5")->GetTitle()
(const char *) "number of soft activity jet pt, pt >5"
root [2] 

A dump of the documentation of content for different releases is available here (NB: the actual content of a branch could dependent on the input sample for some generator quantities):

Centrally produced nanoAOD samples

NanoAOD are centrally produced. Available campaigns are

Correspondence between NanoAOD campaigns and CMSSW releases:

Campaign CMSSW release
NanoAODv8 CMSSW_10_6_19_patch2
NanoAODv7 CMSSW_10_2_22
NanoAODv6 CMSSW_10_2_18
NanoAODv5 CMSSW_10_2_15

More information related to differences wrt previous nanoAOD versions and details on the configuration used for each production are summarized in nanoAOD campaigns

The NanoAOD-Tools

A simple python-based post-processing framework, the NanoAOD-Tools has been developed to aid the analysers. It provides tools to skim events, computate of various quantities, apply POG scale factors, submit CRAB jobs, etc.. More information are collected to the relevant NanoAOD-Tools documentation

-- AndreaRizzi - 2017-09-21

Edit | Attach | Watch | Print version | History: r100 < r99 < r98 < r97 < r96 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r100 - 2021-03-16 - LoukasGouskos
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback