The CMS NanoAOD data tier
NanoAOD format consists of an Ntuple like format, readable with bare root and containing the per-event information that is needed in most generic analyses. A NanoAOD is not a CMSSW
EDM file but several
EDM features are available in this simplified format too.
The size per event of NanoAOD is order of 1Kb, they can be centrally produced in O(weeks) time, central production can be triggered a few times per year if many new features are made available in "hot periods". Users can easily extend NanoAOD for their specific study making a private production when needed (suggested for specific studies on samples that are not too large or for some special needs) or can in the central format (suggested when need to run on multi billion events) by writing on the
XPOG CMStalk category
, or by opening an issue in the
XPOG coordination gitlab
.
Centrally produced nanoAOD samples
NanoAOD are centrally produced. All the information about
NanoAOD versions, their content and changes, which CMSSW versions were used, and where to find the samples, can be found in
nanoAOD campaigns
.
NanoAOD format
A
NanoAOD file contains a main TTree named
Events.
Additional auxiliary TTrees, namely:
LuminosityBlocks,
Runs,
MetaData,
ParameterSets, are also included in the nanoAOD file.
Content auto-documentation
Each branch has built-in documentation distributed with every file, if you want to get the description of the content of a branch named e.g.
SoftActivityJetNjets5
, it is enough to do
root [1] Events->GetBranch("SoftActivityJetNjets5")->GetTitle()
(const char *) "number of soft activity jet pt, pt >5"
root [2]
A dump of the documentation of content for different releases is available
here
(NB: the actual content of a branch could dependent on the input sample for some generator quantities).
The "Events" TTree
The main TTree contains only scalar branches or simple array branches. No special ROOT dictionary is needed to read NanoAODs. Physics objects features are grouped via naming conventions and by sharing the same array dimensionality. Auto-documentation is provided, so simply browsing through the nanoAOD event content is useful to understand each branch.
As an example there is a branch named
Muon_pt[nMuon]
and a branch named
Muon_eta[nMuon]
, both are arrays of length
nMuon
. Access as whole objects can be obtained with light frameworks.
References across objects are implemented via branches with idx suffix that are just bare indices that contains information about the collection they point to, e.g.
Electron_jetIdx
contains information about the jet, if available that contains a given electron. A value of -1 means the ref is null, a value i >= 0 means that given Electron is associated to the i-th object in the Jet collection. Another example Jet_genJetIdx is the index to the
GenJet
collection. The pointer could also point outside the boundary of the collection if some elements in the tail of the collection are dropped. For example before accessing the
GenJet
associated to a Jet one should verify that 0<=
Jet_genJetIdx
< nGenJet.
HLT bits are automatically generated based on HLT bits available in the input file. Each time files are merged (in CMSSW or via the haddnano.py utility) if the list of bits is not matching the new/old bits are zero-filled so that the branches are properly aligned.
Object systematic uncertainties (e.g. all JES variations) are not stored persistently but, instead, the
per-event information needed to compute those corrections is saved.
Many variables are stored with limited precision (i.e. less than 32 bits float) by zeroing a given number of bits in the float mantissa (this results in a better compression when stored on disk).
No skimming is applied at production level.
Some linking to between reco-level to gen objects is available.
No cross-collection cleaning is applied but many objects are cross referenced (with indices, as discussed above). The cross referencing is (mostly) based on the sharing of common PF constituents, not on
DeltaR (there are some exceptions, such as electrons and photons, which are matched based on the sharing of ECAL clusters).
DeltaR matching is less accurate and can be reproduced from NanoAOD if needed, while PF matching can only be done while reading
MiniAOD.
Those matching indices can then be used in the user code to apply any chosen cross-cleaning logic.
Specific information on some branches/objects
Photons
SlimmedPhotons with pT> 5
GeV are stored.
Up to
NanoAOD10,
residual energy scale and resolution corrections are applied to the stored electrons to match the data. The original four-momentum (as stored in MiniAOD) can be obtained by rescaling by the reciprocal of
Photon_eCorr
.
Starting from
NanoAODv11, the corrections need to be applied on top of
NanoAOD using correctionlib
.
More details in
EG POG nanoAOD Twiki
Electrons
SlimmedElectrons with pT> 5
GeV are stored.
Up to
NanoAOD10,
residual energy scale and resolution corrections are applied to the stored electrons to match the data:
example
. The original four-momentum (as stored in MiniAOD) can be obtained by rescaling by the reciprocal of
Electron_eCorr
.
Starting from
NanoAODv11, the corrections need to be applied on top of
NanoAOD using correctionlib
.
More details in
EG POG nanoAOD Twiki
Muons
SlimmedMuons passing the following selection are stored in nanoAOD:
- pT(mu) > 3 GeV
- pass one of the muonID: 'CutBasedIdLoose' || 'SoftCutBasedId' || 'SoftMvaId' || 'CutBasedIdGlobalHighPt' || 'CutBasedIdTrkHighPt'
Branches related to the muon quality and properties (dxy, dz, num of matched stations, etc..), the isolation (PF Isolation, and miniIsolation), and matching to
GenMuon, Jets etc, are also stored.
No residual correction to the muon momentum is applied.
Taus
Taus with pT > 18
GeV that pass at least on of all weakest tau identification discriminator working points are stored.
More details about the available discriminators and corrections can be found in the
TAU ID Twiki
Jets
SlimmedJets, i.e. ak4 PFJets PUPPI (for Run3) or CHS (for Run2, up to
NanoAODv9) with JECs applied, with pT > 15
GeV are stored.
- Jets are corrected to latest available JECs and provided with latest available JetID, PU ID, b-tag and QGL.
- Jet_jetId is a jet ID flag corresponding to the different working points. In this case: the flag represents passlooseID*1+passtightID*2+passtightLepVetoID*4. Then,
- For EOY 2016 samples:
-
jetId==1
means: pass loose ID, fail tight, fail tightLepVeto
-
jetId==3
means: pass loose and tight ID, fail tightLepVeto
-
jetId==7
means: pass loose, tight, tightLepVeto ID.
- For EOY 2017 and 2018, and UL 2016/2017/2018 samples:
-
jetId==2
means: pass tight ID, fail tightLepVeto
-
jetId==6
means: pass tight and tightLepVeto ID.
- Jet_puId is a PU JetID flag corresponding to the different working points. PU JetID should be applied only to AK4 CHS jets with pt < 50 GeV. In this case: the flag represents passlooseID*4+passmediumID*2+passtightID*1. Then,
-
puId==0
means 000: fail all PU ID;
-
puId==4
means 100: pass loose ID, fail medium, fail tight;
-
puId==6
means 110: pass loose and medium ID, fail tight;
-
puId==7
means 111: pass loose, medium, tight ID.
- *NOTE (For NanoAODv9 2016 UL)*: Please consult the 2016 UL PU ID twiki. Special care is needed for a bug in NanoAODv9 production.
- No systematic variations of JECs are stored, but the quantities (ρ, area, pT, η) needed to recompute JECs or JEC systematic variations are available.
- b-jet energy regression, developed for final states with one or more H->bb, is also available.
- Jets of MC samples are not smeared.
- The factor stored in
Jet_muonSubtrFactor
allows computing raw pT with muons subtracted ( NanoAOD configuration
and C++ code
), using the same procedure as in the standard implementation
of the type 1 correction to missing pT.
Branches
CorrT1METJet_*
provide a reduced set of properties of jets with corrected p
T < 15 GeV. They are needed to propagate JEC uncertainties or JER smearing into missing p
T.
SlimmedJetsAK8, i.e. ak8 PFJets with Puppi and JECs applied, after pT > 175
GeV are stored.
MET
The default missing p
T (
MET_*
) has the
type 1 correction based on AK4 PF CHS jets applied. It does not include other corrections such as the φ modulation or propagation of JER smearing. The uncertainties for the default type 1 corrected OF MET can be obtained using the
NanoAOD-tools package, specifically with
jetmetHelperRun2, and the Type 1 corrections can also be re-applied for newer set of JECs.
Other types of missing p
T such as PUPPI MET (type 1 corrected),
CaloMET,
TkMET, or raw PF (PUPPI) MET are also available. A set of uncertainties for PUPPI MET (type 1 corrected) are stored in the central NanoAOD production for the latest version.
The NanoAOD<=v7 does not apply the correct Type-1 corrections on PUPPI MET, but rather reuses the corrections applied during
MiniAOD production as reported in this
issue
.
This affects mainly Run2018 where 2017 JECs are used for Type-1 corrections, while for other years potentially preliminary JEC versions have been applied. While this can be covered by JEC uncertainties across part of the phase-space, please report data/MC disagreement in Type-1 PUPPI MET not covered by the JEC uncertainties. The size of the effect on MET response in Run2018D has been found to be of order 2% in these
slides
For 2017, an alternative missing p
T with the
EE noise mitigation applied is provided in branches
METFixEE2017_*
. Except for this mitigation, it is defined in the same way as the default missing p
T (
MET_*
).
Since nanoAOD-v8:
Trigger
Trigger bits information is available for all trigger in the input files. If multiple files are given in input with different trigger bits, the bits not available in some events are filled with zeros. When merging two nanoaod file with haddnano.py the same algorithm to fill missing trigger bits is used (i.e. the output file will have zeros for events that had a given trigger bit not available)
Trigger objects are stored in nanoaod for some filters as defined
here
, if you need some additional object p4 to be stored please open an issue.
Generator, dressed leptons, LHE
Several collections are available for gen level information:
- a subset of gen particle selected according to this pruner configuration
- dressed lepton are also available as configured here
- gen jets with pt > 10 are stored. Parton and Hadron flavour as well as link to gen jet is available in jets
- Some LHE information is stored (see here
)
For reconstructed jets, the index of the matched particle-level jet is
provided
in branch
Jet_genJetIdx
. As explained above, a value of -1 indicates there is no match. The
standard
angular matching is used, with a cut of ΔR < 0.4.
Vertices and pileup
The expected (AKA ‘true’) number of pileup interactions in simulation is provided in branch
Pileup_nTrueInt
. Note that this should be a real number, but in NanoAOD up to and including
NanoAODv9 it gets
truncated
to an integer (as in
std::trunc
, not
std::round
).
Weights
The nominal generator-level weight is stored in branches
genWeight
and
Generator_weight
, which both contain the same value obtained from
GenEventInfoProduct::weight()
(
here
and
here
). This is typically the product of the nominal weight from LHE (if available) and the nominal weight from Pythia. The nominal LHE weight is available as
LHEWeight_originalXWGTUP
, filled from
LHEEventProduct::originalXWGTUP()
(
here
).
Table with generator weights (
configuration
) produces the following branches for systematic variations:
-
LHEScaleWeight
: Variations in the factorization scale and the renormalization scale in the matrix element.
-
PSWeight
: Variations in the renormalization scale in the parton shower.
-
LHEPdfWeight
: Weights from a single PDF set. This is the first PDF set from the preference list given in the configuration, that is present in the sample. LHAPDF IDs of the weights that have actually been stored are mentioned in the title of the branch.
-
LHEReweightingWeight
.
The "LuminosityBlocks" TTree
Contains the following branches:
The "Runs" TTree
Stores GEN-level related information. The branches are filled in
GenWeightsTableProducer
The "MetaData" TTree
The "ParameterSets" TTree
Producing private (customized) NanoAODs
Please see here:
https://gitlab.cern.ch/cms-nanoAOD/nanoaod-doc/-/wikis/Instructions/Private-production
Contributing to NanoAOD
Please see here:
https://gitlab.cern.ch/cms-nanoAOD/nanoaod-doc/-/wikis/Instructions/Contributing-to-NanoAOD
--
SebastienWertz - 2023-01-27