Storing Ntuple content with the EDM
Purpose
Storing custom root trees from an analysis job has several disadvantages:
- lack of a common format among different studies
- lack of integration with common software and computing tools:
- lack of book-keeping and integration with GRID
- lack of provenance tracking
- etc.
The proposed replacement of custom trees with the addition
of user-defined data in "ntuple-like" format using CMSSW
EDM
allows to have the same flexibility and simplicity as custom root trees with, in
addition:
- possibility to store "ntuple" content together with standard AOD and/or RECO collections
- integration with Framwork and EDM
- integration with
cmsRun
production tools
- integration with DBS for output data
Using the EDM to Store User-defined Quantities
CMSSW
EDM stores events in ROOT trees
that can be accessed interactively via ROOT prompt.
Any type of data can be added to the event tree using an appropriate
procedure.
Adding simple data types, like
float
or
int
, requires
much simplified software coding w.r.t. the general
case, because many required sofware items (namely:
dictionaries)
are defined centrally.
A Generic Configurable Ntuple Dumper - the CandViewNtpProducer tool
A generic module template is provided to dump
to
EDM ntuples variables corresponding to
object methods. This module can be specialized
to any collection type. The following example
shows how to use the specialization for
a collection storing objects inheriting from
reco::Candidate
,
like muons, electrons, Z→μ
+μ
-:
goodZToMuMuEdmNtuple = cms.EDProducer(
"CandViewNtpProducer",
src = cms.InputTag("goodZToMuMuAtLeast1HLTLoose"),
lazyParser = cms.untracked.bool(True),
prefix = cms.untracked.string("z"),
eventInfo = cms.untracked.bool(True),
variables = cms.VPSet(
cms.PSet(
tag = cms.untracked.string("Mass"),
quantity = cms.untracked.string("mass")
),
cms.PSet(
tag = cms.untracked.string("Pt"),
quantity = cms.untracked.string("pt")
),
cms.PSet(
tag = cms.untracked.string("Eta"),
quantity = cms.untracked.string("eta")
),
cms.PSet(
tag = cms.untracked.string("Phi"),
quantity = cms.untracked.string("phi")
),
)
)
Users should specify if whether or not to enable the lazy parser mode (default value is False).
Setting to true the parameter
lazyParser
allows to parse input strings.
For each variable, users should specify
the tag to be used as ROOT alias and in the branch
name (
Mass
,
Pt
,
Eta
,
Phi
, in this example) and
the
expression specifying the quantities
to be stored (
mass
,
pt
,
eta
,
phi
)
with the parameters
quantity
.
The
prefix
, if specified, will be added to the ROOT alias.
Event number, Run number and LumiBlock branches are added to the
EDM ntuple
by default. To disable this feature set to False parameter
eventInfo
.
Find a more complete example of how to configure the CandViewNtpProducer
looking at
ZMuMuAnalysisNtpProducer
configuration file.
To store variables corresponding to the composite candidate daughters which inherit from reco::Candidate (in this example the daughters are reco::Muons ) follow this example:
cms.PSet(
tag = cms.untracked.string("Dau1Pt"),
quantity = cms.untracked.string("daughter(0).masterClone.pt")
),
cms.PSet(
tag = cms.untracked.string("Dau2Pt"),
quantity = cms.untracked.string("daughter(1).masterClone.pt")
)
Variables of interests for the user could be, for instance, those of
pat::Muon
or
pat::GenericParticle
objects.
Users need to enable the lazy parser mode to access these informations and store them in the ntuple.
The corresponding methods can be called in the same way as the reco::Candidate ones:
cms.PSet(
tag = cms.untracked.string("Dau1NofHit"),
quantity = cms.untracked.string("daughter(0).masterClone.numberOfValidHits")
),
cms.PSet(
tag = cms.untracked.string("Dau1NofHitTk"),
quantity = cms.untracked.string("daughter(0).masterClone.innerTrack.numberOfValidHits")
),
cms.PSet(
tag = cms.untracked.string("Dau1NofHitSta"),
quantity = cms.untracked.string("daughter(0).masterClone.outerTrack.numberOfValidHits")
)
There is the possibility to access to user-defined variables stored in
pat::Objects
as UserFloat or UserInt, and add them to the
EDM Ntuple by the CandViewNtpProducer in a similar way as for standard reco::Candidate and
pat::Candidate
variables, calling the method userFloat (userInt) and specifying as argument the variable label:
cms.PSet(
tag = cms.untracked.string("TrueMass"),
quantity = cms.untracked.string("userFloat('TrueMass')")
),
cms.PSet(
tag = cms.untracked.string("TruePt"),
quantity = cms.untracked.string("userFloat('TruePt')")
),
cms.PSet(
tag = cms.untracked.string("Dau1dxyFromPV"),
quantity = cms.untracked.string("daughter(0).masterClone.userFloat('zDau_dxyFromPV')")
),
cms.PSet(
tag = cms.untracked.string("Dau2dxyFromPV"),
quantity = cms.untracked.string("daughter(1).masterClone.userFloat('zDau_dxyFromPV')")
)
Note If you wish to know how to add UserData to your patCollection, please, refer to
SWGuidePATUserData.
Another example of application of the CandViewNtpProducer to make
EDM ntuples is one of the Single Top analysis.
You can find the configuration file at
SingleTopNtuplizer
cvs page.
A more Complete Example
A more complete example storing a larger number of
variables is available below, using the
Z→l+l- skim
event output:
The configuration script is the following:
Find a more detailed How-To write an EDProducer at the workbook page:
WorkBookEDMTutorialProducer
Writing an additional n-tupling module
If there are cases where the default ntupling module does not provide the functionality you are looking for, it is very straight forward to create your own module that adds more numbers to the edm n-tuple. This gets explained in the following.
Supported Data Types
We will assume that you want to store one of the basic data types supported centrally. Among the most
common data types, the following are supported:
-
int
, unsigned int
,
-
short
, unsigned short
-
long
, unsigned long
-
char
, unsigned char
-
bool
-
std::string
-
std::vector<int>
, std::vector<unsigned int>
,
-
std::vector<short>
, std::vector<unsigned short>
-
std::vector<long>
, std::vector<unsigned long>
-
std::vector<char>
, std::vector<unsigned char>
-
std::vector<bool>
-
std::vector<std::string>
Mathematical types are also available. The most commonly used types
are listed below:
-
math::XYZVector
-
math::RhoEtaPhiVector
-
math::XYZPoint
-
math::PtEtaPhiELorentzVector
-
math::PtEtaPhiMLorentzVector
-
math::XYZTLorentzVector
-
std::vector<math::XYZVector>
-
std::vector<math::RhoEtaPhiVector>
-
std::vector<math::XYZPoint>
-
std::vector<math::PtEtaPhiELorentzVector>
-
std::vector<math::PtEtaPhiMLorentzVector>
-
std::vector<math::XYZTLorentzVector>
If you want to access these in FWLite/PyROOT, you need to use the un-typedef'd name with the correct spacing. So for example to get a branch of math::XYZTLorentzVector you need a line like
trueTausHandle = Handle("std::vector<ROOT::Math::LorentzVector<ROOT::Math::PxPyPzE4D<double> > >")
One way to find the type that root expects is to disable the FWLite library loading and look at the list of dictionaries that root complains it can't find.
Defining the EDProducer Module
An
EDProducer module should be defined to add
user-defined data to the event. A skeleton is
what is defined below:
class MyEdmNtupleDumper : public edm::EDProducer {
public:
MyEdmNtupleDumper( const edm::ParameterSet & );
private:
void produce( edm::Event &, const edm::EventSetup & );
edm::InputTag src_; /// tag of input collection(s)
};
Declaring the Data Products
The output data products should be declared in the
EDProducer constructor through the
produces<...>
statement.
Below is an example declaring vectors of
float
:
MyEdmNtupleDumper::MyEdmNtupleDumper( const ParameterSet & cfg ) :
src_( cfg.getParameter<InputTag>( "src" ) ) {
produces<std::vector<float> >( "ZMass" ).setBranchAlias( "ZMass" );
produces<std::vector<float> >( "ZPt" ).setBranchAlias( "ZPt" );
produces<std::vector<float> >( "ZEta" ).setBranchAlias( "ZEta" );
}
The
setBranchAlias(alias)
is useful so that in ROOT you can do
Events->Scan(alias)
to dump them without knowing the 3km-long full branch name.
Filling and Storing Data Products
The data products should be filled in the
produce( . . . )
member function.
Below is an example of filling the ntuple-like
content form a collection of particles (e.g.: Z→l
+l
-).
void MyEdmNtupleDumper::produce( Event & evt, const EventSetup & ) {
using namespace edm; using namespace std;
Handle<CandidateCollection> zColl;
evt.getByLabel( src_, zColl );
auto_ptr<vector<float> > zMass( new vector<float> );
auto_ptr<vector<float> > zPt( new vector<float> );
auto_ptr<vector<float> > zEta( new vector<float> );
for( size_t i = 0; i < zSize; ++ i ) {
const Candidate & z = (*zColl)[ i ];
zMass->push_back( z.mass() );
zPt->push_back( z.pt() );
zEta->push_back( z.eta() );
}
evt.put( zMass, "ZMass" );
evt.put( zPt, "ZPt" );
evt.put( zEta, "ZEta" );
}
Review Status
Responsible:
LucaLista
Last reviewed by: