Storing Ntuple content with the EDM

Purpose

Storing custom root trees from an analysis job has several disadvantages:

  • lack of a common format among different studies
  • lack of integration with common software and computing tools:
    • lack of book-keeping and integration with GRID
    • lack of provenance tracking
    • etc.

The proposed replacement of custom trees with the addition of user-defined data in "ntuple-like" format using CMSSW EDM allows to have the same flexibility and simplicity as custom root trees with, in addition:

  • possibility to store "ntuple" content together with standard AOD and/or RECO collections
  • integration with Framwork and EDM
  • integration with cmsRun production tools
  • integration with DBS for output data

Using the EDM to Store User-defined Quantities

CMSSW EDM stores events in ROOT trees that can be accessed interactively via ROOT prompt. Any type of data can be added to the event tree using an appropriate procedure.

Adding simple data types, like float or int, requires much simplified software coding w.r.t. the general case, because many required sofware items (namely: dictionaries) are defined centrally.

A Generic Configurable Ntuple Dumper - the CandViewNtpProducer tool

A generic module template is provided to dump to EDM ntuples variables corresponding to object methods. This module can be specialized to any collection type. The following example shows how to use the specialization for a collection storing objects inheriting from reco::Candidate, like muons, electrons, Z→μ+μ-:

goodZToMuMuEdmNtuple = cms.EDProducer(
    "CandViewNtpProducer", 
    src = cms.InputTag("goodZToMuMuAtLeast1HLTLoose"),
    lazyParser = cms.untracked.bool(True),
    prefix = cms.untracked.string("z"),
    eventInfo = cms.untracked.bool(True),
    variables = cms.VPSet(
    cms.PSet(
    tag = cms.untracked.string("Mass"),
    quantity = cms.untracked.string("mass")
    ),
    cms.PSet(
    tag = cms.untracked.string("Pt"),
    quantity = cms.untracked.string("pt")
    ),
    cms.PSet(
    tag = cms.untracked.string("Eta"),
    quantity = cms.untracked.string("eta")
    ),
    cms.PSet(
    tag = cms.untracked.string("Phi"),
    quantity = cms.untracked.string("phi")
    ), 
  )  
 )

Users should specify if whether or not to enable the lazy parser mode (default value is False). Setting to true the parameter lazyParser allows to parse input strings. For each variable, users should specify the tag to be used as ROOT alias and in the branch name (Mass, Pt, Eta, Phi, in this example) and the expression specifying the quantities to be stored (mass, pt, eta, phi) with the parameters quantity. The prefix, if specified, will be added to the ROOT alias. Event number, Run number and LumiBlock branches are added to the EDM ntuple by default. To disable this feature set to False parameter eventInfo.

Find a more complete example of how to configure the CandViewNtpProducer looking at ZMuMuAnalysisNtpProducer configuration file. To store variables corresponding to the composite candidate daughters which inherit from reco::Candidate (in this example the daughters are reco::Muons ) follow this example:

cms.PSet(
    tag = cms.untracked.string("Dau1Pt"),
    quantity = cms.untracked.string("daughter(0).masterClone.pt")
    ),
    cms.PSet(
    tag = cms.untracked.string("Dau2Pt"),
    quantity = cms.untracked.string("daughter(1).masterClone.pt")
    )

Variables of interests for the user could be, for instance, those of pat::Muon or pat::GenericParticle objects. Users need to enable the lazy parser mode to access these informations and store them in the ntuple. The corresponding methods can be called in the same way as the reco::Candidate ones:

cms.PSet(
    tag = cms.untracked.string("Dau1NofHit"),
    quantity = cms.untracked.string("daughter(0).masterClone.numberOfValidHits")
    ),
    cms.PSet(
    tag = cms.untracked.string("Dau1NofHitTk"),
    quantity = cms.untracked.string("daughter(0).masterClone.innerTrack.numberOfValidHits")
    ),
    cms.PSet(
    tag = cms.untracked.string("Dau1NofHitSta"),
    quantity = cms.untracked.string("daughter(0).masterClone.outerTrack.numberOfValidHits")
    )

There is the possibility to access to user-defined variables stored in pat::Objects as UserFloat or UserInt, and add them to the EDM Ntuple by the CandViewNtpProducer in a similar way as for standard reco::Candidate and pat::Candidate variables, calling the method userFloat (userInt) and specifying as argument the variable label:

cms.PSet(
    tag = cms.untracked.string("TrueMass"),
    quantity = cms.untracked.string("userFloat('TrueMass')")
    ),
    cms.PSet(
    tag = cms.untracked.string("TruePt"),
    quantity = cms.untracked.string("userFloat('TruePt')")
    ),  
    cms.PSet(
    tag = cms.untracked.string("Dau1dxyFromPV"),
    quantity = cms.untracked.string("daughter(0).masterClone.userFloat('zDau_dxyFromPV')")
    ),
    cms.PSet(
    tag = cms.untracked.string("Dau2dxyFromPV"),
    quantity = cms.untracked.string("daughter(1).masterClone.userFloat('zDau_dxyFromPV')")
    )

ALERT! Note If you wish to know how to add UserData to your patCollection, please, refer to SWGuidePATUserData.

Another example of application of the CandViewNtpProducer to make EDM ntuples is one of the Single Top analysis. You can find the configuration file at SingleTopNtuplizer cvs page.

A more Complete Example

A more complete example storing a larger number of variables is available below, using the Z→l+l- skim event output:

The configuration script is the following:

Find a more detailed How-To write an EDProducer at the workbook page: WorkBookEDMTutorialProducer

Writing an additional n-tupling module

If there are cases where the default ntupling module does not provide the functionality you are looking for, it is very straight forward to create your own module that adds more numbers to the edm n-tuple. This gets explained in the following.

Supported Data Types

We will assume that you want to store one of the basic data types supported centrally. Among the most common data types, the following are supported:

  • int, unsigned int,
  • short, unsigned short
  • long, unsigned long
  • char, unsigned char
  • bool
  • std::string
  • std::vector<int>, std::vector<unsigned int>,
  • std::vector<short>, std::vector<unsigned short>
  • std::vector<long>, std::vector<unsigned long>
  • std::vector<char>, std::vector<unsigned char>
  • std::vector<bool>
  • std::vector<std::string>

Mathematical types are also available. The most commonly used types are listed below:

  • math::XYZVector
  • math::RhoEtaPhiVector
  • math::XYZPoint
  • math::PtEtaPhiELorentzVector
  • math::PtEtaPhiMLorentzVector
  • math::XYZTLorentzVector
  • std::vector<math::XYZVector>
  • std::vector<math::RhoEtaPhiVector>
  • std::vector<math::XYZPoint>
  • std::vector<math::PtEtaPhiELorentzVector>
  • std::vector<math::PtEtaPhiMLorentzVector>
  • std::vector<math::XYZTLorentzVector>

If you want to access these in FWLite/PyROOT, you need to use the un-typedef'd name with the correct spacing. So for example to get a branch of math::XYZTLorentzVector you need a line like

trueTausHandle = Handle("std::vector<ROOT::Math::LorentzVector<ROOT::Math::PxPyPzE4D<double> > >")

One way to find the type that root expects is to disable the FWLite library loading and look at the list of dictionaries that root complains it can't find.

Defining the EDProducer Module

An EDProducer module should be defined to add user-defined data to the event. A skeleton is what is defined below:

class MyEdmNtupleDumper : public edm::EDProducer {
public:
  MyEdmNtupleDumper( const edm::ParameterSet & );
private:
  void produce( edm::Event &, const edm::EventSetup & );
  edm::InputTag src_; /// tag of input collection(s)
};

Declaring the Data Products

The output data products should be declared in the EDProducer constructor through the produces<...> statement. Below is an example declaring vectors of float:

MyEdmNtupleDumper::MyEdmNtupleDumper( const ParameterSet & cfg ) :
  src_( cfg.getParameter<InputTag>( "src" ) ) {
    produces<std::vector<float> >( "ZMass" ).setBranchAlias( "ZMass" );
    produces<std::vector<float> >( "ZPt" ).setBranchAlias( "ZPt" );
    produces<std::vector<float> >( "ZEta" ).setBranchAlias( "ZEta" );
  }

The setBranchAlias(alias) is useful so that in ROOT you can do Events->Scan(alias) to dump them without knowing the 3km-long full branch name.

Filling and Storing Data Products

The data products should be filled in the produce( . . . ) member function. Below is an example of filling the ntuple-like content form a collection of particles (e.g.: Z→l+l-).

void MyEdmNtupleDumper::produce( Event & evt, const EventSetup & ) {
   using namespace edm; using namespace std;
   Handle<CandidateCollection> zColl;
   evt.getByLabel( src_, zColl );
   auto_ptr<vector<float> > zMass( new vector<float> );
   auto_ptr<vector<float> > zPt( new vector<float> );
   auto_ptr<vector<float> > zEta( new vector<float> );
   for( size_t i = 0; i < zSize; ++ i ) {
      const Candidate & z = (*zColl)[ i ];
      zMass->push_back( z.mass() );
      zPt->push_back( z.pt() );
      zEta->push_back( z.eta() );
   }
   evt.put( zMass, "ZMass" );
   evt.put( zPt, "ZPt" );
   evt.put( zEta, "ZEta" );
 }

Review Status

Editor/Reviewer and date Comments
LucaLista - 16 Oct 2007 Page author
GiovanniPetrucciani - 07 Mar 2008 Some fixes
BenediktHegner - 15 Mar 2010 Update to current features
AnnapaolaDeCosa - 16-Mar-2010 Update CandViewNtpProducer configuration example
Responsible: LucaLista
Last reviewed by:
Topic attachments
I Attachment History Action Size Date Who Comment
Cascading Style Sheet filecss tutorial.css r1 manage 0.2 K 2011-12-01 - 02:22 RogerWolf  
Edit | Attach | Watch | Print version | History: r16 < r15 < r14 < r13 < r12 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r16 - 2011-12-04 - RogerWolf



 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback