Getting Data From an Event

Complete: 4

Goals of page

This page describes how to access data from a triggered or simulated physics event.

Introduction

The edm::Event class provides access to all data (raw data, HLT, reconstructed, analysis, etc.) associated with a triggered or simulated event. The Event class provides several access methods whose use depends on the needs of the physicist.

Note: The module or source that produces a product does not immediately add the product to the event. It simply places the product in a queue. Immediately after the module or source returns successfully, the framework puts any queued products into the event. This means that a product cannot be obtained from the event immediately by the same module instance that produced it.

Identifying data in an Event

At a minimum, the C++ type of some datum (not necessarily that of the top-level product stored in the Event) must be specified in order to access that datum. However, the type may not be sufficient (since multiple instances of the same type can appear in an Event). Therefore the EDM provides several ways of identifying data in an Event: labels, ProductID, and provenance. The combination of type, module label, product instance label, and process name will uniquely identify one product in an Event. A ProductID will also uniquely identify one product in an Event. Other information in the full provenance can be used to find a product. Labels, ProductID's and provenance are all persistent.

C++ type

All data access from the Event relies on specifying the C++ type in which the physicist is interested. This is determined 'implicitly' based on the type of the variable passed to the Event methods. Physicists will never have to 'cast' objects returned from the Event.

Labels

There are three string 'labels' that are used to uniquely identify a datum of a certain C++ type.

module label

The module label is the persistent label assigned in the configuration file used by the job that actually created the datum. For example,
process.foo = cms.EDproducer("MyFooProducer")    
'foo' is the module label, and all objects created by that module will be assigned that label.

If the datum was first created by an input source, the string "source" is the module label (but see CAVEAT below). For example,

    process.source = cms.Source("PythiaSource")
'source' is the module label, and all objects created by that source will be assigned that module label.

NOTE: Input sources such as PoolSource that just read pre-existing events do not create or modify any module labels, ProductID's or any provenance. The module label, ProductID, and provenance from the original producer or input source are retained.

CAVEAT: This standardization of the module labels for input sources was added in the CMSSW_0_7_0_pre4. Prior to this change, the module labels were the same as the module names (e.g. PythiaSource).

The plan is to define standard module labels that will be used for all versions of production.

product instance label

The product instance label (persistent) is assigned to a product by the producer or source (either set at compile time or via a parameter). This is used in the producer or source when registering the product (using produces<> method) and in the Event::put() method. A default value (which is the empty string) is allowed. When one module produces more than one product of the same type, each must be given a different product instance label.

process name

The process name is the name used in the process statement in the configuration file for the job that created the job. E.g.
process = cms.Process("RECO")
   ...
}

Then 'RECO' would be the process name for all data items created by EDProducer's in this job.

ProductID

Every top level product in an event is assigned a unique persistent integer value called the ProductID. The correspondence from ProductID to product is maintained across writing the product to a file and reading it back in. The ProductID is primarily used for maintaining 'links' between different data (e.g., what towers are associated with a Jet). Note that the ProductID of a product may not have the same value in different jobs or even in different events within the same job if the input is merged data from different processes. The Framework maintains lookup tables that ensure in a single event a saved ProductID will always uniquely identify the same product.

EDGetToken

NOTE: EDGetToken was added in CMSSW_6_2_0_pre6

When registering what data is to be requested, see consumes below, the registration function returns a edm::EDGetToken or edm::EDGetToken<T> which can be used get the specific data. Unlike the ProductID, the token does not globally uniquely identify data. Instead, the token only locally identifies data that was registered by a particular module. That is, two modules could have tokens which internally hold identical values but which identify completely different data. Therefore, a token can only be used with respect to the edm::Event (or edm::Run or edm::LuminosityBlock) which is passed into a particular module.

NOTE: It is possible to convert a edm::EDGetToken<T> to a edm::EDGetToken via a constructor call but it is not possible to go the other way.

Provenance

Provenance keeps track of how a datum was produced. This includes
  • what type of EDProducer or InputSource first created the data
  • the label assigned to the instance of the EDProducer in the job
  • the parameters passed to the EDProducer or InputSource
  • the product instance name associated with the datum
  • what objects were gotten from the Event by the EDProducer

If you get a Handle to a datum in the Event, as described below, then you can access information about its provenance, such as the name of the process that produced it. For example:

edm::Handle<T> handle;
event.getByToken(token, handle);
const Provenance& provenance = *handle.provenance();
const string& processName = provenance.processName();
(See the header file DataFormats/Provenance/interface/Provenance.h for the rest of the interface) There is also a function that one can use to easily get the ParameterSet of the producer of a product.
#include "FWCore/Framework/interface/getProducerParameterSet.h"
...
edm::ParameterSet const* producerPset = edm::getProducerParameterSet(provenance);
int par = producerPset->getParameter<int>("aParameterName");

Registering for data access

NOTE: Registration of data access is enforced in CMSSW_7_0

Before a module, or a class helper delegated by a module, can access data it must have registered with the framework ahead of time that it will be making a data access request. This registration must happen either in the module's constructor or in the beginJob method. Registration is accomplished by calling the following member functions of the module:

consumes

template<typename T>
edm::EDGetTokenT<T> consumes<T>(edm::InputTag const&)

edm::EDGetToken consumes(edm::TypeID const&, edm::InputTag const&)

template<typename T>
void consumesMany<T>()

Each of these methods tell the framework that the data is always or almost always used each time the module is called. consumes<T> is the function most often used. The template argument T denotes the C++ class type used to get the data. In the case where the exact type isn't known by the module at compile time (because a helper class is doing the retrieval)the the form taking an edm::TypeID can be used. The class which knows the type can create a edm::TypeID by doing

edm::TypeID typeFoo( typeid(Foo) );
For both forms of consumes the edm::InputTag provides the needed module label, product instance label and process name required for full datum identification. The consumesMany is used on the rare cases where a module uses a edm::Event::getManyByType call. If the datum is retrieved using a edm::View<T> then the template argument or edm::TypeID construction should include the edm::View when specifying the type.

may consume

If a module infrequently requests additional data, then you can optimize the access by using the 'may consume' interfaces:
template<typename T>
edm::EDGetTokenT<T> mayConsume<T>(edm::InputTag const&)

edm::EDGetToken mayConsume(edm::TypeID const&, edm::InputTag const&)

The arguments to the functions are identical to those used by the consumes interface.

If you are unsure if you should use consumes or mayConsumes we suggest consumes. The 'may consume' data are actually more complicated for the framework to deal with and it is only likely to be a real performance gain if those data are rarely requested.

Tokens

We recommend storing the edm::EDGetTokenT<T> and edm::EDGetToken tokens returned from registration as member data of the module. As for the edm::InputTag needed for the registration, we suggest that be provided by a configuration parameter.

Consumes and Helpers

When module uses a helper class that needs to access data, the registration to the framework has to be done as well during the module's construction or in the beginJob method. This can be done by passing to the helper a edm::ConsumesCollector. Modules can directly pass an instance of this class to helpers. edm::ConsumesCollector has the same consumes interface as a module, and returns a edm::EDGetTokenT<T> that the helper can keep and use later to access the data through the getByToken method. The helpers must there therefore be instantiated during the construction of the module that uses them, and can not be instantiated at some other time, e.g. when processing an event.

Here is a full example (note the 2 ampersands in the constructor of the helper!) :

class FooProd : public edm::EDProducer {
public:
    FooProd(edm::ParameterSet const& iPS) {
      m_helper = new MyHelper(iPS,consumesCollector());
      produces<Foos>();
    }
    ...
};

class MyHelper {
    MyHelper(edm::ParameterSet const& iPS, edm::ConsumesCollector && iC){
      m_token = iC.consumes<Bars>(iPS.getParameter<InputTag>(“bars”));
      ...
    }
    ...
private:
    edm::EDGetTokenT<Bars> m_token;
};

Configuring MessageLogger to print missing consumes messages

In CMSSW_7_0_X we can now print an INFO message the first time a module requests data for which it never did a 'consumes' registration. However, nearly all jobs turn off INFO messages by default. Therefore, to turn on just those missing consumes messages, add the following at the end of your process configuration:

process.MessageLogger.categories.extend(["GetManyWithoutRegistration","GetByLabelWithoutRegistration"])
_messageSettings = cms.untracked.PSet(
                reportEvery = cms.untracked.int32(1),
                            optionalPSet = cms.untracked.bool(True),
                            limit = cms.untracked.int32(10000000)
                        )

process.MessageLogger.cerr.GetManyWithoutRegistration = _messageSettings
process.MessageLogger.cerr.GetByLabelWithoutRegistration = _messageSettings

Event methods for data access

All Event data access methods use the edm::Handle<T>, where T is the C++ type of the requested object, to hold the result of an access. The edm::Handle class behaves like a pointer to T as well as allows access to the edm::Provenance instance that describes the provenance of the obtained datum.

Because most of the data objects stored in an Event are collections, and because it is sometimes useful to obtain access to information about a certain class of reconstruction products ('e.g.' electrons) without the need to know what variety of container was used to contain those objects ('e.g.' std::vector<Electron> or edm::SortedCollection<Electron>), the class template edm::View<T> is provided. An instance of View<T> can be used as the template argument for a handle (edm::Handle<View<T> >). This handle can be used with many of the Event 'get' methods below to obtain a View<T>, which 'points into' any collection of T objects in the Event, without regard to the actual type of the collection in which those T objects reside.

The Event data access methods come in two forms: one that returns only one datum and one that can return many. The 'get one' methods take an edm::Handle<T>& as argument where the 'get many' take an std::vector< edm::Handle<T> >&. If a 'get one' method is unable to find a unique datum matching the data request, an exception will be thrown at the point that the returned handle is dereferenced. A handle can be queried to ask if it is valid before dereferencing it. If a 'get many' method is unable to find data matching the data request, an empty vector will be returned. An exception can also be thrown in the case where the system encountered a problem while trying to honor your request (e.g., the input service could have a catastrophic failure or multiple products are found when only one was requested).

getByToken

NOTE: getByToken became available in CMSSW_6_2_0_pre6

The call to consumes or mayConsume returns a edm::EDGetToken* instance. This token can be used to get data via

void Event::getByToken(edm::EDGetTokenT<T> const& tag,
                       edm::Handle<T>& result)
void Event::getByToken(edm::EDGetToken const& tag,
                       edm::Handle<T>& result)

This method is faster than getByLabel and guarantees that you've registered to request the data. Therefore this is the method we recommend using.

getByLabel

Before the advent of getByToken , we had recommended using the following function to get data:

void Event::getByLabel(edm::InputTag const& tag,
                       edm::Handle<T>& result)

The InputTag can hold the module label and the product instance name. The module label for a product in the event will never be empty. The product instance name for a product is often the empty string. Optionally, the InputTag can also hold a process name. This gets the datum of type T described by the module label and product instance name. If the process name is not an empty string, it also selects the product from the requested process. Otherwise, the datum from the most recent process is selected.

It is recommended the InputTag be a data member of the module calling the getByLabel function and that it be set via a configuration parameter. The InputTag includes a cache that automatically saves indices determined while looking up the data. If the InputTag is saved as a data member between calls to getByLabel, this caching will improve performance.

The following will also work. They are supported for backward compatibility mainly. In cases where performance is not an issue, they might also be slightly easier to use. One cannot specify the process with these so the matching product from the most recent process is always selected. If the product instance name is not provided it is assumed to be the empty string.

void Event::getByLabel(std::string const& moduleLabel, 
                       edm::Handle<T>& result)
void Event::getByLabel(std::string const& moduleLabel,
                       std::string const& productInstanceLabel, 
                       edm::Handle<T>&    result) 

If there is more than one product of type T with the specified moduleLabel and productInstanceLabel, the products must have distinct process names (this is actually enforced by the framework). The product produced by the most recently run process will be returned.

GetterOfProducts

There may be situations in which getByLabel cannot be used, or is otherwise inappropriate. This alternative to getByLabel allows one to get more than one product at a time. It also allows one to define the selection in a more general way. One still must require the type of the product, but in addition to that one can select based on anything in the BranchDescription (or nothing and only require the type).

Internally, GetterOfProducts uses getByToken and benefits from getByToken's performance optimizations.

The GetterOfProducts automatically calls the appropriate consumes interface of the module using the GetterOfProducts.

Typically one would use this as follows:

Add these headers:

#include "FWCore/Framework/interface/GetterOfProducts.h"
#include "FWCore/Framework/interface/ProcessMatch.h"

Add this data member:

    edm::GetterOfProducts<YourDataType> getterOfProducts_;

Add these to the constructor (1st line is usually in the data member initializer list and the 2nd line in the body of the constructor)

    getterOfProducts_(edm::ProcessMatch(processName_), this) {
    callWhenNewProductsRegistered(getterOfProducts_);
Add this to the method called for each event:
    std::vector<edm::Handle<YourDataType> > handles;
    getterOfProducts_.fillHandles(event, handles);
And that is all you need in most cases. The fillHandles function will add an entry to the vector for each product that is actually present in the event and matches the type and other selection requirements. If there are none, it will return an empty vector. In the above example, "YourDataType" is the type of the product you want to get and processName_ is the name of the process you want data from. There are some variations to the above recipe for special cases:

Use an extra argument to the constructor for products in the Run or LuminosityBlock.

    getterOfProducts_ = edm::GetterOfProducts<Thing>(edm::ProcessMatch(processName_), this, edm::InRun);

You can use multiple GetterOfProducts's in the same module. The only tricky part is to use a lambda as follows to register the callbacks:

    callWhenNewProductsRegistered([this](edm::BranchDescription const& bd) {
      getterOfProducts1_(bd);
      getterOfProducts2_(bd);
    });

One can use "*" for the processName_ to select from all processes (this will just select based on type).

You can define your own predicate to replace ProcessMatch in the above example and select based on anything in the BranchDescription. See ProcessMatch.h for an example of how to write this predicate. ModuleLabelMatch.h is already defined in the Framework. Others may be added in the future.

getManyByType

void Event::getManyByType( std::vector<edm::Handle<T> >& result) 

Returns all data items of type T that are in the Event. The exact data returned in the list can vary event to event as well as from job to job. This is mainly supported for backward compatibility. There are performance and other advantages to using GetterOfProducts instead of this. Although this is easier to use. It is only one line code versus 6 lines of code to use GetterOfProducts.

get using ProductID

void Event::get(const edm::ProductID& id,
                edm::Handle<T>& result)
If there is a datum with product id equal to id and that datum is of type T it is returned, else an exception is thrown. The exact datum returned is stable over a job, but not necessarily between jobs. It is, however, stable across writing data to persistent store and reading the data back.

get using a View

In the standard case getByLabel does not support polymorphism and only looks for the exact type specified, e.g. it ignores possibly matching products of sub classes. Using a View brings the polymorphism back. It is possible to obtain a View into an EDProduct that is of concrete type sequence, if Base is a public base of Derived. Code the like following would be able to read both reco::Muon and pat::Muon products:

  Handle< view<reco::Muon>  > muons;
  getByLabel("some label", muons ); 

Obsolete methods

There were several functions which were used to get products that were eliminated from the CMSSW code in 2013. This includes getByType and also the functions that took an argument of type Selector named get and getMany.

Review Status

Reviewer/Editor and Date (copy from screen) Comments
Main.lsexton - 16 Oct 2006 page author
ChrisDJones - 08 Nov 2006 page last content editor
JennyWilliams - 01 Feb 2007 editing to include in SWGuide
ChrisDJones - 02 Mar 2007 added info about process name
Main.paterno - 27 Jun 2007 added info about views
DavidDagenhart - 15 August 2012 added info about GetterOfProducts, deprecated methods, plus many updates to other parts of document

Responsible: Main.lsexton
Last reviewed by: Sudhir Malik- 24 January 2009.

Edit | Attach | Watch | Print version | History: r32 < r31 < r30 < r29 < r28 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r32 - 2017-09-15 - DavidDagenhart



 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback