4.1.1 More on CMSSW Framework
Complete:
Detailed Review Status
Contents
Goals of this page
When you finish this page, you should understand:
- the modular architecture of the CMSSW framework and the Event Data Model (EDM)
- how data are uniquely identified in an Event
- how Event data are processed - AOD and miniAOD structures
- the Framework Services, including the EventSetup
Introduction
The overall collection of software, referred to as CMSSW, is built around a Framework, an Event Data Model (EDM), and Services needed by the simulation, calibration and alignment, and reconstruction modules that process event data so that physicists can perform analysis. The primary goal of the Framework and
EDM is to facilitate the development and deployment of reconstruction and analysis software.
Modular Event Content
It is important to emphasize that the event data architecture is modular, just as the framework. Different data layers (using different
data formats) can be configured, and a given application can use any layer or layers. The branches (which map one to one with event data objects) can be loaded or dropped on demand by the application. The following diagram illustrates this concept:
You can reprocess event data at virtually any stage. For instance, if the available AOD doesn't contain exactly what you want, you might want to reprocess the RECO (e.g., to apply a new calibration) to produce the desired AOD.
Custom quantities (data produced by a user or analysis group) can be added to an event and associated with existing objects at any processing stage (RECO/AOD -> candidates -> user data). Thus the distinction between "CMS data" and "user data" may change during the lifetime of the experiment.
Identifying Data in the Event
Data within the Event are uniquely identified by four quantities:
- C++ class type of the data
- E.g., edm::PSimHitContainer or reco::TrackCollection.
- module label
- the label that was assigned to the module that created the data. E.g., "SimG4Objects" or "TrackProducer".
- product instance label
- the label assigned to object from within the module (defaults to an empty string). This is convenient if many of the same type of C++ objects are being put into the edm::Event from within a single module.
- process name
- the process name as set in the job that created the data
For example if you do (you can find the file MYCOPY.ROOT
here
) :
edmDumpEventContent MYCOPY.root
you get this output:
vector<reco::TrackExtra> "electronGsfTracks" "" "RECO."
vector<reco::TrackExtra> "generalTracks" "" "RECO."
vector<reco::TrackExtra> "globalMuons" "" "RECO."
vector<reco::TrackExtra> "globalSETMuons" "" "RECO."
vector<reco::TrackExtra> "pixelTracks" "" "RECO."
vector<reco::TrackExtra> "standAloneMuons" "" "RECO."
vector<reco::TrackExtra> "standAloneSETMuons" "" "RECO."
vector<reco::TrackExtra> "tevMuons" "default" "RECO."
vector<reco::TrackExtra> "tevMuons" "firstHit" "RECO."
vector<reco::TrackExtra> "tevMuons" "picky" "RECO."
In the above output:
vector<reco::TrackExtra>
is the C++ class type of the data
globalMuons"
is the module label
firstHit
is the product instance label
RECO
is the process name
Getting data from the Event
All Event data access methods use the
edm::Handle<type>
where
type
is the C++ type of the datum, to hold the result of an access.
To request data from an Event, in your module, use a form of one of the following:
- get which either returns one object or throws a C++ exception.
- getMany which returns a list of zero or more matches to the data request.
After
get or
getMany, indicate how to identify the data , e.g.
getByLabel or
getManyByType, and then use the name associated with the handle type, as shown in the example below.
Sample EDAnalyzer Code
Here is snippet from EDAnalyzer code called
DemoAnalyzer.cc ( used in the next section) showing how data is identified and accessed by a module. Notes follow:
void DemoAnalyzer::analyze(const edm::Event& iEvent, const edm::EventSetup& iSetup)
{
// These declarations create handles called "tracks" to the types of records "reco::TrackCollection" that you want
// to retrieve from event "iEvent".
using namespace edm;
edm::Handle<reco::TrackCollection> tracks;
// Pass the handle "tracks" to the method "getByLabel", which is used to
// retrieve one and only one instance of the type in question with
// the label specified out of event "iEvent". If more than one instance
// exists in the event, then an exception is thrown immediately when
// "getByLabel" is called. If zero instances exist which pass
// the search criteria, then an exception is thrown when the handle
// is used to access the data. (You can use the "failedToGet" function
// of the handle to determine whether the "get" found its data before
// using the handle)
iEvent.getByLabel("generalTracks", tracks);
.....................
.....................
}
Notes:
- Line 1: The method
analyze
receives a pointer iEvent
to the object edm::Event
which contains all event data.
- Middle section: Containers are provided for each type of event data and can be obtained by using the object
edm::Handle
.
- Last 3 section:
iEvent.getByLabel
(handle to types of event data) will retrieve the data from the event and store them in a container in memory.
No matter which way you request the data, the results of the request will be returned in a smart pointer (C++ handle) of type
edm::Handle<>
.
You may refer to the code
4.1.2 called
DemoAnalyzer.cc to see a used case.
The Processing Model
Events are processed by passing the Event through a sequence of modules. The exact sequence of modules is specified by the user via a
path
statement in a configuration file. A
path
is an ordered list of Producer/Filter/Analyzer modules which sets the exact execution order of all the modules. When an Event is passed to a module, that module can get data from the Event and put data back into the Event. When data is put into the Event, the provenance information about the module that created the data will be stored with the data in the Event. The components involved in the framework and
EDM are shown here:
The Standard Input Source shown above uses a ROOT I/O. The Event is then passed to the execution paths. The paths can then be ordered into a list that makes up the schedule for the process. Note that the same module may appear in multiple paths, but the framework will guarantee that a module is only executed once per Event. Since it will ask for exactly the same products from the event and produce the same result independent of which path it is in, it makes no sense to execute it twice. On the other hand a user designing a trigger path should not have to worry about the full schedule (that could involve 100's of modules). Each path should be executable by itself, in that modules within the path, only ask for things they know have been produced in a previous module in the same path or from the input source. In a perfect world, order of execution of the paths should not matter. However due to the existence of bugs it is always possible that there is an order dependence. Such dependencies should be removed during validation of the job.
Framework Services
ServiceRegistry System
The ServiceRegistry is used to deliver services such as the error logger or a debugging service which provides feedback
about the state of the Framework (e.g., what module is presently running). Services are informed about the present state of the Framework, e.g., the start of a new Event or the completion of a certain module. Such information is useful for producing meaningful error messages from the error logger or for debugging. The services to be used in a job and the exact configuration of those services are set in the user's configuration file via a ParameterSet. For further information look
here.
Event Setup
To be able to fully process an event, one has to take into account potentially changing and periodically updated information about the detector environment and status. This information (non-event data) is not tied to a given event, but rather to the time period for which it is valid. This time period is called its
interval of validity or IOV, and an IOV typically spans many events. Examples of this type of non-event data include calibrations, alignments, geometry descriptions, magnetic field and run conditions recorded during data acquisition. The IOV of one piece of non-event data is not necessarily related to that of another. The EventSetup system handles this type of non-event data for which the IOV is longer than one Event. (Note that non-Event data initiated by the DAQ, such as the Event or a Run transition, are handled by the Event system.)
The figure illustrates the varying IOVs of different non-event data (calibrations and alignments), and how their values at the time of a given event are read by the EventSetup system.
The EventSetup system design uses two categories
of modules to do its work:
ESSource and
ESProducer. These components are
configured using the same configuration mechanism as their Event counterparts, i.e., via a
ParameterSet.
- ESSource
- is responsible for determining the IOV of a Record (or a set of Records). (A Record is an EventSetup construct that holds data and services which have identical IOVs.) The ESSource may also deliver data/services. For example, a user can request the ECAL pedestals via an ESSource that reads the appropriate values from a database.
- ESProducer
- an ESProducer is, conceptually, an algorithm whose inputs are dependent on data with IOVs. The ESProducer's algorithm is run whenever there is an IOV change for the Record to which the ESProducer is bound. For example, an ESProducer is used to read the ideal geometry of the tracker as well as the alignment corrections and then create the aligned tracker geometry from those 2 pieces of information. This ESProducer is told by the EventSetup system to create a new aligned tracker geometry whenever the alignment changes.
For further information look
here.
Provenance Tracking
The CMS Offline framework stores provenance information within CMS's standard ROOT event data files. The provenance information is used to track how every data product was constructed including what other data products were read in order to do the construction. We record information to understand the history of how data were produced and chosen. Provenance information does not have to be sufficient to allow an exact replay of a process. Storing provenance in output files is very crucial to insure trust in the data, given the large scale, highly distributed nature of production, especially for physicists' personal skims which are not centrally managed. Using Provenance information one can track the source of a problem seen in one file but not another one, guarantee compatibility when reading multiple files in a job, confirm that an analysis was done using the proper data, track why two analyses get different results etc. A good source of info is a
talk
by Chris Jones at given at CEHP09. Also refer to
WorkBook 2.3. Also see
http://iopscience.iop.org/1742-6596/219/3/032011.
Review status
Responsible:
SudhirMalik
Last reviewed by:
SudhirMalik - 26 Nov 2009
%EDITING%
AltanCakir - 09 Oct 2017
--
AltanCakir - 09 Oct 2017