2.3 CMSSW Application Framework

Complete: 3
Detailed Review Status

Contents

Goals of this page

When you finish this page, you should understand:
  • the modular architecture of the CMSSW framework
  • what comprises an Event
  • how data are uniquely identified in an Event
  • how Event data are processed
  • the basics of provenance tracking
  • ROOT access methods for viewing data

Introduction

The overall collection of software, referred to as CMSSW, is built around a Framework, an Event Data Model (EDM), and Services needed by the simulation, calibration and alignment, and reconstruction modules that process event data so that physicists can perform analysis. The primary goal of the Framework and EDM is to facilitate the development and deployment of reconstruction and analysis software.

CMSSW

The CMSSW framework implements a software bus model wherein there is one executable, called cmsRun, and many plug-in modules which run algorithms. The same executable is used for both detector and Monte Carlo data. This framework is distinct from a more traditional approach in which several executables are used, one per task or set of tasks.

The CMSSW executable, cmsRun, is configured at run time by the user's job-specific configuration file. This file tells cmsRun which data to use, which modules to run, which parameter settings to use for each module, and in what order to run the modules. Required modules are dynamically loaded at the beginning of the job.

Event Data Model

The CMS Event Data Model (EDM) is centered around the concept of an Event as a C++ object container for all RAW and reconstructed data pertaining to a physics event. During processing, data are passed from one module to the next via the Event, and are accessed only through the Event. All objects in the Event may be individually or collectively stored in ROOT files, and are thus directly browsable in ROOT. This allows tests to be run on individual modules in isolation. Auxiliary information needed to process an Event is accessed via the EventSetup.

Access CMSSW Code

The CMSSW code is contained in a single CVS repository, under the project name CMSSW. You can browse this huge amount of code or search using the CMSSW Software Cross-Reference . You can access it directly on /afs/cern.ch/cms/sw/slc4_ia32_gcc345/cms/cmssw/. Packages are organized by functionality. This is a change from the older CMS framework, in which they were organized by subdetector component.

Framework

The Framework provides ways to guarantee reproducibility by automatically maintaining and recording sufficient provenance information for all application results.

Modular Architecture

A module is a piece (or component) of CMSSW code that can be plugged into the CMSSW executable cmsRun. Each module encapsulates a unit of clearly defined event-processing functionality. Modules are implemented as plug-ins (core libraries and services). They are compiled in fully-bound shared libraries and must be declared to the plug-in manager in order to be registered to the framework. The framework takes care to load the plug-in and instantiate the module when it is requested by the job configuration (sometimes called a "card file"). There is no need to build binary executables for user code!

When preparing an analysis job, the user selects which module(s) to run, and specifies a ParameterSet for each via a configuration file. The module is called for every event according to the path statement in the configuration file.

There are six types of dynamically loadable processing modules, whose interface is specified by the framework:

Source
Reads in an Event from a ROOT file or they can create empty events that later generator filters fill in with content (see WorkBookGeneration), gives the Event status information (such as Event number), and can add data directly or set up a call-back system to retrieve the data on the first request. Examples include the DaqSource which reads in Events from the global DAQ, and the PoolSource which reads Events from a ROOT file.
EDProducer
CMSSW uses the concept of producer modules and products, where producer modules (EDProducers) read in data from the Event in one format, produce something from the data, and output the product, in a different format, into the Event. A succession of modules used in an analysis may produce a series of intermediate products, all stored in the Event. An EDProducer example is the RoadSearchTrackCandidateMaker module; it creates a TrackCandidateCollection which is a collection of found tracks that have not yet had their final fit.
EDFilter
Reads data from the Event and returns a Boolean value that is used to determine if processing of that Event should continue for that path. An example is StopAfterNEvents filter which halts processing of events after the module has processed a set number of events.
EDAnalyzer
Studies properties of the Event. An EDAnalyzer reads data from the Event but is neither allowed to add data to the Event nor affect the execution of the path. Typically an EDAnalyzer writes output, e.g., to a ROOT Histogram.
EDLooper
A module which can be used to control 'multi-pass' looping over an input source's data. It can also modify the EventSetup at well defined times. This type of module is used in the track based alignment proceedure.
OutputModule
Reads data from the Event, and once all paths have been executed, stores the output to external media. An example is PoolOutputModule which writes data to a standard CMS format ROOT file.

The user configures the modules in the job configuration file using the module-specific ParameterSets. ParameterSets may hold other ParameterSets. Modules cannot be reconfigured during the lifetime of the job.

Once a job is submitted, the Framework takes care of instantiating the modules. Each bit of code described in the config file is dynamically loaded. The process is as follows:

  1. First cmsRun reads in the config file and creates a string for each class that needs to be dynamically loaded.
  2. It passes this string to the plug-in manager (the program used to manage the plug-in functionality).
  3. The plug-in manager consults the string-to-library mapping, and delivers to the framework the libraries that contain the requested C++ classes.
  4. The framework loads these libraries.
  5. The framework creates a parameter set (PSet) object from the contents of the (loaded) process block in the config file, and hands it to the constructor.
  6. The constructor constructs an instance of each module.
  7. The executable cmsRun, runs each module in the order specified in the config file.

About Events

Events as formed in the Trigger System

Physically, an event is the result of a single readout of the detector electronics and the signals that will (in general) have been generated by particles, tracks, energy deposits, present in a number of bunch crossings.

The task of the online Trigger and Data Acquisition System (TriDAS) is to select, out of the millions of events recorded in the detector, the most interesting 100 or so per second, and then store them for further analysis. An event has to pass two independent sets of tests, or Trigger Levels, in order to qualify. The tests range from simple and of short duration (Level-1) to sophisticated ones requiring significantly more time to run (High Levels 2 and 3, called HLT). In the end, the HLT system creates RAW data events containing:

  • the detector data,
  • the level 1 trigger result
  • the result of the HLT selections (HLT trigger bits)
  • and some of the higher-level objects created during HLT processing.

Events from a software point of view: The Event Data Model (EDM)

In software terms, an Event starts as a collection of the RAW data from a detector or MC event, stored as a single entity in memory, a C++ type-safe container called edm::Event. Any C++ class can be placed in an Event, there is no requirement on inheritance from a common base class. As the event data is processed, products (of producer modules) are stored in the Event as reconstructed (RECO) data objects. The Event thus holds all data that was taken during a triggered physics event as well as all data derived from the taken data. The Event also contains metadata describing the configuration of the software used for the reconstruction of each contained data object and the conditions and calibration data used for such reconstruction. The Event data is output to files browsable by ROOT. The event can be analyzed with ROOT and used as an n-tuple for final analysis.

Products in an Event are stored in separate containers, organizational units within an Event used to collect particular types of data separately. There are particle containers (one per particle), hit containers (one per subdetector), and service containers for things like provenance tracking.

The full event data (FEVT) in an Event is the RAW plus the RECO data. Analysis Object Data (AOD) is a subset of the RECO data in an event; AOD alone is sufficient for most kinds of physics analysis. RAW, AOD and FEVT are described further in WorkBookDataFormats. The tier-structured CMS Computing Model governs which portions of the Event data are available at a given tier. For event grouping, the model supports both physicist abstractions, such as dataset and event collection, as well as physical packaging concepts native to the underlying computing and Grid systems, such as files. This is described in Data Organization. Here is a framework diagram illustrating how an Event changes as data processing occurs:

framework diagram

Modular Event Content

It is important to emphasize that the event data architecture is modular, just as the framework is. Different data layers (using different data formats) can be configured, and a given application can use any layer or layers. The branches (which map one to one with event data objects) can be loaded or dropped on demand by the application. The following diagram illustrates this concept:

modular_event_products.gif

You can reprocess event data at virtually any stage. For instance, if the available AOD doesn't contain exactly what you want, you might want to reprocess the RECO (e.g., to apply a new calibration) to produce the desired AOD.

reprocess.gif

Custom quantities (data produced by a user or analysis group) can be added to an event and associated with existing objects at any processing stage (RECO/AOD -> candidates -> user data). Thus the distinction between "CMS data" and "user data" may change during the lifetime of the experiment.

user_data_in_event.gif

Identifying Data in the Event

Data within the Event are uniquely identified by four quantities:

C++ class type of the data
E.g., edm::PSimHitContainer or reco::TrackCollection.
module label
the label that was assigned to the module that created the data. E.g., "SimG4Objects" or "TrackProducer".
product instance label
the label assigned to object from within the module (defaults to an empty string). This is convenient if many of the same type of C++ objects are being put into the edm::Event from within a single module.
process name
the process name as set in the job that created the data

Getting data from the Event

All Event data access methods use the
edm::Handle<type>
where type is the C++ type of the datum, to hold the result of an access.

To request data from an Event, in your module, use a form of one of the following:

  • get which either returns one object or throws a C++ exception.
  • getMany which returns a list of zero or more matches to the data request.
After get or getMany, indicate how to identify the data , e.g. getByLabel or getManyByType, and then use the name associated with the handle type, as shown in the example below.

Sample EDAnalyzer Code

Here is some sample EDAnalyzer code from DemoAnalyzer1.cc showing how data is identified and accessed by a module. Notes follow:
void DemoAnalyzer1::analyze(edm::Event const& e, edm::EventSetup const& iSetup) {
 // These declarations create handles to the types of records that you want
  // to retrieve from event "e".
  //
  edm::Handle<reco::TrackCollection> trk_hits;
  edm::Handle<HcalTBTriggerData> triggerD;

  // Pass the handle to the method "getByLabel", which is used to 
  // retrieve one and only one instance of the type in question with
  // the label specified out of event "e". If more than one instance 
  // exists in the event, then an exception is thrown immediately when
  // "getByLabel" is called.  If zero instances exist which pass
  // the search criteria, then an exception is thrown when the handle
  // is used to access the data.  (You can use the "failedToGet" function
  // of the handle to determine whether the "get" found its data before
  // using the handle)
  e.getByLabel("TrackProducer", trk_hits);

  // Note that getByType is discouraged.  It only makes sense for raw data items that we
  // absolutely know there will only ever be one of in the event.  Raw trigger data is an
  // example of such
  //
  e.getByType(triggerD);
Notes:
  • Line 1: The method analyze receives a pointer e to the object edm::Event which contains all event data.
  • Middle section: Containers are provided for each type of event data and can be obtained by using the object edm::Handle.
  • Last 3 section: e.getBy* (handle to types of event data) will retrieve the data from the event and store them in a container in memory.

No matter which way you request the data, the results of the request will be returned in a smart pointer (C++ handle) of type edm::Handle<>.

The Processing Model

Events are processed by passing the Event through a sequence of modules. The exact sequence of modules is specified by the user via a path statement in a configuration file. A path is an ordered list of Producer/Filter/Analyzer modules which sets the exact execution order of all the modules. When an Event is passed to a module, that module can get data from the Event and put data back into the Event. When data is put into the Event, the provenance information about the module that created the data will be stored with the data in the Event. The components involved in the framework and EDM are shown here:

fw_edm.gif

In a second figure below, we see a Source that provides the Event to the framework. (The standard source which uses POOL is shown; it combines C++ Object streaming technology, such as ROOT I/O, with a transaction-safe relational database store.) The Event is then passed to the execution paths. The paths can then be ordered into a list that makes up the schedule for the process. Note that the same module may appear in multiple paths, but the framework will guarantee that a module is only executed once per Event. Since it will ask for exactly the same products from the event and produce the same result independent of which path it is in, it makes no sense to execute it twice. On the other hand a user designing a trigger path should not have to worry about the full schedule (that could involve 100's of modules). Each path should be executable by itself, in that modules within the path, only ask for things they know have been produced in a previous module in the same path or from the input source. In a perfect world, order of execution of the paths should not matter. However due to the existence of bugs it is always possible that there is an order dependence. Such dependencies should be removed during validation of the job.

processing model

Framework Services

ServiceRegistry System

The ServiceRegistry is used to deliver services such as the error logger or a debugging service which provides feedback about the state of the Framework (e.g., what module is presently running). Services are informed about the present state of the Framework, e.g., the start of a new Event or the completion of a certain module. Such information is useful for producing meaningful error messages from the error logger or for debugging. The services to be used in a job and the exact configuration of those services are set in the user's configuration file via a ParameterSet.

Event Setup

To be able to fully process an event, one has to take into account potentially changing and periodically updated information about the detector environment and status. This information (non-event data) is not tied to a given event, but rather to the time period for which it is valid. This time period is called its interval of validity or IOV, and an IOV typically spans many events. Examples of this type of non-event data include calibrations, alignments, geometry descriptions, magnetic field and run conditions recorded during data acquisition. The IOV of one piece of non-event data is not necessarily related to that of another. The EventSetup system handles this type of non-event data for which the IOV is longer than one Event. (Note that non-Event data initiated by the DAQ, such as the Event or a Run transition, are handled by the Event system.)

The figure illustrates the varying IOVs of different non-event data (calibrations and alignments), and how their values at the time of a given event are read by the EventSetup system.

Event setup from Paolos presentation

The EventSetup system design uses two categories of modules to do its work: ESSource and ESProducer. These components are configured using the same configuration mechanism as their Event counterparts, i.e., via a ParameterSet.

ESSource
is responsible for determining the IOV of a Record (or a set of Records). (A Record is an EventSetup construct that holds data and services which have identical IOVs.) The ESSource may also deliver data/services. For example, a user can request the ECAL pedestals via an ESSource that reads the appropriate values from a database.

ESProducer
an ESProducer is, conceptually, an algorithm whose inputs are dependent on data with IOVs. The ESProducer's algorithm is run whenever there is an IOV change for the Record to which the ESProducer is bound. For example, an ESProducer is used to read the ideal geometry of the tracker as well as the alignment corrections and then create the aligned tracker geometry from those 2 pieces of information. This ESProducer is told by the EventSetup system to create a new aligned tracker geometry whenever the alignment changes.

Provenance Tracking

To aid in understanding the full history of an analysis, the framework accumulates provenance for all data stored in the standard ROOT output files. The provenance is recorded in a hierarchical fashion. First, configuration information for each Producer in the job (and all previous jobs contributing data to this job) is stored in the output file. The configuration information includes the ParameterSet used to configure the Producer, the C++ type of the Producer and the software version. Second, each datum stored in the Event is associated with the Producer which created it. Third, for each Event, the data requested by each Producer (when running its algorithm) are recorded. In this way the actual interdependencies between data in the Event are captured.

Examining the Output in ROOT

One main goal of the CMSSW EDM is to make the ROOT files that are read and created by the framework useable directly in ROOT. Several different ROOT access methods are available: bare, Framework Lite (with libraries), and full framework. These are described in Different Ways to Make an Analysis.

Information Sources

CMS computing TDR 4.1 -- cmsdoc.cern.ch/cms/ppt/tdr; discussion with J Yarba 3/17/06 and with Liz Sexton-Kennedy and Oliver Gutsche.
ECAL CMSSW Tutorial (Paolo Meridiani, INFN Roma, ECAL week April 27, 2006)
Framework tutorial
Physics TDR

Discussion with Luca Lista, and his presentation RECO/AOD Issues for Analysis Tools.

Recovering data from CMS software black hole. G. Zito

Review status

Reviewer/Editor and Date (copy from screen) Comments
Main.Aresh - 19 Feb 2008 change in the index (table tags inserted into TWIKI conditionals for printable version) and in "ExamineOutput" where "PyROOT" arises problems for printable version
JennyWilliams - 23 Oct 2007 review, minor editing
Main.lsexton - 18 Dec 2006 Answered some questions, small changes in phrasing, and corrected the use of the deprecated Event interface found in the G.Zito material
AnneHeavey - 06 and 07 Nov 2006 Reworked "ROOT browsing" section; pulled in info from WorkBookMakeAnalysis; fixed up in prep for mass printing; used G.Zito's EDAnalyzer example

Responsible: KatiLassilaPerini
Last reviewed by: DavidDagenhart - 25 Feb 2008

Topic attachments
I Attachment History Action Size Date Who Comment
GIFgif dataflowtiers.gif r1 manage 52.6 K 2006-02-18 - 00:23 UnknownUser Oli's data flow tier diagram 1/12/06
GIFgif event_setup.gif r1 manage 27.7 K 2006-04-29 - 00:27 UnknownUser Event setup from Paolo's presentation
GIFgif framework.gif r2 r1 manage 45.1 K 2006-04-28 - 23:07 UnknownUser framework diagram
GIFgif fw_edm.gif r1 manage 17.3 K 2006-08-01 - 17:30 AnneHeavey components of fw and edm from phys tdr
GIFgif modular_event_products.gif r1 manage 33.7 K 2006-07-21 - 13:06 AnneHeavey modular event products
GIFgif processingmodel.gif r1 manage 33.0 K 2006-03-10 - 23:02 UnknownUser processing model from c.jones chep06 talk
GIFgif reco_in_frmwk.gif r1 manage 15.2 K 2006-04-17 - 19:32 UnknownUser Schematic Picture of Reconstruction in the Framework
GIFgif reprocess.gif r1 manage 19.4 K 2006-07-21 - 13:07 AnneHeavey reprocessing options
GIFgif tierstructure.gif r1 manage 162.6 K 2006-02-21 - 20:38 UnknownUser Tier structure
GIFgif tut040processdiagram.gif r1 manage 18.3 K 2006-02-18 - 00:20 UnknownUser Tutorial process diagram
GIFgif user_data_in_event.gif r1 manage 67.7 K 2006-07-21 - 13:08 AnneHeavey user data in event
Edit | Attach | Watch | Print version | History: r87 < r86 < r85 < r84 < r83 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r87 - 2009-11-25 - KatiLassilaPerini
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback