Analysis Patterns
Complete:
Introduction
This page summarizes
the main
supported data access modes for physics analysis
with references to more detailed documentation and concrete
examples.
Purpose
Define the output objects of CMS reconstruction with a uniform look and feel in the interface.
From each detector and analysis PRS contact persons have been requested to set the specific requirements.
CMS Data Formats
CMSSW provides three main dataformats:
- FEVT: Full event. Most of the intermediate products of simulation and reconstruction steps are stored in this format.
- RECO: the output of reconstruction is stored. The information is sufficient to allow event reconstruction reprocessing. For simulated data, Geant 4 tracks and vertices and HepMC generator information is also stored (RECOSIM format).
- AOD: a selected output of reconstruction is stored sufficient for a large fraction of analyses. The information does not allow neither complete event reprocessing nor track refitting. For simulated data the HepMC generator information is also stored (AODSIM format).
Data Format Documentation
Documentation
of the RECO/AOD format definition is part of
CMSSW Reference Manual
which is produced
for any CMSSW release.
Interactive Analysis
Instant access to CMS data can be achieved using interactive applications.
Below a number of supported interactive data access modes.
Bare ROOT Analysis
CMS data in
EDM format can be inspected with ROOT without need of any CMSSW code installation.
This can even be done locally on a laptop.
More details on:
FWLite Interactive Analysis
ROOT can interactlively invoke member functions of objects that
are stored in data files in
EDM format.
The CMS libraries 'advertise' what C++ data classes they contain. We can use this advertisement to allow ROOT to automatically load the correct library the first time ROOT needs a particular data class.
More details on:
Python Analysis
Interactive
ROOT
analysis can be done using python scripting language.
More details on:
Batch Analysis
Analysis on large data samples with CPU-intensive processing is better
performed with compiled batch application rather than interactively.
Below a number of supported batch application.
Compiled ROOT Applications
Compiled ROOT applications allow more stability than interactive macros.
Any CMSSW library can be linked if the application is compiled within
a local CMSSW release area.
More details on:
Full Framework Analysis
Using Framework modules allows to access all CMSSW services, including
calibrations and alignment, and writing new event files in the
EDM format with
configurable output content. Analysis applications can be organized into Framework
modules to achieve a better modularity.
Below indications of how to use the different Framework module types for
analysis.
EDAnalyzers Modules
EDAnalyzer modules allow a read-only access to the Event.
It is useful to produce histograms, reports, statistics, etc.
More details on:
EDProducer Modules
EDProducer modules are useful to define new data to be stored
in the Event for the final analysis.
More details on:
EDFilter Modules
EDFilter modules are useful to filter the event sample
according to user-defined criteria. The event processing stops
afther the filter module if the event does not match the specified
criteria.
Data samples can be
skimmed, selecting only the desired event
subset from a data sample, and saving to the output only the
specified event data products.
More details on:
TSelector Analysis usable on Parallel PROOF farms
TSelectors are ROOT utilities that allow running analysis under parallel PROOF farms.
TSelectors can be written and compiled under CMSSW.
More details on:
Skimming Data Samples
It may be useful for analysis to apply a preselection to your data sample
in order reduce the number of events and possibly to reduce the
event content. The format of the events saved in output may be
different from the one in input, allowing data reduction.
Such process is called
skimming.
More details on:
Why no more ExRootAnalysis ?
Many ORCA users used to run their analysis on root trees, or "ntuples" produced
with applications like ExRootAnalysis with a format specific for analysis.
This approach is no longer needed in CMSSW. The
EDM data format is flexible
and configurable in such a way that the data in the final analysis format can be
stored in
EDM files.
Review Status
Responsible:
LucaLista
Last reviewed by: Reviewer