Analysis Patterns

Complete: 3

Introduction

This page summarizes the main supported data access modes for physics analysis with references to more detailed documentation and concrete examples.

Purpose

Define the output objects of CMS reconstruction with a uniform look and feel in the interface. From each detector and analysis PRS contact persons have been requested to set the specific requirements.

CMS Data Formats

CMSSW provides three main dataformats:

  • FEVT: Full event. Most of the intermediate products of simulation and reconstruction steps are stored in this format.
  • RECO: the output of reconstruction is stored. The information is sufficient to allow event reconstruction reprocessing. For simulated data, Geant 4 tracks and vertices and HepMC generator information is also stored (RECOSIM format).
  • AOD: a selected output of reconstruction is stored sufficient for a large fraction of analyses. The information does not allow neither complete event reprocessing nor track refitting. For simulated data the HepMC generator information is also stored (AODSIM format).

Data Format Documentation

Documentation of the RECO/AOD format definition is part of CMSSW Reference Manual which is produced for any CMSSW release.

Interactive Analysis

Instant access to CMS data can be achieved using interactive applications. Below a number of supported interactive data access modes.

Bare ROOT Analysis

CMS data in EDM format can be inspected with ROOT without need of any CMSSW code installation. This can even be done locally on a laptop.

More details on:

FWLite Interactive Analysis

ROOT can interactlively invoke member functions of objects that are stored in data files in EDM format.

The CMS libraries 'advertise' what C++ data classes they contain. We can use this advertisement to allow ROOT to automatically load the correct library the first time ROOT needs a particular data class.

More details on:

Python Analysis

Interactive ROOT analysis can be done using python scripting language.

More details on:

Batch Analysis

Analysis on large data samples with CPU-intensive processing is better performed with compiled batch application rather than interactively.

Below a number of supported batch application.

Compiled ROOT Applications

Compiled ROOT applications allow more stability than interactive macros. Any CMSSW library can be linked if the application is compiled within a local CMSSW release area.

More details on:

Full Framework Analysis

Using Framework modules allows to access all CMSSW services, including calibrations and alignment, and writing new event files in the EDM format with configurable output content. Analysis applications can be organized into Framework modules to achieve a better modularity.

Below indications of how to use the different Framework module types for analysis.

EDAnalyzers Modules

EDAnalyzer modules allow a read-only access to the Event. It is useful to produce histograms, reports, statistics, etc.

More details on:

EDProducer Modules

EDProducer modules are useful to define new data to be stored in the Event for the final analysis.

More details on:

EDFilter Modules

EDFilter modules are useful to filter the event sample according to user-defined criteria. The event processing stops afther the filter module if the event does not match the specified criteria.

Data samples can be skimmed, selecting only the desired event subset from a data sample, and saving to the output only the specified event data products.

More details on:

TSelector Analysis usable on Parallel PROOF farms

TSelectors are ROOT utilities that allow running analysis under parallel PROOF farms. TSelectors can be written and compiled under CMSSW.

More details on:

Skimming Data Samples

It may be useful for analysis to apply a preselection to your data sample in order reduce the number of events and possibly to reduce the event content. The format of the events saved in output may be different from the one in input, allowing data reduction. Such process is called skimming.

More details on:

Why no more ExRootAnalysis ?

Many ORCA users used to run their analysis on root trees, or "ntuples" produced with applications like ExRootAnalysis with a format specific for analysis.

This approach is no longer needed in CMSSW. The EDM data format is flexible and configurable in such a way that the data in the final analysis format can be stored in EDM files.

Review Status

Reviewer/Editor and Date (copy from screen) Comments
LucaLista -01 Nov 2006 page author
MichaelCase - 02 Feb 2007 page last content editor
JennyWilliams - 05 Feb 2007 editing to include in SWGuide

Responsible: LucaLista
Last reviewed by: Reviewer

Edit | Attach | Watch | Print version | History: r22 < r21 < r20 < r19 < r18 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r22 - 2007-07-02 - JennyWilliams



 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback