SuperComputing 2007


The CMS Grid Demo exercises the standard analysis work within the CMS computing model. A physicist will use various CMS and general GRID services to analysis distributed data and MC samples.

CMS computing model

The CMS computing model distributes its resources in a tiered structure around the world:

T0: CERN (accelerator and detector) T1: regional centers (FNAL is one of seven regional centers) T2: analysis centers (25-50 centers associated with one T1 for support, 7 in US, all associated to FNAL)

There are two data flow models in CMS:

Recorded data

Data is recorded by the detector and triggered by the first level (hardware) and higher level (software) trigger systems to reduce the 40 MHz bunch crossing to 150-200 Hz of recorded events. Those raw events are stored on tape at the T0 (CERN). This copy is not accessible by physicists (cold copy). At CERN, the raw data is reconstructed and split into primary datasets according to their trigger bit composition. With an estimated total overlap of 10% (10% of the recorded events are in two or more primary datasets), each of these primary datasets alone is enough to perform a specific physics analysis (corresponding to the used trigger channel). The primary datasets are distributed amongst the 7 T1. An individual primary dataset is only at one T1. The T1 centers also store the primary datasets on tape (custodial copy, accessible by physicists). At the T1's, the primary datasets are skimmed by physics groups to reduce dataset size and only retain events or special interest. At T1, the primary datasets will also be re-reconstructed when new calibration constants and improved reconstruction algorithms are available. Skimmed datasets, AOD datasets (object analysis data, extract of reconstructed event data format, very small and sufficient for 90% of the analyses) are transported to the T2 level (which acts like a cache). Physicists will run their analysis only at the T2 level on these skimmed datasets.

Monte Carlo

Monte Carlo is simulated only at the T2 level and archived back at the associated T1 center and stored on tape. The MC samples are then skimmed by physics groups and the skims are treated like the recorded data skims, transported back to T2 level for analysis.

Distribution consequences

Because of the distributed nature, analysis wil be data location driven (send the analysis jobs to the location of the data). All centers have to be accessible to all physicist on a fair share basis. CMS developed tools (global data catalogs which provide information about which datasets are available and where copies are accessible) and a wrapper tool for GRID tools which hides the various GRID interactions and provides the user with a simple interface.


The user GRID tool CRAB will be used to run ~10 distributed jobs on MC samples transported to various CMS US T2. The example analysis will reconstruct the Higgs signal and store key distributions in histograms stored in ROOT files. Those ROOT files with histograms are retrieved and merged to display the reconstructed Higgs mass.

The demo is described in many places, some of which are the following TWiki pages:

with probably the most explanations at:


Following talk was given at SC2006 and will be updated for SC2007:

Also available are more infos about the CMS computing model and networking here

Relation to other parts of the SC2007 FNAL booth

There is a close relation to the network presented in another corner of the booth. Without strong and fast network connections, CMS cannot follow the distributed computing model and move data all around the world.

Edit | Attach | Watch | Print version | History: r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r1 - 2007-08-19 - OliverGutsche
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback