DRAFT Analysis Model and Skimming DRAFT


Explaining the analysis model and skimming procedures for the startup phase. Covering technical details as well

Multiple tier concept

PD, central skim, group skim, user skim

Central Skims (Primary Skims)

For the distribution of data to Tier-2 centres so-called Central Skims will be run at Tier-1 centres. The scope of the Central Skims is an extension of the Primary Datasets and selection is foreseen to happen on trigger paths with well judged exceptions. The standard event content to be kept will be RECO in the startup phase.

The processing of central skims will be handled by data ops.

Talking about the foreseen frequency of skimming. Does re-reco trigger the skims to be recreated automatically?

Group Skims (Secondary Skims)

The next step in the processing chain are Group Skims, which get defined by the corresponding PAGs. Running the skims via CRAB is in the hands of the PAGs themselves. It is expected that the output of the skim are EDM files using official data formats. Additional demands in case of exceptions are written down in XYZ.

As knowing about the processing history of Group Skims is an essential part of the skimming infrastructure, all C++ code used for the skimming must be part of the used release. Integration of the code is handled by PAG contact people and the Offline Analysis Tools group.

Talking about the foreseen frequency of skimming.

Further skimming or analysis steps

Subgroups or individual physicists can use the output of the Group Skims to either run on them directly or to create their own sub-skim and pulling that to their local institute cluster or Tier-3. Further transfers than at creation time will not be supported. Especially no replication of such data sets.

Code that is used at this stage only needs to apply the common analysis guidelines which facilitate the latter physics approval.

Dataset publication and distribution

The central skims produced at the Tier1 will be published in the global DBS and can be requested for transfer using the Phedex service.

Group skims created at Tier-2s are registered in the local DBS of the site. On request and control of the PAG conveners datasets can be published to the global DBS so that they can be replicated. A prerequisite for this step is that those skims are created using standard releases without any further additions.

Further sub skims will only be registered in a local DBS and is not available for replication.

To add: how quotas are discussed and agreed

Technical details

From the technical point of view a skim is defined by sequences to be run, an event content to be stored and selection criteria based on HLT bits and paths. These need to be announced in the following way:

AnalysisSkim = cms.FilteredStream(
   responsible = '<name>',
   name = '<the name>',
   paths  = <list of paths>,
   content = <output commands>,
   selectEvents = <select events statement>,
   dataTier = cms.untracked.string('USER')

The final configuration gets created using such snippets and cmsDriver.py. Further descriptions to appear here soon.

Documents and Talks

Review status

Reviewer/Editor and Date Comments
BenediktHegner - 24 Feb 2009 created page

Responsible: BenediktHegner
Last reviewed by: Never reviewed

Edit | Attach | Watch | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r3 - 2009-03-03 - BenediktHegner
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback