How to read events from an EDM/ROOT file

Complete: 3

Goal of this page

This page contains some examples of how to specify a EDM/ROOT file or files as a source of events.

Reading events from a EDM/ROOT file

Configurable Parameters for PoolSource

The configurable parameters for PoolSource can be found here:

https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideEDMParametersForModules#PoolSource

Example 1:

process.maxEvents = cms.untracked.PSet( input = cms.untracked.int32(20) )
process.source = cms.Source ("PoolSource",
                        fileNames=cms.untracked.vstring('file:myFile.root'),
                        skipEvents=cms.untracked.uint32(5)
)

This specifies the file 'myFile.root' in the current working directory as the input file. It also specifies that only 20 events are to be read, after skipping the first 5 events. 'file:myFile.root' is a ROOT connection string to a locally stored file. The file protocol file: is passed directly to ROOT, so any supported protocol may be used (e.g. rfio:, dcap:, castor:).

Example 2: Multiple input files

process.maxEvents = cms.untracked.PSet( input = cms.untracked.int32(-1) )
process.source = cms.Source ("PoolSource",
                      fileNames=cms.untracked.vstring(
                        'file:myFile1.root',
                        'file:myFile2.root',
                        'file:myFile3.root'
                        ),
                        firstRun = cms.untracked.uint32(2),
                        firstEvent = cms.untracked.uint32(4)
       )

This specifies to sequentially read three input files, and that there is no limit on the number of events. -1 (i.e. no limit) is the default for maxEvents, so that specification is optional. If a limit on the number of events is specified, it applies to the job as a whole, not independently to each file.

This also specifies to begin reading events at run #2, event #4. Events will be skipped until the first event with

run > 2

or

run == 2 and event >= 4

is found. In the above two two examples, the presence of the colon (':') in the input file string indicates that the file name is a physical file name. No lookup is done in the input file catalog. Indeed, an input file catalog is not required.

Example 3: More than 255 input files

Python functions have a limit of 255 arguments, so very long lists of files cannot be created in one step. The solution is simple, just pass a tuple as a variable length argument:

process.maxEvents = cms.untracked.PSet( input = cms.untracked.int32(-1) )

# pass as many files as you wish this way
process.source = cms.Source ("PoolSource",
                      myfilelist = cms.untracked.vstring( *('file:myfile1.root', 'file:myfile2.root', ..., 'file:myfile255.root', 'file:myfile256.root', ...) )
)

Example 4: Logical File Name

process.maxEvents = cms.untracked.PSet( 
                    input = cms.untracked.int32(20)
      )
process.source = cms.Source ("PoolSource",
                fileNames=cms.untracked.vstring(
                        'myFile.root'
                 ),
                skipEvents=cms.untracked.uint32(5)

     )

This example example is the same as Example 1, except that there is no protocol specified in the file name. In this case, the file name is interpreted as a logical file name. The input file catalog (the trivial file catalog) is read if present, and if a logical file name entry with a value 'myFile.root' is found, the logical file name is translated to the corresponding physical file name. Otherwise, an exception will be thrown.

Example 5: Merging files with different branches (different data tiers)

process.maxEvents = cms.untracked.PSet( 
                            input = cms.untracked.int32(-1)
              )
process.source = cms.Source ("PoolSource",
                fileNames=cms.untracked.vstring(
                        'file:myRecoFile1.root',
                        'file:myRecoFile2.root',
                        'file:myRecoFile3.root'   
                ),
                secondaryFileNames=cms.untracked.vstring(
                        'file:yourRawDataFileA.root',
                        'file:yourRawDataFileB.root'
                )  
        )

In this example, events will be read from the primary files specified in the fileNames parameter, as in the previous examples. However, after an event is read from a primary file, a search is done for the corresponding event in the secondary files. If found, additional products may be read from the secondary file from any branches that have been dropped in the primary files.

The secondary files must be ancestors of the primary files. For example, if the primary files have been processed by processes "FIRST", "SECOND", and "THIRD", the secondary files must have been processed by "FIRST", or "FIRST" and "SECOND", or "FIRST", "SECOND", and "THIRD".

It is not necessary that there be a one to one correspondence between the primary and secondary files. However, it is highly recommended for performance that both the primary and secondary files be properly ordered by run number and event number to the greatest extent possible. Note that DBS can provide users with configuration fragments that include both primary and secondary file specifications that match.

Only per event products will always be read from the secondary files. Luminosity block and run products in the secondary files that are constant throughout the run (like the run number or an integrated luminosity per run) will be available in the same way event data from the secondary files are available. However due to performance concerns any object that represents only the event data contained in the file (such as accumulated histograms) will be ignored. Such objects should always be copied forward if they are needed for downstream processing.

Example 6: Selecting Input Lumi Blocks, specific lumiblocks, or events

import FWCore.ParameterSet.Config as cms
process = cms.Process("PROCESSNAME")
...
process.maxLuminosityBlocks = cms.untracked.PSet( 
               input = cms.untracked.int32(20)
    )

process.source = cms.Source("PoolSource",
                            fileNames = cms.untracked.vstring( 
                            'myFile1.root', 
                            'myFile2.root' ),
                           
                          firstLumi   = cms.untracked.uintt32(5),
  )  

This example shows how to control which luminosity blocks are read from the input file. The first five lumi blocks in the file will be skipped, and then the next 20 lumi blocks will be read in and processed.

import FWCore.ParameterSet.Config as cms
process = cms.Process("PROCESSNAME")
...
process.maxLuminosityBlocks = cms.untracked.PSet( 
               input = cms.untracked.int32(-1)
    )

process.source = cms.Source("PoolSource",
                            fileNames = cms.untracked.vstring( 
                            'myFile1.root', 
                            'myFile2.root' ),
                           
                          eventsToProcess = cms.untracked.VEventRange('1:1-1:6','2:100-3:max'),
                          eventsToSkip = cms.untracked.VEventRange('1:1-1:6','2:100-3:max'),
                          lumisToProcess = cms.untracked.VLuminosityBlockRange('1:1-1:6','2:100-3:max'),
                          lumisToSkip = cms.untracked.VLuminosityBlockRange('1:1-1:6','2:100-3:max'),

  )  

This example shows fine grained control over which lumis or events are read. This particular example would not, of course, work, but illustrates the four settings: eventsToProcess, eventsToSkip, lumisToProcess, lumisToSkip. In all these cases the string '1:1-1:6' means the event (or lumi) 1 of run 1 through event (or lumi) 6 of run 1. The string '2:100-3:max' means event (or lumi) 100 of run 2 through the last event (or lumi) of run 3.

The framework will throw an exception if the combination of events, lumis, to skip or process does not make sense.

Example 7: Selecting branches for input

PoolSource has a configurable parameter inputCommands. This parameter is of type vector of string, and carries zero or more "commands" that determine which branches in the Event (and in Run, LuminosityBlock as well) will be dropped on input.

An example configuration is:

process.maxEvents = cms.untracked.PSet( input = cms.untracked.int32(20) )
process.source = cms.Source ("PoolSource",
          fileNames=cms.untracked.vstring('file:myFile.root'),
          dropDescendantsOfDroppedBranches=cms.untracked.bool(False),
          inputCommands=cms.untracked.vstring(
                  'keep *',
                  'drop *_*_*_HLT',
                  'keep FEDRawDataCollection_*_*_*'
          )
)

This configuration tells the system to:

  1. Disable the default automatic dropping of branches that are derived from branches that are explicitly dropped.
  2. Read all branches, except ...
  3. Do not read (drop on input) all branches from the HLT process, except ...
  4. Read all products of type FEDRawDataCollection.

Commands are applied in the order of presentation in the vector. Each command carries an action (keep or drop) and a specification (e.g. *_*_*_HLT). The specification is compared to each branch name. The specification matches the branch name if all four fields of the branch name match the corresponding fields of the specification. The four fields are separated by underscores.

A special case is when the entire specification is the one character "*". This is interpreted as "*_*_*_*". In all other cases, exactly 3 underscores must appear in the specification. The fields must contain only alphanumeric characters or one of two available wildcards. The wildcard "*" will match zero or more characters and these characters can be anything. The wildcard "?" will match exactly one character and that character can be anything. One restriction is that these wildcards will only match a sequence of characters contained inside one of the four fields. They can match an entire field or some smaller part of a field, but nothing larger. All 3 underscores must explicitly appear in the specification.

If the "inputCommands" parameter is specified, there is an implicit "drop *" specification as the first specification. We therefore recommend that, if "inputCommands" is specified, the first explicit specification should be either "keep *" or "drop *", to avoid confusion.

If no parameter named "inputCommands" is found, then a default value is used. The default value is:

cms.untracked.vstring(
        'keep *'
)

and so by default no objects are dropped (all are kept).

Review Status

Reviewer/Editor and Date (copy from screen) Comments
Main.tomalini - 09 Oct 2006 page last content editor (Ian Tomalin)
JennyWilliams - 31 Jan 2007 editing to include in SWGuide
WilliamTanenbaum - 26 Mar 2007 editing for maxEvents changes
WilliamTanenbaum - 04 Mar 2008 add Example with secondary files
PavelDemin - 12 Feb 2009 add missing commas in the python examples
WilliamTanenbaum - 18 Feb 2009 add Example for drop on input
FreyaBlekman - 2009-09-16 add Example on more than 256 input files
WilliamTanenbaum - 15 Mar 2012 clarify implicit 'drop *' in inputCommands

Responsible: WilliamTanenbaum
Last reviewed by: Sudhir Malik- 24 January 2009

Edit | Attach | Watch | Print version | History: r36 < r35 < r34 < r33 < r32 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r36 - 2015-09-30 - AndreaBocci



 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback