How to read events from an EDM/ROOT file
Complete:
Goal of this page
This page contains some examples of how to specify a
EDM/ROOT file or files as a source of events.
Reading events from a EDM/ROOT file
Configurable Parameters for PoolSource
The configurable parameters for
PoolSource can be found here:
https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideEDMParametersForModules#PoolSource
Example 1:
process.maxEvents = cms.untracked.PSet( input = cms.untracked.int32(20) )
process.source = cms.Source ("PoolSource",
fileNames=cms.untracked.vstring('file:myFile.root'),
skipEvents=cms.untracked.uint32(5)
)
This specifies the file 'myFile.root' in the current working directory as the input file. It also specifies that only 20 events are to be read, after skipping the first 5 events. 'file:myFile.root' is a ROOT connection string to a locally stored file. The file protocol
file: is passed directly to ROOT, so any supported protocol may be used (e.g.
rfio:,
dcap:,
castor:).
Example 2: Multiple input files
process.maxEvents = cms.untracked.PSet( input = cms.untracked.int32(-1) )
process.source = cms.Source ("PoolSource",
fileNames=cms.untracked.vstring(
'file:myFile1.root',
'file:myFile2.root',
'file:myFile3.root'
),
firstRun = cms.untracked.uint32(2),
firstEvent = cms.untracked.uint32(4)
)
This specifies to sequentially read three input files, and that there is no limit on the number of events. -1 (i.e. no limit) is the default for
maxEvents, so that specification is optional. If a limit on the number of events is specified, it applies to the job as a whole, not independently to each file.
This also specifies to begin reading events at run #2, event #4. Events will be skipped until the first event with
run > 2
or
run == 2 and event >= 4
is found.
In the above two two examples, the presence of the colon (':') in the input file string indicates that the file name is a physical file name. No lookup is done in the input file catalog. Indeed, an input file catalog is not required.
Example 3: More than 255 input files
Python functions have a limit of 255 arguments, so very long lists of files cannot be created in one step. The solution is simple, just pass a tuple as a variable length argument:
process.maxEvents = cms.untracked.PSet( input = cms.untracked.int32(-1) )
# pass as many files as you wish this way
process.source = cms.Source ("PoolSource",
myfilelist = cms.untracked.vstring( *('file:myfile1.root', 'file:myfile2.root', ..., 'file:myfile255.root', 'file:myfile256.root', ...) )
)
Example 4: Logical File Name
process.maxEvents = cms.untracked.PSet(
input = cms.untracked.int32(20)
)
process.source = cms.Source ("PoolSource",
fileNames=cms.untracked.vstring(
'myFile.root'
),
skipEvents=cms.untracked.uint32(5)
)
This example example is the same as Example 1, except that there is no protocol specified in the file name. In this case, the file name is interpreted as a logical file name. The input file catalog (the
trivial file catalog) is read if present, and if a logical file name entry with a value 'myFile.root' is found, the logical file name is translated to the corresponding physical file name. Otherwise, an exception will be thrown.
Example 5: Merging files with different branches (different data tiers)
process.maxEvents = cms.untracked.PSet(
input = cms.untracked.int32(-1)
)
process.source = cms.Source ("PoolSource",
fileNames=cms.untracked.vstring(
'file:myRecoFile1.root',
'file:myRecoFile2.root',
'file:myRecoFile3.root'
),
secondaryFileNames=cms.untracked.vstring(
'file:yourRawDataFileA.root',
'file:yourRawDataFileB.root'
)
)
In this example, events will be read from the
primary files specified in the
fileNames parameter, as in the previous examples. However, after an event is read from a primary file, a search is done for the corresponding event in the
secondary files. If found, additional products may be read from the secondary file from any branches that have been dropped in the primary files.
The secondary files must be ancestors of the primary files. For example, if the primary files have been processed by processes "FIRST", "SECOND", and "THIRD", the secondary files must have been processed by "FIRST", or "FIRST" and "SECOND". This is not just a requirement on process names. The requirement extends to the entire ProcessConfiguration in the ProcessHistory which includes the process level ParameterSetID. (The tables that support Ptr's and Ref's will not work if this requirement is not met. There may be other things that require this as well. It would not be easy to relax this requirement).
It is not necessary that there be a one to one correspondence between the primary and secondary files. However, it is highly recommended for performance that both the primary and secondary files be properly ordered by run number and event number to the greatest extent possible. Note that DBS can provide users with configuration fragments that include both primary and secondary file specifications that match.
Only per event products will always be read from the secondary files. Luminosity block and run products in the secondary files that are constant throughout the run (like the run number or an integrated luminosity per run) will be available in the same way event data from the secondary files are available. However due to performance concerns any object that represents only the event data contained in the file (such as accumulated histograms) will be ignored. Such objects should always be copied forward if they are needed for downstream processing.
Example 6: Selecting Input Lumi Blocks, specific lumiblocks, or events
import FWCore.ParameterSet.Config as cms
process = cms.Process("PROCESSNAME")
...
process.maxLuminosityBlocks = cms.untracked.PSet(
input = cms.untracked.int32(20)
)
process.source = cms.Source("PoolSource",
fileNames = cms.untracked.vstring(
'myFile1.root',
'myFile2.root' ),
firstLumi = cms.untracked.uint32(5),
)
This example shows how to control which luminosity blocks are read from the input file. The first five lumi blocks in the file will be skipped, and then the next 20 lumi blocks will be read in and processed.
import FWCore.ParameterSet.Config as cms
process = cms.Process("PROCESSNAME")
...
process.maxLuminosityBlocks = cms.untracked.PSet(
input = cms.untracked.int32(-1)
)
process.source = cms.Source("PoolSource",
fileNames = cms.untracked.vstring(
'myFile1.root',
'myFile2.root' ),
eventsToProcess = cms.untracked.VEventRange('1:1-1:6','2:100-3:max'),
eventsToSkip = cms.untracked.VEventRange('1:1-1:6','2:100-3:max'),
lumisToProcess = cms.untracked.VLuminosityBlockRange('1:1-1:6','2:100-3:max'),
lumisToSkip = cms.untracked.VLuminosityBlockRange('1:1-1:6','2:100-3:max'),
)
This example shows fine grained control over which lumis or events are read. This particular example would not, of course, work, but illustrates the four settings:
eventsToProcess
,
eventsToSkip
,
lumisToProcess
,
lumisToSkip
. In all these cases the string '1:1-1:6' means the event 1 of run 1 through event 6 of run 1 for
eventsToProcess
and
eventsToSkip
, and lumi 1 of run 1 through lumi 6 of run 1 for
lumisToProcess
and
lumisToSkip
. The string '2:100-3:max' means event 100 of run 2 through the last event of run 3 (for events), and lumi 100 of run 2 through the last lumi of run 3 (for lumis).
The framework will throw an exception if the combination of events, lumis, to skip or process does not make sense.
Example 7: Selecting branches for input
PoolSource has a configurable parameter
inputCommands.
This parameter is of type
vector
of
string
,
and carries zero or more "commands"
that determine which branches in the
Event
(and in
Run
,
LuminosityBlock
as well)
will be dropped on input.
An example configuration is:
process.maxEvents = cms.untracked.PSet( input = cms.untracked.int32(20) )
process.source = cms.Source ("PoolSource",
fileNames=cms.untracked.vstring('file:myFile.root'),
dropDescendantsOfDroppedBranches=cms.untracked.bool(False),
inputCommands=cms.untracked.vstring(
'keep *',
'drop *_*_*_HLT',
'keep FEDRawDataCollection_*_*_*'
)
)
This configuration tells the system to:
- Disable the default automatic dropping of branches that are derived from branches that are explicitly dropped.
- Read all branches, except ...
- Do not read (drop on input) all branches from the HLT process, except ...
- Read all products of type FEDRawDataCollection.
Commands are applied in the order of presentation in the vector.
Each command carries an
action (keep or drop)
and a
specification (
e.g. *_*_*_HLT).
The specification is compared to each branch name.
The specification matches the branch name
if all four fields of the branch name
match the corresponding fields of the specification.
The four fields are separated by underscores.
A special case is when the entire specification is the
one character "*". This is interpreted as "*_*_*_*".
In all other cases, exactly 3 underscores must appear
in the specification. The fields must contain only
alphanumeric characters or one of two available wildcards.
The wildcard "*" will match zero or more characters and these
characters can be anything.
The wildcard "?" will match exactly one character and that
character can be anything. One restriction is that these
wildcards will only match a sequence of characters contained
inside one of the four fields. They can match an entire field or
some smaller part of a field, but nothing larger. All 3 underscores
must explicitly appear in the specification.
If the "inputCommands" parameter is specified, there is an implicit
"drop *" specification as the first specification. We therefore
recommend that, if "inputCommands" is specified, the first explicit
specification should be either "keep *" or "drop *", to avoid confusion.
If no parameter named "inputCommands" is found,
then a default value is used. The default value is:
cms.untracked.vstring(
'keep *'
)
and so by default no objects are dropped (all are kept).
Review Status
Responsible:
WilliamTanenbaum
Last reviewed by:
Sudhir Malik- 24 January 2009