4.1.3 Introduction to CMS Configuration Files
Complete:
Detailed Review status
Goals of this page
After reading this page you will understand the general structure and use of configuration files and configuration file fragments for running CMS analysis jobs.
This discussion-only tutorial provides an introduction to configuration files which are used to configure
cmsRun jobs. A complete working example is given.
Contents
Introduction
The CMS software framework uses a “software bus” model, where data is stored in the event which is passed to a series of modules. A single executable,
cmsRun, is used, and the modules are loaded at runtime. A configuration file defines which modules are loaded, in which order they are run, and with which configurable parameters they are run. Note that this is not an interactive system. The entire configuration is defined once, at the beginning of the job, and cannot be changed during running. This design facilitates the tracking of
event provenance, that is, the processing history of the event.
Full details about configuration files are given in
SWGuideAboutPythonConfigFile.
Configuration files in CMS
All CMS code is run by passing a config file (
_cfg.py
) to the CMSSW executable,
cmsRun.
cmsRun <Configuration File>
for example:
cmsRun MyConfig_cfg.py
Configurations are written using the Python language.
Using the Python interpreter, one can quickly check the Python syntax of
the configuration and run many (but not all) of the checks in the
CMS Python module
FWCore.ParameterSet.Config
.
python MyConfig_cfg.py
(Python interpreter)
After Python finishes importing and executing the configuration file,
all components will have been loaded into the program.
Contents of a typical configuration file
A config file consists (typically) of the following parts as data members of a "cms.Process" object (of your naming):
- A source (which might read Events from a file or create new empty events)
- A collection of modules (e.g. EDAnalyzer, EDProducer, EDFilter) which you wish to run, along with customised settings for parameters you wish to change from default values
- An output module to create a ROOT file which stores all the event data. (When running an Analyzer module, the histograms produced are not event data, so an output module is not needed in that case)
- A path which will list in order the modules to be run
Each config file is created from discrete building blocks which specify a component of the
cmsRun program and configure it via its parameter set.
A configuration file written using the Python language can be created as:
- a top level file, which is a full process definition (naming convention is
_cfg.py
) which might import other configuration files
- external Python file fragment, which are of two types:
- those used for module initialization (naming convention is
_cfi.py
)
- those used as configuration fragment (naming convention is
_cff.py
)
The fragments are often imported into the top level configuration file
using the "
process.load()
" method, which also attaches the imported
objects to the process. Usually fragments are imported into other fragments
using one of the import statements of the Python language.
Note: All imports create references, not copies. If a module is imported at two different places in a configuration, the imported symbols (variables) will reference the same objects. Changing an object at one place, changes the object at other places.
Most python config files will start with the line
import FWCore.ParameterSet.Config as cms
which imports our CMS-specific Python classes and functions.
Standard fragments are available in the CMSSW release's
Configuration/StandardSequences/python/
area.
They can be read in using syntax like
process.load("Configuration.StandardSequences.Geometry_cff")
In the Python language, a line that starts with "
#
" is a comment.
The word "module" has two meanings. A Python module is a file containing
Python code and the word also refers to the object created by importing a Python file.
In the other meaning, EDProducers, EDFilters, EDAnalyzers,
and OutputModules are called modules.
In order to make sure that your Python module can be imported by
other Python modules:
- Place it in the
python
subdirectory of a package
- Be sure your SCRAM environment is set up
- Go to your package and do
scram b
or scram b python
The above steps are needed only once. The correctness of a Python config is checked at a basic level every time the
scram
command is used.
Examples of configuration files in Python
In the example here, configuration will open the files
test.root
and
anotherTest.root
. It will run a producer that creates a track collection and adds it to the data. The combined set of events from the two input files along with the new collection of tracks will be saved in the file
test2.root
. For each event, the
MyRandomAnalyzer
module will run on the event.
_cfg file
import FWCore.ParameterSet.Config as cms
#set up a process named RECO
processName = "RECO"
process = cms.Process(processName)
# configures the source that reads the input files
process.source = cms.Source ("PoolSource",
fileNames=cms.untracked.vstring(
'file:test.root',
'file:anotherTest.root'
)
)
# loads producer TrackFinderProducer
process.tracker=cms.EDProducer('TrackFinderProducer',
threshold=cms.untracked.double(5.6)
)
# loads your analyzer
process.MyModule = cms.EDAnalyzer('MyRandomAnalyzer',
numBins = cms.untracked.int32(100),
minBin = cms.untracked.double(0),
maxBin = cms.untracked.double(100)
)
# talk to output module
process.out = cms.OutputModule("PoolOutputModule",
fileName = cms.untracked.string("test2.root")
)
# Defines which modules and sequences to run
process.mypath = cms.Path(process.tracker*process.MyModule)
# A list of analyzers or output modules to be run after all paths have been run.
process.outpath = cms.EndPath(process.out)
Many examples of
_cfg.py
can be found in
/CMSSW/CMS.PhysicsTools/PatAlgos/test
and
/CMSSW/CMS.PhysicsTools/PatExamples/test
.
_cfi.py file
This example is taken from
/CMSSW/CMS.PhysicsTools/PatAlgos/python/cleaningLayer1/electronCleaner_cfi.py
.
import FWCore.ParameterSet.Config as cms
cleanPatElectrons = cms.EDProducer("PATElectronCleaner",
## pat electron input source
src = cms.InputTag("selectedPatElectrons"),
# preselection (any string-based cut for pat::Electron)
preselection = cms.string(''),
# overlap checking configurables
checkOverlaps = cms.PSet(
muons = cms.PSet(
src = cms.InputTag("cleanPatMuons"),
algorithm = cms.string("byDeltaR"),
preselection = cms.string(""), # don't preselect the muons
deltaR = cms.double(0.3),
checkRecoComponents = cms.bool(False), # don't check if they share some AOD object ref
pairCut = cms.string(""),
requireNoOverlaps = cms.bool(False), # overlaps don't cause the electron to be discared
)
),
# finalCut (any string-based cut for pat::Electron)
finalCut = cms.string(''),
)
_cff.py
This example is taken from
/CMSSW/CMS.PhysicsTools/PatAlgos/python/cleaningLayer1/cleanPatCandidates_cff.py
.
The module initialization file above is included using the line
from CMS.PhysicsTools.PatAlgos.cleaningLayer1.electronCleaner_cfi import *
in the
_cff.py
fragment file below:
import FWCore.ParameterSet.Config as cms
from CMS.PhysicsTools.PatAlgos.cleaningLayer1.electronCleaner_cfi import *
from CMS.PhysicsTools.PatAlgos.cleaningLayer1.muonCleaner_cfi import *
from CMS.PhysicsTools.PatAlgos.cleaningLayer1.tauCleaner_cfi import *
from CMS.PhysicsTools.PatAlgos.cleaningLayer1.photonCleaner_cfi import *
from CMS.PhysicsTools.PatAlgos.cleaningLayer1.jetCleaner_cfi import *
from CMS.PhysicsTools.PatAlgos.producersLayer1.hemisphereProducer_cfi import *
#FIXME ADD MHT
# One module to count objects
cleanPatCandidateSummary = cms.EDAnalyzer("CandidateSummaryTable",
logName = cms.untracked.string("cleanPatCandidates|PATSummaryTables"),
candidates = cms.VInputTag(
cms.InputTag("cleanPatElectrons"),
cms.InputTag("cleanPatMuons"),
cms.InputTag("cleanPatTaus"),
cms.InputTag("cleanPatPhotons"),
cms.InputTag("cleanPatJets"),
)
)
cleanPatCandidates = cms.Sequence(
cleanPatMuons *
cleanPatElectrons *
cleanPatPhotons *
cleanPatTaus *
cleanPatJets *
cleanPatCandidateSummary
)
Using Python Interactively
One can use Python interactively to understand the config files a bit more.
python -i MyConfg_cfg.py
and then it takes you to the Python prompt (
>>>
) where you can type
Python statements interactively. Many things are possible. Two examples
follow.
- Print the entire configuration out in Python format with all the imported objects expanded. This might be much larger than top level configuration file if many objects are imported.
>>> print process.dumpPython()
- Print one particular attribute of the process. For example, if the process contains a path labeled "p", then print the path.
>>> process.p
OR
>>> print process.p.dumpPython()
When you are done, type CONTROL-D to quit the Python interpreter.
Cloning of Python Process
As mentioned above, all imports create references, not copies. Changing an object at one place, changes the object at other places. Thus, if standard module configurations are imported, replace statements should be used with care. Parameter changes happen globally so other configs could be affected. The standard solution to this problem is
cloning the module and changing parameters while doing that:
The standard syntax for
cloning is
from aPackage import oldName
newName = oldName.clone (changedParameter = 42)
or
from aPackage import oldName as _oldName
newName = _oldName.clone (changedParameter = 42)
The second form is better if the symbol oldName is not needed and
this occurs in a fragment that might be imported with the process
load function or a "from aModule import *" statement. Symbols starting
with an underscore are not imported in these cases.
An example is below. Here we are NOT importing the module but defining it right here, called
patMuonBenchmarkGeneric
. You can see that by cloning it we can avoid the possible problem mentioned in the beginning of this section, save a lot of repetition and change the input parameter that we need to, in this case,
InputTruthLabel
,
BenchmarkLabel
and
InputRecoLabel
.
import FWCore.ParameterSet.Config as cms
patMuonBenchmarkGeneric = cms.EDAnalyzer("GenericBenchmarkAnalyzer",
OutputFile = cms.untracked.string('benchmark.root'),
InputTruthLabel = cms.InputTag('muons'),
minEta = cms.double(-1),
maxEta = cms.double(2.8),
recPt = cms.double(0.0),
deltaRMax = cms.double(0.3),
PlotAgainstRecoQuantities = cms.bool(True),
OnlyTwoJets = cms.bool(False),
BenchmarkLabel = cms.string( 'selectedPatMuons' ),
InputRecoLabel = cms.InputTag( 'selectedLayer1Muons'')
)
patElectronBenchmarkGeneric = patMuonBenchmarkGeneric.clone(
InputTruthLabel = 'pixelMatchGsfElectrons',
BenchmarkLabel = 'selectedPatElectrons',
InputRecoLabel = 'selectedPatElectrons',
)
patJetBenchmarkGeneric= patMuonBenchmarkGeneric.clone(
InputTruthLabel = 'iterativeCone5CMS.CaloJets',
BenchmarkLabel = 'selectedPatJets',
InputRecoLabel = 'selectedPatJets',
)
patPhotonBenchmarkGeneric= patMuonBenchmarkGeneric.clone(
InputTruthLabel = 'photons',
BenchmarkLabel = 'selectedPatPhotons',
InputRecoLabel = 'selectedPatPhotons',
)
patTauBenchmarkGeneric= patMuonBenchmarkGeneric.clone(
InputTruthLabel = 'pfRecoTauProducer',
BenchmarkLabel = 'selectedPatTaus',
InputRecoLabel = 'selectedPatTaus',
)
Information Sources
These information source were used when the original versions of this TWIKI page were written. They are out of date and contain much obsolete information. The second one requires a password.:
Further Documentation
A complete description of the configuration file language, parameters used and some python tips are provided in the following software guide pages:
Review status
Responsible:
SudhirMalik
Last reviewed by:
DavidDagenhart - 15 Dec 2016