4.1.3 Introduction to CMS Configuration Files

Complete: 5
Detailed Review status

Goals of this page

After reading this page you will understand the general structure and use of configuration files and configuration file fragments for running CMS analysis jobs.

This discussion-only tutorial provides an introduction to configuration files which are used to configure cmsRun jobs. A complete working example is given.

Contents

Introduction

The CMS software framework uses a “software bus” model, where data is stored in the event which is passed to a series of modules. A single executable, cmsRun, is used, and the modules are loaded at runtime. A configuration file defines which modules are loaded, in which order they are run, and with which configurable parameters they are run. Note that this is not an interactive system. The entire configuration is defined once, at the beginning of the job, and cannot be changed during running. This design facilitates the tracking of event provenance, that is, the processing history of the event.

Full details about configuration files are given in SWGuideAboutPythonConfigFile.

Configuration files in CMS

All CMS code is run by passing a config file (_cfg.py) to the CMSSW executable, cmsRun.

cmsRun <Configuration File> 

for example:

cmsRun MyConfig_cfg.py

Configurations are written using the Python language. Using the Python interpreter, one can quickly check the Python syntax of the configuration and run many (but not all) of the checks in the CMS Python module FWCore.ParameterSet.Config.

python MyConfig_cfg.py (Python interpreter)

After Python finishes importing and executing the configuration file, all components will have been loaded into the program.

Contents of a typical configuration file

A config file consists (typically) of the following parts as data members of a "cms.Process" object (of your naming):

  • A source (which might read Events from a file or create new empty events)
  • A collection of modules (e.g. EDAnalyzer, EDProducer, EDFilter) which you wish to run, along with customised settings for parameters you wish to change from default values
  • An output module to create a ROOT file which stores all the event data. (When running an Analyzer module, the histograms produced are not event data, so an output module is not needed in that case)
  • A path which will list in order the modules to be run

Each config file is created from discrete building blocks which specify a component of the cmsRun program and configure it via its parameter set.

A configuration file written using the Python language can be created as:

  • a top level file, which is a full process definition (naming convention is _cfg.py ) which might import other configuration files
  • external Python file fragment, which are of two types:
    • those used for module initialization (naming convention is _cfi.py)
    • those used as configuration fragment (naming convention is _cff.py)

The fragments are often imported into the top level configuration file using the " process.load() " method, which also attaches the imported objects to the process. Usually fragments are imported into other fragments using one of the import statements of the Python language.

Note: All imports create references, not copies. If a module is imported at two different places in a configuration, the imported symbols (variables) will reference the same objects. Changing an object at one place, changes the object at other places.

Most python config files will start with the line

import FWCore.ParameterSet.Config as cms
which imports our CMS-specific Python classes and functions.

Standard fragments are available in the CMSSW release's Configuration/StandardSequences/python/ area. They can be read in using syntax like

process.load("Configuration.StandardSequences.Geometry_cff")

In the Python language, a line that starts with " # " is a comment.

The word "module" has two meanings. A Python module is a file containing Python code and the word also refers to the object created by importing a Python file. In the other meaning, EDProducers, EDFilters, EDAnalyzers, and OutputModules are called modules.

In order to make sure that your Python module can be imported by other Python modules:

  • Place it in the python subdirectory of a package
  • Be sure your SCRAM environment is set up
  • Go to your package and do scram b or scram b python

The above steps are needed only once. The correctness of a Python config is checked at a basic level every time the scram command is used.

Examples of configuration files in Python

In the example here, configuration will open the files test.root and anotherTest.root. It will run a producer that creates a track collection and adds it to the data. The combined set of events from the two input files along with the new collection of tracks will be saved in the file test2.root. For each event, the MyRandomAnalyzer module will run on the event.

_cfg file

import FWCore.ParameterSet.Config as cms

#set up a process named RECO
processName = "RECO"
process = cms.Process(processName)

# configures the source that reads the input files
process.source = cms.Source ("PoolSource",
    fileNames=cms.untracked.vstring(
        'file:test.root',
        'file:anotherTest.root'
    )
)

# loads producer TrackFinderProducer
process.tracker=cms.EDProducer('TrackFinderProducer',
    threshold=cms.untracked.double(5.6)
)

# loads your analyzer
process.MyModule = cms.EDAnalyzer('MyRandomAnalyzer',
    numBins = cms.untracked.int32(100),
    minBin  = cms.untracked.double(0),
    maxBin  = cms.untracked.double(100)
)

# talk to output module
process.out = cms.OutputModule("PoolOutputModule",
    fileName = cms.untracked.string("test2.root")
)

# Defines which modules and sequences to run
process.mypath = cms.Path(process.tracker*process.MyModule)

# A list of analyzers or output modules to be run after all paths have been run.
process.outpath = cms.EndPath(process.out)



Many examples of _cfg.py can be found in /CMSSW/CMS.PhysicsTools/PatAlgos/test and /CMSSW/CMS.PhysicsTools/PatExamples/test.

_cfi.py file

This example is taken from /CMSSW/CMS.PhysicsTools/PatAlgos/python/cleaningLayer1/electronCleaner_cfi.py.

import FWCore.ParameterSet.Config as cms

cleanPatElectrons = cms.EDProducer("PATElectronCleaner",
    ## pat electron input source
    src = cms.InputTag("selectedPatElectrons"), 

    # preselection (any string-based cut for pat::Electron)
    preselection = cms.string(''),

    # overlap checking configurables
    checkOverlaps = cms.PSet(
        muons = cms.PSet(
           src       = cms.InputTag("cleanPatMuons"),
           algorithm = cms.string("byDeltaR"),
           preselection        = cms.string(""),  # don't preselect the muons
           deltaR              = cms.double(0.3),
           checkRecoComponents = cms.bool(False), # don't check if they share some AOD object ref
           pairCut             = cms.string(""),
           requireNoOverlaps   = cms.bool(False), # overlaps don't cause the electron to be discared
        )
    ),

    # finalCut (any string-based cut for pat::Electron)
    finalCut = cms.string(''),
)

_cff.py

This example is taken from /CMSSW/CMS.PhysicsTools/PatAlgos/python/cleaningLayer1/cleanPatCandidates_cff.py.

The module initialization file above is included using the line from CMS.PhysicsTools.PatAlgos.cleaningLayer1.electronCleaner_cfi import * in the _cff.py fragment file below:

import FWCore.ParameterSet.Config as cms

from CMS.PhysicsTools.PatAlgos.cleaningLayer1.electronCleaner_cfi import *
from CMS.PhysicsTools.PatAlgos.cleaningLayer1.muonCleaner_cfi import *
from CMS.PhysicsTools.PatAlgos.cleaningLayer1.tauCleaner_cfi import *
from CMS.PhysicsTools.PatAlgos.cleaningLayer1.photonCleaner_cfi import *
from CMS.PhysicsTools.PatAlgos.cleaningLayer1.jetCleaner_cfi import *
from CMS.PhysicsTools.PatAlgos.producersLayer1.hemisphereProducer_cfi import *
#FIXME ADD MHT

# One module to count objects
cleanPatCandidateSummary = cms.EDAnalyzer("CandidateSummaryTable",
    logName = cms.untracked.string("cleanPatCandidates|PATSummaryTables"),
    candidates = cms.VInputTag(
        cms.InputTag("cleanPatElectrons"),
        cms.InputTag("cleanPatMuons"),
        cms.InputTag("cleanPatTaus"),
        cms.InputTag("cleanPatPhotons"),
        cms.InputTag("cleanPatJets"),
    )
)


cleanPatCandidates = cms.Sequence(
    cleanPatMuons     *
    cleanPatElectrons *
    cleanPatPhotons   *
    cleanPatTaus      *
    cleanPatJets      *
    cleanPatCandidateSummary
)

Using Python Interactively

One can use Python interactively to understand the config files a bit more.

python -i MyConfg_cfg.py

and then it takes you to the Python prompt ( >>>) where you can type Python statements interactively. Many things are possible. Two examples follow.

  • Print the entire configuration out in Python format with all the imported objects expanded. This might be much larger than top level configuration file if many objects are imported.
        >>> print process.dumpPython()
  • Print one particular attribute of the process. For example, if the process contains a path labeled "p", then print the path.
        >>> process.p
        OR
        >>> print process.p.dumpPython()

When you are done, type CONTROL-D to quit the Python interpreter.

Cloning of Python Process

As mentioned above, all imports create references, not copies. Changing an object at one place, changes the object at other places. Thus, if standard module configurations are imported, replace statements should be used with care. Parameter changes happen globally so other configs could be affected. The standard solution to this problem is cloning the module and changing parameters while doing that:

The standard syntax for cloning is

from aPackage import oldName 
newName = oldName.clone (changedParameter = 42)

or

from aPackage import oldName as _oldName 
newName = _oldName.clone (changedParameter = 42)

The second form is better if the symbol oldName is not needed and this occurs in a fragment that might be imported with the process load function or a "from aModule import *" statement. Symbols starting with an underscore are not imported in these cases.

An example is below. Here we are NOT importing the module but defining it right here, called patMuonBenchmarkGeneric. You can see that by cloning it we can avoid the possible problem mentioned in the beginning of this section, save a lot of repetition and change the input parameter that we need to, in this case, InputTruthLabel, BenchmarkLabel and InputRecoLabel.


import FWCore.ParameterSet.Config as cms

patMuonBenchmarkGeneric = cms.EDAnalyzer("GenericBenchmarkAnalyzer",
    OutputFile = cms.untracked.string('benchmark.root'),
    InputTruthLabel = cms.InputTag('muons'),
    minEta = cms.double(-1),
    maxEta = cms.double(2.8),
    recPt = cms.double(0.0),
    deltaRMax = cms.double(0.3),
    PlotAgainstRecoQuantities = cms.bool(True),
    OnlyTwoJets = cms.bool(False),
    BenchmarkLabel = cms.string( 'selectedPatMuons' ),
    InputRecoLabel = cms.InputTag( 'selectedLayer1Muons'')
)

patElectronBenchmarkGeneric = patMuonBenchmarkGeneric.clone(
   InputTruthLabel = 'pixelMatchGsfElectrons',
   BenchmarkLabel = 'selectedPatElectrons',
   InputRecoLabel = 'selectedPatElectrons',
)

patJetBenchmarkGeneric= patMuonBenchmarkGeneric.clone(
   InputTruthLabel = 'iterativeCone5CMS.CaloJets',
   BenchmarkLabel = 'selectedPatJets',
   InputRecoLabel = 'selectedPatJets',
)

patPhotonBenchmarkGeneric= patMuonBenchmarkGeneric.clone(
   InputTruthLabel = 'photons',                   
   BenchmarkLabel = 'selectedPatPhotons', 
   InputRecoLabel = 'selectedPatPhotons', 
)

patTauBenchmarkGeneric= patMuonBenchmarkGeneric.clone(
   InputTruthLabel = 'pfRecoTauProducer',
   BenchmarkLabel = 'selectedPatTaus',   
   InputRecoLabel = 'selectedPatTaus',   
)


Information Sources

These information source were used when the original versions of this TWIKI page were written. They are out of date and contain much obsolete information. The second one requires a password.:

Further Documentation

A complete description of the configuration file language, parameters used and some python tips are provided in the following software guide pages:

Review status

Reviewer/Editor and Date (copy from screen) Comments
JennyWilliams - 08 Aug 2007 Page author
ChrisDJones - 24 Oct 2007 minor corrections
RickWilkinson - 08 Feb 2008 emphasize includes and reuse
DavidDagenhart - 15 Dec 2016 reviewed and made many updates

Responsible: SudhirMalik

Last reviewed by: DavidDagenhart - 15 Dec 2016

Edit | Attach | Watch | Print version | History: r35 < r34 < r33 < r32 < r31 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r35 - 2016-12-15 - DavidDagenhart


ESSENTIALS

ADVANCED TOPICS


 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback