Description of the cmsRun Python Configuration Syntax
Complete:
Goals of this page
This page describes how to use the Python language to create configuration files.
For an introductory discussion of configuration files, see the WorkBook page
WorkBookConfigFileIntro. (It covers some topics not covered here, so it might be a good idea to read or skim that before or after reading this page.)
Introduction
A
configuration document, written using the Python language, is used to configure the
cmsRun executable. A Python configuration program specifies which modules, inputs, outputs and services are to be loaded during execution, how to configure these modules and services, and in what order to execute them. It is a file which can be self-contained, or can read in any number of external Python configuration fragments using Python's standard
import
statement. Here is an illustration of the structure of a configuration file:
# Import CMS python class definitions such as Process, Source, and EDProducer
import FWCore.ParameterSet.Config as cms
# Import contents of a file
import Foo.Bar.somefile_cff
# Set up a process, named RECO in this case
process = cms.Process("RECO")
# Configure the object that reads the input file
process.source = cms.Source("PoolSource",
fileNames = cms.untracked.vstring("test.root")
)
# Configure an object that produces a new data object
process.tracker = cms.EDProducer("TrackFinderProducer")
# Configure the object that writes an output file
process.out = cms.OutputModule("PoolOutputModule",
fileName = cms.untracked.string("test2.root")
)
# Add the contents of Foo.Bar.somefile_cff to the process
# Note that more commonly in CMS, we call process.load(Foo.Bar.somefile_cff)
# which both performs the import and calls extend.
process.extend(Foo.Bar.somefile_cff)
# Configure a path and endpath to run the producer and output modules
process.p = cms.Path(process.tracker)
process.ep = cms.EndPath(process.out)
There must be exactly one
cms.Process
with the name
process
in a top level configuration file.
Python Quick Tips
SWGuidePythonTips
Python has a few features that take some getting used to.
Python program flow is controlled by the
indentation of the statements. For example, the content of a function definition is indented more than the first line that declares the function name. Python knows the function content has ended when the indentation goes back to what it was. Other program blocks like "if" conditional blocks are handled similarly, so there's no need for 'endif' statements or curly braces to mark the end of a block of statements.
The way arguments are passed into a function is different in Python and C++. In the official "Python Tutorial" it says arguments are passed using "call by value (where the value is always an object reference, not the value of the object)" then there is a footnote that says "Actually, call by object reference would be a better description, since if a mutable object is passed, the caller will see any changes the callee makes to it (items inserted into a list)."
Another difference between C++ and python is that when you assign an object to a variable using the '=' sign, python doesn't make a copy of the object. In general, assume that python shares references unless you explicitly make a copy, so when you modify an object, you modify it everywhere it's used.
Here is an example that may clarify the last three points.
x = []
def f(x):
x = [1]
f(x)
print x
def g(x):
y = x
y.append(1)
g(x)
print x
If you copy the above to a file and run it with python,
you see the first print statement prints an empty list. Then second prints
a list that contains a 1. What happens? First an empty list is created
and a symbol 'x' that exists outside the functions is created and set to
refer (or point) to that list. When function f is called a new symbol 'x'
is created that refers to the same list. Then a new list is created
and the symbol 'x' in the function is set to refer to the new list. It no longer
points at the original list. The original list is unchanged. The symbol
'x' outside the function still points at the original list.
Then function g is called and another symbol 'x' is created which
is set to refer to the original list. A symbol 'y' is created and is set
to refer to the same list 'x' refers to.
Then using y, a function of the original list is called
that modifies the original list. After the function returns that change
is visible. If you understand this little example, then you will understand
some important things about how Python works.
Objects are built using constructor syntax. It's easy to forget the commas.
You can validate the syntax of your configuration using python
python <your file>
Python allows you to explore your configuration from the python command prompt.
There is described
here.
Python and SCRAM
To make objects in your python files visible to other python files, the other files need to import your python files. For the import to succeed your python files first need to be put in your package's
python
directory, or a subdirectory of
python
. Then, you need to run
scram build
once in your python area, to initialize the symbolic links. You only need to build once. If later you change your python file, you won't need to re-build.
Many types of objects (e.g. EDProducers) can have a function called fillDescriptions defined in their C++ class. This function specifies which parameters are allowed to exist in the modules configurations and other things like parameter default values. If this function has been defined for a C++ class, it can trigger automatic generation of the cfi file for modules of that class. The automatic generation is part of the build that executes when the command
scram build
is given. The automatically generated cfi files are stored in the
cfipython
subdirectory of a CMSSW working area instead of the
src
subdirectory. See
SWGuideConfigurationValidationAndHelp for more details.
Objects
A Python configuration program is structured as a hierarchy of
objects, nested or adjacent with respect to each other. Each object specifies a recognizable component or feature of the
cmsRun program. Each object is given a name which is used to refer to it
in the Python code. In CMS we often call those names "labels". Each object has a python type. Not all, but many types can
be configured by a parameter set and store that parameter set. And frequently, a particular object will correspond to one
C++ class that is related but not the same as the python type. Commonly, an object definition looks like this:
aLabel = pythonType("C++ClassName",
aParameterName = anotherPythonType(constructor arguments)
# more parameter definitions
...
)
Below is an example where the label is
VtxSmeared
, the python type is
cms.EDProducer
, and the C++ class name is
VertexGenerator
.
The example also shows that comments may be added within a block's
scope using "#" where the comment continues to the end of the line.
It is clearest to define each parameter on a separate line in your configuration file.
VtxSmeared = cms.EDProducer("VertexGenerator",
# Setting parameters: <-- First comment
MeanX = cms.double(0.),
MeanY = cms.double(0.),
MeanZ = cms.double(0.),
# Second comment
SigmaX = cms.double(0.015),
SigmaY = cms.double(0.015),
SigmaZ = cms.double(53.0) # in mm (as in COBRA/OSCAR) <-- Third comment
)
Labelled vs Unlabelled Objects
I considered deleting this section from the documentation because the terms described here are not often used, but I left it here because there are some obscure places in the code and documentation where you might run into the terms 'Labelled' and 'Unlabelled'.
In CMS, a 'label' is the attribute name used in the process to refer to an object. When the process load function or the process extend function is used to attach objects to the process, the label is the same as the Python variable name assigned to refer to the object (and because they are usually the same that name is often called the label as well). In Python configurations, every object attached to the process has a label. So the terms 'Labelled' and 'Unlabelled' are really misnomers that have a historical origin. Before 2008, CMS did not use the Python language for configurations and in the language we used before Python, some objects attached to the process really did not have labels. These types of objects were called 'Unlabelled' and even now there remains some subtle differences in how these are handled internally (for backward compatibility reasons that are now ancient). Currently in a Python configuration, an 'Unlabelled' object is one that has a certain label and in some cases the object is required to always have that label. Here are some examples:
- A source must have the label 'source'
- A service must have a label matching its C++ type name
- An ESProducer or ESSource is considered 'Unlabelled' if and only if it has the same label as its type name, although they are allowed to have an arbitrary labels
Most other types are 'Labelable'. The example in the previous section shows a 'Labeled' object. Here is an example
of an 'Unlabelled' object.
process.source = cms.Source("PoolSource",
# parameter declarations
...
)
The label of a 'Labelled' object can have an arbitrary value, except that it must be unique within the scope of the process.
The Process Object
The top-level item of a configuration program is a
Process
object. Each configuration program must have a
Process
object assigned to a variable named
process
. [NOTE: the program may have other
Process
objects assigned to different variable names, but those Process objects will not be used to configure the job except in the special case of SubProcess's]. This
process
variable refers to a
Process
object which aggregates all the configuration information for the
cmsRun executable. The Process object must be named by passing the name as the first argument to the Process class' constructor. This name gets carried along with the output data and is used as a part of the branch name to distinguish between otherwise similar objects in a given event.
process = cms.Process("NAME")
Official production processes have a standard set of process names which
they use which include but are not limited to the following:
User processes are free to select their own process names. Common choices
include TEST and USER. A history of processes which have added data
to the Events, Runs, and LuminosityBlocks is recorded in the output files. New
processes must have names different from all the names in the process history.
Attribute Declarations for the Process Object
Inside the
process
object there must be exactly one object assigned that has Python type Source and is used for data input. There may be zero or more objects for each of many other Python types. In the official production configurations there can be hundreds or even thousands of objects attached to the process. Your job is configured by your choice of objects to construct and attach to the process, and by the configuration of each object. (This may be done via
"import" statements or calls to the load function, instead of or in addition to object construction.) Some of the Python types that may be used to create these objects are listed below:
-
Source
object - Defines the data input source for the cmsRun executable; i.e., where the data comes from. There must be exactly one and it must be assigned the label
source
. Configured with a C++ type name and parameter set for each object.
-
EDProducer
, EDFilter
, EDAnalyzer
or OutputModule
objects - The four Python types are collectively refered to as 'Module's. Each of the four Python types corresponds to a specific C++ base class. Configured with a C++ type name and parameter set for each object. The C++ type must inherit from the corresponding C++ base class.
-
EDAlias
objects - Allows mapping of an
EDProducer
from its given label to one or more different module labels. Also allows multiple EDProducers
to share the same label. This is useful for making multiple jobs with different configurations to appear to make the same Event products.
-
Service
object - Invoke services. They must be assigned to a label which matches their C++ class name. Configured with a C++ type name and parameter set for each object.
-
PSet
objects - Define parameter sets.
-
VPSet
objects - Define vectors of parameter sets.
-
SecSource
objects - Define secondary data input source, used only with "mixing modules" and are typically included as a parameter in the parameter set of a mixing module instead of being attached directly to the process. Configured with a C++ type name and parameter set for each object.
-
ESSource
objects - Grab information from an external input source, e.g., the calibration database, an xml file, etc. Configured with a C++ type name and parameter set for each object.
-
ESProducer
objects - Describe environmental quantities, e.g., ambient temperature, magnetic field, and so on. Configured with a C++ type name and parameter set for each object.
-
Sequence
definitions - Define ordered groups of modules, giving each group a name that may be used in other sequences, paths, or endpaths.
-
Task
definitions (in releases that support Tasks) - Define groups of EDProducers and EDFilters to be run in unscheduled mode. Define which EDProducers, ESSources, and Services are enabled.
-
Path
definitions - Specify groups of modules to execute in the order given.
-
EndPath
definitions - Define groups of modules to run, in the order given, after all other named
paths
have been run.
-
Schedule
definitions - Define the group of paths and end paths to run. Can be 0 or 1 of these and it must be labelled
schedule
CMSSW classes and modules
"Module" is a generic name for "workers" in cmsRun that are objects instantiated from a C++ class that inherits from a base class with one of the following 4 names: EDProducer, EDFilter, EDAnalyzer and OutputModule. There have been thousands of such C++ classes defined in CMSSW. Modules are described in more detail here:
Modular Architecture. There are multiple ways to browse the C++ code of these CMSSW classes available online. Here are a some of them:
Parameters
Modules and some other C++ objects which are created by cmsRun can be configured by parameters in the configuration. These parameters are initially passed to the constructor of the corresponding Python object which stores them. Then the
framework creates a C++ ParameterSet object from them and passes that to the C++ constructor of the object. The following table shows the different types of parameters that can be used when configuring a module in the Python code. Each such object can contain any or all of the following types of named parameters, in any number or combination. Parameters of type PSet can be nested inside the top level set of parameters and other PSet parameters
and this nesting will result in nested ParameterSet objects in C++ execution.
NOTE: the examples assume one has done
import FWCore.ParameterSet.Config as cms
so we can use the short name
cms
rather than the verbose name
FWCore.ParameterSet.Config
when referring to the Python objects.
Python Type |
C++ Type |
Example |
bool |
bool |
b = cms.bool(False) |
int32 |
int |
i = cms.int32(-234) |
uint32 |
unsigned |
i = cms.uint32(2112) |
vint32 |
std::vector<int> |
v = cms.vint32( 1, -3, 5 ) |
vuint32 |
std::vector<unsigned> |
v =cms.vuint32( 0, 1, 0 ) |
int64 |
long long |
i = cms.int64(-234) |
uint64 |
unsigned long long |
i = cms.uint64(2112) |
vint64 |
std::vector<long long> |
v = cms.vint64( 1, -3, 5 ) |
vuint64 |
std::vector<unsigned long long> |
v =cms.vuint64( 0, 1, 0 ) |
string |
std::string |
s = cms.string("spaces are allowed") |
s = cms.string('single quotes allowed') |
vstring |
std::vector<std::string> |
v = cms.vstring( 'thing one', "thing two") |
double |
double |
d = cms.double(-3.43e-34) |
vdouble |
std::vector<double> |
v = cms.vdouble(1.2, 3, 4.5e-100, -inf) |
FileInPath |
edm::FileInPath |
particleFile = cms.FileInPath("SimGeneral/HepPDTESSource/data/particle.tbl") |
InputTag |
edm::InputTag |
inputTag = cms.InputTag("simrec","jets") |
VInputTag |
std::vector<edm::InputTag> |
jetTags = cms.VInputTag( cms.InputTag("simrec","jets"), cms.InputTag("cone5CMS.CaloJets")) |
ESInputTag |
edm::InputTag |
inputTag = cms.ESInputTag("hbconditions") |
VESInputTag |
std::vector<edm::InputTag> |
inputTag = cms.VInputTag(cms.ESInputTag("hbconditions")) |
EventID |
edm::EventID |
e = cms.EventID(1,1,1) |
VEventID |
std::vector<edm::EventID> |
events = cms.VEventID(cms.EventID(1,1,1),cms.EventID(2,3,4)) |
EventRange |
edm::EventRange |
r = cms.EventRange(1,1,1,5,3,999) (0 corresponds to MAX) |
r = cms.EventRange("1:1:1-5:3:999") ('min' and 'max' are allowed) |
VEventRange |
std::vector<edm::EventRange> |
ranges = cms.VEventRange( cms.EventRange(1,1,1,5,0,0), cms.EventRange(8,1,1,9,1,10)) |
LuminosityBlockID |
edm::LuminosityBlockID |
l = cms.LuminosityBlockID(1,1) |
VLuminosityBlockID |
std::vector<edm::LuminosityBlockID> |
lumis = cms.VLuminosityBlockID(cms.LuminosityBlockID(1,1),cms.LuminosityBlockID(2,3)) |
LuminosityBlockRange |
edm::LuminosityBlockRange |
r = cms.LuminosityBlockRange(1,1,5,0) (0 corresponds to MAX) |
r = cms.LuminosityBlockRange("1:1-5:max") ('min' and 'max' are allowed) |
VLuminosityBlockRange |
std::vector<edm::LuminosityBlockRanget> |
ranges = cms.VLuminosityBlockRange( cms.LuminosityBlockRange(1,1,5,0), cms.LuminosityBlockRange(8,1,9,1)) |
PSet |
edm::ParameterSet |
see below |
VPSet |
std::vector<edm::ParameterSet> |
see below |
Note that empty vectors are legal; e.g.,
c = cms.vint32( )
creates an empty integer vector named "c".
The system keeps track of what parameters are used to create each data item in the Event and saves this information in the output files. This can be used later to help understand how the data was made. However, sometimes a parameter will have no effect on the final objects created, e.g., the parameter just sets how much debugging information should be printed to the log. Such parameters are declared 'untracked' as shown below and their values are not saved.
To see how parameters are defined, here is a sample :
import FWCore.ParameterSet.Config as cms
source = cms.Source("FlatRandomEGunSource",
# Here we define a nested parameter set (PSet) to be used by the source,
# and give it the name PGunParameters:
PGunParameters =cms.PSet(
# you can request more than one particle, e.g.:
# PartID = cms.vint32(211,11) # but we just request one this time:
PartID = cms.vint32(211),
MinEta = cms.double(-3.5),
MaxEta = cms.double(3.5),
MinPhi = cms.double(-3.14159265358979323846), # in radians
MaxPhi = cms.double(3.14159265358979323846),
MinE = cms.double(9.99),
MaxE = cms.double(10.01)
),
Verbosity = cms.untracked.int32(0) # set to 1 (or greater) for printouts
)
For modules with many parameters, it can be painful to have to write every single one of them in a job's configuration program. To avoid this, the intended design is that there be exactly one file called a "configuration fragment include" whose file ends with
_cfi.py
corresponding with each module in CMS. Then in places where the module is used, the
cfi
file is imported instead of redefining all the possible parameters. After importing the
cfi
, the parameters will all have default values and changes can be made to the defaults. The module can be copied or cloned. This is discussed more in the section below about
imports and the sections following that one.
cfi
files will be automatically generated for modules that have implemented the fillDescriptions function in their C++ code. You can find the
cfi
files that were manually implemented in the
python
directory of the package where the module is defined. The automatically generated
cfi
files are placed in subdurectories of the
cfipython
directory of the top level level working directory.
cfi
do not need to be manually written for modules where the autogeneration occurs.
Parameters without defaults
Parameter types can be specified without declaring a particular default value using the modifier
required
,
optional
, and
obsolete
. A sample could be
import FWCore.ParameterSet.Config as cms
value = cms.PSet( min = cms.required.int32,
max = cms.optional.uint32,
vebosity = cms.optional.untracked.string )
Where
-
required
: if the parameter has not been set, an exception will be thrown.
-
optional
: if the parameter is missing, it will not be passed to the C++ code.
-
obsolete
: the parameter (whether set or not) will never be passed to the C++ code.
For
PSet
parameter types, the special
PSetTemplate
keyword can be used to specify not only that a
PSet
is the type but also what are the allowed, or default, parameters within that PSet. The values can be specified by assigning the parameter to a python dictionary containing all or some of the parameters which were declared.
import FWCore.ParameterSet.Config as cms
value = cms.PSet( constraints = cms.optional.PSetTemplate(
min = cms.required.int32,
max = cms.optional.uint32),
values = cms.required.PSetTemplate(
x = cms.required.float
y = cms.optional.float)
)
value.constraints = dict(min = 5, max = 10)
value.values = dict(x=7)
Additionally, a
PSet
which allows any string to be used for its parameter names can specify that via the use of the
allowAnyLabel_
option.
import FWCore.ParameterSet.Config as cms
value = cms.PSet( allowAnyLabel_ = cms.required.int32)
value.foo = 5
value.bar = 10
Processing Component Objects
There are four types of dynamically loadable processing components (in addition to the source type which provides the event to be processed). These are known as "modules". As mentioned above,
- "Module" is a generic name for 4 types of "workers" (C++ base classes): EDProducer, EDFilter, EDAnalyzer, and OutputModule.
- Zero or more labelled
module
blocks can be assigned to a Process
object.
- The C++ classes that go here would be subclasses of any of the worker base classes.
The component types are:
- Producer
- Based on the EDProducer class; creates new data to be placed in the Event
- Filter
- Based on the EDFilter class; decides if processing should continue on a path for an Event
- Analyzer
- Based on the EDAnalyzer class; studies properties of the Event
- OutputModule
- Stores the data from the Event
The label given to a module may be used elsewhere in the configuration, (e.g., in
sequence,
path and
endpath specifications).
For example,:
process.filter = cms.EDFilter("PythiaFilter",
MinMuonPt = cms.untracked.double(20.)
)
process.out = cms.OutputModule("PoolOutputModule",
fileName = cms.untracked.string("mcpool.root"),
SelectEvents = cms.untracked.PSet(
SelectEvents = cms.vstring("p")
)
)
process.p = cms.Path(process.filter)
process.e = cms.EndPath(process.out)
Service Objects
A service is a facility that performs a well-defined task that is globally acessible and that does not affect physics results. Any number of
services can be attached to a
Process
object, although only one for each C++ type.
This example shows the form of the service object:
process.RandomNumberGeneratorService = cms.Service("RandomNumberGeneratorService",
externalLHEProducer = cms.PSet(
initialSeed = cms.untracked.uint32(234567),
engineName = cms.untracked.string('HepJamesRandom')
)
)
RandomNumberGeneratorService is the name of the C++ class of the
service the program will use and also must be the label. The specific parameters required are determined by the class used.
An alternative way to define a service and attach it to the process follows.
It has the advantage that it sets the label automatically for you. This only
works for services.
process.add_( cms.Service("RandomNumberGeneratorService",
externalLHEProducer = cms.PSet(
initialSeed = cms.untracked.uint32(234567),
engineName = cms.untracked.string('HepJamesRandom')
)
)
)
Parameter Set (PSet) Objects
Any number of
parameter sets ("PSet") objects can be attached to the
Process
object. They can also be nested inside a module as a parameter of the module or other type of object. A PSet can be nested inside another PSet. Parameter sets are used to define a list of parameters. A PSet attached to the Process can be shared by multiple modules, i.e. can be used to configure multiple modules. The PSet concept allows a single point of maintenance for such a set of parameters.
The
PSet
object is of the form:
somename = cms.PSet(
# parameter declarations here, e.g.,
s = cms.string("thing zero"),
v = cms.vstring( 'thing one', "thing two")
)
Any of the parameter declarations listed under
Declaring Parameters, above, may appear in a
PSet
object, as may other
PSet
objects.
To import an external PSet into a module, put the PSet in the constructor of whatever you're constructing, after its name, but before any named variables:
myBlock = cms.PSet( a = cms.int32(1) )
myModule = cms.EDAnalyzer("MyModule",
myBlock,
b = cms.int32(2)
)
Copies of the parameters from the PSet will be inserted into the module. These copies can then be modified, without affecting the original PSet.
Parameter Set Vector (VPSet) Objects
A parameter set vector, VPSet, is a vector (a list) of PSets. There may be any number of uniquely labelled
vectors of parameter sets in the
process
object, and such a vector may be referred to, by name, from places where a ParameterSet is needed. There may be any number of vector entries. Each vector entry must be comma-separated from the next. The
VPSet
object is of the form:
somename = cms.VPSet(
cms.PSet(
... # parameter declarations for vector entry 0
),
cms.PSet(
... # parameter declarations for vector entry 1
) ,
cms.PSet(
...
) ,
...
cms.PSet(
... # parameter declarations for vector entry _n_
)
)
Note that it is also possible to use a
VPSet
which creates a
std::vector<edm::ParameterSet>
which is empty; this is done by:
somename = cms.VPSet()
EventRange, LuminosityBlockRange, VEventRange, VLuminosityBlockRange objects
These types are used to describe ranges of luminosity sections and events within runs. Both types follow the same syntax which is easiest to explain by looking at the text representation:
cms.EventRange("1:1-5:6")
which is the range begining with event (or lumi) number 1 of run 1 and continuing to event number 6 in run 5. You can also represent this as
cms.EventRange(1, 1, 5, 6)
. The text form also takes a pair of wild-card like options, so
cms.EventRange("1:min-5:max")
represents the event number 1 of run 1 through the last event of run 5. This is represented as
cms.EventRange(1, 1, 5, 0)
as well (0 is invalid as a run, event, or lumi section, so we use it as a wildcard).
You can also supply vectors of range objects in a VEventRange or VLuminosityBlockRange objects that look like this:
cms.VEventRange("1:2-3:4", "5:MIN-7:MAX")
.
The vector forms of these types are used to specify lumis or events to process or skip in the PoolInputSource configuration for CMSSW releases > 3_0_0.
In release 3_9_0 and later, an EventRange may optionally specify luminosity block numbers, in this representation:
cms.EventRange("1:1:1-5:4:6")
. If this form is used, events are ordered by run number, luminosity block number, and event number, in that order of significance. In contrast, if luminosity block numbers are not used, as in:
cms.EventRange("1:1-5:6")
, luminosity block numbers are ignored in determining the range,
If given EventRange specifies a luminosity block number, it must specify it for both ends of the range. However, a given VEventRange may include both types of EventRange.
Secondary input source (secsource) objects
This is a special feature, and is intended for use only by "mixing modules". Unless you are writing or configuring a mixing module it is unlikely you will need this. Mixing, used in Monte Carlo generation and simulation, refers to adding a secondary source of events to the simulated hard scatter event. A mixing module is one that reads in both types of generated data and simulates the detector response accordingly.
Here is an example of how this is actually used:
mix = cms.EDProducer("MixingModule",
...
input = cms.SecSource("EmbeddedRootSource",
...
fileNames = cms.untracked.vstring(
'/store/relval/2008/4/9/RelVal-RelValMinBias-1207754630/0002/00233C31-5806-DD11-9DDC-001617DBD5B2.root',
...
Event Setup Source (ESSource) Objects
The purpose of an ESSource is described here:
ESSource.
Any number of labelled or unlabelled EventSetup objects (ESSources) may be attached to the
process
block.
You can use an unlabelled ESSource unless you plan to include more than one instance of an ESSource.
The unlabelled object (label and C++ class name are the same) is of the form:
SomeClass = cms.ESSource("SomeClass",
... # parameter declarations here
)
The labelled object is of the form:
somename = cms.ESSource("SomeClass",
... # parameter declarations here
)
In both cases,
SomeClass
is the name of the C++ class of the
ESSource the program will use.
Event Setup Producer (ESProducer) Objects
The purpose of an ESProducer is described here:
ESProducer.
Any number of uniquely labelled or unlabelled EventSetup producer objects (ESProducers) may be attached to a
Process
object. This is non-event data. You can use an unlabelled ESProducer unless you plan to include more than one instance of that same ESProducer.
The unlabelled object (label and C++ class name are the same) is of the form:
SomeClass = ESProducer("SomeClass",
... # parameter declarations here
)
The labelled object is of the form:
somename = ESProducer("SomeClass",
... # parameter declarations here
)
In both cases,
SomeClass
is the name of the C++ class of the
ESProducer the program will use. The specific parameters required are determined by the specific class used.
Module sequences
Module
sequences
are used to define an ordered group of modules, giving the group a name that may be used in other
sequences
,
paths
, or
endpaths
in the configuration. Any number of uniquely named
sequence
definitions may be attached to a
Process
object. Sequencing is a good organization tool if you have lots of modules to load, or if you want to easily switch in and out groups of modules from the execution path.
The
sequence
definition is of the form:
somename = cms.Sequence(m1 + m2 + s1)
There can be zero or more operands in the expression that is passed as the
argument of the constructor. Usually the operands have the type of a
module (EDAnalyzer, EDFilter, EDProducer, OutputModule) or Sequence,
but there are other possibilities.
The operator can be "+" or "*" and these two operators have exactly the
same meaning and behavior in this context. The operators imply the operand
on the left is run before the operand on the right. This expression is used to
build an ordered sequence of modules and the modules are run in that order.
EDFilters can be used to stop execution along the sequence.
There are more details describing this path expression syntax here:
path syntax.
Here is an example taken from the
WorkBookSimDigi topic:
trDigi = cms.Sequence(pixdigi + stripdigi)
calDigi = cms.Sequence(ecaldigi + hcaldigi)
muonDigi = cms.Sequence(muoncscdigi + muondtdigi)
doDigi = cms.Sequence(trDigi + calDigi + muonDigi)
p1 = cms.Path(VtxSmeared * SimG4Object * mix * doDigi)
In releases that support
Tasks, the following constructor allows one
to associate Tasks to a Sequence.
somename = cms.Sequence(m1 + m2 + s1, task1, task2)
where there are zero or more additional arguments of type Task that either follow or replace
the expression containing the ordered sequence of modules to run.
Alternately the Sequence class has an associate function,
somename.associate(task1, task2)
where the associate function can take 0 or more arguments of type Task.
With multithreading, the performance of cmsRun is better
when EDProducers are put on Tasks and the Tasks associated to
Sequences than when EDProducers are placed in the group of ordered
modules in a Sequence. Except in rare cases, this is recommended. This is
also recommended for EDFilters that produce event data and whose
filter results are ignored.
Task Objects
If you want unscheduled execution, you have to put your producers in a Task and
associate the Task with a Sequence, Path, EndPath, or Schedule. Task is a feature
which was added in release 9_1_0. In releases before this, the
cms.Task python class does not exist and cannot be used. It replaces the
old way of configuring a module to run unscheduled. The old way will
no longer work in release 9_1_0 and later releases.
A Task is defined with the following syntax:
taskName = cms.Task(A, B)
The constructor arguments are comma separated and must be of type EDProducer,
EDFilter, Task, ESProducer, ESSource, or Service. The constructor
can be called with 0 or more arguments. One can add to a Task using the following
syntax.
taskName.add(C, D)
The add function can take zero or more arguments with the same types as
allowed by the Task constructor.
After being created Tasks can be attached as attributes to the process,
either directly or when the python module containing them is added with
the process load function. Zero or more Task objects can be added as
an attribute of the process. The only restriction on the process attribute
names of Tasks is that they be unique within a process.
A Task can be associated with a
Sequence,
Path,
EndPath, or
Schedule.
See the sections about those types for the details about the proper syntax
to associate a Task to those types. One can associate Task objects by
including them as arguments of the constructor or by calling an explicit
associate function.
Effect of Tasks on cmsRun
First, one should be aware that placing EDProducers and EDFilters on
a Task effects cmsRun in a different way than placing ESProducers,
ESSources, and Services on a Task.
Behavior when an EDProducer or EDFilter is on a Task:
To be very brief and ignore some important details, one can explain
the behavior with the following sentence:
A module on a Task is available to be run in unscheduled mode.
The details follow.
For the above to be true, the Task must be either:
- Associated with the Schedule
- Associated with a Path or EndPath on the Schedule.
- Associated with a Path or EndPath and there is no Schedule.
where the association with the Schedule, Path or EndPath could be direct
or indirect through one or more Sequences or Tasks.
If an EDProducer or EDFilter is contained on the ordered sequence of modules
of at least one Path or EndPath and that Path or EndPath is on the Schedule
or there is no Schedule, then the module is not run unscheduled and is run as
part of the Path or EndPath. The fact that the module is on a Task is just ignored
in that case.
The order modules are added to a Task does not have any effect on
the behavior of cmsRun, nor does it matter which objects the Task is associated
to or the order of association. It only matters that there exists at least one Task
associated to at least one such object.
Unscheduled mode is described on this TWIKI page:
SWGuideUnscheduledExecution.
Behavior when an ESProducer, ESSource, or Service is on a Task:
Note: The initial focus of migrating to use Task has been on EDProducers and
EDFilters. At this time (August 2017), I am not aware of any configurations in CMSSW
using Tasks for ESProducers, ESSources, or Services other than Core unit
tests. This may or may not change in the future ...
There are 3 cases.
Case 1. The ESProducer, ESSource or Service is enabled (constructed in the
C++ part of cmsRun and available for use) if it is on a Task that is either:
- Associated with the Schedule
- Associated with a Path or EndPath on the Schedule.
- Associated with a Path or EndPath and there is no Schedule.
where the association with the Schedule, Path or EndPath could be direct
or indirect through one or more Sequences or Tasks.
Case 2. The ESProducer, ESSource or Service is enabled if it is not
in a Task which is an attribute of the process either directly or indirectly
through one or more sub-Task's.
Case 3. If an ESProducer, ESSource, or Service does not satisfy
either case 1 or case 2, then it is disabled.
One thing to note about this is that when the initial implementation
of Task is merged there will be no Tasks that contain an ESProducer,
ESSource or Service so none of them will be affected (everything will
fall under Case 2). This will make it easier to incrementally migrate
to using Tasks to manage ESProducer's, ESSource's, and Service's
one at a time.
Similarities Between Tasks and Sequences
In many ways, Tasks have been implemented to behave like
Sequences. For example, many of the functions in the Task and Sequence
python classes are named and behave similarly (e.g. replace, dumpPython).
If one uses a visitor on a Sequence, Path, or EndPath, it will
also visit the associated Tasks and one can visit a Task in a
way similar to visiting a Sequence. When a module attribute
of the process is replaced, the side effects on the Tasks and
Sequences are similar. There are other similarities. The design
intent was that except where necessary, the behavior of Tasks
and Sequences should be the same.
There is another similarity between Sequences and Tasks. In the early
stages of a cmsRun job, the python configuration is imported and many
manipulations can occur in python. But when that is done, there is a point
where the python data structures are converted into C++ classes and
passed into the C++ part of cmsRun. During this conversion, both
Tasks and Sequences are eliminated entirely. There is no C++ Task
type. EDProducers and EDFilters that are not run in the ordered sequence
of a Path or EndPath and not run in unscheduled mode are simply not
defined at all in the C++ ParameterSet. Similarly, Sequences are
expanded away and do not exist in the C++ ParameterSet.
Task Design Comments
There is a python function with the name convertToUnscheduled.
When run on a configuration, this function first resolves any
SequencePlaceholder objects. Then it removes EDProducers
from Paths and EndPaths and places them on a Task that is
associated with the Schedule. It does the same thing for
EDFilters whose filter result is ignored.
At the time Task was designed, production processes relied
heavily on the convertToUnscheduled function to run processes
in unscheduled mode. One of the major motivations for the
introduction of the Task class was to make it easier to gradually
migrate away from convertToUnscheduled by using Tasks.
One goal is to eventually eliminate the need for that function entirely.
(Before the implementation of Task, unscheduled modules were
configured in a different way and the conversion function was also
different. It was difficult to combine configuration fragments
which used unscheduled mode with configuration fragments
that did not and this was making it difficult to incrementally
migrate to configurations explicitly using unscheduled mode.)
In the pull request that included the initial implementation of
Task, all Tasks outside the Core code were associated with
the top level Schedule. This was simply an expedient to move
the development forward faster and also because the code
was implemented by someone who is not an expert in the PAT
code (which was almost the only area whose configurations
were explicitly written to run in unscheduled mode prior to that
pull request). This was not intended to be a pattern to follow.
The intent of the design was that experts will identify which
EDProducers are needed with which sequences and associate
Tasks with the specific Sequences where they are needed.
This will improve the performance and reduce memory usage
of cmsRun by allowing the C++ part of it to only construct
EDProducers (and other types) that are needed instead of
constructing everything defined in the configuration.
EDAlias
EDAlias allows mapping of an
EDProducer
from its given label (the label the Python process object uses to refer to it) to one or more different module labels which can be used in getByLabel or consumes function calls. Also allows multiple
EDProducers
to share the same label. This is useful for making multiple jobs with different configurations to appear to make the same Event products.
The label assigned to the EDAlias is the new module label that can be used to lookup the data item. The EDAlias parameters accepted by an EDAlias are of the form
<old module label> = cms.VPSet( cms.PSet( type = cms.string(<friendly class name>),
[fromProductInstance = cms.string(<old product instance name>)],
[toProductInstance = cms.string(<new product instance name>)] ))
Where
-
<old module label>
is the EDProducer's label from which the data originally derived.
-
<friendly class name>
specifies exactly what class type is to be obtained from the original EDProducer. The name is not the C++ class name and is instead the name used when specifying what data should be stored in the OutputModule. This name also corresponds to the first part of the TBranch name. One can also find the friendly class names of collections available in a job by adding the EventContentAnalyzer
to the job.
- Starting from CMSSW_11_2_0_pre6 the
type
field can be given a wildcard '*'
that matches all data product types that also match fromProductInstance
. An EDAlias is allowed to have many <old module label>
VPSets with wildcards in the type
field as long as the matched data products have different (type, instance label) pairs.
-
<old product instance name>
specifies the product instance name used by the EDProducer when storing the data product. By default EDProducer's use an empty string. If fromProductInstance
is not specified, the value of <old product instance name>
defaults to '*'
and therefore matches all data products of type <friendly class name>
produced by the Producer with module label <old module label>
.
-
<new product instance name>
specifies a product instance name different from the original one used by the EDProducer. If toProductInstance
is not specified, <new product instance name>
defaults to the value '*'
which means the new name should match the old name.
Here is an example where we want to be able to get the data product
Bars
from module label
otherbar
but using the label
bar
bar = cms.EDAlias( otherbar = cms.VPSet( cms.PSet(type=cms.string('Bars') ) ) )
Say the module
otherbar
makes two data products of different types
Bars
and
BarExtras
. Then if we want to be able to get both by the label
bar
bar = cms.EDAlias( otherbar = cms.VPSet( cms.PSet(type=cms.string('Bars') ),
cms.PSet(type=cms.string('BarExtras') ) ) )
If you want to make data coming from two different modules,
foo
and
fee
, appear to come from one module,
bar
, where
foo
makes data of type
Foos
and
fee
makes data of type
Fees
bar = cms.EDAlias( foo = cms.VPSet( cms.PSet(type=cms.string('Foos') ) ),
fee = cms.VPSet( cms.PSet(type=cms.string('Fees') ) ) )
If you want to make data coming from two different modules,
foo
and
fee
, appear to come from one module,
bar
, where both modules create the same type,
Bars
. If both modules use the default product instance name then you will need to assign at least one of them to a different instance name.
bar = cms.EDAlias( foo = cms.VPSet( cms.PSet(type=cms.string('Bars') ) ),
fee = cms.VPSet( cms.PSet(type=cms.string('Bars'),
fromProductInstance = cms.string(''),
toProductInstance = cms.string('refined') ) ) )
If you want to to data products, of type
Fees
and
Foos
coming from one module,
bar
, appear to come from two different modules then you can use two EDAlias both of which refer to the same EDProducer.
foo = cms.EDAlias(bar = cms.VPSet( cms.PSet(type=cms.string('Foos'))))
fee = cms.EDALias(bar = cms.VPSet( cms.PSet(type=cms.string('Fees'))))
It is also possible to alias all data products of an EDProducer (since CMSSW_11_2_0_pre6) with the syntax below. This feature is likely useful only in conjunction with
SwitchProducer.
foo = cms.EDAlias(bar = cms.EDAlias.allProducts())
Statements
Processing and trigger path (path) statements
Processing and trigger paths are declared via
Path
objects. Any number of uniquely labelled
Path
definitions may be attached to a
Process
object.
The
Path
definition is of the form:
somename = cms.Path(m1 + m2 + s1)
There can be zero or more operands in the expression that is passed as the
argument of the constructor. Usually the operands have the type of a
module (EDAnalyzer, EDFilter, EDProducer) or Sequence,
but there are other possibilities. Note that it is illegal for an OutputModule
to be placed on a Path.
The operator can be "+" or "*" and these two operators have exactly the
same meaning and behavior in this context. The operators imply the operand
on the left is run before the operand on the right. This expression is used to
build an ordered sequence of modules and the modules are run in that order.
EDFilters can be used to stop execution along the sequence.
There are more details describing this path expression syntax here:
path syntax.
The following are equivalent:
process.mypath = cms.Path (process.m1*process.m2*process.s1*process.m3)
process.mypath = cms.Path (process.m1+process.m2+process.s1+process.m3)
process.mypath = cms.Path (
process.m1*
process.m2*
process.s1*
process.m3)
process.mypath = cms.Path(process.m1*process.m2)
process.mypath *= process.s1*process.m3
In releases that support
Tasks, the following constructor allows one
to associate Tasks to a Path.
somename = cms.Path(m1 + m2, task1, task2)
where there are zero or more additional arguments of type Task that either follow or replace
the expression containing the ordered sequence of modules to run.
Alternately the Path class has an associate function,
somename.associate(task1, task2)
where the associate function can take 0 or more arguments of type Task.
With multithreading, the performance of cmsRun is better
when EDProducers are put on Tasks and the Tasks associated to
Paths than when EDProducers are placed in the group of ordered
modules in a Path. Except in rare cases, this is recommended. This is
also recommended for EDFilters that produce event data and whose
filter results are ignored.
Paths, EndPaths, and Sequences are very similar. The functions to
construct them and associate Tasks are identical. What are the
differences? Sequences are used only as building blocks and are used
to build Paths and EndPaths. Paths have meaning in the output
of cmsRun. There is a data object which is automatically produced
called TriggerResults that stores the results of the Paths and those
results can be used by OutputModules to select events. There are
more details about this here:
Processing and trigger paths
EndPath statements
An
EndPath
object is used to define an ordered group of modules which are to run after all
Paths
have been run. Any number of uniquely labelled
EndPath
defintions may be attached to a
Process
object.
EndPaths
are used mostly for
OutputModules and EDAnalyzers (e.g. for
PoolOutputModules
).
The
EndPath
definition is of the form:
somename = cms.EndPath(m1 + m2 + s1)
There can be zero or more operands in the expression that is passed as the
argument of the constructor. Usually the operands have the type of a
module (EDAnalyzer, OutputModule) or Sequence,
but there are other possibilities. In rare cases EDFilters or EDProducers are
allowed to be on EndPaths, but this is greatly discouraged.
The operator can be "+" or "*" and these two operators have exactly the
same meaning and behavior in this context. The operators imply the operand
on the left is run before the operand on the right. This expression is used to
build an ordered sequence of modules and the modules are run in that order.
There are more details describing this path expression syntax here:
path syntax.
In releases that support
Tasks, the following constructor allows one
to associate Tasks to an EndPath.
somename = cms.EndPath(m1 + m2, task1, task2)
where there are zero or more additional arguments of type Task that either follow or replace
the expression containing the ordered sequence of modules to run.
Alternately the EndPath class has an associate function,
somename.associate(task1, task2)
where the associate function can take 0 or more arguments of type Task.
Schedule statements
To define the Paths and EndPaths to be run, you can define a Schedule.
If you don't define a Schedule, all Paths and EndPaths that are attributes
of the process will be run.
To specify a Schedule, pass any number of Path and EndPath objects to
the Schedule constructor separated by commas.
process.schedule = cms.Schedule(process.generation_step,process.out_step)
In releases that support
Tasks, the following constructor allows one
to associate Tasks to the Schedule.
process.schedule = cms.Schedule(process.path1, process.path2, tasks=[process.task1,process.task2])
There is an additional argument which must be a single python keyword argument
with the keyword name 'tasks' as shown above. The value of the keyword argument
should be a list of objects of type Task. Instead of a list it will also accept
any iterable container of objects of type Task or just a single object of
type Task. Alternately, the Schedule class has an associate function
process.schedule.associate(process.task1, process.task2)
where the associate function can take 0 or more arguments of type Task.
The import
statement
The standard Python
import statement is used to inject the objects from another file into the current file, therefore it can be used to include shared-use configuration fragments. The
import statements may be "nested", meaning that a configuration document may import a fragment which itself imports another fragment. There is no limit (other than that caused by memory exhaustion) on the number of levels of inclusion allowed. Here is an example:
import FWCore.Modules.printContent_cfi
In the above example,
FWCore
is the subsystem,
Modules
is the package, and
printContent_cfi.py
is the filename. Note that in the import statement the suffix
.py
of the filename is left out and directories are separated with a "." instead of "/". This is standard Python. In CMSSW, the environment is configured so that Python will look for imported python files in two places. For example if the release is
CMSSW_8_1_0
, it will look for
CMSSW_8_1_0/src/FWCore/Modules/python/printContent_cfi.py
Note that the directory named
python
is not included in the import statement. If and only if it fails to find the file there and if for example the current architecture is
slc6_amd64_gcc530
, then it will also look for
CMSSW_8_1_0/cfipython/slc6_amd64_gcc530/FWCore/Modules/printContent_cfi.py
Automatically generated
cfi
files (discussed below) will be found in subdirectories of the
cfipython
directory.
Use of
import does not guarantee that the objects defined in the python configuration file will actually be used to configure the job. In order to be used to configure a job, the objects must be attached to the
Process
object which is referenced by the
process
variable. The attachment can either be done for each object individually by assigning them to attributes of
process
, e.g.,
import Subsystem.Package.somemodule
process.somename = Subsystem.Package.somemodule.somename
or all the variables of a python module can be attached to a process by using the
extend()
member function
import Subsystem.Package.somemodule
process.extend(Subsystem.Package.somemodule)
When using 'extend' all the variable names used in the imported module are assigned as the 'labels' for the process' attributes. Because the label assigned to a module comes from the process attribute's name assigned to the object and not from the original variable to which the object was assigned, the individual object assignment method allows one to 'relabel' a module to be different from the imported file
import Subsystem.Package.somemodule
process.someothername = Subsystem.Package.somemodule.somename
There are several alternative syntaxes for the import statement depending on how you wish to access the variables defined:
#must use full name to get to a variable
import Subsystem.package.filename
v = Subsystem.package.filename.variable
#must use short name to get to a variable
import Subsystem.package.filename as shortname
v = shortname.variable
#variable is added directly to this file
from Subsystem.package.filename import variable
p = variable
#all variables are added directly to this file
#deprecated by some because it might bring in too much
#but this is common in CMS configurations
from Subsystem.package.filename import *
p = variable
# Important convenience method to insert every cms object in the module into a process
# In the following example, the module is declared in the file
# Configuration/StandardSequences/python/Analysis_cff.py
# Same as import followed by process.extend
process.load("Configuration.StandardSequences.Analysis_cff")
An important detail to add is that "import *" and the process load and extend functions
do not import or attach variables that start with underscore "_". If you do not want a
variable to be imported or attached to the process, then you can start its name with
an underscore and prevent that unless the name is explicitly used.
Blocking imports
Occasionally, a cfi.py or cff.py file may wish to declare itself "unimportable" except by certain files. As of CMSSW 3_1_2, this is possible by including the following statements at the start of the imported file:
import FWCore.ParameterSet.Config as cms
cms.checkImportPermission(allowedPatterns = ['Module','Module/Submodule'])
where the list can include any number of modules and/or submodules which are allowed to import the file. A user's top-level _cfg.py is allowed, by default, to import such a file. This behavior can be changed by setting the value of minLevel = 1 as another parameter to checkImportPermission()
Including Standard Module definitions
Every configuration has a top level file called the
cfg
file. By convention its filename ends with the extension
_cfg.py
. This file may define the entire configuration or it may include other configuration files by importing them.
Configuration File Initializer files (
cfi
files) are used to initialize modules with all of their default parameter settings. Each module is supposed to have one, with the module label the first part of its filename and the extension
_cfi.py
as the rest of the filename. This file is intended for use in end-users'
cfg
files; the user can then reset only the parameters that require non-default values.
cfi
files may also be imported into
cff
files.
Configuration File Fragment files (
cff
files) are like
cfi
files, but are not associated with modules one-for-one. They are used to contain (or import) pieces of a configuration which may be either larger than
cfi
files (e.g. import multiple
cfi
files and define Sequences and other objects) or smaller than
cfi
files (e.g. contain just one or a few parameter specifications or a PSet). These files may also be included in
cfg
files. The file extension is
_cff.py
.
For Component developers
The developer of a module should make sure that a
cfi
file is available which can be imported. There are two ways to do this. The recommended method is to add a function to the C++ code of the module called fillDescriptions. After this has been created, when
scram build
is run a
cfi
file will be automatically generated. There are several advantages to creating the
cfi
file this way. First, the
cfi
is automatically kept consistent with the C++ code. There is no need to remember to change the
cfi
file and less chance for an error to create an inconsistency. The fillDescriptions function is used to validate configurations. This validation can report mistakes, for example if a parameter name was spelled incorrectly.
There is also an executable that will provide information about the possible parameters of modules which contain a fillDescriptions function, for example
edmPluginHelp -p EventContentAnalyzer
or for brief output
edmPluginHelp -b -p EventContentAnalyzer
. This is described in detail here:
SWGuideConfigurationValidationAndHelp.
Alternatively, one can manually create a
cfi
file and put it in the
python
subdirectory of a package. This is the same directory where
cff
files are located.
cff
files must be manually created. (Note: You do not need to manually create a
cfi
file if you implemented the fillDescriptions function and one is automatically generated. If both are created, the manually created one will be used when an import occurs.)
See above for additional information about
importing files and file locations.
cfi
files should be named
<moduleLabel>_cfi.py
Developers of ESSources and other components which have no module labels should
use the full class name as the first part of the
cfi
file name.
For example, the
FWCore/Modules
package has a file named:
FWCore/Modules/python/printContent_cfi.py
which contains:
import FWCore.ParameterSet.Config as cms
#print what data items are available in the Event
printContent = cms.EDAnalyzer("EventContentAnalyzer",
#should we print data? (sets to 'true' if verboseForModuleLabels has entries)
verbose = cms.untracked.bool(False),
#how much to indent when printing verbosely
verboseIndentation = cms.untracked.string(' '),
#string used at the beginning of all output of this module
indentation = cms.untracked.string('++'),
#data from which modules to print (all if empty)
verboseForModuleLabels = cms.untracked.vstring(),
# which data from which module should we get without printing
getDataForModuleLabels = cms.untracked.vstring(),
#should we get data? (sets to 'true' if getDataFormModuleLabels has entries)
getData = cms.untracked.bool(False)
)
For Component Users
Below is an example of a top level configuration that does not use a
cfi
.
This is an actual working example that you might find useful. It will print
out information about the content of a file named "test.root" (you can change
the filename if you would like). This is a nice way to write a very small
test configuration. It is particularly nice if you planned to manually edit the
parameters of the printContent module. But this would be a horrible
approach that would be extremely difficult to develop or maintain
for complex configurations like the ones used in event generation,
simulation, or reconstruction.
import FWCore.ParameterSet.Config as cms
process = cms.Process("TEST")
process.source = cms.Source("PoolSource",
fileNames = cms.untracked.vstring(
'file:test.root'
)
)
process.maxEvents = cms.untracked.PSet(
input = cms.untracked.int32(1)
)
process.printContent = cms.EDAnalyzer('EventContentAnalyzer',
indentation = cms.untracked.string('++'),
verbose = cms.untracked.bool(False),
verboseIndentation = cms.untracked.string(' '),
verboseForModuleLabels = cms.untracked.vstring(),
getData = cms.untracked.bool(False),
getDataForModuleLabels = cms.untracked.vstring(),
listContent = cms.untracked.bool(True)
)
process.path = cms.Path(process.printContent)
Below is an equivalent
cfg
file that imports a
cfi
file. Notice that
the parameters do not need to be repeated in the top level configuration.
The default values from the
cfi
file are used.
import FWCore.ParameterSet.Config as cms
from FWCore.Modules.printContent_cfi import printContent
process = cms.Process("TEST")
process.source = cms.Source("PoolSource",
fileNames = cms.untracked.vstring(
'file:test.root'
)
)
process.maxEvents = cms.untracked.PSet(
input = cms.untracked.int32(1)
)
process.printContent = printContent
process.path = cms.Path(process.printContent)
Modifying Parameters
There are many ways modify parameters and it depends on the situation which is best.
In this section, we discuss some of the situations and describe each.
First consider the very simple test configuration presented at
the end of the preceding section. For example, if you wanted to change
the text string used for indentation from '++' to '**' you could do this:
...
from FWCore.Modules.printContent_cfi import printContent
...
process.printContent = printContent
process.printContent.indentation = '**'
...
You do not need to specify the type of the parameter again because
that was already done in the
cfi
file.
Replacement of more than one parameter is allowed. For example,
from Configuration.Generator.SingleElectronPt10_pythia8_cfi import generator
generator.PGunParameters.MinPt = 5.0
generator.PGunParameters.MaxEta = 50.0
generator.PGunParameters.ParticleID = ( 211, 11 )
Multiple values in an object with mulitple parameters can be modified via the use of a python `dict`
from Configuration.Generator.SingleElectronPt10_pythia8_cfi import generator
generator.PGunParameters = dict(MinPt = 5.0,
MaxEta = 50.0,
ParticleID = ( 211, 11 ) )
If there are additional parameters in the object which are not in the `dict` those parameters are not modified.
Warning: If you wish to modify a vector and set it equal to a single-element array, then you must insert a comma
after that element:
generator.PGunParameters.ParticleID = ( 211, )
You may want to replace an entire
ParameterSet
:
generator.PGunParameters = cms.PSet(
MaxPt = cms.double(100.01),
MinPt = cms.double(9.99),
ParticleID = cms.vint32(11),
AddAntiParticle = cms.bool(True),
MaxEta = cms.double(2.5),
MaxPhi = cms.double(3.14159265359),
MinEta = cms.double(-2.5),
MinPhi = cms.double(-3.14159265359) ## in radians
)
All vector style parameters behave like normal Python lists and therefor allow
append
,
extend
and other list manipulations.
The syntax is as follows:
a.composers.append("Beethoven")
a.painters.extend( ("Picasso", "da Vinci"))
Cloning is strongly recommended when you are developing a
cff
file that
might be used as part of a large complicated configuration. While creating
the clone, one or more parameters can be modified.
The standard syntax for
cloning is
from aPackage import oldName
newName = oldName.clone(changedParameter = 42)
or
from aPackage import oldName as _oldName
newName = _oldName.clone(changedParameter = 42)
The second form is better if the symbol oldName is not needed and
this occurs in a fragment that might be imported with the process
load function or a
from aModule import *
statement. Symbols starting
with an underscore are not imported in these cases.
Starting from
CMSSW_12_0_0_pre2
the parameters to be changed can be passed in via a ParameterSet
from aPackage import oldName as _oldName
pset = cms.PSet(changedParameter = cms.int32(42))
newName = _oldName.clone(pset)
Giving two or more ParameterSet object arguments to
clone()
is allowed as long as the ParameterSets have distinct parameters, similar to EDModule constructors.
Cloning is important because the module object brought in by
an import statement is always the same object. If the same file
is imported in different
cff
files, the module object referenced
is the same object. So if one
cff
file changes a parameter value,
then all the modules referenced in all the
cff
files are affected.
This is very bad. But when a clone is made, the module objects
and parameters inside them are distinct objects. When a parameter
value is changed it only affects one instance of the module.
In large complicated configurations, this is very important.
There are different operating scenarios. There are many parameters
that must have different values in different operating scenarios.
These kind of parameter modifications are handled by "Eras"
which are described here:
SWGuideCmsDriverEras. "Eras"
should be used where appropriate instead of other parameter
modification schemes. Before the "Eras" system was created,
customization functions were used. These were Python functions
that made large scale modifications to configurations to
accommodate different situations. Some still exist and are
used, but for official production configurations there is an
effort to replace these with "Eras" or some other mechanism.
The customization functions tended to interfere with each other
when there was more than one, and also make the configuration
hard to understand and maintain.
Modifying Sequences and Paths
Sequences and paths have a "replace()" command, which lets you replace all occurences of a module or a sequence within a module or path with something else.
mySequence.replace(oldModule, newModule)
myPath.replace(oldSequence, newSequence)
myPath.replace(oldModule, newSequence)
If you wish to replace a module or a sequence in a process, and
have the change propagated through to all sequences,
paths, and endpaths which use that object, you should
use
process.globalReplace("label", newObject) # note the quotes!!
Warning: When replacing sequences or tasks you should use this with care if there are multiple calls to this or other similar functions that modify configurations. Some of these modifications can cause sequences and tasks to be expanded such that the sequence or task no longer exists (although its contents are preserved in the object that contained the sequence or task). Trying to replace one of these expanded sequences or tasks will not work.
Eras
This was mentioned in passing above, but it is sufficiently important that I wanted it to appear in the table of contents. The documentation for eras is here:
SWGuideCmsDriverEras. This describes the primary method of changing parameter values for different time periods of the experiment.
Passing Command Line Arguments Through cmsRun
Starting in CMSSW_3_0_0, you will be able to pass
command-line arguments through to the Python system.
For example, if you start your job via this command
> cmsRun -m grid my_cfg.py myOpt myValue=True
Then, from anywhere in the Python configuration
system, you could do:
import sys
print sys.argv
and see
['cmsRun', '-m', 'grid', 'my_cfg.py', 'myOpt', 'myValue=True']
Arguments beginning with '-' or '--' are handled by the C++ options parser in cmsRun, and other options which come after the Python file are used in Python. These additional options can begin with '-' or '--' if an unnamed '--' argument directly follows the python file name.
Additionally, you can use the python module
argparse
or the
FWCore.ParameterSet.VarParsing
module to handle the parsing of the command line objects for you (see next section for documentation).
argparse
The standard python parser
argparse
can be loaded by doing
import argparse
To understand how to use it, please consult the python standard documentation.
To pass arguments unknown to
cmsRun
through
cmsRun
and to python, you need to separate any
cmsRun
parameters from your personal parameters by using the unamed '--' argument directly after the script name. These additional parameters can be handled by
argparse
by removing the separating '--' and having
argparse
only handle only arguments it knows. The latter is needed since the arguments seen known by
cmsRun
are still in
sys.argv
parser = argparse.ArgumentParser(prog=sys.argv[0],...)
...
argv = sys.argv[:]
if '--' in argv:
argv.remove("--")
args, unknown = parser.parse_known_args(argv)
VarParsing
Example
Here's a quick example of how to use the
VarParsing
Module
import FWCore.ParameterSet.VarParsing as VarParsing
# set up process
process = cms.Process("StarterKit")
# setup 'analysis' options
options = VarParsing.VarParsing ('analysis')
# setup any defaults you want
options.outputFile = '/uscms/home/cplager/nobackup/outputFiles/try_3.root'
options.inputFiles= 'file1.root', 'file2.root'
options.maxEvents = -1 # -1 means all events
# get and parse the command line arguments
options.parseArguments()
# Use the options
process.source = cms.Source ("PoolSource",
fileNames = cms.untracked.vstring (options.inputFiles),
debugVerbosity = cms.untracked.uint32(200),
debugFlag = cms.untracked.bool(True),
)
...
process.maxEvents = cms.untracked.PSet(
input = cms.untracked.int32 (options.maxEvents)
)
# talk to output module
process.out = cms.OutputModule("PoolOutputModule",
process.patEventSelection,
process.patEventContent,
verbose = cms.untracked.bool(False),
fileName = cms.untracked.string (options.outputFile)
Here's how I might call it:
unix> cmsRun Reco2Pat1_withVarParsingExample_cfg.py print \
inputFiles=one.root inputFiles=two.root inputFiles_load=Zjets_2_2_3.list > & output&
where
inputFiles=one.root inputFiles=two.root
tells it to append
one.root
and
two.root
to the list of files to use as input.
inputFiles_load=Zjets_2_2_3.list
tells it to look at the file
Zjets_2_2_3.list
and append all files listed to the list of files to use as input as well.
print
tells it to print out the current values of all user access variables (very useful for log files)
Reco2Pat1_withVarParsing_example_cfg.py.txt is a complete example configuration script.
VarParsing
Documentation
The idea behind
VarParsing
package is simple: make it easy to set up variables that can be changed from the command line. When hooking up a variable, you need to tell the module certain information about the object:
- name of object
- default value of object
- whether the object is a single number (
VarParsing.VarParsing.multiplicity.singleton
) or a list (VarParsing.VarParsing.multiplicity.list
)
- whether the object is a string (
VarParsing.VarParsing.varType.string
), integer (VarParsing.VarParsing.varType.int
), or float (VarParsing.VarParsing.varType.float
)
- use integer for boolean flags (
1
= True, 0
= False)
- information string on object (shown when user types 'help' or 'print' from command line)
Note: Instead of typing
VarParsing.VarParsing.
, you can simply type
options.
where
options
is the name of the
VarParsing
object you created.
Here's an example of hooking up an integer called
someInt
:
options.register ('someInt',
-1, # default value
VarParsing.VarParsing.multiplicity.singleton, # singleton or list
VarParsing.VarParsing.varType.int, # string, int, or float
"Number of events to process (-1 for all)")
If when you create a
VarParsing
instance, you pass in the string
'analysis'
, you will get the following objects by default:
Analysis Options:
Option |
Purpose |
maxEvents |
Number of events to process (singleton integer) |
inputFiles |
List of files to process as input (list string) |
secondaryFiles |
List of secondary files (if needed; list string) |
outputFile |
Name of output file (singleton string) |
secondaryOutput |
Name of secondary output (if needed; singleton string) |
To set another default value, you would:
object.someInt = 4
If you have a list variable, you can load default values as follows:
options.someList = 'one.txt', 'two.txt'
options.someList = 'three.txt'
options.loadFromFile ('someList', 'nameOfFiles.list')
After this,
options.someList
will contain
'one.txt'
,
'one.txt'
,
'one.txt'
, and any files listed in
'nameOfFiles.list'
.
Note that the format of
'nameOfFiles.list'
text file is the same as the format that DBS will give to you. Comments (or lines you want to temporarily exclude) starting with the hash character (
#
) can be embedded in the text file and will be ignored.
Command Line Options
To assign a variable defined above from the command line, you simply use
variable=value
syntax.
If the variable (object) is a list, you have several options:
- You can repeat
variable=value
many times to load different values into the list
- You can type
variable_clear
to clear the variable
list of all entries
- You can type
variable_load=nameOfFiles.list
to load all files listed in nameOfFiles.list
into the variable
list
Finally there are two other commands you can put on the command line
-
print
will print the current values of all of the settable variables
-
help
will do the same as print
, but also display possible commands and will then exit.
IMPORTANT: The configuration script
MUST end with
.py
. If it does not,
VarParsing
will not know where to start looking at the commands and will assume that everything is for
cmsRun
.
VarParsing
Tags
VarParsing
has the ability to append
tags to the name of the output file given the values of the command line options. This can be very useful, for example, when running different reconstruction algorithms so that 1) it is obvious which file goes with which options and 2) so you don't accidentally overwrite files you want to compare with each other.
object.setupTags (tag = 'someBool',
ifCond = 'someBool > 0')
object.setupTags (tag = 'someInt%d',
ifCond = 'someInt > 0',
tagArg = 'someInt')
The first case sets up a tag if the user variable
someBool
is greater than 0. The second case will setup a tag if
someInt
is greater than 0, and then it will embed the value into the tag (
'%d'
).
An annotated example
Here we present an annotated sample configuration program that demonstrates many of the features found in a typical configuration program. The details of the file and the configuration objects are explained in the previous sections.
# Python is very picky about leading spaces on a line
# Comments (obviously!) are introduced by a sharp ('#'), and continue
# to the end of line.
#Bring in the CMS configuration python class and function definitions
import FWCore.ParameterSet.Config as cms
process = cms.Process("TEST")
# Each process must have a source attribute.
# As the first argument of the constructor, one must specify the
# name of the class to be instantiated. This name is the same name
# that the plug-in was associated with, generally done in the
# class's implementation file. This is usually the name of the
# C++ class, stripped of any namespace specification.
process.source = cms.Source("PoolSource",
fileNames = cms.untracked.vstring('file:somefilename.root')
)
process.a = cms.EDProducer("AProducer",
a = cms.int32(32),
b = cms.vdouble( 1.1, 2.2 ),
c = cms.vstring( ), # empty vectors are accepted
d = cms.vstring( 'boo', "yah" )
)
# Configuration file fragments may be included by naming
# the file to be included...
# The 'import' statement below shows the syntax.
import SomeSubsystem.SomePackage.partial_configuration_cff as _partial
process.b = cms.EDProducer("BProducer",
a = cms.untracked.int32(14),
b = cms.string('sillyness ensues')
c = cms.PSet(
a = cms.string('nested')
)
).extend(_partial)
process.y = cms.OutputModule("PoolOutputModule",
fileName = cms.untracked.string('myfile_y.root')
)
process.z = cms.OutputModule("PoolOutputModule",
fileName = cms.untracked.string('myfile_z.root')
)
process.s1 = cms.Sequence( process.a+process.b )
process.s2 = cms.Sequence( process.b )
process.s3 = cms.Sequence( process.a )
# It is not an error for two sequences (here, s3 and s4) to be identical.
process.s4 = cms.Sequence( process.a )
process.p1 = cms.Path(process.a+process.b)
process.p2 = cms.Path(process.s1 * (process.s3+process.s2) )
process.ep = cms.EndPath( process.y + process.z )
Historical Notes
- Prior to the year 2008, CMS did not use the Python language for its configurations. CMS had developed its own language for its configuration files. We have tried to completely remove that old language from the code repository and documentation, but one still runs across it from time to time. Most often it is found in old documentation that needs to be updated or deleted. There are significant similarities between the old configuration language and the current python configuration files, but the old language is formatted differently and will fail if you try to run it.
- Two operators are allowed to be used when building the module sequences in paths, endpaths, and sequences: '+' and '*'. There is currently no difference in their behavior. A very long time ago, these operators were used to express dependences between modules and those dependences had to be consistent on all paths. And there were error checks and exceptions thrown if those dependences were not consistent on all paths. But this scheme for checking errors and consistency was too difficult to maintain in the configurations and was removed a long time ago. The fact that two operators can still be used in these expressions is now just a historical remnant that we keep for backward compatibility reasons.
Review status
Reviewer/Editor and Date (copy from screen) |
Comments |
-- ChrisDJones - 01 May 2007 |
translate config language page to python equivalent |
-- JennyWilliams - 08 Aug 2007 |
moved page from workbook into SWGuide and put more basic introduction into the workbook |
-- ChristopherJones - 19 Aug 2008 |
removed .data. from all import statements to match present day usage |
-- CharlesPlager - 09 Jan 2009 |
Added information on VarParsing.py and command line arguments to configuration scripts |
-- WilliamTanenbaum - 27 Sep 2010 |
Allow specification of luminosity block numbers in EventID and EventRange |
-- StefanoBelforte - 17 Sep 2014 |
fix reference to SWGuideEDMPathsAndTriggerBits to use CMSPublic twiki |
-- DavidDagenhart - 21 Dec 2016 |
Added parts about Task, Reviewed all the rest (except VarParsing parts), Many Updates and Fixes |
Responsible:
ChrisDJones, Liz Sexton-Kennedy
Last reviewed by:
DavidDagenhart - 21 Dec 2016