Description of the cmsRun Python Configuration Syntax

Complete: 4

Goals of this page

This page describes how to use the Python language to create configuration files.

For an introductory discussion of configuration files, see the WorkBook page WorkBookConfigFileIntro. (It covers some topics not covered here, so it might be a good idea to read or skim that before or after reading this page.)

Introduction

A configuration document, written using the Python language, is used to configure the cmsRun executable. A Python configuration program specifies which modules, inputs, outputs and services are to be loaded during execution, how to configure these modules and services, and in what order to execute them. It is a file which can be self-contained, or can read in any number of external Python configuration fragments using Python's standard import statement. Here is an illustration of the structure of a configuration file:

# Import CMS python class definitions such as Process, Source, and EDProducer
import FWCore.ParameterSet.Config as cms

# Import contents of a file
import Foo.Bar.somefile_cff

# Set up a process, named RECO in this case
process = cms.Process("RECO")

# Configure the object that reads the input file
process.source = cms.Source("PoolSource", 
    fileNames = cms.untracked.vstring("test.root")
)

# Configure an object that produces a new data object
process.tracker = cms.EDProducer("TrackFinderProducer")

# Configure the object that writes an output file
process.out = cms.OutputModule("PoolOutputModule",
    fileName = cms.untracked.string("test2.root")
)

# Add the contents of Foo.Bar.somefile_cff to the process
# Note that more commonly in CMS, we call process.load(Foo.Bar.somefile_cff)
# which both performs the import and calls extend.
process.extend(Foo.Bar.somefile_cff)

# Configure a path and endpath to run the producer and output modules
process.p = cms.Path(process.tracker)
process.ep = cms.EndPath(process.out)

There must be exactly one cms.Process with the name process in a top level configuration file.

Python Quick Tips

SWGuidePythonTips

Python has a few features that take some getting used to.

Python program flow is controlled by the indentation of the statements. For example, the content of a function definition is indented more than the first line that declares the function name. Python knows the function content has ended when the indentation goes back to what it was. Other program blocks like "if" conditional blocks are handled similarly, so there's no need for 'endif' statements or curly braces to mark the end of a block of statements.

The way arguments are passed into a function is different in Python and C++. In the official "Python Tutorial" it says arguments are passed using "call by value (where the value is always an object reference, not the value of the object)" then there is a footnote that says "Actually, call by object reference would be a better description, since if a mutable object is passed, the caller will see any changes the callee makes to it (items inserted into a list)."

Another difference between C++ and python is that when you assign an object to a variable using the '=' sign, python doesn't make a copy of the object. In general, assume that python shares references unless you explicitly make a copy, so when you modify an object, you modify it everywhere it's used.

Here is an example that may clarify the last three points.

x = []
def f(x):
    x = [1]
f(x)
print x
def g(x):
    y = x
    y.append(1)
g(x)
print x
If you copy the above to a file and run it with python, you see the first print statement prints an empty list. Then second prints a list that contains a 1. What happens? First an empty list is created and a symbol 'x' that exists outside the functions is created and set to refer (or point) to that list. When function f is called a new symbol 'x' is created that refers to the same list. Then a new list is created and the symbol 'x' in the function is set to refer to the new list. It no longer points at the original list. The original list is unchanged. The symbol 'x' outside the function still points at the original list. Then function g is called and another symbol 'x' is created which is set to refer to the original list. A symbol 'y' is created and is set to refer to the same list 'x' refers to. Then using y, a function of the original list is called that modifies the original list. After the function returns that change is visible. If you understand this little example, then you will understand some important things about how Python works.

Objects are built using constructor syntax. It's easy to forget the commas.

You can validate the syntax of your configuration using python

python <your file>

Python allows you to explore your configuration from the python command prompt. There is described here.

Python and SCRAM

To make objects in your python files visible to other python files, the other files need to import your python files. For the import to succeed your python files first need to be put in your package's python directory, or a subdirectory of python. Then, you need to run scram build once in your python area, to initialize the symbolic links. You only need to build once. If later you change your python file, you won't need to re-build.

Many types of objects (e.g. EDProducers) can have a function called fillDescriptions defined in their C++ class. This function specifies which parameters are allowed to exist in the modules configurations and other things like parameter default values. If this function has been defined for a C++ class, it can trigger automatic generation of the cfi file for modules of that class. The automatic generation is part of the build that executes when the command scram build is given. The automatically generated cfi files are stored in the cfipython subdirectory of a CMSSW working area instead of the src subdirectory. See SWGuideConfigurationValidationAndHelp for more details.

Objects

A Python configuration program is structured as a hierarchy of objects, nested or adjacent with respect to each other. Each object specifies a recognizable component or feature of the cmsRun program. Each object is given a name which is used to refer to it in the Python code. In CMS we often call those names "labels". Each object has a python type. Not all, but many types can be configured by a parameter set and store that parameter set. And frequently, a particular object will correspond to one C++ class that is related but not the same as the python type. Commonly, an object definition looks like this:

aLabel = pythonType("C++ClassName",
    aParameterName = anotherPythonType(constructor arguments)
    # more parameter definitions
    ...
)

Below is an example where the label is VtxSmeared, the python type is cms.EDProducer, and the C++ class name is VertexGenerator. The example also shows that comments may be added within a block's scope using "#" where the comment continues to the end of the line. It is clearest to define each parameter on a separate line in your configuration file.

VtxSmeared = cms.EDProducer("VertexGenerator",
    # Setting parameters:  <-- First comment
    MeanX = cms.double(0.),
    MeanY = cms.double(0.),
    MeanZ = cms.double(0.),
    # Second comment
    SigmaX = cms.double(0.015),
    SigmaY = cms.double(0.015),
    SigmaZ = cms.double(53.0)  # in mm (as in COBRA/OSCAR) <-- Third comment
)

Labelled vs Unlabelled Objects

I considered deleting this section from the documentation because the terms described here are not often used, but I left it here because there are some obscure places in the code and documentation where you might run into the terms 'Labelled' and 'Unlabelled'.

In CMS, a 'label' is the attribute name used in the process to refer to an object. When the process load function or the process extend function is used to attach objects to the process, the label is the same as the Python variable name assigned to refer to the object (and because they are usually the same that name is often called the label as well). In Python configurations, every object attached to the process has a label. So the terms 'Labelled' and 'Unlabelled' are really misnomers that have a historical origin. Before 2008, CMS did not use the Python language for configurations and in the language we used before Python, some objects attached to the process really did not have labels. These types of objects were called 'Unlabelled' and even now there remains some subtle differences in how these are handled internally (for backward compatibility reasons that are now ancient). Currently in a Python configuration, an 'Unlabelled' object is one that has a certain label and in some cases the object is required to always have that label. Here are some examples:

  • A source must have the label 'source'
  • A service must have a label matching its C++ type name
  • An ESProducer or ESSource is considered 'Unlabelled' if and only if it has the same label as its type name, although they are allowed to have an arbitrary labels

Most other types are 'Labelable'. The example in the previous section shows a 'Labeled' object. Here is an example of an 'Unlabelled' object.

process.source = cms.Source("PoolSource",
    # parameter declarations
    ...
) 

The label of a 'Labelled' object can have an arbitrary value, except that it must be unique within the scope of the process.

The Process Object

The top-level item of a configuration program is a Process object. Each configuration program must have a Process object assigned to a variable named process. [NOTE: the program may have other Process objects assigned to different variable names, but those Process objects will not be used to configure the job except in the special case of SubProcess's]. This process variable refers to a Process object which aggregates all the configuration information for the cmsRun executable. The Process object must be named by passing the name as the first argument to the Process class' constructor. This name gets carried along with the output data and is used as a part of the branch name to distinguish between otherwise similar objects in a given event.

process = cms.Process("NAME")

Official production processes have a standard set of process names which they use which include but are not limited to the following:

  • SIM
  • HLT
  • RECO
  • PAT

User processes are free to select their own process names. Common choices include TEST and USER. A history of processes which have added data to the Events, Runs, and LuminosityBlocks is recorded in the output files. New processes must have names different from all the names in the process history.

Attribute Declarations for the Process Object

Inside the process object there must be exactly one object assigned that has Python type Source and is used for data input. There may be zero or more objects for each of many other Python types. In the official production configurations there can be hundreds or even thousands of objects attached to the process. Your job is configured by your choice of objects to construct and attach to the process, and by the configuration of each object. (This may be done via "import" statements or calls to the load function, instead of or in addition to object construction.) Some of the Python types that may be used to create these objects are listed below:

Source object
Defines the data input source for the cmsRun executable; i.e., where the data comes from. There must be exactly one and it must be assigned the label source. Configured with a C++ type name and parameter set for each object.
EDProducer, EDFilter, EDAnalyzer or OutputModule objects
The four Python types are collectively refered to as 'Module's. Each of the four Python types corresponds to a specific C++ base class. Configured with a C++ type name and parameter set for each object. The C++ type must inherit from the corresponding C++ base class.
EDAlias objects
Allows mapping of an EDProducer from its given label to one or more different module labels. Also allows multiple EDProducers to share the same label. This is useful for making multiple jobs with different configurations to appear to make the same Event products.
Service object
Invoke services. They must be assigned to a label which matches their C++ class name. Configured with a C++ type name and parameter set for each object.
PSet objects
Define parameter sets.
VPSet objects
Define vectors of parameter sets.
SecSource objects
Define secondary data input source, used only with "mixing modules" and are typically included as a parameter in the parameter set of a mixing module instead of being attached directly to the process. Configured with a C++ type name and parameter set for each object.
ESSource objects
Grab information from an external input source, e.g., the calibration database, an xml file, etc. Configured with a C++ type name and parameter set for each object.
ESProducer objects
Describe environmental quantities, e.g., ambient temperature, magnetic field, and so on. Configured with a C++ type name and parameter set for each object.
Sequence definitions
Define ordered groups of modules, giving each group a name that may be used in other sequences, paths, or endpaths.
Task definitions (in releases that support Tasks)
Define groups of EDProducers and EDFilters to be run in unscheduled mode. Define which EDProducers, ESSources, and Services are enabled.
Path definitions
Specify groups of modules to execute in the order given.
EndPath definitions
Define groups of modules to run, in the order given, after all other named paths have been run.
Schedule definitions
Define the group of paths and end paths to run. Can be 0 or 1 of these and it must be labelled schedule

CMSSW classes and modules

"Module" is a generic name for "workers" in cmsRun that are objects instantiated from a C++ class that inherits from a base class with one of the following 4 names: EDProducer, EDFilter, EDAnalyzer and OutputModule. There have been thousands of such C++ classes defined in CMSSW. Modules are described in more detail here: Modular Architecture. There are multiple ways to browse the C++ code of these CMSSW classes available online. Here are a some of them:

Parameters

Modules and some other C++ objects which are created by cmsRun can be configured by parameters in the configuration. These parameters are initially passed to the constructor of the corresponding Python object which stores them. Then the framework creates a C++ ParameterSet object from them and passes that to the C++ constructor of the object. The following table shows the different types of parameters that can be used when configuring a module in the Python code. Each such object can contain any or all of the following types of named parameters, in any number or combination. Parameters of type PSet can be nested inside the top level set of parameters and other PSet parameters and this nesting will result in nested ParameterSet objects in C++ execution.

NOTE: the examples assume one has done

import FWCore.ParameterSet.Config as cms
so we can use the short name cms rather than the verbose name FWCore.ParameterSet.Config when referring to the Python objects.

Python Type C++ Type Example
bool bool b = cms.bool(False)
int32 int i = cms.int32(-234)
uint32 unsigned i = cms.uint32(2112)
vint32 std::vector<int> v = cms.vint32( 1, -3, 5 )
vuint32 std::vector<unsigned> v =cms.vuint32( 0, 1, 0 )
int64 long long i = cms.int64(-234)
uint64 unsigned long long i = cms.uint64(2112)
vint64 std::vector<long long> v = cms.vint64( 1, -3, 5 )
vuint64 std::vector<unsigned long long> v =cms.vuint64( 0, 1, 0 )
string std::string s = cms.string("spaces are allowed")
s = cms.string('single quotes allowed')
vstring std::vector<std::string> v = cms.vstring( 'thing one', "thing two")
double double d = cms.double(-3.43e-34)
vdouble std::vector<double> v = cms.vdouble(1.2, 3, 4.5e-100, -inf)
FileInPath edm::FileInPath particleFile = cms.FileInPath("SimGeneral/HepPDTESSource/data/particle.tbl")
InputTag edm::InputTag inputTag = cms.InputTag("simrec","jets")
VInputTag std::vector<edm::InputTag> jetTags = cms.VInputTag( cms.InputTag("simrec","jets"), cms.InputTag("cone5CMS.CaloJets"))
ESInputTag edm::InputTag inputTag = cms.ESInputTag("hbconditions")
VESInputTag std::vector<edm::InputTag> inputTag = cms.VInputTag(cms.ESInputTag("hbconditions"))
EventID edm::EventID e = cms.EventID(1,1,1)
VEventID std::vector<edm::EventID> events = cms.VEventID(cms.EventID(1,1,1),cms.EventID(2,3,4))
EventRange edm::EventRange r = cms.EventRange(1,1,1,5,3,999) (0 corresponds to MAX)
r = cms.EventRange("1:1:1-5:3:999") ('min' and 'max' are allowed)
VEventRange std::vector<edm::EventRange> ranges = cms.VEventRange( cms.EventRange(1,1,1,5,0,0), cms.EventRange(8,1,1,9,1,10))
LuminosityBlockID edm::LuminosityBlockID l = cms.LuminosityBlockID(1,1)
VLuminosityBlockID std::vector<edm::LuminosityBlockID> lumis = cms.VLuminosityBlockID(cms.LuminosityBlockID(1,1),cms.LuminosityBlockID(2,3))
LuminosityBlockRange edm::LuminosityBlockRange r = cms.LuminosityBlockRange(1,1,5,0) (0 corresponds to MAX)
r = cms.LuminosityBlockRange("1:1-5:max") ('min' and 'max' are allowed)
VLuminosityBlockRange std::vector<edm::LuminosityBlockRanget> ranges = cms.VLuminosityBlockRange( cms.LuminosityBlockRange(1,1,5,0), cms.LuminosityBlockRange(8,1,9,1))
PSet edm::ParameterSet see below
VPSet std::vector<edm::ParameterSet> see below

Note that empty vectors are legal; e.g.,

    c = cms.vint32( )
creates an empty integer vector named "c".

The system keeps track of what parameters are used to create each data item in the Event and saves this information in the output files. This can be used later to help understand how the data was made. However, sometimes a parameter will have no effect on the final objects created, e.g., the parameter just sets how much debugging information should be printed to the log. Such parameters are declared 'untracked' as shown below and their values are not saved.

To see how parameters are defined, here is a sample :

import FWCore.ParameterSet.Config as cms
source = cms.Source("FlatRandomEGunSource",     
    # Here we define a nested parameter set (PSet) to be used by the source, 
    # and give it the name PGunParameters:
    PGunParameters =cms.PSet(
         # you can request more than one particle, e.g.:
         # PartID = cms.vint32(211,11)   # but we just request one this time:
         PartID = cms.vint32(211),
         MinEta = cms.double(-3.5),
         MaxEta =  cms.double(3.5),
         MinPhi = cms.double(-3.14159265358979323846), # in radians
         MaxPhi =  cms.double(3.14159265358979323846),
         MinE   =  cms.double(9.99),
         MaxE   = cms.double(10.01)
      ),
      Verbosity = cms.untracked.int32(0) # set to 1 (or greater)  for printouts
   )

For modules with many parameters, it can be painful to have to write every single one of them in a job's configuration program. To avoid this, the intended design is that there be exactly one file called a "configuration fragment include" whose file ends with _cfi.py corresponding with each module in CMS. Then in places where the module is used, the cfi file is imported instead of redefining all the possible parameters. After importing the cfi, the parameters will all have default values and changes can be made to the defaults. The module can be copied or cloned. This is discussed more in the section below about imports and the sections following that one.

cfi files will be automatically generated for modules that have implemented the fillDescriptions function in their C++ code. You can find the cfi files that were manually implemented in the python directory of the package where the module is defined. The automatically generated cfi files are placed in subdurectories of the cfipython directory of the top level level working directory. cfi do not need to be manually written for modules where the autogeneration occurs.

Parameters without defaults

Parameter types can be specified without declaring a particular default value using the modifier required , optional , and obsolete . A sample could be
import FWCore.ParameterSet.Config as cms

value = cms.PSet( min = cms.required.int32,
                  max = cms.optional.uint32,
                  vebosity = cms.optional.untracked.string )
Where
  • required : if the parameter has not been set, an exception will be thrown.
  • optional : if the parameter is missing, it will not be passed to the C++ code.
  • obsolete : the parameter (whether set or not) will never be passed to the C++ code.

Processing Component Objects

There are four types of dynamically loadable processing components (in addition to the source type which provides the event to be processed). These are known as "modules". As mentioned above,

  • "Module" is a generic name for 4 types of "workers" (C++ base classes): EDProducer, EDFilter, EDAnalyzer, and OutputModule.
  • Zero or more labelled module blocks can be assigned to a Process object.
  • The C++ classes that go here would be subclasses of any of the worker base classes.

The component types are:

Producer
Based on the EDProducer class; creates new data to be placed in the Event
Filter
Based on the EDFilter class; decides if processing should continue on a path for an Event
Analyzer
Based on the EDAnalyzer class; studies properties of the Event
OutputModule
Stores the data from the Event

The label given to a module may be used elsewhere in the configuration, (e.g., in sequence, path and endpath specifications).

For example,:

process.filter = cms.EDFilter("PythiaFilter",
    MinMuonPt = cms.untracked.double(20.)
)
process.out = cms.OutputModule("PoolOutputModule", 
    fileName = cms.untracked.string("mcpool.root"),
    SelectEvents = cms.untracked.PSet( 
        SelectEvents = cms.vstring("p")
    )
)
process.p = cms.Path(process.filter)
process.e = cms.EndPath(process.out)

Service Objects

A service is a facility that performs a well-defined task that is globally acessible and that does not affect physics results. Any number of services can be attached to a Process object, although only one for each C++ type.

This example shows the form of the service object:

process.RandomNumberGeneratorService = cms.Service("RandomNumberGeneratorService",
    externalLHEProducer = cms.PSet(
        initialSeed = cms.untracked.uint32(234567),
        engineName = cms.untracked.string('HepJamesRandom')
    )
)

RandomNumberGeneratorService is the name of the C++ class of the service the program will use and also must be the label. The specific parameters required are determined by the class used.

An alternative way to define a service and attach it to the process follows. It has the advantage that it sets the label automatically for you. This only works for services.

process.add_( cms.Service("RandomNumberGeneratorService",
        externalLHEProducer = cms.PSet(
            initialSeed = cms.untracked.uint32(234567),
            engineName = cms.untracked.string('HepJamesRandom')
        )
    )
)

Parameter Set (PSet) Objects

Any number of parameter sets ("PSet") objects can be attached to the Process object. They can also be nested inside a module as a parameter of the module or other type of object. A PSet can be nested inside another PSet. Parameter sets are used to define a list of parameters. A PSet attached to the Process can be shared by multiple modules, i.e. can be used to configure multiple modules. The PSet concept allows a single point of maintenance for such a set of parameters.

The PSet object is of the form:

somename = cms.PSet(
    # parameter declarations here, e.g.,
    s = cms.string("thing zero"),
    v = cms.vstring( 'thing one', "thing two")
)

Any of the parameter declarations listed under Declaring Parameters, above, may appear in a PSet object, as may other PSet objects.

To import an external PSet into a module, put the PSet in the constructor of whatever you're constructing, after its name, but before any named variables:

myBlock = cms.PSet( a = cms.int32(1) )

myModule = cms.EDAnalyzer("MyModule",
  myBlock,
  b = cms.int32(2)
)

Copies of the parameters from the PSet will be inserted into the module. These copies can then be modified, without affecting the original PSet.

Parameter Set Vector (VPSet) Objects

A parameter set vector, VPSet, is a vector (a list) of PSets. There may be any number of uniquely labelled vectors of parameter sets in the process object, and such a vector may be referred to, by name, from places where a ParameterSet is needed. There may be any number of vector entries. Each vector entry must be comma-separated from the next. The VPSet object is of the form:
somename = cms.VPSet(
    cms.PSet(
        ... # parameter declarations for vector entry 0
    ),
    cms.PSet(
        ... # parameter declarations for vector entry 1
    ) ,
    cms.PSet(
        ... 
    ) ,
    ...
    cms.PSet(
        ... # parameter declarations for vector entry _n_
    ) 
)

Note that it is also possible to use a VPSet which creates a std::vector<edm::ParameterSet> which is empty; this is done by:

somename = cms.VPSet()


EventRange, LuminosityBlockRange, VEventRange, VLuminosityBlockRange objects

These types are used to describe ranges of luminosity sections and events within runs. Both types follow the same syntax which is easiest to explain by looking at the text representation: cms.EventRange("1:1-5:6") which is the range begining with event (or lumi) number 1 of run 1 and continuing to event number 6 in run 5. You can also represent this as cms.EventRange(1, 1, 5, 6). The text form also takes a pair of wild-card like options, so cms.EventRange("1:min-5:max") represents the event number 1 of run 1 through the last event of run 5. This is represented as cms.EventRange(1, 1, 5, 0) as well (0 is invalid as a run, event, or lumi section, so we use it as a wildcard).

You can also supply vectors of range objects in a VEventRange or VLuminosityBlockRange objects that look like this: cms.VEventRange("1:2-3:4", "5:MIN-7:MAX").

The vector forms of these types are used to specify lumis or events to process or skip in the PoolInputSource configuration for CMSSW releases > 3_0_0.

In release 3_9_0 and later, an EventRange may optionally specify luminosity block numbers, in this representation: cms.EventRange("1:1:1-5:4:6"). If this form is used, events are ordered by run number, luminosity block number, and event number, in that order of significance. In contrast, if luminosity block numbers are not used, as in: cms.EventRange("1:1-5:6"), luminosity block numbers are ignored in determining the range, If given EventRange specifies a luminosity block number, it must specify it for both ends of the range. However, a given VEventRange may include both types of EventRange.

Secondary input source (secsource) objects

This is a special feature, and is intended for use only by "mixing modules". Unless you are writing or configuring a mixing module it is unlikely you will need this. Mixing, used in Monte Carlo generation and simulation, refers to adding a secondary source of events to the simulated hard scatter event. A mixing module is one that reads in both types of generated data and simulates the detector response accordingly.

Here is an example of how this is actually used:

mix = cms.EDProducer("MixingModule",
    ...
    input = cms.SecSource("EmbeddedRootSource",
        ...
        fileNames = cms.untracked.vstring(
            '/store/relval/2008/4/9/RelVal-RelValMinBias-1207754630/0002/00233C31-5806-DD11-9DDC-001617DBD5B2.root', 
            ...

Event Setup Source (ESSource) Objects

The purpose of an ESSource is described here: ESSource.

Any number of labelled or unlabelled EventSetup objects (ESSources) may be attached to the process block. You can use an unlabelled ESSource unless you plan to include more than one instance of an ESSource.

The unlabelled object (label and C++ class name are the same) is of the form:

SomeClass = cms.ESSource("SomeClass",
    ... # parameter declarations here
)

The labelled object is of the form:

somename = cms.ESSource("SomeClass",
    ... # parameter declarations here
)

In both cases, SomeClass is the name of the C++ class of the ESSource the program will use.

Event Setup Producer (ESProducer) Objects

The purpose of an ESProducer is described here: ESProducer.

Any number of uniquely labelled or unlabelled EventSetup producer objects (ESProducers) may be attached to a Process object. This is non-event data. You can use an unlabelled ESProducer unless you plan to include more than one instance of that same ESProducer.

The unlabelled object (label and C++ class name are the same) is of the form:

SomeClass = ESProducer("SomeClass",
  ... # parameter declarations here
)

The labelled object is of the form:

somename = ESProducer("SomeClass",
  ... # parameter declarations here
)

In both cases, SomeClass is the name of the C++ class of the ESProducer the program will use. The specific parameters required are determined by the specific class used.

Module sequences

Module sequences are used to define an ordered group of modules, giving the group a name that may be used in other sequences, paths, or endpaths in the configuration. Any number of uniquely named sequence definitions may be attached to a Process object. Sequencing is a good organization tool if you have lots of modules to load, or if you want to easily switch in and out groups of modules from the execution path.

The sequence definition is of the form:

somename = cms.Sequence(m1 + m2 + s1)
There can be zero or more operands in the expression that is passed as the argument of the constructor. Usually the operands have the type of a module (EDAnalyzer, EDFilter, EDProducer, OutputModule) or Sequence, but there are other possibilities. The operator can be "+" or "*" and these two operators have exactly the same meaning and behavior in this context. The operators imply the operand on the left is run before the operand on the right. This expression is used to build an ordered sequence of modules and the modules are run in that order. EDFilters can be used to stop execution along the sequence. There are more details describing this path expression syntax here: path syntax.

Here is an example taken from the WorkBookSimDigi topic:

trDigi = cms.Sequence(pixdigi + stripdigi)
calDigi = cms.Sequence(ecaldigi + hcaldigi)
muonDigi = cms.Sequence(muoncscdigi + muondtdigi)
doDigi = cms.Sequence(trDigi + calDigi + muonDigi)
p1 = cms.Path(VtxSmeared * SimG4Object * mix * doDigi)

In releases that support Tasks, the following constructor allows one to associate Tasks to a Sequence.

somename = cms.Sequence(m1 + m2 + s1, task1, task2)
where there are zero or more additional arguments of type Task that either follow or replace the expression containing the ordered sequence of modules to run. Alternately the Sequence class has an associate function,
somename.associate(task1, task2)
where the associate function can take 0 or more arguments of type Task.

With multithreading, the performance of cmsRun is better when EDProducers are put on Tasks and the Tasks associated to Sequences than when EDProducers are placed in the group of ordered modules in a Sequence. Except in rare cases, this is recommended. This is also recommended for EDFilters that produce event data and whose filter results are ignored.

Task Objects

If you want unscheduled execution, you have to put your producers in a Task and associate the Task with a Sequence, Path, EndPath, or Schedule. Task is a feature which was added in release 9_1_0. In releases before this, the cms.Task python class does not exist and cannot be used. It replaces the old way of configuring a module to run unscheduled. The old way will no longer work in release 9_1_0 and later releases.

A Task is defined with the following syntax:

taskName = cms.Task(A, B)
The constructor arguments are comma separated and must be of type EDProducer, EDFilter, Task, ESProducer, ESSource, or Service. The constructor can be called with 0 or more arguments. One can add to a Task using the following syntax.
taskName.add(C, D)
The add function can take zero or more arguments with the same types as allowed by the Task constructor.

After being created Tasks can be attached as attributes to the process, either directly or when the python module containing them is added with the process load function. Zero or more Task objects can be added as an attribute of the process. The only restriction on the process attribute names of Tasks is that they be unique within a process.

A Task can be associated with a Sequence, Path, EndPath, or Schedule. See the sections about those types for the details about the proper syntax to associate a Task to those types. One can associate Task objects by including them as arguments of the constructor or by calling an explicit associate function.

Effect of Tasks on cmsRun

First, one should be aware that placing EDProducers and EDFilters on a Task effects cmsRun in a different way than placing ESProducers, ESSources, and Services on a Task.

Behavior when an EDProducer or EDFilter is on a Task:

To be very brief and ignore some important details, one can explain the behavior with the following sentence: A module on a Task is available to be run in unscheduled mode. The details follow.

For the above to be true, the Task must be either:

  • Associated with the Schedule
  • Associated with a Path or EndPath on the Schedule.
  • Associated with a Path or EndPath and there is no Schedule.

where the association with the Schedule, Path or EndPath could be direct or indirect through one or more Sequences or Tasks.

If an EDProducer or EDFilter is contained on the ordered sequence of modules of at least one Path or EndPath and that Path or EndPath is on the Schedule or there is no Schedule, then the module is not run unscheduled and is run as part of the Path or EndPath. The fact that the module is on a Task is just ignored in that case.

The order modules are added to a Task does not have any effect on the behavior of cmsRun, nor does it matter which objects the Task is associated to or the order of association. It only matters that there exists at least one Task associated to at least one such object.

Unscheduled mode is described on this TWIKI page: SWGuideUnscheduledExecution.

Behavior when an ESProducer, ESSource, or Service is on a Task:

Note: The initial focus of migrating to use Task has been on EDProducers and EDFilters. At this time (August 2017), I am not aware of any configurations in CMSSW using Tasks for ESProducers, ESSources, or Services other than Core unit tests. This may or may not change in the future ...

There are 3 cases.

Case 1. The ESProducer, ESSource or Service is enabled (constructed in the C++ part of cmsRun and available for use) if it is on a Task that is either:

  • Associated with the Schedule
  • Associated with a Path or EndPath on the Schedule.
  • Associated with a Path or EndPath and there is no Schedule.

where the association with the Schedule, Path or EndPath could be direct or indirect through one or more Sequences or Tasks.

Case 2. The ESProducer, ESSource or Service is enabled if it is not in a Task which is an attribute of the process either directly or indirectly through one or more sub-Task's.

Case 3. If an ESProducer, ESSource, or Service does not satisfy either case 1 or case 2, then it is disabled.

One thing to note about this is that when the initial implementation of Task is merged there will be no Tasks that contain an ESProducer, ESSource or Service so none of them will be affected (everything will fall under Case 2). This will make it easier to incrementally migrate to using Tasks to manage ESProducer's, ESSource's, and Service's one at a time.

Similarities Between Tasks and Sequences

In many ways, Tasks have been implemented to behave like Sequences. For example, many of the functions in the Task and Sequence python classes are named and behave similarly (e.g. replace, dumpPython). If one uses a visitor on a Sequence, Path, or EndPath, it will also visit the associated Tasks and one can visit a Task in a way similar to visiting a Sequence. When a module attribute of the process is replaced, the side effects on the Tasks and Sequences are similar. There are other similarities. The design intent was that except where necessary, the behavior of Tasks and Sequences should be the same.

There is another similarity between Sequences and Tasks. In the early stages of a cmsRun job, the python configuration is imported and many manipulations can occur in python. But when that is done, there is a point where the python data structures are converted into C++ classes and passed into the C++ part of cmsRun. During this conversion, both Tasks and Sequences are eliminated entirely. There is no C++ Task type. EDProducers and EDFilters that are not run in the ordered sequence of a Path or EndPath and not run in unscheduled mode are simply not defined at all in the C++ ParameterSet. Similarly, Sequences are expanded away and do not exist in the C++ ParameterSet.

Task Design Comments

There is a python function with the name convertToUnscheduled. When run on a configuration, this function first resolves any SequencePlaceholder objects. Then it removes EDProducers from Paths and EndPaths and places them on a Task that is associated with the Schedule. It does the same thing for EDFilters whose filter result is ignored.

At the time Task was designed, production processes relied heavily on the convertToUnscheduled function to run processes in unscheduled mode. One of the major motivations for the introduction of the Task class was to make it easier to gradually migrate away from convertToUnscheduled by using Tasks. One goal is to eventually eliminate the need for that function entirely. (Before the implementation of Task, unscheduled modules were configured in a different way and the conversion function was also different. It was difficult to combine configuration fragments which used unscheduled mode with configuration fragments that did not and this was making it difficult to incrementally migrate to configurations explicitly using unscheduled mode.)

In the pull request that included the initial implementation of Task, all Tasks outside the Core code were associated with the top level Schedule. This was simply an expedient to move the development forward faster and also because the code was implemented by someone who is not an expert in the PAT code (which was almost the only area whose configurations were explicitly written to run in unscheduled mode prior to that pull request). This was not intended to be a pattern to follow. The intent of the design was that experts will identify which EDProducers are needed with which sequences and associate Tasks with the specific Sequences where they are needed. This will improve the performance and reduce memory usage of cmsRun by allowing the C++ part of it to only construct EDProducers (and other types) that are needed instead of constructing everything defined in the configuration.

EDAlias

EDAlias allows mapping of an EDProducer from its given label (the label the Python process object uses to refer to it) to one or more different module labels which can be used in getByLabel or consumes function calls. Also allows multiple EDProducers to share the same label. This is useful for making multiple jobs with different configurations to appear to make the same Event products.

The label assigned to the EDAlias is the new module label that can be used to lookup the data item. The EDAlias parameters accepted by an EDAlias are of the form

   <old module label> = cms.VPSet( cms.PSet( type = cms.string(<friendly class name>),
                                            [fromProductInstance = cms.string(<old product instance name>)],
                                            [toProductInstance = cms.string(<new product instance name>)] ))
Where
  • <old module label> is the EDProducer's label from which the data originally derived.
  • <friendly class name> specifies exactly what class type is to be obtained from the original EDProducer. The name is not the C++ class name and is instead the name used when specifying what data should be stored in the OutputModule. This name also corresponds to the first part of the TBranch name. One can also find the friendly class names of collections available in a job by adding the EventContentAnalyzer to the job.
  • <old product instance name> specifies the product instance name used by the EDProducer when storing the data product. By default EDProducer's use an empty string. If fromProductInstance is not specified, the value of <old product instance name> defaults to '*' and therefore matches all data products of type <friendly class name> produced by the Producer with module label <old module label>.
  • <new product instance name> specifies a product instance name different from the original one used by the EDProducer. If toProductInstance is not specified, <new product instance name> defaults to the value '*' which means the new name should match the old name.

Here is an example where we want to be able to get the data product Bars from module label otherbar but using the label bar

bar = cms.EDAlias( otherbar = cms.VPSet( cms.PSet(type=cms.string('Bars') ) ) )

Say the module otherbar makes two data products of different types Bars and BarExtras. Then if we want to be able to get both by the label bar

bar = cms.EDAlias( otherbar = cms.VPSet( cms.PSet(type=cms.string('Bars') ),
                                         cms.PSet(type=cms.string('BarExtras') ) ) )

If you want to make data coming from two different modules, foo and fee, appear to come from one module, bar, where foo makes data of type Foos and fee makes data of type Fees

bar = cms.EDAlias( foo = cms.VPSet( cms.PSet(type=cms.string('Foos') ) ),
                   fee = cms.VPSet( cms.PSet(type=cms.string('Fees') ) ) )

If you want to make data coming from two different modules, foo and fee, appear to come from one module, bar, where both modules create the same type, Bars. If both modules use the default product instance name then you will need to assign at least one of them to a different instance name.

bar = cms.EDAlias( foo = cms.VPSet( cms.PSet(type=cms.string('Bars') ) ),
                   fee = cms.VPSet( cms.PSet(type=cms.string('Bars'), 
                                             fromProductInstance = cms.string(''),
                                             toProductInstance = cms.string('refined') ) ) )

If you want to to data products, of type Fees and Foos coming from one module, bar, appear to come from two different modules then you can use two EDAlias both of which refer th to the same EDProducer.

foo = cms.EDAlias(bar = cms.VPSet( cms.PSet(type=cms.string('Foos'))))
fee = cms.EDALias(bar = cms.VPSet( cms.PSet(type=cms.string('Fees'))))

Statements

Processing and trigger path (path) statements

Processing and trigger paths are declared via Path objects. Any number of uniquely labelled Path definitions may be attached to a Process object.

The Path definition is of the form:

somename = cms.Path(m1 + m2 + s1)
There can be zero or more operands in the expression that is passed as the argument of the constructor. Usually the operands have the type of a module (EDAnalyzer, EDFilter, EDProducer) or Sequence, but there are other possibilities. Note that it is illegal for an OutputModule to be placed on a Path. The operator can be "+" or "*" and these two operators have exactly the same meaning and behavior in this context. The operators imply the operand on the left is run before the operand on the right. This expression is used to build an ordered sequence of modules and the modules are run in that order. EDFilters can be used to stop execution along the sequence. There are more details describing this path expression syntax here: path syntax.

The following are equivalent:

process.mypath = cms.Path (process.m1*process.m2*process.s1*process.m3)
process.mypath = cms.Path (process.m1+process.m2+process.s1+process.m3)
process.mypath = cms.Path (
    process.m1*
    process.m2*
    process.s1*
    process.m3)
process.mypath = cms.Path(process.m1*process.m2)
process.mypath *= process.s1*process.m3

In releases that support Tasks, the following constructor allows one to associate Tasks to a Path.

somename = cms.Path(m1 + m2, task1, task2)
where there are zero or more additional arguments of type Task that either follow or replace the expression containing the ordered sequence of modules to run. Alternately the Path class has an associate function,
somename.associate(task1, task2)
where the associate function can take 0 or more arguments of type Task.

With multithreading, the performance of cmsRun is better when EDProducers are put on Tasks and the Tasks associated to Paths than when EDProducers are placed in the group of ordered modules in a Path. Except in rare cases, this is recommended. This is also recommended for EDFilters that produce event data and whose filter results are ignored.

Paths, EndPaths, and Sequences are very similar. The functions to construct them and associate Tasks are identical. What are the differences? Sequences are used only as building blocks and are used to build Paths and EndPaths. Paths have meaning in the output of cmsRun. There is a data object which is automatically produced called TriggerResults that stores the results of the Paths and those results can be used by OutputModules to select events. There are more details about this here: Processing and trigger paths

EndPath statements

An EndPath object is used to define an ordered group of modules which are to run after all Paths have been run. Any number of uniquely labelled EndPath defintions may be attached to a Process object. EndPaths are used mostly for OutputModules and EDAnalyzers (e.g. for PoolOutputModules).

The EndPath definition is of the form:

somename = cms.EndPath(m1 + m2 + s1)
There can be zero or more operands in the expression that is passed as the argument of the constructor. Usually the operands have the type of a module (EDAnalyzer, OutputModule) or Sequence, but there are other possibilities. In rare cases EDFilters or EDProducers are allowed to be on EndPaths, but this is greatly discouraged. The operator can be "+" or "*" and these two operators have exactly the same meaning and behavior in this context. The operators imply the operand on the left is run before the operand on the right. This expression is used to build an ordered sequence of modules and the modules are run in that order. There are more details describing this path expression syntax here: path syntax.

In releases that support Tasks, the following constructor allows one to associate Tasks to an EndPath.

somename = cms.EndPath(m1 + m2, task1, task2)
where there are zero or more additional arguments of type Task that either follow or replace the expression containing the ordered sequence of modules to run. Alternately the EndPath class has an associate function,
somename.associate(task1, task2)
where the associate function can take 0 or more arguments of type Task.

Schedule statements

To define the Paths and EndPaths to be run, you can define a Schedule. If you don't define a Schedule, all Paths and EndPaths that are attributes of the process will be run.

To specify a Schedule, pass any number of Path and EndPath objects to the Schedule constructor separated by commas.

process.schedule = cms.Schedule(process.generation_step,process.out_step)

In releases that support Tasks, the following constructor allows one to associate Tasks to the Schedule.

process.schedule = cms.Schedule(process.path1, process.path2, tasks=[process.task1,process.task2])
There is an additional argument which must be a single python keyword argument with the keyword name 'tasks' as shown above. The value of the keyword argument should be a list of objects of type Task. Instead of a list it will also accept any iterable container of objects of type Task or just a single object of type Task. Alternately, the Schedule class has an associate function
process.schedule.associate(process.task1, process.task2)
where the associate function can take 0 or more arguments of type Task.

The import statement

The standard Python import statement is used to inject the objects from another file into the current file, therefore it can be used to include shared-use configuration fragments. The import statements may be "nested", meaning that a configuration document may import a fragment which itself imports another fragment. There is no limit (other than that caused by memory exhaustion) on the number of levels of inclusion allowed. Here is an example:

import FWCore.Modules.printContent_cfi
In the above example, FWCore is the subsystem, Modules is the package, and printContent_cfi.py is the filename. Note that in the import statement the suffix .py of the filename is left out and directories are separated with a "." instead of "/". This is standard Python. In CMSSW, the environment is configured so that Python will look for imported python files in two places. For example if the release is CMSSW_8_1_0, it will look for
CMSSW_8_1_0/src/FWCore/Modules/python/printContent_cfi.py
Note that the directory named python is not included in the import statement. If and only if it fails to find the file there and if for example the current architecture is slc6_amd64_gcc530, then it will also look for
CMSSW_8_1_0/cfipython/slc6_amd64_gcc530/FWCore/Modules/printContent_cfi.py
Automatically generated cfi files (discussed below) will be found in subdirectories of the cfipython directory.

Use of import does not guarantee that the objects defined in the python configuration file will actually be used to configure the job. In order to be used to configure a job, the objects must be attached to the Process object which is referenced by the process variable. The attachment can either be done for each object individually by assigning them to attributes of process, e.g.,

import Subsystem.Package.somemodule
process.somename = Subsystem.Package.somemodule.somename
or all the variables of a python module can be attached to a process by using the extend() member function
import Subsystem.Package.somemodule
process.extend(Subsystem.Package.somemodule)
When using 'extend' all the variable names used in the imported module are assigned as the 'labels' for the process' attributes. Because the label assigned to a module comes from the process attribute's name assigned to the object and not from the original variable to which the object was assigned, the individual object assignment method allows one to 'relabel' a module to be different from the imported file
import Subsystem.Package.somemodule
process.someothername = Subsystem.Package.somemodule.somename

There are several alternative syntaxes for the import statement depending on how you wish to access the variables defined:

#must use full name to get to a variable
import Subsystem.package.filename
v = Subsystem.package.filename.variable

#must use short name to get to a variable
import Subsystem.package.filename as shortname
v = shortname.variable

#variable is added directly to this file
from Subsystem.package.filename import variable
p = variable

#all variables are added directly to this file
#deprecated by some because it might bring in too much
#but this is common in CMS configurations
from Subsystem.package.filename import *
p = variable

# Important convenience method to insert every cms object in the module into a process
# In the following example, the module is declared in the file 
#     Configuration/StandardSequences/python/Analysis_cff.py
# Same as import followed by process.extend
process.load("Configuration.StandardSequences.Analysis_cff")

An important detail to add is that "import *" and the process load and extend functions do not import or attach variables that start with underscore "_". If you do not want a variable to be imported or attached to the process, then you can start its name with an underscore and prevent that unless the name is explicitly used.

Blocking imports

Occasionally, a cfi.py or cff.py file may wish to declare itself "unimportable" except by certain files. As of CMSSW 3_1_2, this is possible by including the following statements at the start of the imported file:

import FWCore.ParameterSet.Config as cms

cms.checkImportPermission(allowedPatterns = ['Module','Module/Submodule'])

where the list can include any number of modules and/or submodules which are allowed to import the file. A user's top-level _cfg.py is allowed, by default, to import such a file. This behavior can be changed by setting the value of minLevel = 1 as another parameter to checkImportPermission()

Including Standard Module definitions

Every configuration has a top level file called the cfg file. By convention its filename ends with the extension _cfg.py. This file may define the entire configuration or it may include other configuration files by importing them.

Configuration File Initializer files (cfi files) are used to initialize modules with all of their default parameter settings. Each module is supposed to have one, with the module label the first part of its filename and the extension _cfi.py as the rest of the filename. This file is intended for use in end-users' cfg files; the user can then reset only the parameters that require non-default values. cfi files may also be imported into cff files.

Configuration File Fragment files (cff files) are like cfi files, but are not associated with modules one-for-one. They are used to contain (or import) pieces of a configuration which may be either larger than cfi files (e.g. import multiple cfi files and define Sequences and other objects) or smaller than cfi files (e.g. contain just one or a few parameter specifications or a PSet). These files may also be included in cfg files. The file extension is _cff.py.

For Component developers

The developer of a module should make sure that a cfi file is available which can be imported. There are two ways to do this. The recommended method is to add a function to the C++ code of the module called fillDescriptions. After this has been created, when scram build is run a cfi file will be automatically generated. There are several advantages to creating the cfi file this way. First, the cfi is automatically kept consistent with the C++ code. There is no need to remember to change the cfi file and less chance for an error to create an inconsistency. The fillDescriptions function is used to validate configurations. This validation can report mistakes, for example if a parameter name was spelled incorrectly. There is also an executable that will provide information about the possible parameters of modules which contain a fillDescriptions function, for example edmPluginHelp -p EventContentAnalyzer or for brief output edmPluginHelp -b -p EventContentAnalyzer. This is described in detail here: SWGuideConfigurationValidationAndHelp.

Alternatively, one can manually create a cfi file and put it in the python subdirectory of a package. This is the same directory where cff files are located. cff files must be manually created. (Note: You do not need to manually create a cfi file if you implemented the fillDescriptions function and one is automatically generated. If both are created, the manually created one will be used when an import occurs.)

See above for additional information about importing files and file locations.

cfi files should be named

<moduleLabel>_cfi.py
Developers of ESSources and other components which have no module labels should use the full class name as the first part of the cfi file name.

For example, the FWCore/Modules package has a file named:

FWCore/Modules/python/printContent_cfi.py
which contains:
import FWCore.ParameterSet.Config as cms

#print what data items are available in the Event
printContent = cms.EDAnalyzer("EventContentAnalyzer",
    #should we print data? (sets to 'true' if verboseForModuleLabels has entries)
    verbose = cms.untracked.bool(False),
    #how much to indent when printing verbosely
    verboseIndentation = cms.untracked.string('  '),
    #string used at the beginning of all output of this module
    indentation = cms.untracked.string('++'),
    #data from which modules to print (all if empty)
    verboseForModuleLabels = cms.untracked.vstring(),
    # which data from which module should we get without printing
    getDataForModuleLabels = cms.untracked.vstring(),
    #should we get data? (sets to 'true' if getDataFormModuleLabels has entries)
    getData = cms.untracked.bool(False)
)

For Component Users

Below is an example of a top level configuration that does not use a cfi. This is an actual working example that you might find useful. It will print out information about the content of a file named "test.root" (you can change the filename if you would like). This is a nice way to write a very small test configuration. It is particularly nice if you planned to manually edit the parameters of the printContent module. But this would be a horrible approach that would be extremely difficult to develop or maintain for complex configurations like the ones used in event generation, simulation, or reconstruction.

import FWCore.ParameterSet.Config as cms
process = cms.Process("TEST")
process.source = cms.Source("PoolSource",
    fileNames = cms.untracked.vstring(
        'file:test.root'
    )
)
process.maxEvents = cms.untracked.PSet(
    input = cms.untracked.int32(1)
)
process.printContent = cms.EDAnalyzer('EventContentAnalyzer',
  indentation = cms.untracked.string('++'),
  verbose = cms.untracked.bool(False),
  verboseIndentation = cms.untracked.string('  '),
  verboseForModuleLabels = cms.untracked.vstring(),
  getData = cms.untracked.bool(False),
  getDataForModuleLabels = cms.untracked.vstring(),
  listContent = cms.untracked.bool(True)
)
process.path = cms.Path(process.printContent)
Below is an equivalent cfg file that imports a cfi file. Notice that the parameters do not need to be repeated in the top level configuration. The default values from the cfi file are used.
import FWCore.ParameterSet.Config as cms
from FWCore.Modules.printContent_cfi import printContent
process = cms.Process("TEST")
process.source = cms.Source("PoolSource",
    fileNames = cms.untracked.vstring(
        'file:test.root'
    )
)
process.maxEvents = cms.untracked.PSet(
    input = cms.untracked.int32(1)
)
process.printContent = printContent
process.path = cms.Path(process.printContent)

Modifying Parameters

There are many ways modify parameters and it depends on the situation which is best. In this section, we discuss some of the situations and describe each.

First consider the very simple test configuration presented at the end of the preceding section. For example, if you wanted to change the text string used for indentation from '++' to '**' you could do this:

...
from FWCore.Modules.printContent_cfi import printContent
...
process.printContent = printContent
process.printContent.indentation = '**'
...

You do not need to specify the type of the parameter again because that was already done in the cfi file.

Replacement of more than one parameter is allowed. For example,

from Configuration.Generator.SingleElectronPt10_pythia8_cfi import generator
generator.PGunParameters.MinPt = 5.0
generator.PGunParameters.MaxEta = 50.0
generator.PGunParameters.ParticleID = ( 211, 11 )

Multiple values in an object with mulitple parameters can be modified via the use of a python `dict`

from Configuration.Generator.SingleElectronPt10_pythia8_cfi import generator
generator.PGunParameters = dict(MinPt = 5.0, 
                                MaxEta = 50.0, 
                                ParticleID = ( 211, 11 ) )
If there are additional parameters in the object which are not in the `dict` those parameters are not modified.

Warning: If you wish to modify a vector and set it equal to a single-element array, then you must insert a comma after that element:

generator.PGunParameters.ParticleID = ( 211, )

You may want to replace an entire ParameterSet:

generator.PGunParameters = cms.PSet(
    MaxPt = cms.double(100.01),
    MinPt = cms.double(9.99),
    ParticleID = cms.vint32(11),
    AddAntiParticle = cms.bool(True),
    MaxEta = cms.double(2.5),
    MaxPhi = cms.double(3.14159265359),
    MinEta = cms.double(-2.5),
    MinPhi = cms.double(-3.14159265359) ## in radians
)

All vector style parameters behave like normal Python lists and therefor allow append, extend and other list manipulations.

The syntax is as follows:

a.composers.append("Beethoven")
a.painters.extend( ("Picasso", "da Vinci"))

Cloning is strongly recommended when you are developing a cff file that might be used as part of a large complicated configuration. While creating the clone, one or more parameters can be modified.

The standard syntax for cloning is

from aPackage import oldName 
newName = oldName.clone (changedParameter = 42)

or

from aPackage import oldName as _oldName 
newName = _oldName.clone (changedParameter = 42)

The second form is better if the symbol oldName is not needed and this occurs in a fragment that might be imported with the process load function or a "from aModule import *" statement. Symbols starting with an underscore are not imported in these cases.

Cloning is important because the module object brought in by an import statement is always the same object. If the same file is imported in different cff files, the module object referenced is the same object. So if one cff file changes a parameter value, then all the modules referenced in all the cff files are affected. This is very bad. But when a clone is made, the module objects and parameters inside them are distinct objects. When a parameter value is changed it only affects one instance of the module. In large complicated configurations, this is very important.

There are different operating scenarios. There are many parameters that must have different values in different operating scenarios. These kind of parameter modifications are handled by "Eras" which are described here: SWGuideCmsDriverEras. "Eras" should be used where appropriate instead of other parameter modification schemes. Before the "Eras" system was created, customization functions were used. These were Python functions that made large scale modifications to configurations to accommodate different situations. Some still exist and are used, but for official production configurations there is an effort to replace these with "Eras" or some other mechanism. The customization functions tended to interfere with each other when there was more than one, and also make the configuration hard to understand and maintain.

Modifying Sequences and Paths

Sequences and paths have a "replace()" command, which lets you replace all occurences of a module or a sequence within a module or path with something else.

mySequence.replace(oldModule, newModule)
myPath.replace(oldSequence, newSequence)
myPath.replace(oldModule, newSequence)

If you wish to replace a module or a sequence in a process, and have the change propagated through to all sequences, paths, and endpaths which use that object, you should use

process.globalReplace("label", newObject)  # note the quotes!!

Warning: When replacing sequences or tasks you should use this with care if there are multiple calls to this or other similar functions that modify configurations. Some of these modifications can cause sequences and tasks to be expanded such that the sequence or task no longer exists (although its contents are preserved in the object that contained the sequence or task). Trying to replace one of these expanded sequences or tasks will not work.

Passing Command Line Arguments Through cmsRun

Starting in CMSSW_3_0_0, you will be able to pass command-line arguments through to the Python system. For example, if you start your job via this command

> cmsRun -m grid my_cfg.py myOpt myValue=True

Then, from anywhere in the Python configuration system, you could do:

import sys
print sys.argv

and see

['cmsRun', '-m', 'grid', 'my_cfg.py', 'myOpt', 'myValue=True']

Arguments beginning with '-' or '--' are handled by the C++ options parser in cmsRun, and other options which come after the Python file are used in Python.

Additionally, you can use FWCore.ParameterSet.VarParsing module to handle the parsing of the command line objects for you (see next section for documentation).

VarParsing Example

Here's a quick example of how to use the VarParsing Module

import FWCore.ParameterSet.VarParsing as VarParsing

# set up process
process = cms.Process("StarterKit")

# setup 'analysis'  options
options = VarParsing.VarParsing ('analysis')

# setup any defaults you want
options.outputFile = '/uscms/home/cplager/nobackup/outputFiles/try_3.root'
options.inputFiles= 'file1.root', 'file2.root'
options.maxEvents = -1 # -1 means all events

# get and parse the command line arguments
options.parseArguments()

# Use the options

process.source = cms.Source ("PoolSource",
                             fileNames      = cms.untracked.vstring (options.inputFiles),
                             debugVerbosity = cms.untracked.uint32(200),
                             debugFlag      = cms.untracked.bool(True),
                             )
...

process.maxEvents = cms.untracked.PSet(
    input = cms.untracked.int32 (options.maxEvents)
)

# talk to output module
process.out = cms.OutputModule("PoolOutputModule",
                               process.patEventSelection,
                               process.patEventContent,
                               verbose = cms.untracked.bool(False),
                               fileName = cms.untracked.string (options.outputFile)

Here's how I might call it:

unix> cmsRun Reco2Pat1_withVarParsingExample_cfg.py print \
inputFiles=one.root inputFiles=two.root inputFiles_load=Zjets_2_2_3.list > & output&

where inputFiles=one.root inputFiles=two.root tells it to append one.root and two.root to the list of files to use as input. inputFiles_load=Zjets_2_2_3.list tells it to look at the file Zjets_2_2_3.list and append all files listed to the list of files to use as input as well. print tells it to print out the current values of all user access variables (very useful for log files)

Reco2Pat1_withVarParsing_example_cfg.py.txt is a complete example configuration script.

VarParsing Documentation

The idea behind VarParsing package is simple: make it easy to set up variables that can be changed from the command line. When hooking up a variable, you need to tell the module certain information about the object:

  • name of object
  • default value of object
  • whether the object is a single number (VarParsing.VarParsing.multiplicity.singleton) or a list (VarParsing.VarParsing.multiplicity.list)
  • whether the object is a string (VarParsing.VarParsing.varType.string), integer (VarParsing.VarParsing.varType.int), or float (VarParsing.VarParsing.varType.float)
    • use integer for boolean flags (1 = True, 0 = False)
  • information string on object (shown when user types 'help' or 'print' from command line)

Note: Instead of typing VarParsing.VarParsing., you can simply type options. where options is the name of the VarParsing object you created.

Here's an example of hooking up an integer called someInt:

options.register ('someInt',
                  -1, # default value
                  VarParsing.VarParsing.multiplicity.singleton, # singleton or list
                  VarParsing.VarParsing.varType.int,          # string, int, or float
                  "Number of events to process (-1 for all)")

If when you create a VarParsing instance, you pass in the string 'analysis', you will get the following objects by default:

Analysis Options:

Option Purpose
maxEvents Number of events to process (singleton integer)
inputFiles List of files to process as input (list string)
secondaryFiles List of secondary files (if needed; list string)
outputFile Name of output file (singleton string)
secondaryOutput Name of secondary output (if needed; singleton string)

To set another default value, you would:

object.someInt = 4

If you have a list variable, you can load default values as follows:

options.someList = 'one.txt', 'two.txt'
options.someList = 'three.txt'
options.loadFromFile ('someList', 'nameOfFiles.list')

After this, options.someList will contain 'one.txt', 'one.txt', 'one.txt', and any files listed in 'nameOfFiles.list'.

Note that the format of 'nameOfFiles.list' text file is the same as the format that DBS will give to you. Comments (or lines you want to temporarily exclude) starting with the hash character (#) can be embedded in the text file and will be ignored.

Command Line Options

To assign a variable defined above from the command line, you simply use variable=value syntax.

If the variable (object) is a list, you have several options:

  • You can repeat variable=value many times to load different values into the list
  • You can type variable_clear to clear the variable list of all entries
  • You can type variable_load=nameOfFiles.list to load all files listed in nameOfFiles.list into the variable list

Finally there are two other commands you can put on the command line

  • print will print the current values of all of the settable variables
  • help will do the same as print, but also display possible commands and will then exit.

IMPORTANT: The configuration script MUST end with .py. If it does not, VarParsing will not know where to start looking at the commands and will assume that everything is for cmsRun.

VarParsing Tags

VarParsing has the ability to append tags to the name of the output file given the values of the command line options. This can be very useful, for example, when running different reconstruction algorithms so that 1) it is obvious which file goes with which options and 2) so you don't accidentally overwrite files you want to compare with each other.

object.setupTags (tag = 'someBool',
                  ifCond = 'someBool > 0')
object.setupTags (tag = 'someInt%d',
                  ifCond = 'someInt > 0',
                  tagArg = 'someInt')

The first case sets up a tag if the user variable someBool is greater than 0. The second case will setup a tag if someInt is greater than 0, and then it will embed the value into the tag ('%d').

An annotated example

Here we present an annotated sample configuration program that demonstrates many of the features found in a typical configuration program. The details of the file and the configuration objects are explained in the previous sections.

# Python is very picky about leading spaces on a line

# Comments (obviously!) are introduced by a sharp ('#'), and continue
# to the end of line.

#Bring in the CMS configuration python class and function definitions
import FWCore.ParameterSet.Config as cms

process = cms.Process("TEST")

# Each process must have a source attribute.
# As the first argument of the constructor, one must specify the
# name of the class to be instantiated. This name is the same name
# that the plug-in was associated with, generally done in the
# class's implementation file. This is usually the name of the
# C++ class, stripped of any namespace specification.

process.source = cms.Source("PoolSource",
    fileNames = cms.untracked.vstring('file:somefilename.root')
)

process.a = cms.EDProducer("AProducer",
    a = cms.int32(32),
    b = cms.vdouble( 1.1, 2.2 ),
    c = cms.vstring( ),            # empty vectors are accepted
    d = cms.vstring( 'boo', "yah" )
)

# Configuration file fragments may be included by naming 
# the file to be included...  
# The 'import' statement below shows the syntax.

import SomeSubsystem.SomePackage.partial_configuration_cff as _partial
process.b = cms.EDProducer("BProducer",
    a = cms.untracked.int32(14),
    b = cms.string('sillyness ensues')
    c = cms.PSet(
        a = cms.string('nested')
    )
).extend(_partial)

process.y = cms.OutputModule("PoolOutputModule",
    fileName = cms.untracked.string('myfile_y.root')
)

process.z = cms.OutputModule("PoolOutputModule",
    fileName = cms.untracked.string('myfile_z.root')
)

process.s1 = cms.Sequence( process.a+process.b )
process.s2 =  cms.Sequence( process.b )
process.s3 =  cms.Sequence( process.a )
# It is not an error for two sequences (here, s3 and s4) to be identical.
process.s4 =  cms.Sequence( process.a )

process.p1 = cms.Path(process.a+process.b)
process.p2 = cms.Path(process.s1 * (process.s3+process.s2) )
  
process.ep = cms.EndPath( process.y + process.z )

Historical Notes

  • Prior to the year 2008, CMS did not use the Python language for its configurations. CMS had developed its own language for its configuration files. We have tried to completely remove that old language from the code repository and documentation, but one still runs across it from time to time. Most often it is found in old documentation that needs to be updated or deleted. There are significant similarities between the old configuration language and the current python configuration files, but the old language is formatted differently and will fail if you try to run it.

  • Two operators are allowed to be used when building the module sequences in paths, endpaths, and sequences: '+' and '*'. There is currently no difference in their behavior. A very long time ago, these operators were used to express dependences between modules and those dependences had to be consistent on all paths. And there were error checks and exceptions thrown if those dependences were not consistent on all paths. But this scheme for checking errors and consistency was too difficult to maintain in the configurations and was removed a long time ago. The fact that two operators can still be used in these expressions is now just a historical remnant that we keep for backward compatibility reasons.

Review status

Reviewer/Editor and Date (copy from screen) Comments
-- ChrisDJones - 01 May 2007 translate config language page to python equivalent
-- JennyWilliams - 08 Aug 2007 moved page from workbook into SWGuide and put more basic introduction into the workbook
-- ChristopherJones - 19 Aug 2008 removed .data. from all import statements to match present day usage
-- CharlesPlager - 09 Jan 2009 Added information on VarParsing.py and command line arguments to configuration scripts
-- WilliamTanenbaum - 27 Sep 2010 Allow specification of luminosity block numbers in EventID and EventRange
-- StefanoBelforte - 17 Sep 2014 fix reference to SWGuideEDMPathsAndTriggerBits to use CMSPublic twiki
-- DavidDagenhart - 21 Dec 2016 Added parts about Task, Reviewed all the rest (except VarParsing parts), Many Updates and Fixes

Responsible: ChrisDJones, Liz Sexton-Kennedy
Last reviewed by: DavidDagenhart - 21 Dec 2016

Topic attachments
I Attachment History Action Size Date Who Comment
Texttxt Reco2Pat1_withVarParsing_example_cfg.py.txt r1 manage 3.1 K 2009-01-09 - 20:06 CharlesPlager Example configuration file using VarParsing module
Edit | Attach | Watch | Print version | History: r86 < r85 < r84 < r83 < r82 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r86 - 2019-06-20 - ChrisDJones



 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback