Quick Tips for the Python Configuration System
Complete:
Don't hesitate to add your own tips!
A note on using these "tricks" with CRAB
Using python as the configuration language lets you use programming language constructs to manipulate the process object, which is just a large static variable that configures cmsRun. As such, you can use some of these tricks with tools like CRAB, however in the case of CRAB, the configuration file sent to the worker node is only the static object resulting from running the python. No executable statements will remain. In other words if you do something like this:
if os.environ('VAR1'):
process.someVariable = os.environ('VAR2')
else:
process.someVariable = os.environ('VAR3')
the value of
someVariable
will depend on the environmental variables
VAR1,2,3
on your computer, not where your job actually runs.
Browsing the python configuration
You're getting lost in the python configuration files? Use this wonderful tool:
SWGuideConfigBrowser
Validating python configuration files
Running python on your configuration file is much faster than
cmsRun
:
python myscript_cfg.py
Creating the name of the output root file automatically
The following piece of code generates the name of the output file from the name of the input file:
inFile = 'aod.root'
process.source = cms.Source("PoolSource",
fileNames = cms.untracked.vstring('file:%s' % inFile)
)
outFile = "output_%s" % (inFile)
# more parameters can be added.
# if myParam is an integer:
# outFile = "output_%d_%s" % (myParam, inFile)
process.out = cms.OutputModule(
"PoolOutputModule",
fileName = cms.untracked.string(outFile)
)
Expanding the whole configuration to a single python file
Use the following script:
#! /usr/bin/env python
from optparse import OptionParser
import sys
import os
import imp
parser = OptionParser()
parser.usage = "%prog <file> : expand this python configuration"
(options,args) = parser.parse_args()
if len(args)!=1:
parser.print_help()
sys.exit(1)
filename = args[0]
handle = open(filename, 'r')
cfo = imp.load_source("pycfg", filename, handle)
cmsProcess = cfo.process
handle.close()
print cmsProcess.dumpPython()
Running on more than 255 files
The easiest way to get around these limits is to use the new
Command Line Arguments Through cmsRun using
VarParsing.
Python functions have a limit of 255 arguments, so very long vstrings cannot be created in one step. This pops up most often when people want to have long lists of files. The solution is simple, just extend a vstring as many times as needed:
readFiles = cms.untracked.vstring()
readFiles.extend(['file1.root', 'file2.root', ..... 'file255.root']);
readFiles.extend(['file256.root', .......])
process.source = cms.Source('PoolSource', fileNames = readFiles, ... )
A more compact alternative is to pass the list of arguments in a tuple, and tell python to expand it in place:
process.source = cms.Source('PoolSource',
fileNames = cms.untracked.vstring( *(
'file1.root',
'file2.root',
...
'file255.root',
'file256.root',
...
) )
)
You can also initialize a Python list or
vstring
from a text file using and
FWCore.Python
(tag
V00-00-00
)
import FWCore.Utilities.FileUtils as FileUtils
mylist = FileUtils.loadListFromFile ('fileWithInfoYouWant.txt')
mylist.extend ( FileUtils.loadListFromFile ('moreInfoIwant.txt') )
readFiles = cms.untracked.vstring( *mylist)
The format of
fileWithInfoYouWant.txt
should be a file per line and compatible with the format that DBS will give to you.. Comments (or lines you want to temporarily exclude) starting with the hash character (
#
) can be embedded in the text file and will be ignored. Finally, note that there is no '255' limit on the number of files you can have in a single text file.
Running cmsRun from the script itself
At the beginning of the config file, add:
import os
At the end of of the config file, after the process is fully constructed add several lines like this:
outFile = open("tmpConfig.py","w")
outFile.write("import FWCore.ParameterSet.Config as cms\n")
outFile.write(process.dumpPython())
outFile.close()
os.system("cmsRun tmpConfig.py")
Defining variables
It is possible to define variables to ease the maintenance of the configuration files.
For example, instead of:
process.allLayer1Taus.tauSource = cms.InputTag( "pfTaus" )
process.tauMatch.src = cms.InputTag( "pfTaus" )
process.tauGenJetMatch.src = cms.InputTag( "pfTaus" )
You can do:
taus = "pfTaus"
process.allLayer1Taus.tauSource = cms.InputTag( taus )
process.tauMatch.src = cms.InputTag( taus )
process.tauGenJetMatch.src = cms.InputTag( taus )
Reset the random seeds every time
You can use a helper function to set the random seeds to different values every time you run cmsRun (destroying reproducibility, of course). Any time after you have included the generator sequence, use these lines:
from IOMC.RandomEngine.RandomServiceHelper import RandomNumberServiceHelper
randSvc = RandomNumberServiceHelper(process.RandomNumberGeneratorService)
randSvc.populate()
Use a JSON file of good lumi sections to configure CMSSW
Usually the JSON files of lumi sections are used as inputs into CRAB. But if you want to run interactively on the same lumi sections, you can use this little trick:
For CMSSW 5.0 and higher:
import FWCore.PythonUtilities.LumiList as LumiList
import FWCore.ParameterSet.Types as CfgTypes
process.source.lumisToProcess = CfgTypes.untracked(CfgTypes.VLuminosityBlockRange())
JSONfile = 'Cert_190456-195947_8TeV_PromptReco_Collisions12_JSON_v2.txt'
myLumis = LumiList.LumiList(filename = JSONfile).getCMSSWString().split(',')
process.source.lumisToProcess.extend(myLumis)
For CMSSW 3.8.X and higher, but less than 5.X.Y (Checking out
PhysicsTools/PythonAnalysis):
import CMS.PhysicsTools.PythonAnalysis.LumiList as LumiList
import FWCore.ParameterSet.Types as CfgTypes
myLumis = LumiList.LumiList(filename = 'goodList.json').getCMSSWString().split(',')
process.source.lumisToProcess = CfgTypes.untracked(CfgTypes.VLuminosityBlockRange())
process.source.lumisToProcess.extend(myLumis)
Checking to see where a module lives
There can sometimes be confusion as to whether or not Python is
picking up modules from your local project area or from the CMSSW
release. An easy way to check is to look when running python
interactively. For example:
python
>>> import CMS.PhysicsTools.PythonAnalysis.LumiList as LumiList
>>> LumiList.__file__
'/uscms/home/cplager/work/cmssw/CMSSW_3_8_0_pre8/python/CMS.PhysicsTools/PythonAnalysis/LumiList.py'
shows that in this case, I am picking up
CMS.PhysicsTools.PythonAnalysis.LumiList from my local project
area.
Tracking modifications
For a utility to track modifications as they are made, see
SWGuideConfigHistory.
Links to the python documentation
The python documentation is good, it is worth having a look: