Quick Tips for the Python Configuration System

Complete: 5

TIP Don't hesitate to add your own tips!

A note on using these "tricks" with CRAB

Using python as the configuration language lets you use programming language constructs to manipulate the process object, which is just a large static variable that configures cmsRun. As such, you can use some of these tricks with tools like CRAB, however in the case of CRAB, the configuration file sent to the worker node is only the static object resulting from running the python. No executable statements will remain. In other words if you do something like this:

if os.environ('VAR1'):
    process.someVariable = os.environ('VAR2')
else:
    process.someVariable = os.environ('VAR3')

the value of someVariable will depend on the environmental variables VAR1,2,3 on your computer, not where your job actually runs.

Browsing the python configuration

You're getting lost in the python configuration files? Use this wonderful tool: SWGuideConfigBrowser

Validating python configuration files

Running python on your configuration file is much faster than cmsRun:

   python myscript_cfg.py

Creating the name of the output root file automatically

The following piece of code generates the name of the output file from the name of the input file:

inFile = 'aod.root'
process.source = cms.Source("PoolSource",
    fileNames = cms.untracked.vstring('file:%s' % inFile)
)

outFile = "output_%s" % (inFile)

# more parameters can be added. 
# if myParam is an integer:
# outFile = "output_%d_%s" % (myParam, inFile)

process.out = cms.OutputModule(
    "PoolOutputModule",
    fileName = cms.untracked.string(outFile)
)

Expanding the whole configuration to a single python file

Use the following script:

#! /usr/bin/env python

from optparse import OptionParser
import sys
import os 
import imp

parser = OptionParser()
parser.usage = "%prog <file> : expand this python configuration"

(options,args) = parser.parse_args()

if len(args)!=1:
    parser.print_help()
    sys.exit(1)

filename = args[0]
handle = open(filename, 'r')
cfo = imp.load_source("pycfg", filename, handle)
cmsProcess = cfo.process
handle.close()

print cmsProcess.dumpPython()

Running on more than 255 files

The easiest way to get around these limits is to use the new Command Line Arguments Through cmsRun using VarParsing.

Python functions have a limit of 255 arguments, so very long vstrings cannot be created in one step. This pops up most often when people want to have long lists of files. The solution is simple, just extend a vstring as many times as needed:

readFiles = cms.untracked.vstring()
readFiles.extend(['file1.root', 'file2.root', ..... 'file255.root']);
readFiles.extend(['file256.root', .......])

process.source = cms.Source('PoolSource', fileNames = readFiles, ... )

A more compact alternative is to pass the list of arguments in a tuple, and tell python to expand it in place:

process.source = cms.Source('PoolSource', 
    fileNames = cms.untracked.vstring( *(
        'file1.root', 
        'file2.root', 
        ...
        'file255.root',
        'file256.root', 
        ...
    ) )
)

You can also initialize a Python list or vstring from a text file using and FWCore.Python (tag V00-00-00)

import FWCore.Utilities.FileUtils as FileUtils
mylist = FileUtils.loadListFromFile ('fileWithInfoYouWant.txt') 
mylist.extend ( FileUtils.loadListFromFile ('moreInfoIwant.txt') )
readFiles = cms.untracked.vstring( *mylist)

The format of fileWithInfoYouWant.txt should be a file per line and compatible with the format that DBS will give to you.. Comments (or lines you want to temporarily exclude) starting with the hash character (#) can be embedded in the text file and will be ignored. Finally, note that there is no '255' limit on the number of files you can have in a single text file.

Running cmsRun from the script itself

At the beginning of the config file, add:

import os 

At the end of of the config file, after the process is fully constructed add several lines like this:

    outFile = open("tmpConfig.py","w")
    outFile.write("import FWCore.ParameterSet.Config as cms\n")
    outFile.write(process.dumpPython())
    outFile.close()
    os.system("cmsRun tmpConfig.py")

Defining variables

It is possible to define variables to ease the maintenance of the configuration files. For example, instead of:

process.allLayer1Taus.tauSource = cms.InputTag( "pfTaus" )
process.tauMatch.src = cms.InputTag(  "pfTaus" )
process.tauGenJetMatch.src = cms.InputTag( "pfTaus" )
You can do:
taus = "pfTaus"

process.allLayer1Taus.tauSource = cms.InputTag( taus )
process.tauMatch.src = cms.InputTag( taus )
process.tauGenJetMatch.src = cms.InputTag( taus )

Reset the random seeds every time

You can use a helper function to set the random seeds to different values every time you run cmsRun (destroying reproducibility, of course). Any time after you have included the generator sequence, use these lines:

from IOMC.RandomEngine.RandomServiceHelper import RandomNumberServiceHelper
randSvc = RandomNumberServiceHelper(process.RandomNumberGeneratorService)
randSvc.populate()

Use a JSON file of good lumi sections to configure CMSSW

Usually the JSON files of lumi sections are used as inputs into CRAB. But if you want to run interactively on the same lumi sections, you can use this little trick:

For CMSSW 5.0 and higher:

import FWCore.PythonUtilities.LumiList as LumiList
import FWCore.ParameterSet.Types as CfgTypes
process.source.lumisToProcess = CfgTypes.untracked(CfgTypes.VLuminosityBlockRange())
JSONfile = 'Cert_190456-195947_8TeV_PromptReco_Collisions12_JSON_v2.txt'
myLumis = LumiList.LumiList(filename = JSONfile).getCMSSWString().split(',')
process.source.lumisToProcess.extend(myLumis)

For CMSSW 3.8.X and higher, but less than 5.X.Y (Checking out PhysicsTools/PythonAnalysis):

import CMS.PhysicsTools.PythonAnalysis.LumiList as LumiList
import FWCore.ParameterSet.Types as CfgTypes
myLumis = LumiList.LumiList(filename = 'goodList.json').getCMSSWString().split(',')
process.source.lumisToProcess = CfgTypes.untracked(CfgTypes.VLuminosityBlockRange())
process.source.lumisToProcess.extend(myLumis)

Checking to see where a module lives

There can sometimes be confusion as to whether or not Python is picking up modules from your local project area or from the CMSSW release. An easy way to check is to look when running python interactively. For example:

python
>>> import CMS.PhysicsTools.PythonAnalysis.LumiList as LumiList
>>> LumiList.__file__
'/uscms/home/cplager/work/cmssw/CMSSW_3_8_0_pre8/python/CMS.PhysicsTools/PythonAnalysis/LumiList.py'

shows that in this case, I am picking up CMS.PhysicsTools.PythonAnalysis.LumiList from my local project area.

Tracking modifications

For a utility to track modifications as they are made, see SWGuideConfigHistory.

Links to the python documentation

The python documentation is good, it is worth having a look:

Reviewer/Editor and Date (copy from screen) Comments
-- ColinBernet - 01 Oct 2008
-- CharlesPlager - 09 Jan 2009 Updated 255 file limit section
-- CharlesPlager - 16 Jan 2009 Updated tag and added new recipe for CMSSW < 2.2.3 for 255 file limit section
Edit | Attach | Watch | Print version | History: r22 < r21 < r20 < r19 < r18 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r22 - 2014-01-26 - PhatSrimanobhas
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback