How to Submit SiD Simulation and Reconstruction Jobs Using ILCDirac
Interfaces for all the
SiD simulation and reconstruction software packages are available in ILCDIRAC and allow easy definition and submission of jobs using python.
This page includes python examples for all possible steps. It also explains the use of a flexible job submission script, that allows to directly submit common job configurations from the command line and let's you get started immediately.
Python Examples
The following statements are required to create an ILCDirac job.
# import ILCDIRAC classes
from ILCDIRAC.Interfaces.API.DiracILC import DiracILC
from ILCDIRAC.Interfaces.API.ILCJob import ILCJob
# create an instance of ILCDIRAC and define a repository file to store the submitted job IDs
dirac = DiracILC ( True , "myRepositoryFile.txt" )
# create a new job
job = ILCJob ( )
# add any step to be executed here
job.setSystemConfig ( "x86_64-slc5-gcc43-opt" ) # need to correspond to a config defined in the dirac configuration
job.setName ( "myJob" ) # just an identifier
# the following statements are optional
job.setOutputSandbox ( [ "*.log" , "*.mac", "*.xml", "*.lcsim" ] ) # beeing able to retrieve the steering files is useful for debugging
job.setInputSandbox ( [ "file1", "file2", ... ] ) # required if the job needs input
job.setOutputData ( [ "outputFile1", "outputFile2", ... ], "CERN-SRM", "/my/storage/path" ) # other possible storage elements are RAL-SRM, IN2P3-SRM, etc.
job.setCPUTime( cpuLimit ) # maximum is 300000 seconds
job.setJobGroup( "myGroup" ) # helps finding jobs that belong together
job.setBannedSites( [ "Site1", "Site2", ... ] )
job.setDestination( "LCG.CERN.ch" )
# submit the job
dirac.submit ( job )
In addition any one of the following steps can be added before the submit command to define what is being executed in the job.
If several steps are defined, the output of one step is automatically used as the input for the following step (if no input is defined).
Only certain combinations are supported, though. LCSim can follow an
SLIC or
SlicPandora step and
SlicPandora can follow an LCSim step.
res = job.setSLIC ( appVersion = "v2r9p8" , # has to be available in the dirac configuration
detectorModel = "clic_sid_cdr" , # dirac will retrieve the detector geometry automatically from www.lcsim.org
macFile = "mySlicMacro.mac" , # steering file
inputGenfile = "LFN:/my/files/input.stdhep", # optional input file, if an LFN is given it is automatically added to the input sandbox
nbOfEvents = 100 ,
startFrom = 0 , # only set if you want to skip events from the input file
outputFile = "slicOutput.slcio" # don't forget to add it also to the outputdata!
)
# checking the return value is not required, but considered good practice
if not res['OK']:
print res['Message']
sys.exit(2)
Overlay background
This step will download the required background files and overlay them in the next LCSim step present.
res = job.addOverlay( detector = "SID", # need to set which background samples to use
energy = "3tev", # need to set which background samples to use
BXOverlay = 60, # number of bunch crossings per signal event
NbGGtoHadInts = 3.2, # number of events per bunch crossing
NSigEventsPerJob = 100 # number of signal events
)
# checking the return value is not required, but considered good practice
if not res['OK']:
print res['Message']
sys.exit(2)
LCSim with background overlay
res = job.setLCSIM ( appVersion = "CLIC_CDR" , # has to be available in the dirac configuration
xmlfile = "myLcsim.xml" , # steering file
aliasproperties = "mtAlias.properties" , # optional
evtstoprocess = 100, # optional, default is -1
inputslcio = [ "input1.slcio", "LFN:/my/files/input2.slcio", ... ], # can be a list of files, LFNs will be added to the input sandbox automatically
outputFile = "lcsimOutput.slcio" # don't forget to add it also to the outputdata!
)
# checking the return value is not required, but considered good practice
if not res['OK']:
print res['Message']
sys.exit(2)
res = job.setSLICPandora ( appVersion = "CLIC_CDR" , # has to be available in the dirac configuration
detectorgeo = "clic_sid_cdr" , # pandora.xml or detector name, dirac will then retrieve the detector geometry automatically from www.lcsim.org
inputslcio = [ "input1.slcio", "LFN:/my/files/input2.slcio", ... ], , # can be a list of files, LFNs will be added to the input sandbox automatically
pandorasettings = "myPandoraSettings.xml" , # optional, if none given a default settings file will be taken
nbevts = 100 , # optional, default is -1
outputFile = "slicPandoraOutput.slcio" # don't forget to add it also to the outputdata!
)
# checking the return value is not required, but considered good practice
if not res['OK']:
print res['Message']
sys.exit(2)
Job Submission Scripts
There are some flexible command line scripts available shipped together with ILCDIRAC:
$ILCDIRAC/Interfaces/API/Examples/SIDChain
Of course you can check them also check them out as a standalone from the repository:
svn co svn+ssh://svn.cern.ch/reps/dirac/ILCDIRAC/trunk/ILCDIRAC/Interfaces/API/Examples/SIDChain
LCSim Job
The script
lcsimJob.py
executes a single lcsim step with a user provided steering file.
It allows running directly on the full output of a certain production ID, as well as on a provided list of LFNs.
Since DIRAC does not know which output to expect and to upload, the script takes care of this.
In order to do that the steering file is parsed and placeholder strings are replaced by the correct file name for every job.
Thus, your steering xml should have the following strings instead of file names, where output files are produced:
-
__outputSlcio__
-
__outputAida__
-
__outputRoot__
-
__outputDat__
-
__outputTxt__
All of these will be replaced by the output file name, which is based on the job title and the input file name, plus the respective file ending. If your script creates multiple files of the same type or a different file type you can simply extend the list of replacement strings that are looked for.
If the job requires user code not available in the lcsim build you have to provide it in a jar file (see the tutorial
here). The easist way would be to place the jar file on a web accessible space and set the url directly in the lcsim steering file using
jarURL
instead of
jar
. This way lcsim will get the file automatically at runtime. You can also add it to the inputsandbox by appending
-J <myCode.jar>
to the submission command, the steering file should then have the jar file defined as the
jar
field.
Examples
The easiest possibility to submit a job would be
python lcsimJob.py -p <prodID> -l <myLcsimSteering.xml>
In this case all DST files from the given production will be processed. The output file path (default:
detectorName/eventType/jobTitle
) will be determined from the production meta data and the job name will be based on the steering file name.
Instead of using a production ID to define the input data, one can also provide directly a list of LFNs.
In this case the LFN list has to be passed as a seperate python script that just contains a list of strings (the LFNs) called
lfnlist
, i.e.:
lfnlist = [ "LFN:/my/file1.slcio", ... ]
This convention is identical to the one created by the script
dirac-repo-create-lfnlist
.
When using such a file list as input, the minimal example looks like this:
python lcsimJob.py -i <myFileList.py> -l <myLcsimSteering.xml> -e <myEventType>
In addition to the file list, also an event type has to be given, since no meta data is available to extract this automatically. Instead one can also specify the full output file path directly
python lcsimJob.py -i <myFileList.py> -l <myLcsimSteering.xml> -o <my/output/path>
Command Line Parameters
Parameter |
Takes Argument |
Description |
a |
alias |
yes |
Sets the alias.properties file to use, necessary if your desired detector model is not in the lcsim repository. |
A |
agent |
no |
Submit the job in agent mode for debugging. The job will be executed on the local machine but is also registered with the job monitoring system. |
b |
banlist |
yes |
Sets the file with the list of banned sites. Default is bannedSites.py shipped with the script. |
D |
detector |
yes |
Sets the detector name. Only used for determining the output path, since lcsim gets the geometry files automatically. Default is clic_sid_cdr . |
e |
eventtype |
yes |
Sets the name of the event type used for the output path. Overrides the one obtained from the production meta data. |
f |
files |
yes |
Sets the maximum number of files to process, i.e. only the first five files. |
h |
help |
no |
Shows this list of paramters. |
i |
input |
yes |
Sets the file with the list of LFNs to process. See above for details. |
J |
jar |
yes |
Sets which jar file should be added to the input sand box. |
l |
lcsimxml |
yes |
Sets the xml steering file to be used. |
L |
lcsim |
yes |
Sets the lcsim version to used. Has to be available in DIRAC. Default is CLIC_CDR . |
M |
merge |
yes |
Sets the number of input files processed in each job. |
n |
events |
yes |
Sets the number of events to process per job. Default are all in each input file. |
o |
outputpath |
yes |
Sets the path where the output data will be stored directly, instead of using detectorName/eventType/jobTitle . |
O |
override |
no |
By default, jobs which would produce a file that already exists in the file catalog will be skipped. If this option is set the old file will be deleted and the job will be submitted. |
p |
prodid |
yes |
Sets the production ID to be processed. |
R |
recfiles |
no |
If running over a production, the REC files are used instead of the DST files. |
S |
storageelement |
yes |
Sets the storage element where the output data will be stored. Default is CERN-SRM . |
t |
time |
yes |
Sets the maximum cpu time limit in seconds. Default is 100000 and maximum is 300000. |
T |
title |
yes |
Sets the job title, used in the output path and in the job monitoring. Default is the steering file name. |
v |
verbose |
no |
Switches of the output that is usually printed. |
y |
strategy |
yes |
Sets the tracking strategy file used in the job. Only required for special jobs. |