ILC specific jobs
This section shows the preferred way of creating jobs in ILCDIRAC. This uses the PYTHON API. To use it, some basic knowledge of object oriented (OO) programming is needed, as PYTHON is a completely OO language. The examples shown below use the API described in
http://lcd-data.web.cern.ch/lcd-data/doc/ilcdiracdoc/
.
The philosophy of the ILCDIRAC interface is the following: a user must care about what he wants to run, but not how. That's why we provide the interface to all ILC applications that will prepare the runtime environments and execute them. The users need not to care about environment variables, directories to look for, etc. To run an application in ILCDIRAC, there are only very few mandatory parameters, in particular those that would be passed on the command line to call a pre installed application (example: the steering file for Marlin). Moreover, for example, input files do not need to be changed by the user to match what the job is going to run on: this is changed automatically (by default) in the steering files. The idea is that a user would test his run locally, then pass directly the steering file to the job definition without having to do any modification.
Job types and basic job definition
In ILCDIRAC, there are 2 main job types: User jobs and Production jobs. Here are only covered the User ones. Let's start with a generic example that doesn't do anything:
from DIRAC.Core.Base import Script
Script.parseCommandLine()
from ILCDIRAC.Interfaces.API.DiracILC import DiracILC
dirac = DiracILC(True,"some_job_repository.rep")
from ILCDIRAC.Interfaces.API.NewInterface.UserJob import UserJob
job = UserJob()
job.setName("MyJobName")
job.setJobGroup("Agroup")
job.setCPUTime(86400)
job.setInputSandbox(["file1","file2"])
job.setOutputSandbox(["fileout1","fileout2", "*.out", "*.log"])
job.setOutputData(["somefile1","somefile2"],"some/path","CERN-SRM")
print job.submit(dirac)
If submitted as is, it will fail because the different files are not there.
Let's go through the individual lines. The first 2:
from DIRAC.Core.Base import Script
Script.parseCommandLine()
are mandatory to get the DIRAC environment known to the script (This is oversimplifying things but enough). For example, this way the different services that are used get their server addresses set. This also initializes many internal DIRAC utilities that ILCDIRAC makes use of (logger functionality for example). They need to be called before the other DIRAC imports to avoid race conditions.
The following 2 line
from ILCDIRAC.Interfaces.API.DiracILC import DiracILC
dirac = DiracILC(True,"some_job_repository.rep")
take care of importing and creating a
DiracILC
object. This class is the "job receiver" class, as the last line
job.submit(dirac)
indicates. It's needed as it makes sure that all the specified inputs (if any) are actually available. It also checks that all requested software (more on that later) is available. Several additional utilities are provided by the inheritance of
DiracILC
from the main
Dirac
class (see
API
doc), not discussed here. Finally, it provides the interface to the Job Repository. This is a text file, in this case
some_job_repository.rep
, that has a special structure: it hold the jobIDs of the submitted jobs, and their properties (have a look once you have submitted a job). It's very important to keep this file safe as it is used to retrieve the job outputs easily (more on that later). A good way to use this functionality is to have a different file name per activity,
The next few lines
from ILCDIRAC.Interfaces.API.NewInterface.UserJob import UserJob
job = UserJob()
job.setName("MyJobName")
job.setJobGroup("Agroup")
job.setInputData(['datafile1','datafile2'])
job.setInputSandbox(["file1","file2"])
job.setOutputSandbox(["fileout1","fileout2", "*.out", "*.log"])
job.setOutputData(["somefile1","somefile2"],"some/path","CERN-SRM")
are needed for an actual job definition. Again, in this example, the job doesn't do anything, and will likely fail if submitted: the specified inputs are not available (more on that later).
The process of job definition starts with
from ILCDIRAC.Interfaces.API.NewInterface.UserJob import UserJob
job = UserJob()
where you instruct ILCDIRAC what type of job to use. This has little visible consequences for the users, but is making important choices for the internal functionality.

Users should ALWAYS use a
UserJob
unless explicitly asked or recommended.
The next few lines give some of the
UserJob
class method use example. The first 2,
setName
and
setJobGroup
, can be used for the monitoring: the names are displayed on the monitoring page and the JobGroup can be used to select only the jobs belonging to a given group.
The next one,
setInputData
is used as it's name suggest to define input data. Input data means files that can be on a tape backend and can rely on staging (recall from tape) to be available. Files produced from the production system should be accessed using this method. For the Mokka,
SLIC, Marlin, LCSim, SLICPandora, and LCIOSplit application there is a link between setInputData and the application. If
setInputFile
was not used "hepevt, stdhep, or slcio" files in the
inputData
will be used as input files for the respective application. For other applications there is no link between this and the application
setInputFile
so you'll need to specify the file in both. The file name to give is an LFN (Logical File Name, described in the File Catalog section of this page), like
/ilc/prod/clic/[...]/something.slcio
.
The next 2,
setInputSandbox
and
setOutputSandbox
, indicate to the job what files should be shipped to the job working area, and what files should be brought back when retrieving the job outputs. Any file in the
setInputSandbox
must reside locally (absolute or relative path accepted), or be stored on the GRID, and specified with the LFN, where it must be prepended by
LFN:
. For example, if there is a file
/ilc/user/s/someone/file
, the file to specify in the
setInputSandbox
should be
LFN:/ilc/user/s/someone/file
. As it can be seen in the example, python lists can be used, but not lists of lists.

Make sure your sandboxes do not contain such things, as the system cannot correct them.

It is a good idea to put on the GRID large input files (or often used) and specify them using the
LFN:
definition: they make the job submission much faster and make the ILCDIRAC servers happy (no huge amount of data to carry around). How to do this is given in the section
Specifying your libraries.
The
setOutputSandbox
follows the same style. There are several important things to notice: If a file is missing in the output, DIRAC will mark the job as failed, but the output will be filled with whatever is available among the requested things. If the sandbox is bigger than 10MB, the files will be packed together in a tar ball and shipped to the GRID. Note the "*.log" and "*.out" parts in the output Sandbox. This will include the application log files in the output sandbox making debugging much simpler.

In version ILCDIRAC v16r7p0, it seems that retrieving the output sandbox of a job does not retrieve those files from the GRID. Retrieving job outputs is explained later on this page.
The last part of this section is
job.setOutputData(["somefile1","somefile2"],"some/path","CERN-SRM")
This is used when a user wants to keep some files on the GRID for further processing or for easy sharing with others. It implies that the specified files
somefile1
and
somefile2
will be uploaded to the GRID under the automatically built LFN
/ilc/user/u/username/some/path/
.
u
and
username
indicate the DIRAC user name of the user that created the job (u is the initial). The
some/path
part of the LFN comes from the second element in the method call above. Finally, the storage element to store the files can be specified by passing its logical name in the last element of the method call. In this example, e use
CERN-SRM
, which is the default storage for ILCDIRAC. See the
Storage Elements section for a list of valid SEs.

If the output files already exist in the File Catalog, the jobs will fail because overwriting files is not possible. Be sure to clean before submitting the jobs.
Finally, the line
print job.submit(dirac)
submits the job, and applies the checking procedure defined internally. The
print
statement allows to see on screen the job ID. You can check that this job ID is in the
some_job_repository.rep
file.
This ends the general presentation of the jobs. More information can be found in the
API
documentation. The next sections will show how the different applications can be configured.
ILC applications-job framework
For maximum flexibility in the ILCDIRAC interface, the application-support framework was carefully designed. It's based in the following assumptions:
- An application has several generic properties
- It has a name and possibly a version
- It will be steered with some configuration file
- It will process a given number of events (for HEP)
- It will maybe produce some log file
- It will produce some output file
- It may process some input data
- It may have energy dependency (for HEP)
- An also application specific
- An application is a block of treatment of information
- It should be easy to "plug" an application in a workflow
- A Job is only one way to run an application
- An application should be "standalone"
Those requirements were implemented in the ILCDIRAC Application framework. The corresponding PYTHON implementation is best described in the
API
documentation. For completeness, we show here a python code that makes use of this framework, although it's just a non functional example. All ILC applications described below can use the same methods.
from ILCDIRAC.Interfaces.API.NewInterface.Application import Application
ap = Application()
ap.setName("MyName") #application name
ap.setVersion("1") #version
ap.setLogFile("MylogFile") #log file name (stdout of the application)
ap.setInputFile("MyInput") #input file name, can be a list,
#can contain elements with LFN:
ap.setOutputFile("MyOutput") #output file name
ap.setSteeringFile("SomeSteeringFile")#steering file (configuration), can have LFN:
ap.setNumberOfEvents(10) #Obviously...
ap.setEnergy(3000) #Energy
res = job.append(ap) #Add this application to the job
if not res['OK']: #Catch if there is an error
print res['Message'] #Print the error message
#do something, like quit
Here,
job
is an instance of the
UserJob
class described before. It is possible to stack applications one after the other:
ap1 = Application()
ap2 = Application()
res = job.append(ap1)
if not res['OK']:
print res['Message']
#do something, like quit
res = job.append(ap2)
if not res['OK']:
print res['Message']
#do something, like quit
It is also possible to chain applications: the second one can get its output from the first one:
ap1 = Application()
ap1.setOutputFile("MyOutput") # Needed when chaining
ap2 = Application()
ap2.getInputFromApp(ap1) #This is where the magic happens
res = job.append(ap1)
if not res['OK']:
print res['Message']
#do something, like quit
res = job.append(ap2)
if not res['OK']:
print res['Message']
#do something, like quit
This last property does not make sense for all applications, in particular for the user application, as DIRAC cannot "guess" what a random binary will produce. If specified in the application steering file AND in the
setOutputFile
field, then it will behave as expected.
The next sections describe the application specific methods (setters). Everything that is described in this section applies for the following, except otherwise indicated.
Generic application
A generic application is any executable that is not part of the ILC software chain (but can use some ILC software) and not ROOT (as ROOT has it's own wrapper). I may be used to run a user code.
A generic application is defined as follows:
from ILCDIRAC.Interfaces.API.NewInterface.Applications import GenericApplication
ga = GenericApplication()
Check the
API
documentation for the full detail.
The 3 methods that are specific to this Application are the following:
- The setter of the script to run:
ga.setScript("somescript_or_binary")
This can be any executable script (shell or python). Check that the mod (
chmod +x
) is correct before submission. It can also be an
LFN:
- The arguments to pass to the script
ga.setArguments("some command line arguments")
The arguments here are passed as they are input here.
- A dependency to an ILCDIRAC supported application:
ga.setDependency({"ROOT":"5.34"})
This makes sure the dependency (here
ROOT 5.34
) is installed prior to running the script. Any ILCDIRAC supported application can be set here, the full list can be obtained by running
dirac-ilc-show-software
.
Generation
The generation of events in ILCDIRAC is done using WHIZARD 1.95 (for the moment) for most events, and with PYTHIA directly for
ttbar
,
WW
, and
ZZ
events. The support for WHIZARD 2. is being developed, so if you need events produced with WHIZARD 2, you should request help from
ilcdirac-support@cernNOSPAMPLEASE.ch.
Whizard 1.95
This generator is used for the CLIC
CDR and ILC DBD studies. It has several limitations not discussed here, but the main one being that it requires a dedicated binary containing the process one wants to study. As having an exhaustive list of processes is not possible, only a few (>1000) are available. If you need to use this application, it's a good idea to contact
ilcdirac-support@cernNOSPAMPLEASE.ch to ask if a given process is available. If not, someone will (or not) add it to WHIZARD and make it available. When WHIZARD 2 is supported by ILCDIRAC, the process can be obtained directly as WHIZARD will be ran "automatically".
The standard way to create a WHIZARD application is by using
from ILCDIRAC.Interfaces.API.NewInterface.Applications import Whizard
from ILCDIRAC.Interfaces.API.DiracILC import DiracILC
d = DiracILC()
wh= Whizard(d.getProcessList())
wh.setModel("sm")
pdict = {}
pdict['process_input']={}
pdict['process_input']['process_id'] = "qq"
pdict['process_input']['sqrts'] = 1400
pdict['simulation_input'] = {}
pdict['simulation_input']['n_events'] = 10000
pdict['beam_input_1'] = {}
pdict['beam_input_1']['particle_name']='e1'
pdict['beam_input_1']['polarization'] = "0.0 0.0"
pdict['beam_input_1']['USER_spectrum_on'] = 'T'
pdict['beam_input_1']['USER_spectrum_mode'] = 19
pdict['beam_input_1']['ISR_on'] = 'T'
pdict['beam_input_2'] = {}
pdict['beam_input_2']['particle_name']='E1'
pdict['beam_input_2']['polarization'] = "0.0 0.0"
pdict['beam_input_2']['USER_spectrum_on'] = 'T'
pdict['beam_input_2']['ISR_on'] = 'T'
pdict['beam_input_2']['USER_spectrum_mode'] = -19
wh.setFullParameterDict(pdict)
wh.setOutputFile("toto.stdhep")
res = j.append(wh)
if not res['OK']:
print res['Message']
exit(1)
As usual, a detailed description of the class is available on the
API documentation page
. I will only briefly discuss the instance definition, and how it works.
from ILCDIRAC.Interfaces.API.NewInterface.Applications import Whizard
from ILCDIRAC.Interfaces.API.DiracILC import DiracILC
d = DiracILC()
wh= Whizard(d.getProcessList())
As can be seen on the code snipped above, it is necessary to get a
DiracILC instance to create a WHIZARD instance. The reason is that it's mandatory to get the available processes for WHIZARD to be configured. In particular, the process list contains all the processes as well as in which version of WHIZARD they are defined. The
d.getProcessList()
takes care of downloading the file locally. You will find the
processlist.whiz
file in the local directory.
The next thing is
wh.setModel("sm")
. All the available models are defined in the Configuration Service under
/Operations/Defaults/Models
. In this example, the SM is used so nothing particular happens. When using SUSY for instance, this is used to define which
LesHouches file to use. It is always possible to overwrite the
LesHouches file to use by placing one in the
InputSandbox, and naming it
LesHouches.msugra_1.in
. No, it's not necessarily a msugra model, but that's just a name and it has to be like that to be picked up.
After, there is
pdict = {}
...
wh.setFullParameterDict(pdict)
which allows setting the whizard parameters. The structure of the
pdict
is the same as the whizard.in file: there are sections like
pdict['process_input']
that correspond to the
process_input
section of the whizard.in. All the sections are available this way, with the notable difference of the
beam_input
sections that are named explicitly
beam_input_1
and
beam_input_2
for clarity. All the possible parameters described on the
whizard documentation page
can be set.
The rest of the possibilities is described in the
API page
.
Pythia
The PYTHIA application is rarely used as it's not generic at all: it can only produce ttbar, WW, and ZZ events. When needing such events, I usually run it locally and upload the files. The issue with that application is the lack of flexibility, everything is hard coded in the FORTRAN code. It should be considered as an expert's application and will not be documented here. Nevertheless, for the advanced user, as usual, the
API is documented at the usual location
.
StdHepCut and StdhepCutJava
There are 2 versions of this application, which are equivalent: one in C++ and one in JAVA. The latter is needed because the former does unnecessary thing due to limitations of the C++ stdhep library. In practice, the JAVA version should be preferred. The interface being identical for both application, the example will use the JAVA one.
from ILCDIRAC.Interfaces.API.NewInterface.Applications import StdhepCutJava
cut = StdhepCutJava()
cut.setVersion("1.0")
cut.setNumberOfEvents(10000)
cut.setSteeringFile('cuts_quarks_1400.txt')
cut.setMaxNbEvts(10)
cut.setSelectionEfficiency(0.5)
res = j.append(cut)
The
setNumberOfEvents
value is used in conjunction with
setSelectionEfficiency
and
setMaxNbEvts
to make sure that enough events are input. The
setMaxNbEvts
is the maximum number of events to write in the output file.
This application uses the code available at
http://svnweb.cern.ch/guest/lcgentools/trunk/stdhepcut_java
. There is a README file in the svn repository that will tell you how to use the application, and by looking at the code you may be able to develop your own cut code. It you want to get your code integrated in the trunk, you should get in touch with either
ilcdirac-support@cernNOSPAMPLEASE.ch or any LC common generator WG member.
The following only applies for the JAVA part, as the C++ does not allow it:
You can use your own cut classes. They must be stored under org.lcsim.stdhepcut.cuts. Your corresponding jar file must be stored in a
lib
directory that you can then add to your sandbox. The StdhepCutJava application takes the
lib
directory and adds it in the
CLASSPATH
.
Simulation
The design of those applications could be a bit better structured, as they are all independent, instead of inheriting from a common SimApplication class. That's just an implementation detail, transparent for the user.
Mokka
This application was the first to be added to ILCDIRAC.
It's interface is described like any other in the
API documentation
. The example below shows how it can be used:
from ILCDIRAC.Interfaces.API.NewInterface.Applications import Mokka
mo = Mokka()
mo.setInputFile("some_file.stdhep")
mo.setVersion("0706P08")
mo.setSteeringFile("clic_ild_cdr.steer")
mo.setOutputFile("totosim.slcio")
res = j.append(mo)
That's the simplest way to call the application.
There are more possibilities that are worth showing:
mo.setDetectorModel('Something')
allows to define another detector model, as the default is
CLIC_ILD_CDR
.
mo.setDbSlice("something.sql")
allows to define your own MySQL database dump. It can be a local file or an lfn. The specified detector model must exist in the dump of course.
mo.setMacFile("mymacro.mac")
Allows to specify your own macro file. This is needed when you want to use ParticleGun for example. It can be a local file or an lfn. Whatever is in there is unchanged. If you don't specify a macro file, the application creates one with the input file, the startFrom and the number of events specified. This means that if you have your own macro file and want to use the startFrom functionality or change the number of events, you need to set it yourself in your macro file.
mo.setStartFrom(10)
Allows to define the starting event in the input file.
SLIC
The interface of
SLIC is similar to that of Mokka, and is described as usual in the
API documentation
.
It is invoked with
from ILCDIRAC.Interfaces.API.NewInterface.Applications import SLIC
slic = SLIC()
slic.setVersion('v2r9p8')
slic.setInputFile("some_file.stdhep")
slic.setSteeringFile('MyMacro.mac')
slic.setDetectorModel('clic_sid_cdr')
slic.setOutputFile("Something.slcio")
res = j.append(slic)
Similarly to Mokka it has a
setStartFrom(N)
method that allows skipping the first
N
events in the input file.
The detector model handling is worth presenting: it is the radical of a zip file, in this case it would be
clic_sid_cdr.zip
. The zip file can live either locally, as an lfn, or as a standard detector description on the lcsim.org web portal. It must be a zip file and the detector it describes must be called like the zip file but without the
.zip
.
There is a script written by C. Grefe that takes care of running the SID reconstruction chain. J. McCormick wrote some documentation on the
confluence page
.
Reconstruction
Marlin
The runs the reconstruction in the context of the ILD detector concept. The API is as usual, described in the
API documentation
.
from ILCDIRAC.Interfaces.API.NewInterface.Applications import Marlin
ma = Marlin()
ma.setVersion('ILCSoft-01-17-08')
ma.setInputFile("something.slcio")
#job.setInputData(["something.slcio"]) #add this line if the file is on a storage element with tape backend (e.g., CERN)
ma.setSteeringFile("SomeSteeringFile.xml")
ma.setGearFile("SomeGear.gear")
##optionally, better use the job.setOutputData(...) function
ma.setOutputRecFile("MyRECfile.slcio")
ma.setOutputDstFile("myDSTfile.slcio")
res = j.append(ma)
Check the API documentation for 2 extra methods that can be useful.
If you want to run with your own libs (LD libs and/or MARLIN_DLL), the
lib
directory MUST have the following structure:
- The LD libs must go under
lib/lddlib/
. It is recommended to put the versioned libraries here as well, i.e., something like libUser.so
, as well as libUser.so.5.7
- The Processors MUST be under
lib/marlin_dll/
- Any Marlin DLL must end on
.so
(not .so.xyz
)
This comes from the fact that Marlin is sensitive to the difference between a Processor and a library.
LCSIM
LCSIM is now used mostly to run the tracking and create the final files for the SID detector concept. The PFA is ran as another application, SLICPandora, described later. The example below is specific to such a use case (most likely use case any way)
from ILCDIRAC.Interfaces.API.NewInterface.Applications import LCSIM
lcsim_prepandora = LCSIM()
lcsim_prepandora.setVersion('CLIC_CDR')
lcsim_prepandora.setSteeringFile("clic_cdr_prePandora.lcsim")
lcsim_prepandora.setTrackingStrategy("defaultStrategies_clic_sid_cdr.xml")
lcsim_prepandora.setOutputFile("prePandora.slcio")
lcsim_prepandora.willRunSLICPandora()
res = j.append(lcsim_prepandora)
There are a few other options that are described in the
API documentation
.
There is a script written by C. Grefe that takes care of running the SID reconstruction chain. J. McCormick wrote some documentation on the
confluence page
.
SLICPandora
This application (described more in the
API documentation
is usually used in conjunction with LCSIM and the example below shows such a use case.
from ILCDIRAC.Interfaces.API.NewInterface.Applications import SLICPandora
slicpandora = SLICPandora()
slicpandora.setVersion('CLIC_CDR')
slicpandora.setDetectorModel('clic_sid_cdr')
slicpandora.setPandoraSettings("PandoraSettingsSlic.xml")
slicpandora.getInputFromApp(lcsim_prepandora)
slicpandora.setOutputFile('pandora.slcio')
res = j.append(slicpandora)
It is of course possible to use it as standalone.
One noticeable aspect if the detector model handling. It's identical to the one of SLIC: the detector model must be e.g.
detectormodel
where it's described in a
detectormodel.zip
living either locally, as an lfn, or in the lcsim.org portal.
There is a script written by C. Grefe that takes care of running the SID reconstruction chain. J. McCormick wrote some documentation on the
confluence page
.
Adding the Overlay
As the production system is non deterministic in the sense that it does not know a priori the files it will run on, the overlay files have to be determined at the last minute. This is particularly needed as in the CLIC case, we had many interactions per bunch crossings so the number of events to get was very large, and could vary from job to job. So for that, there is another application to add
before marlin or LCSIM, called OverlayInput. This application does not have an OutputFile so don't try to use
ma.getInputFromApp(ov)
as it would fail.
It requires a few parameters like the number of interactions per bunch crossing, the detector model, etc. Those are described in the
API documentation
, and the example below shows how to configure it.
from ILCDIRAC.Interfaces.API.NewInterface.Applications import OverlayInput
overlay = OverlayInput()
overlay.setMachine("clic_cdr")
overlay.setEnergy(3000.0)
overlay.setBXOverlay(60)
overlay.setGGToHadInt(3.2)##When running at 3TeV
overlay.setDetectorModel("CLIC_ILD_CDR")
overlay.setNbSigEvtsPerJob(10)
res = j.append(overlay)
In any case, how it works:
- Given a signal input file (its number of events to be exact), the interaction probability, and the number of bunch crossing to overlay, it computes the number of background events to fetch.
- Gets from the configuration the ProdID that is suitable for that sample of background given the machine, the energy and the detector model, and the number of events per bkg file. You can see an example of this under /Operations/Defaults/Overlay/ilc_dbd/energy/detectormodel/bkgtype/ in the Configuration Service.
- Fetch the list of files from the catalog with that ProdID and other tags (detector model, energy, machine, etc.)
- Get the appropriate number of files given the number of events per background files. The files are then chosen randomly.
- Stores those under a special directory
- When Marlin/LCSIM starts, the code checks that the steering XML references a
bgoverlay
or overlaytiming
or org.lcsim.util.OverlayDriver
section and then sets the files accordingly.
ROOT applications
Other applications
GetSRMFile
Example to use
GetSRMFile:
from ILCDIRAC.Interfaces.API.NewInterface.Applications import GetSRMFile
srm = GetSRMFile()
fdict={"file" : "srm://dcache-se-desy.desy.de/pnfs/desy.de/calice/generated/2011-10-04/file885bf7d6-f911-4169-8124-097379c42c0f", "site" : "DESY-SRM"}
srm.setFiles(fdict)
srm.setDebug(True)
res = job.append(srm)
[...]
ma = Marlin()
#either this or the line after can be used
#ma.setInputFile("file885bf7d6-f911-4169-8124-097379c42c0f")
ma.getInputFromApp(srm)
job.append(ma)
[...]
When using
GetSRMFile with Mokka, make sure that you let dirac create the macro file so that the proper path to the input file can be generated.
SLCIOSplit
SLCIOConcatenate
StdHepSplit
CheckCollections
--
AndreSailer - 2014-12-08