17 July 2014 - Alpha release. Fresh out of the oven.
07 Jan 2015 - Improved alpha release. API for raw execution of commands available to the users.
04 Feb 2015 - Released production version with CRAB v3.3.13
Introduction
The CRABAPI library is a python wrapper on top of the client and allows people to easily interact with the server through the same CRAB client interface used by the official client. It is intended for people who would like to automatize the submission of tasks with CRAB. With the CRABAPI library you will never have to call the client commands and parse their outputs again!
There are two ways of using the library. One is just a small python function, called crabCommand, that allows you to run the client commands accepting the same arguments they accept from the command line. Backward compatibility is not guaranteed and error handling is left to the users. The other is an interface which adds some intelligence to the client commands, and, among other things, manipulates the output of the server, guarantees backward compatibility. The latter is not available yet.
The crabCommand API
This is a thin API that allows easy access to the CRAB client commands. Users who have experience with the CRAB client command line should be able to understand quickly how to use the crabCommand API.
In order to use it you shoud make sure to do (replace sh with csh as appropriate for your shell):
and make sure that your first import in your python script is
import CRABClient
this will trigger addition to $PYTHONPATH of dependencies like WMCore and DBS which are
not in default CMSSW path.
Brief description
The signature of the crabCommand API is the following: returndict crabCommand(command, *args, **kwargs)Input arguments:
command: Takes the name of the CRAB command to be executed (e.g. 'submit', 'status', 'report', etc).
args: Positional arguments to be passed as such to the command.
kwargs: Keyword arguments are considered as (long-name) options to be passed to the command. The API takes care of adding a double hyphen ('--') in front of each keyword argument so that the command interprets them as options.
Returns:
The return value of the invoked CRAB command, which should be always a python dictionary.
The positional arguments args are supported for compatibility with the client command line, where the CRAB configuration file name and the CRAB project directory name can be passed as the first positional argument instead of using the --config and --dir options respectively. The submit command is the only one that accepts additional positional arguments; they are used for overwriting CRAB configuration parameters. However, when using the crabCommand API, the overwriting of CRAB configuration parameters can be done before calling the API. So at the end, the usage of the positional arguments args should not be needed and for simplicity we recommend using the keyword arguments kwargs only. The recommended syntax for passing options to the command is then the following:
Options that do not take a value in the command line should be specified giving the boolean value True in the API. If the boolean value is False, the corresponding keyword argument will be ignored by the API (i.e. the argument will not be passed as an option to the command). The API will then run the equivalent of:
Note: Remember that the --config option of the submit command accepts both the name of a CRAB configuration file and a Configuration object itself.
Brief usage examples
Example of submitting a task from the python shell
Here is a quick example on how to submit a task from the python shell. We use the CMSSW parameter-set configuration file pset_tutorial_analysis.py and the CRAB configuration file crabConfig_tutorial_MC_analysis.py (these files are available in the CRAB tutorial (introductory)).
>>> import CRABClient
>>> from CRABAPI.RawCommand import crabCommand
>>> res = crabCommand('submit', config = 'crabConfig_tutorial_MC_analysis.py')
Will use CRAB configuration file crabConfig_tutorial_MC_analysis.py
Importing CMSSW configuration pset_tutorial_analysis.py
Finished importing CMSSW configuration pset_tutorial_analysis.py
Sending the request to the server
Success: Your task has been delivered to the CRAB3 server.
Task name: 150122_113932_crab3test-1:mmascher_crab_tutorial_MC_analysis_test1
Please use 'crab status' to check how the submission process proceeds.
Log file is /afs/cern.ch/work/m/mmasher/wf/multicrab_crab3/crab_projects/crab_tutorial_MC_analysis_test1/crab.log
>>> print res
{'uniquerequestname': '150122_113932_crab3test-1:mmascher_crab_tutorial_MC_analysis_test1', 'requestname': 'crab_tutorial_MC_analysis_test1'}
Now we use the status command to check the status of a task:
Note: Every command will return a python dictionary that will give the result of the command and the server output. This is the raw information returned by the server and sometimes could contain more information than what one actually needs.
Example submitting multiple tasks using a python script
Suppose one wants to send the same configuration file as in the above example multiple times changing only the input dataset. One can accomplish this by adding the following lines to the crabConfig_tutorial_MC_analysis.py file and executing it via python crabConfig_tutorial_MC_analysis.py instead of crab submit crabConfig_tutorial_MC_analysis.py:
import CRABClient
[The usual crab configuration comes here]
if __name__ == '__main__':
from CRABAPI.RawCommand import crabCommand
for dataset in ['/DoubleMuParked/Run2012B-22Jan2013-v1/AOD', '/DoubleMuParked/Run2012C-22Jan2013-v1/AOD']:
config.Data.inputDataset = dataset
config.General.requestName = dataset.split('/')[2]
crabCommand('submit', config = config)
crab status API description and example
Example of data returned by calling crabCommand("status")
This status combines the task's status on CRABServer as well as on the grid. For example, after task submission, it will show NEW or QUEUED while CRABServer handles the new task. (These are some of the statuses on the CRABServer side, which are reported separately in the dbStatus dict key as well.) If the task submission to the grid succeeds, the status will become SUBMITTED (Immediately after submission, the status may show UNKNOWN during the task bootstrapping process on the grid). Status will continue to show SUBMITTED until the task on the grid finishes. After this point, the status will become either COMPLETED, which means that all of the jobs finished successfully, or FAILED, meaning that at least some of the jobs failed. (The task states on the grid are also separately reported in the dagStatus dict key.) This combined status value reported in the dictionary is currently not reported in the CLI, it exists to improve backwards compatibility with the old crab status implementation API. Instead, the CLI reports the dbStatus and dagStatus values separately.
dbStatus
Status of the task on CRABServer. This is indicates CRAB's progress in handling the task before and during the submission to the grid. It will show NEW or QUEUED for a new task for example, and SUBMITTED if the submission to the grid succeeds.
dagStatus
Status of the task on the grid. It will show SUBMITTED until all jobs finish running. Afterwards, it will either show COMPLETED or FAILED, depending on whether all jobs completed successfully or some of them failed.
taskFailureMsg
This will contain a message about a problem that the CRABServer encountered during the handling of the task. For example, if the user's provided lumimask doesn't correctly match the input data and no jobs can be generated by CRABServer, this key will contain a message similar to "The CRAB3 server backend could not submit any job to the Grid scheduler: Splitting task 170529_102639:erupeika_crab_test on dataset /A/B/C with FileBased method does not generate any job'".
statusFailureMsg
This will contain a message about the reason why the crabCommand("status") cannot provide all of the usual information about a task. One such reason could be "Waiting for the Grid scheduler to report back the status of your task" which means some information about the task like the state of each job is unavailable at the current time.
The crabCommand("status") will not throw any exception if it has some useful information to return. For example, even though it may fail getting information from the grid scheduler for various reasons, information about a task from the CRABServer task database is still available. It will throw an HTTPException if something goes wrong with the request to CRABServer itself (task not found for example).
Basic crabCommand("status") usage example
import CRABClient
from httplib import HTTPException
from CRABAPI.RawCommand import crabCommand
st = {}
try:
st = crabCommand("status")
except HTTPException as ex:
print("Problem with status encountered: %s" % ex)
raise
if st.get("jobList"):
print(st.get("jobList"))
else:
print("Status incomplete")
if st.get("statusFailureMsg"):
print("Found reason for error: %s" % st.get("statusFailureMsg"))
Multicrab using the crabCommand API
Using the crabCommand API and a few lines of code it is possible to implement the so-called multicrab functionality in CRAB3. Here is an example.
First let's assume we want to submit two identical tasks on two different datasets (like the example above), but let's also change other parameters in the CRAB configuration file.
What we will do is to call crabCommand('submit') twice (once for each task) passing the Configuration object as argument and of course changing the configuration parameters accordingly before each call. For example, one can add the following lines at the end of the CRAB configuration file:
import CRABClient
[The usual crab configuration comes here]
if __name__ == '__main__':
from CRABAPI.RawCommand import crabCommand
from CRABClient.ClientExceptions import ClientException
from httplib import HTTPException
# We want to put all the CRAB project directories from the tasks we submit here into one common directory.
# That's why we need to set this parameter (here or above in the configuration file, it does not matter, we will not overwrite it).
config.General.workArea = 'crab_projects'
def submit(config):
try:
crabCommand('submit', config = config)
except HTTPException as hte:
print "Failed submitting task: %s" % (hte.headers)
except ClientException as cle:
print "Failed submitting task: %s" % (cle)
#############################################################################################
## From now on that's what users should modify: this is the a-la-CRAB2 configuration part. ##
#############################################################################################
config.General.requestName = 'runB'
config.Data.inputDataset = '/DoubleMuParked/Run2012B-22Jan2013-v1/AOD'
config.Data.unitsPerJob = 2
config.Data.totalUnits = 4
submit(config)
config.General.requestName = 'runC'
config.Data.inputDataset = '/DoubleMuParked/Run2012C-22Jan2013-v1/AOD'
config.Data.unitsPerJob = 3
config.Data.totalUnits = 8
submit(config)
# etc ...
Then execute the configuration file:
python crabConfig.py
Importing CMSSW configuration pset_tutorial_analysis.py
Finished importing CMSSW configuration pset_tutorial_analysis.py
Sending the request to the server
Success: Your task has been delivered to the CRAB3 server.
Task name: 150112_153111_crab3test-1:mmascher_crab_runB
Please use 'crab status' to check how the submission process proceeds.
Log file is /afs/cern.ch/work/m/mmascher/wf/multicrab_crab3/crab_projects/crab_runB/crab.log
Importing CMSSW configuration pset_tutorial_analysis.py
Finished importing CMSSW configuration pset_tutorial_analysis.py
Sending the request to the server
Success: Your task has been delivered to the CRAB3 server.
Task name: 150112_153116_crab3test-4:mmascher_crab_runC
Please use 'crab status' to check how the submission process proceeds.
Log file is /afs/cern.ch/work/m/mmascher/wf/multicrab_crab3/crab_projects/crab_runC/crab.log
Warning: If you will use (and change) the CRAB configuration parameter JobType.pyCfgParams when submitting multiple tasks, please check first this FAQ.
You can create a file called multicrab and make it executable with chmod 744 multicrab:
#!/usr/bin/env python
"""
This is a small script that does the equivalent of multicrab.
"""
import os
from optparse import OptionParser
import CRABClient
from CRABAPI.RawCommand import crabCommand
from CRABClient.ClientExceptions import ClientException
from httplib import HTTPException
def getOptions():
"""
Parse and return the arguments provided by the user.
"""
usage = ("Usage: %prog --crabCmd CMD [--workArea WAD --crabCmdOpts OPTS]"
"\nThe multicrab command executes 'crab CMD OPTS' for each project directory contained in WAD"
"\nUse multicrab -h for help")
parser = OptionParser(usage=usage)
parser.add_option('-c', '--crabCmd',
dest = 'crabCmd',
default = '',
help = "The crab command you want to execute for each task in DIR",
metavar = 'CMD')
parser.add_option('-w', '--workArea',
dest = 'workArea',
default = '',
help = "work area directory (only if CMD != 'submit')",
metavar = 'WAD')
parser.add_option('-o', '--crabCmdOpts',
dest = 'crabCmdOpts',
default = '',
help = "options for crab command CMD",
metavar = 'OPTS')
(options, arguments) = parser.parse_args()
if arguments:
parser.error("Found positional argument(s): %s." % (arguments))
if not options.crabCmd:
parser.error("(-c CMD, --crabCmd=CMD) option not provided.")
if options.crabCmd != 'submit':
if not options.workArea:
parser.error("(-w WAR, --workArea=WAR) option not provided.")
if not os.path.isdir(options.workArea):
parser.error("'%s' is not a valid directory." % (options.workArea))
return options
def main():
"""
Main
"""
options = getOptions()
# If you want crabCommand to be quiet:
#from CRABClient.UserUtilities import setConsoleLogLevel
#from CRABClient.ClientUtilities import LOGLEVEL_MUTE
#setConsoleLogLevel(LOGLEVEL_MUTE)
# With this function you can change the console log level at any time.
# To retrieve the current crabCommand console log level:
#from CRABClient.UserUtilities import getConsoleLogLevel
#crabConsoleLogLevel = getConsoleLogLevel()
# If you want to retrieve the CRAB loggers:
#from CRABClient.UserUtilities import getLoggers
#crabLoggers = getLoggers()
# Execute the command with its arguments for each directory inside the work area.
for dir in os.listdir(options.workArea):
projDir = os.path.join(options.workArea, dir)
if not os.path.isdir(projDir):
continue
# Execute the crab command.
msg = "Executing (the equivalent of): crab %s --dir %s %s" % (options.crabCmd, projDir, options.crabCmdOpts)
print "-"*len(msg)
print msg
print "-"*len(msg)
try:
crabCommand(options.crabCmd, dir = projDir, *options.crabCmdOpts.split())
except HTTPException as hte:
print "Failed executing command %s for task %s: %s" % (options.crabCmd, projDir, hte.headers)
except ClientException as cle:
print "Failed executing command %s for task %s: %s" % (options.crabCmd, projDir, cle)
if __name__ == '__main__':
main()
With the crabCommand API the whole error handling is up to the user. That's because the API is directly executing CRAB commands without the interface of the crab script, so the main exceptions handling present in crab is skipped. The best way of dealing with errors is to execute in the terminal the command one wanted to execute. For example, if one does not change the Data.outLFNDirBase parameter in crabConfig_tutorial_MC_analysis.py when executing the example above, one will get the following stacktrace:
>>> import CRABClient
>>> from CRABAPI.RawCommand import crabCommand
>>> res = crabCommand('submit', config = 'crabConfig_tutorial_MC_analysis.py')
Will use configuration file crabConfig_tutorial_MC_analysis.py
Importing CMSSW configuration pset_tutorial_analysis.py
Finished importing CMSSW configuration pset_tutorial_analysis.py
Sending the request to the server
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/afs/cern.ch/user/m/mmascher/repos/CRABClient/src/python/CRABAPI/RawCommand.py", line 22, in crabCommand
return execRaw(command, arguments)
File "/afs/cern.ch/user/m/mmascher/repos/CRABClient/src/python/CRABAPI/RawCommand.py", line 42, in execRaw
res = cmdobj()
File "/afs/cern.ch/user/m/mmascher/repos/CRABClient/src/python/CRABClient/Commands/submit.py", line 135, in __call__
dictresult, status, reason = server.put( self.uri, data = configreq_encoded)
File "/cvmfs/cms.cern.ch/crab3/slc6_amd64_gcc481/cms/crabclient/3.3.13.rc2/lib/python2.6/site-packages/RESTInteractions.py", line 75, in put
return self.makeRequest(uri = uri, data = data, verb = 'PUT')
File "/cvmfs/cms.cern.ch/crab3/slc6_amd64_gcc481/cms/crabclient/3.3.13.rc2/lib/python2.6/site-packages/RESTInteractions.py", line 113, in makeRequest
capath=caCertPath)#, verbose=True)# for debug
File "/afs/cern.ch/user/m/mmascher/repos/WMCore/src/python/WMCore/Services/pycurl_manager.py", line 164, in request
raise exc
httplib.HTTPException
>>>
Of course this is not telling much about the error, but if one executes
crab submit crabConfig_tutorial_MC_analysis.py
one would get
Will use configuration file crabConfig_tutorial_MC_analysis.py
Importing CMSSW configuration pset_tutorial_analysis.py
Finished importing CMSSW configuration pset_tutorial_analysis.py
Sending the request to the server
Error contacting the server.
Server answered with: Invalid input parameter
Reason is: Incorrect 'Data.outLFNDirBase' parameter
Log file is /afs/cern.ch/user/m/mmascher/wf/multicrab_crab3/crab_projects/crab_tutorial_MC_analysis_test1/crab.log
There are basically two classes of exceptions one should worry about: ClientException, which is raised by the client, and HTTPException, which is raised by the server. One can catch them in the code and make the appropriate actions. One should know that more information about the server errors is available in the HTTP headers of the server answer:
>>> from httplib import HTTPException
>>> import CRABClient
>>> from CRABAPI.RawCommand import crabCommand
>>> try:
... res = crabCommand('submit', config = 'crabConfig_tutorial_MC_analysis.py')
... except HTTPException, hte:
... print hte.headers
...
Will use configuration file crabConfig_tutorial_MC_analysis.py
Importing CMSSW configuration pset_tutorial_analysis.py
Finished importing CMSSW configuration pset_tutorial_analysis.py
Sending the request to the server
{'Content-Length': '714', 'X-Error-Http': '400', 'X-Rest-Time': '565996.885 us', 'Server': 'CherryPy/3.2.2', 'Connection': 'close', 'X-Error-Detail': 'Invalid input parameter', 'CMS-Server-Time': 'D=570010 t=1421926577776648', 'X-Rest-Status': '302', 'Date': 'Thu, 22 Jan 2015 11:36:17 GMT', 'Content-Type': 'text/html;charset=utf-8', 'X-Error-Info': "Incorrect 'lfn' parameter", 'X-Error-Id': '531e2d062351f01d051e86d87e8c0a21'}