CRAB Logo

CRAB client library API

Complete: 5 Go to SWGuideCrab

Version

  • 17 July 2014 - Alpha release. Fresh out of the oven.
  • 07 Jan 2015 - Improved alpha release. API for raw execution of commands available to the users.
  • 04 Feb 2015 - Released production version with CRAB v3.3.13

Introduction

The CRABAPI library is a python wrapper on top of the client and allows people to easily interact with the server through the same CRAB client interface used by the official client. It is intended for people who would like to automatize the submission of tasks with CRAB. With the CRABAPI library you will never have to call the client commands and parse their outputs again!

There are two ways of using the library. One is just a small python function, called crabCommand, that allows you to run the client commands accepting the same arguments they accept from the command line. Backward compatibility is not guaranteed and error handling is left to the users. The other is an interface which adds some intelligence to the client commands, and, among other things, manipulates the output of the server, guarantees backward compatibility. The latter is not available yet.

The crabCommand API

This is a thin API that allows easy access to the CRAB client commands. Users who have experience with the CRAB client command line should be able to understand quickly how to use the crabCommand API.

In order to use it you shoud make sure to do (replace sh with csh as appropriate for your shell):

cmsenv
source /cvmfs/cms.cern.ch/common/crab-setup.sh

and make sure that your first import in your python script is

import CRABClient

this will trigger addition to $PYTHONPATH of dependencies like WMCore and DBS which are not in default CMSSW path.

Brief description

The signature of the crabCommand API is the following: returndict crabCommand(command, *args, **kwargs)

Input arguments:

  • command: Takes the name of the CRAB command to be executed (e.g. 'submit', 'status', 'report', etc).
  • args: Positional arguments to be passed as such to the command.
  • kwargs: Keyword arguments are considered as (long-name) options to be passed to the command. The API takes care of adding a double hyphen ('--') in front of each keyword argument so that the command interprets them as options.

Returns:

  • The return value of the invoked CRAB command, which should be always a python dictionary.

The positional arguments args are supported for compatibility with the client command line, where the CRAB configuration file name and the CRAB project directory name can be passed as the first positional argument instead of using the --config and --dir options respectively. The submit command is the only one that accepts additional positional arguments; they are used for overwriting CRAB configuration parameters. However, when using the crabCommand API, the overwriting of CRAB configuration parameters can be done before calling the API. So at the end, the usage of the positional arguments args should not be needed and for simplicity we recommend using the keyword arguments kwargs only. The recommended syntax for passing options to the command is then the following:

crabCommand('<command>', <option1>=<value1>, <option2>=<value2>, <option3>=True, ...)

Options that do not take a value in the command line should be specified giving the boolean value True in the API. If the boolean value is False, the corresponding keyword argument will be ignored by the API (i.e. the argument will not be passed as an option to the command). The API will then run the equivalent of:

crab <command> --<option1> <value1> --<option2> <value2> --<option3> ...

Note: Remember that the --config option of the submit command accepts both the name of a CRAB configuration file and a Configuration object itself.

Brief usage examples

Example of submitting a task from the python shell

Here is a quick example on how to submit a task from the python shell. We use the CMSSW parameter-set configuration file pset_tutorial_analysis.py and the CRAB configuration file crabConfig_tutorial_MC_analysis.py (these files are available in the CRAB tutorial (introductory)).

>>> import CRABClient
>>> from CRABAPI.RawCommand import crabCommand
>>> res = crabCommand('submit', config = 'crabConfig_tutorial_MC_analysis.py')
Will use CRAB configuration file crabConfig_tutorial_MC_analysis.py
Importing CMSSW configuration pset_tutorial_analysis.py
Finished importing CMSSW configuration pset_tutorial_analysis.py
Sending the request to the server
Success: Your task has been delivered to the CRAB3 server.
Task name: 150122_113932_crab3test-1:mmascher_crab_tutorial_MC_analysis_test1
Please use 'crab status' to check how the submission process proceeds.
Log file is /afs/cern.ch/work/m/mmasher/wf/multicrab_crab3/crab_projects/crab_tutorial_MC_analysis_test1/crab.log
>>> print res
{'uniquerequestname': '150122_113932_crab3test-1:mmascher_crab_tutorial_MC_analysis_test1', 'requestname': 'crab_tutorial_MC_analysis_test1'}

Now we use the status command to check the status of a task:

>>> res = crabCommand('status', dir = 'crab_projects/crab_tutorial_MC_analysis_test1')
CRAB project directory:         /afs/cern.ch/work/m/mmasher/wf/multicrab_crab3/crab_projects/crab_tutorial_MC_analysis_test1
Task name:                      150122_113932_crab3test-1:mmascher_crab_tutorial_MC_analysis_test1
Grid scheduler:                 crab3test-1@submit-5.t2.ucsd.edu
Task status:                    SUBMITTED
Dashboard monitoring URL:       http://dashb-cms-job.cern.ch/dashboard/templates/task-analysis/#user=mmascher&refresh=0&table=Jobs&p=1&records=25&activemenu=2&status=&site=&tid=150122_113932_crab3test-1%3Ammascher_crab_tutorial_MC_analysis_test1

Jobs status:                    idle          100.0% (18/18)

>>> print res['jobList']
[['idle', 11], ['idle', 10], ['idle', 13], ['idle', 12], ['idle', 15], ['idle', 14], ['idle', 17], ['idle', 16], ['idle', 18], ['idle', 1], ['idle', 3], ['idle', 2], ['idle', 5], ['idle', 4], ['idle', 7], ['idle', 6], ['idle', 9], ['idle', 8]]

note.gif Note: Every command will return a python dictionary that will give the result of the command and the server output. This is the raw information returned by the server and sometimes could contain more information than what one actually needs.

Example submitting multiple tasks using a python script

Suppose one wants to send the same configuration file as in the above example multiple times changing only the input dataset. One can accomplish this by adding the following lines to the crabConfig_tutorial_MC_analysis.py file and executing it via python crabConfig_tutorial_MC_analysis.py instead of crab submit crabConfig_tutorial_MC_analysis.py:


import CRABClient

[The usual crab configuration comes here]

if __name__ == '__main__':

    from CRABAPI.RawCommand import crabCommand

    for dataset in ['/DoubleMuParked/Run2012B-22Jan2013-v1/AOD', '/DoubleMuParked/Run2012C-22Jan2013-v1/AOD']:
        config.Data.inputDataset = dataset
        config.General.requestName = dataset.split('/')[2]
        crabCommand('submit', config = config)

crab status API description and example

Example of data returned by calling crabCommand("status")

{
    'username': 'erupeika',
    'submissionTime': 1495796779,
    'collector': 'cmsgwms-collector-global.cern.ch:9620,cmssrv221.fnal.gov:9620',
    'jobList': [
        ['finished', '1'],
        ['finished', '3'],
        ['finished', '2'],
        ...
        ['finished', '18']
    ],
    'proxiedWebDir': 'http://vocms0121.cern.ch/mon/cms1425/170526_110618:erupeika_crab_client_status_test9',
    'publication': {
        'done': 18
    },
    'jobsPerStatus': {
        'finished': 18
    },
    'publicationFailures': {},
    'taskFailureMsg': None,
    'ASOURL': 'https://cmsweb.cern.ch/crabserver/prod',
    'status': 'COMPLETED',
    'jobs': {
        '1': {
            'Retries': 0,
            'WallDurations': [9.0, 300.0],
            'StartTimes': [1495797033.0],
            'SubmitTimes': [1495796824.0],
            'JobIds': ['6071308.0'],
            'EndTimes': [1495797333.0],
            'Restarts': 0,
            'RecordedSite': True,
            'State': 'finished',
            'ResidentSetSize': [2776],
            'TotalUserCpuTimeHistory': [117],
            'SiteHistory': ['Unknown', 'T1_UK_RAL'],
            'TotalSysCpuTimeHistory': [12]
        },
        ...
    }
    'inputDataset': '/GenericTTbar/HC-CMSSW_5_3_1_START53_V5-v1/GEN-SIM-RECO',
    'schedd': 'crab3@vocms0121.cern.ch',
    'dbStatus': 'SUBMITTED',
    'taskWarningMsg': '[]',
    'outdatasets': "['/GenericTTbar/erupeika-CRAB3_tutorial_May2015_USER_analysis-37773c17ce2994cf16892d5f04945e41/USER']",
    'statusFailureMsg': '',
    'dagStatus': 'COMPLETED',
    'command': 'SUBMIT',
    'userWebDirURL': 'http://vocms0121.cern.ch/mon/cms1425/170526_110618:erupeika_crab_client_status_test9',
    'publicationEnabled': True
}

Description of some of the less obvious keys in the return dictionary

Key Description
status This status combines the task's status on CRABServer as well as on the grid. For example, after task submission, it will show NEW or QUEUED while CRABServer handles the new task. (These are some of the statuses on the CRABServer side, which are reported separately in the dbStatus dict key as well.) If the task submission to the grid succeeds, the status will become SUBMITTED (Immediately after submission, the status may show UNKNOWN during the task bootstrapping process on the grid). Status will continue to show SUBMITTED until the task on the grid finishes. After this point, the status will become either COMPLETED, which means that all of the jobs finished successfully, or FAILED, meaning that at least some of the jobs failed. (The task states on the grid are also separately reported in the dagStatus dict key.) This combined status value reported in the dictionary is currently not reported in the CLI, it exists to improve backwards compatibility with the old crab status implementation API. Instead, the CLI reports the dbStatus and dagStatus values separately.
dbStatus Status of the task on CRABServer. This is indicates CRAB's progress in handling the task before and during the submission to the grid. It will show NEW or QUEUED for a new task for example, and SUBMITTED if the submission to the grid succeeds.
dagStatus Status of the task on the grid. It will show SUBMITTED until all jobs finish running. Afterwards, it will either show COMPLETED or FAILED, depending on whether all jobs completed successfully or some of them failed.
taskFailureMsg This will contain a message about a problem that the CRABServer encountered during the handling of the task. For example, if the user's provided lumimask doesn't correctly match the input data and no jobs can be generated by CRABServer, this key will contain a message similar to "The CRAB3 server backend could not submit any job to the Grid scheduler: Splitting task 170529_102639:erupeika_crab_test on dataset /A/B/C with FileBased method does not generate any job'".
statusFailureMsg This will contain a message about the reason why the crabCommand("status") cannot provide all of the usual information about a task. One such reason could be "Waiting for the Grid scheduler to report back the status of your task" which means some information about the task like the state of each job is unavailable at the current time.

The crabCommand("status") will not throw any exception if it has some useful information to return. For example, even though it may fail getting information from the grid scheduler for various reasons, information about a task from the CRABServer task database is still available. It will throw an HTTPException if something goes wrong with the request to CRABServer itself (task not found for example).

Basic crabCommand("status") usage example

import CRABClient
from httplib import HTTPException
from CRABAPI.RawCommand import crabCommand

st = {}
try:
    st = crabCommand("status")
except HTTPException as ex:
    print("Problem with status encountered: %s" % ex)
    raise

if st.get("jobList"):
    print(st.get("jobList"))
else:
    print("Status incomplete")
    if st.get("statusFailureMsg"):
        print("Found reason for error: %s" % st.get("statusFailureMsg"))

Multicrab using the crabCommand API

Using the crabCommand API and a few lines of code it is possible to implement the so-called multicrab functionality in CRAB3. Here is an example.

First let's assume we want to submit two identical tasks on two different datasets (like the example above), but let's also change other parameters in the CRAB configuration file. What we will do is to call crabCommand('submit') twice (once for each task) passing the Configuration object as argument and of course changing the configuration parameters accordingly before each call. For example, one can add the following lines at the end of the CRAB configuration file:

import CRABClient

[The usual crab configuration comes here]

if __name__ == '__main__':

    from CRABAPI.RawCommand import crabCommand
    from CRABClient.ClientExceptions import ClientException
    from httplib import HTTPException

    # We want to put all the CRAB project directories from the tasks we submit here into one common directory.
    # That's why we need to set this parameter (here or above in the configuration file, it does not matter, we will not overwrite it).
    config.General.workArea = 'crab_projects'

    def submit(config):
        try:
            crabCommand('submit', config = config)
        except HTTPException as hte:
            print "Failed submitting task: %s" % (hte.headers)
        except ClientException as cle:
            print "Failed submitting task: %s" % (cle)

    #############################################################################################
    ## From now on that's what users should modify: this is the a-la-CRAB2 configuration part. ##
    #############################################################################################

    config.General.requestName = 'runB'
    config.Data.inputDataset = '/DoubleMuParked/Run2012B-22Jan2013-v1/AOD'
    config.Data.unitsPerJob = 2
    config.Data.totalUnits = 4
    submit(config)

    config.General.requestName = 'runC'
    config.Data.inputDataset = '/DoubleMuParked/Run2012C-22Jan2013-v1/AOD'
    config.Data.unitsPerJob = 3
    config.Data.totalUnits = 8
    submit(config)

    # etc ...

Then execute the configuration file:

python crabConfig.py

Importing CMSSW configuration pset_tutorial_analysis.py
Finished importing CMSSW configuration pset_tutorial_analysis.py
Sending the request to the server
Success: Your task has been delivered to the CRAB3 server.
Task name: 150112_153111_crab3test-1:mmascher_crab_runB
Please use 'crab status' to check how the submission process proceeds.
Log file is /afs/cern.ch/work/m/mmascher/wf/multicrab_crab3/crab_projects/crab_runB/crab.log

Importing CMSSW configuration pset_tutorial_analysis.py
Finished importing CMSSW configuration pset_tutorial_analysis.py
Sending the request to the server
Success: Your task has been delivered to the CRAB3 server.
Task name: 150112_153116_crab3test-4:mmascher_crab_runC
Please use 'crab status' to check how the submission process proceeds.
Log file is /afs/cern.ch/work/m/mmascher/wf/multicrab_crab3/crab_projects/crab_runC/crab.log

Warning: If you will use (and change) the CRAB configuration parameter JobType.pyCfgParams when submitting multiple tasks, please check first this FAQ.

You can create a file called multicrab and make it executable with chmod 744 multicrab:

#!/usr/bin/env python
"""
This is a small script that does the equivalent of multicrab.
"""
import os
from optparse import OptionParser

import CRABClient
from CRABAPI.RawCommand import crabCommand
from CRABClient.ClientExceptions import ClientException
from httplib import HTTPException


def getOptions():
    """
    Parse and return the arguments provided by the user.
    """
    usage = ("Usage: %prog --crabCmd CMD [--workArea WAD --crabCmdOpts OPTS]"
             "\nThe multicrab command executes 'crab CMD OPTS' for each project directory contained in WAD"
             "\nUse multicrab -h for help")

    parser = OptionParser(usage=usage)

    parser.add_option('-c', '--crabCmd',
                      dest = 'crabCmd',
                      default = '',
                      help = "The crab command you want to execute for each task in DIR",
                      metavar = 'CMD')

    parser.add_option('-w', '--workArea',
                      dest = 'workArea',
                      default = '',
                      help = "work area directory (only if CMD != 'submit')",
                      metavar = 'WAD')

    parser.add_option('-o', '--crabCmdOpts',
                      dest = 'crabCmdOpts',
                      default = '',
                      help = "options for crab command CMD",
                      metavar = 'OPTS')

    (options, arguments) = parser.parse_args()

    if arguments:
        parser.error("Found positional argument(s): %s." % (arguments))
    if not options.crabCmd:
        parser.error("(-c CMD, --crabCmd=CMD) option not provided.")
    if options.crabCmd != 'submit':
        if not options.workArea:
            parser.error("(-w WAR, --workArea=WAR) option not provided.")
        if not os.path.isdir(options.workArea):
            parser.error("'%s' is not a valid directory." % (options.workArea))

    return options


def main():
    """
    Main
    """
    options = getOptions()

    # If you want crabCommand to be quiet:
    #from CRABClient.UserUtilities import setConsoleLogLevel
    #from CRABClient.ClientUtilities import LOGLEVEL_MUTE
    #setConsoleLogLevel(LOGLEVEL_MUTE)
    # With this function you can change the console log level at any time.

    # To retrieve the current crabCommand console log level:
    #from CRABClient.UserUtilities import getConsoleLogLevel
    #crabConsoleLogLevel = getConsoleLogLevel()

    # If you want to retrieve the CRAB loggers:
    #from CRABClient.UserUtilities import getLoggers
    #crabLoggers = getLoggers()

    # Execute the command with its arguments for each directory inside the work area.
    for dir in os.listdir(options.workArea):
        projDir = os.path.join(options.workArea, dir)
        if not os.path.isdir(projDir):
            continue
        # Execute the crab command.
        msg = "Executing (the equivalent of): crab %s --dir %s %s" % (options.crabCmd, projDir, options.crabCmdOpts)
        print "-"*len(msg)
        print msg
        print "-"*len(msg)
        try:
            crabCommand(options.crabCmd, dir = projDir, *options.crabCmdOpts.split())
        except HTTPException as hte:
            print "Failed executing command %s for task %s: %s" % (options.crabCmd, projDir, hte.headers)
        except ClientException as cle:
            print "Failed executing command %s for task %s: %s" % (options.crabCmd, projDir, cle)


if __name__ == '__main__':
    main()

and use it to execute commands a-la-crab2:

./multicrab -c status -w crab_projects/

Executing (the equivalent of): crab status --dir crab_projects/crab_runB
CRAB project directory:         /afs/cern.ch/work/m/mmasher/wf/multicrab_crab3/crab_projects/crab_runB
Task name:                      150122_153747_crab3test-5:mmascher_crab_runB
Grid scheduler:                 crab3test-5@vocms0114.cern.ch
Task status:                    SUBMITTED
Dashboard monitoring URL:       http://dashb-cms-job.cern.ch/dashboard/templates/task-analysis/#user=mmascher&refresh=0&table=Jobs&p=1&records=25&activemenu=2&status=&site=&tid=150122_153747_crab3test-5%3Ammascher_crab_runB

Jobs status:                    running        50.0% (1/2)
                                transferring   50.0% (1/2)

Executing (the equivalent of): crab status --dir crab_projects/crab_runC
CRAB project directory:         /afs/cern.ch/work/m/mmasher/wf/multicrab_crab3/crab_projects/crab_runC
Task name:                      150122_153758_crab3test-5:mmascher_crab_runC
Grid scheduler:                 crab3test-5@vocms0114.cern.ch
Task status:                    SUBMITTED
Dashboard monitoring URL:       http://dashb-cms-job.cern.ch/dashboard/templates/task-analysis/#user=mmascher&refresh=0&table=Jobs&p=1&records=25&activemenu=2&status=&site=&tid=150122_153758_crab3test-5%3Ammascher_crab_runC

Jobs status:                    running        66.7% (2/3)
                                transferring   33.3% (1/3)

Other examples of usage of the multicrab script:

./multicrab -c status -w crab_projects/ -o "--long --sort=site"
./multicrab -c report -w crab_projects/ -o "--dbs=yes"

Debugging errors with crabCommand

With the crabCommand API the whole error handling is up to the user. That's because the API is directly executing CRAB commands without the interface of the crab script, so the main exceptions handling present in crab is skipped. The best way of dealing with errors is to execute in the terminal the command one wanted to execute. For example, if one does not change the Data.outLFNDirBase parameter in crabConfig_tutorial_MC_analysis.py when executing the example above, one will get the following stacktrace:

>>> import CRABClient
>>> from CRABAPI.RawCommand import crabCommand
>>> res = crabCommand('submit', config = 'crabConfig_tutorial_MC_analysis.py')
Will use configuration file crabConfig_tutorial_MC_analysis.py
Importing CMSSW configuration pset_tutorial_analysis.py
Finished importing CMSSW configuration pset_tutorial_analysis.py
Sending the request to the server
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/afs/cern.ch/user/m/mmascher/repos/CRABClient/src/python/CRABAPI/RawCommand.py", line 22, in crabCommand
    return execRaw(command, arguments)
  File "/afs/cern.ch/user/m/mmascher/repos/CRABClient/src/python/CRABAPI/RawCommand.py", line 42, in execRaw
    res = cmdobj()
  File "/afs/cern.ch/user/m/mmascher/repos/CRABClient/src/python/CRABClient/Commands/submit.py", line 135, in __call__
    dictresult, status, reason = server.put( self.uri, data = configreq_encoded)
  File "/cvmfs/cms.cern.ch/crab3/slc6_amd64_gcc481/cms/crabclient/3.3.13.rc2/lib/python2.6/site-packages/RESTInteractions.py", line 75, in put
    return self.makeRequest(uri = uri, data = data, verb = 'PUT')
  File "/cvmfs/cms.cern.ch/crab3/slc6_amd64_gcc481/cms/crabclient/3.3.13.rc2/lib/python2.6/site-packages/RESTInteractions.py", line 113, in makeRequest
    capath=caCertPath)#, verbose=True)# for debug
  File "/afs/cern.ch/user/m/mmascher/repos/WMCore/src/python/WMCore/Services/pycurl_manager.py", line 164, in request
    raise exc
httplib.HTTPException
>>>

Of course this is not telling much about the error, but if one executes

crab submit crabConfig_tutorial_MC_analysis.py

one would get

Will use configuration file crabConfig_tutorial_MC_analysis.py
Importing CMSSW configuration pset_tutorial_analysis.py
Finished importing CMSSW configuration pset_tutorial_analysis.py
Sending the request to the server
Error contacting the server.
Server answered with: Invalid input parameter
Reason is: Incorrect 'Data.outLFNDirBase' parameter
Log file is /afs/cern.ch/user/m/mmascher/wf/multicrab_crab3/crab_projects/crab_tutorial_MC_analysis_test1/crab.log

There are basically two classes of exceptions one should worry about: ClientException, which is raised by the client, and HTTPException, which is raised by the server. One can catch them in the code and make the appropriate actions. One should know that more information about the server errors is available in the HTTP headers of the server answer:

>>> from httplib import HTTPException
>>> import CRABClient
>>> from CRABAPI.RawCommand import crabCommand
>>> try:
...     res = crabCommand('submit', config = 'crabConfig_tutorial_MC_analysis.py')
... except HTTPException, hte:
...     print hte.headers
...
Will use configuration file crabConfig_tutorial_MC_analysis.py
Importing CMSSW configuration pset_tutorial_analysis.py
Finished importing CMSSW configuration pset_tutorial_analysis.py
Sending the request to the server
{'Content-Length': '714', 'X-Error-Http': '400', 'X-Rest-Time': '565996.885 us', 'Server': 'CherryPy/3.2.2', 'Connection': 'close', 'X-Error-Detail': 'Invalid input parameter', 'CMS-Server-Time': 'D=570010 t=1421926577776648', 'X-Rest-Status': '302', 'Date': 'Thu, 22 Jan 2015 11:36:17 GMT', 'Content-Type': 'text/html;charset=utf-8', 'X-Error-Info': "Incorrect 'lfn' parameter", 'X-Error-Id': '531e2d062351f01d051e86d87e8c0a21'}
Edit | Attach | Watch | Print version | History: r35 < r34 < r33 < r32 < r31 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r35 - 2020-05-14 - StefanoBelforte
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback