MultiCRAB usage

This document describes how to use MultiCRAB, a CRAB extension to submit the same job to multiple datasets in one go.

Prerequisites

  • CRAB version : >2_3_2
  • Since multicrab allows you to submit large number of tasks, each with possibly many jobs, it is a good idea to use CrabOperations. See relevant documentation.
Hint: it is enough to add
use_server=1

in crab.cfg or

CRAB.use_server=1

in [COMMON] section of multicrab.cfg

General

The use case for multicrab is when you have your analysis code that you want to run on several datasets, typically some signals plus some backgrounds (for MC studies) or on different streams/configuration/runs for real data taking. You want to run exactly the same code, and also the crab.cfg are different only for few keys: for sure datasetpath but also other keys, such as eg total_number_of_events, in case you want to run on all signals but only a fraction of background, or anything else. So far, you would have to create a set of crab.cfg, one for each dataset you want to access, and submit several instances of CRAB, saving the output to different locations.

Multicrab is meant to automatize this procedure: you create one crab.cfg to be used as template fo all ask, then you create an additional configuration file multicrab.cfg, where you define the things which changes for the different tasks, and then submit all the tasks via a single command multicrab -create, and then -submit, -status, -get etc...

Multicrab just read the template, modify it for each task according to your multicrab.cfg, and then instances N crab session, one for each dataset you are accessing. It takes care of changing the names of working directory, as well as remote StoragePath, and so on. You can also access any of the task via crab in the usual way, just specifying -c ui_working_dir, if you want to.

Configuration

In addition to the usual crab.cfg, there is a new configuration file called multicrab.cfg. The syntax is very similar to that of crab.cfg, namely

[SECTION]
<crab.cfg Section>.Key=Value

Please note that it is mandatory to add explicitly the crab.cfg [SECTION] in front of [KEY].

The role of multicrab.cfg is to apply modification to the template crab.cfg, some which are common to all tasks, and some which are task specific. So there are two sections:

  • [COMMON] which applies to all task, and which is fully equivalent to modify directly the template crab.cfg
  • [DATASET] there could be an arbitrary number of sections, one for each dataset you want to run. The names are free (but COMMON and MULTICRAB), and they will influence the names of the output directories, as described below. Warning if you stage out your output on castor, where you need to create by hand the output directory, with proper permission, you need also to create all the subdirectories, with proper ACL, one for each section.
WARNING: parameters in the [CRAB] section of your crab.cfg template can't be changed by multicrab.cfg.

Here is an example of multicrab.cfg

# section for multicrab: now has just the template crab.cfg, but more
# keys might appear in the future
[MULTICRAB]
cfg=crab.cfg

# Section [COMMON] is common for all datasets
# General idea: you define all the parameter in the template (crab.cfg), 
# but you might want to change the template values for all dataset.
# The general syntax is that you first put the crab.cfg [SECTION] and
# the the crab.cfg [key], with a "." in between, exactly as you would do
# to pass to CRAB keys via command line.

[COMMON]

# This determines the direcory where the CRAB log files and CMSSW output files will go.
# It will be USER.ui_working_dir/section_name/
# where section_name is the corresponding  section "[xyz]" that you give below.
USER.ui_working_dir = /scratch/myname/craboutdir

# This determines both the location in dcache and the published name in DBS. 
# The former will be of the form /input_dataset_name/USER.publish_data_name/.../
# The latter will be of the form /input_dataset_name/myname-USER.publish_data_name-.../USER
USER.publish_data_name = aGoodName

# Below we have a section for each dataset you want to access (or, more precisely,
# any task you want to create).
# The name of the section will be used as USER.ui_working_dir, so the
# stuff for this dataset will be found in Wmunu/ directory.
# Any name is allowed (but MULTICRAB and COMMON) and any number of
# sections can be added
# The syntax for the parameters is the one described before
# SECTION.key=value
# and any parameter can be changed. Otherwise, the template one will be
# used.
[Wmunu]
CMSSW.datasetpath=/Zmumu/CSA08_CSA08_S156_v1/GEN-SIM-RECO
CMSSW.total_number_of_events=10
CMSSW.number_of_jobs = 5

[Zmunu]
CMSSW.datasetpath=/Zmumu/CSA08_CSA08_S156_v1/GEN-SIM-RECO
CMSSW.total_number_of_events=-1
CMSSW.number_of_jobs = 5

Running

Use multicrab as you would use crab. So

multicrab -create

multicrab -submit

multicrab -status

multicrab -get

etc... All crab command are supported.

As said before, you might want to create all tasks but the submit just one of them: in this case, you would do like that (using the above multicrab.cfg)

multicrab -create

crab -submit -c Wmunu

crab -status -c Wmunu

etc...

-- StefanoLacaprara - 02 Dec 2008

Edit | Attach | Watch | Print version | History: r10 < r9 < r8 < r7 < r6 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r10 - 2013-03-19 - MaksatHaytmyradov
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback