TWiki> CMSPublic Web>CRAB>WorkBookCRAB2Tutorial (revision 8)EditAttachPDF

Running CMSSW code on the Grid using CRAB


Recipe for the tutorial

For this tutorial we will refer to:

  • CMSSW_2_0_7

we will use already prepared CMSSW analysis code to analyze the /MinBias/CSA08_CSA08_S43_v1/GEN-SIM-RECO sample, which replicates an analysis scenario.

  • CRAB_2_2_2

using the central installation available at CERN

The example is written to use the csh shell family

If you want to use sh replace csh with sh.

Setup local Environment and prepare user analysis code

In order to submit jobs to the Grid, you must have an access to a LCG User Interface (LCG UI). It will allow you to access WLCG-affiliated resoures in a fully transparent way. LXPLUS users can get an LCG UI via AFS by:

source /afs/

Install CMSSW project in a directory of your choice. In this case we create a "Tutorial" directory:

mkdir Tutorial
cd Tutorial
scramv1 project CMSSW CMSSW_2_0_7
cd CMSSW_2_0_7/src/
eval `scramv1 runtime -csh`

get from cvs a real user analysis code and build it:

cvs co -r V00-02-02 QCDAnalysis/UEAnalysis
cd QCDAnalysis/UEAnalysis/src
scramv1 b

CRAB setup

Setup on lxplus:

In order to setup and use CRAB from any directory, source the the script crab.(c)sh located in /afs/, which always points to the latest version of CRAB. After the source of the script it's possible to use CRAB from any directory (typically use it on your CMSSW working directory).

source /afs/

Locate the dataset and prepare CRAB submission

In order to run our analysis over a whole dataset, we have to find first the data name and then put it on the crab configuration file.

Data selection

To select data you want to access, use the DBS web page where available datasets are listed DBS Data Discovery (see links on CRAB home page). For this tutorial we'll use :


CRAB configuration

Modify the CRAB configuration file crab.cfg according to your needs: a fully documented template is available at $CRABDIR/python/crab.cfg . For guidance, see the list and description of configuration parameters. For this tutorial, the only relevant sections of the file are [CRAB], [CMSSW] and [USER] .

The configuration file should be located at the same location as the CMSSW parameter-set to be used by CRAB. Please change directory to :

cd  ../test/ 
and save the crab configuration file:
with the following content:
jobtype = cmssw
scheduler = glite
#server_name = cnaf

datasetpath = /MinBias/CSA08_CSA08_S43_v1/GEN-SIM-RECO
pset = ueAnalysisRootFileChainOnlyReco.cfg
total_number_of_events = 100
number_of_jobs =10

return_data =1
#ui_working_dir = 

#thresholdLevel = 100
#eMail = your_email_address 

rb = CERN 
proxy_server = 

Run Crab

Once your crab.cfg is ready and the whole underlying environment is set up, you can start running CRAB. CRAB supports command line help which can be useful for the first time. You can get it via:
crab -h
in particular there is a *HOW TO RUN CRAB FOR THE IMPATIENT USER* section where the base commands are reported.

Job Creation

The job creation checks the availability of the selected dataset and prepares all the jobs for submission according to the selected job splitting specified in the crab.cfg

The creation process creates a CRAB project directory (default: crab_0__

CRAB allows the user to chose the project name, so that it can be used later to distinguish multiple CRAB projects in the same directory.

crab -create

which should produce a similar screen output like:

Job Submission

With the submission command it's possible to specify a combination of jobs and job-ranges separated by comma (e.g.: =1,2,3-4), the default is all. To submit all jobs of the last created project with the default name, it's enough to execute the following command:

crab -submit 
to submit a specific project:
crab -submit -c  <dir name>

which should produce a similar screen output like:

Job Status Check

Check the status of the jobs in the latest CRAB project with the following command:
crab -status 
to check a specific project:
crab -status -c  <dir name>

which should produce a similar screen output like:

Job Output Retrieval

For the jobs which are in the "Done" state it is possible to retrieve the output files. The following command retrieves the output of all "Done" jobs of the last created CRAB project:
crab -getoutput 
to get the output of a specific project:
crab -getoutput -c  <dir name>

the job results will be copied in the res subdirectory of your crab project:


Using the server

In parallel to the execution of the first 10 jobs, we can use the CRABSERVER mode by adding to the [CRAB] section of the crab.cfg

server_name = cnaf

We also want an e-mail when our job is done so we don't have to keep checking the status. Put these two lines in the [USER] section:

thresholdLevel          = 100
eMail                   =
You can replace 100 by a "percent done" to get the e-mail earlier.

Then, we can repeat the creation, submission, status check and getoutput steps described above: Run Crab

Publish your result in DBS

Modify the crab.cfg

You have to modify following entries:
copy_data = 1
storage_element = "the storage element name"                 
storage_path = "the storage element path up to ..../user"  

publish_data_name = "data name to publish"  (i.e  myprocessingCMSSW_2_0_4)
dbs_url_for_publication = "your local dbs_url"  (i.e

Examples for lnl or pisa StorageElement

storage_element =
storage_path = /srm/managerv1?SFN=/pnfs/
storage_element =
storage_path = /srm/managerv1?SFN=/pnfs/

Use the -publish option

You need first to:

  • create and submit all jobs
  • retrieve all the outputs
then you can issue:
   crab -publish
It will look for all the FrameworkJobReport ( /res/crab_fjr_*.xml ) produced by each jobs and extract from there the information (i.e. number of events, LFN,....) to publish.

Check the result of data publication

Note that:
  • CRAB doesn't publish file with 0 events and file with copy problems
  • The publication of a single file per job is supported. In case you have a job that produce multiple output files only the first one is published

To check if your data have been published you can use the script located in the python dir of CRAB :

./ --DBSURL=&lt;dbs_url_for_publication&gt; --datasetPath=&lt;name_of_your_dataset&gt;
where <dbs_url_for_publication> is the dbs_url you have written in the crab.cfg file and <name_of_your_dataset> is the name of dataset published by CRAB <primarydataset>/<publish_data_name>/USER

Publication output example

[lxplus240] ~/scratch0/TEST_RELEASE/TEST_2_0_4 > crab -publish
crab. crab (version 2.0.4) running on Tue Dec 11 18:28:00 2007

crab. Working options:
  scheduler           glite
  job type            CMSSW
  working directory   /afs/

crab. <dbs_url_for_publication> =
crab. --->>> Start dataset publication
crab. PrimaryDataset = pythia_2_0_4
crab. ProcessedDataset = pythia_2_0_4
crab. <User Dataset Name> = /pythia_2_0_4/pythia_2_0_4/USER
crab. --->>> End dataset publication
crab. --->>> Start files publication
crab. file = /afs/
crab. Blocks = ['/pythia_2_0_4/pythia_2_0_4/USER#fe335055-3841-41ab-b887-cb3b289ce8b0']
crab. file = /afs/
crab. Blocks = ['/pythia_2_0_4/pythia_2_0_4/USER#fe335055-3841-41ab-b887-cb3b289ce8b0']
crab. file = /afs/
crab. Blocks = ['/pythia_2_0_4/pythia_2_0_4/USER#fe335055-3841-41ab-b887-cb3b289ce8b0']
crab. file = /afs/
crab. Blocks = ['/pythia_2_0_4/pythia_2_0_4/USER#fe335055-3841-41ab-b887-cb3b289ce8b0']
crab. file = /afs/
crab. Blocks = ['/pythia_2_0_4/pythia_2_0_4/USER#fe335055-3841-41ab-b887-cb3b289ce8b0']
crab. file = /afs/
crab. Blocks = ['/pythia_2_0_4/pythia_2_0_4/USER#fe335055-3841-41ab-b887-cb3b289ce8b0']
crab. file = /afs/
crab. Blocks = ['/pythia_2_0_4/pythia_2_0_4/USER#fe335055-3841-41ab-b887-cb3b289ce8b0']
crab. file = /afs/
crab. Blocks = ['/pythia_2_0_4/pythia_2_0_4/USER#fe335055-3841-41ab-b887-cb3b289ce8b0']
crab. --->>> End files publication
crab. --->>> To check data publication please use: --DBSURL=<dbs_url_for_publication> --datasetPath=/<User Dataset Name>
crab. Log-file is /afs/

CRAB with writing out CMSSW ROOT files


There is a limit of 50MB on the size of the output sandbox which is imposed by the grid policy. If your ouptut size exceeds this limit it will be truncated, and you will loose it. To avoid this problem is recommended to transfer the output directly to a Storage Element (SE). There's is a purge policy for sandboxes older than 7 days. Files will be removed from the WMS's after this time.

Edit | Attach | Watch | Print version | History: r120 | r10 < r9 < r8 < r7 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r8 - 2008-06-12 - MattiaCinquilli
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback