TWiki> CMSPublic Web>CRAB>WorkBookCRAB2Tutorial (revision 2)EditAttachPDF

Running CMSSW code on the Grid using CRAB

Complete: 5
Detailed Review status

Contents

Introduction

CRAB is a Python program intended to simplify the process of creation and submission of CMS analysis jobs into a grid environment. It runs the CMSSW applications, including user specific analysis code.

CRAB is documented more fully at the CRAB home page. Be sure to read the README file (version-specific link available under How To...).

In order to submit jobs on the grid, you must be on a LCG User Interface (LCG UI). It will allow you to access WLCG-affiliated resoures in a fully transparent way. LXPLUS users can install an LCG UI by sourcing the file /afs/cern.ch/cms/LCG/LCG-2/UI/cms_ui_env.csh.

The current instructions are based on CRAB version 2_2_2, which has both Standalone and Server support.

CRAB Usage Overview

Here we list the steps, then provide details in following sections:

  • Prepare, compile and test your code in an interactive session.
  • Source crab.csh or crab.sh from the location where CRAB has been installed (this sets $CRABDIR; see Setup CRAB below)
  • Select the data you want analyze
  • Modify crab.cfg according to your needs (this is the CRAB configuration file)
  • Run CRAB (read on!)

To illustrate how to run CRAB, we've copied and adapted some command information from HOW TO RUN CRAB FOR THE IMPATIENT USER. Once your crab.cfg is ready, run this sequence of commands (comments are inline, shown for running two jobs):

crab -create             # Create all jobs. No submission!
crab -submit 2 -continue [ui_working_dir]
                         # Submit 2 jobs, the ones already created (-continue)
crab -status             # Check the status of all jobs
... coffee ...
crab -status             # Check the status again
crab -getoutput          # Get back the output of all jobs

For further information, run crab -h.

Recipe for the tutorial

For this tutorial we will refer to:

  • CMSSW_1_7_0

we will use already prepared CMSSW analysis code to analyze a Higgs->ZZ->4mu sample, which replicates a real analysis scenario.

  • CRAB_2_2_2

using the central installation available at CERN

The example is written to use the csh shell family

If you want to use sh replace csh with sh.

Setup local Environment and prepare user analysis code

In order to submit jobs to the Grid, you must have an access to a LCG User Interface (LCG UI). It will allow you to access WLCG-affiliated resoures in a fully transparent way. LXPLUS users can get an LCG UI via AFS by:

source /afs/cern.ch/cms/LCG/LCG-2/UI/cms_ui_env.csh

Install CMSSW project in a directory of your choice:

mkdir Tutorial
cd Tutorial
scramv1 p CMSSW CMSSW_1_6_0
cd CMSSW_1_6_0/src
eval `scramv1 runtime -csh`

get and compile the example of the user analysis code

wget  http://cmsdoc.cern.ch/cms/ccs/wm/www/Crab/Demo.tgz 
tar zxvf Demo.tgz
scramv1 b

Install CRAB

Most users (particularly those on LXPLUS) do not need to install CRAB. They only need to set it up.

CRAB is intended to be installed in a private area for use by a single person, or in a common area for use by all system users. A public installation is available on CERN's LXPLUS. At CERN on LXPLUS, users may access CRAB at (shown for arbitrary version X_Y_Z):

/afs/cern.ch/cms/ccs/wm/scripts/Crab/CRAB_X_Y_Z

To know the latest release check CRAB web page or the proper HyperNews forum.

The installation is needed only once and, even if it's in a private area, one installation is enough for a site, provided permissions on filesystem are properly set. On the other hand, new versions are produced rather often, so you may find yourself reinstalling frequently!

The installation is very simple. You need only to download a tar-ball and run a script. First change to the directory under which you want the CRAB directory to go. Make sure there is at least 9MB of free space.

To create a CRAB installation area and install the package, run the following command (shown for CRAB v 2_0_0):

tar -xvzf /afs/cern.ch/cms/ccs/wm/scripts/Crab/CRAB_2_2_2.tgz

Change (cd) to the new directory (which will be referred to as $CRABDIR), and run the configure script:

./configure
This creates the crab.sh(csh) files.

CRAB setup

Setup on lxplus:

In order to setup and use CRAB from any directory, source the the script crab.(c)sh located in /afs/cern.ch/cms/ccs/wm/scripts/Crab/, which always points to the latest version of CRAB. After the source of the script it's possible to use CRAB from any directory (tipically use it on your CMSSW working directory).

source /afs/cern.ch/cms/ccs/wm/scripts/Crab/crab.csh

Data selection

To select data you want to access, use the DBS web page where available datasets are listed DBS Data Discovery (see links on CRAB home page). For this tutorial we'll use :

/RelValHiggs-ZZ-4L/CMSSW_1_6_0-RelVal-1188844800/GEN-SIM-DIGI-RECO

CRAB configuration

Modify the CRAB configuration file crab.cfg according to your needs: a fully documented template is available at $CRABDIR/python/crab.cfg . For guidance, see the list and description of configuration parameters. For this tutorial, the only relevant sections of the file are [CRAB], [CMSSW] and [USER] .

The configuration file should be located at the same location as the CMSSW parameter-set to be used by CRAB. Please change directory to :

cd  Demo/MyTrackAnalyzer/test/ 
and save the crab configuration file:
crab.cfg
with the following content:

[CRAB]
jobtype = cmssw
scheduler = glite
server_name = cnaf

[CMSSW]
datasetpath =/RelValHiggs-ZZ-4L/CMSSW_1_6_0-RelVal-1188844800/GEN-SIM-DIGI-RECO 
pset = higgs.cfg
total_number_of_events = 100
number_of_jobs =10
output_file =  histograms.root

[USER]
return_data =1

#ui_working_dir = 

#thresholdLevel = 100
#eMail = your_email_address 

[EDG]
rb = CERN 
proxy_server = myproxy.cern.ch 
virtual_organization = cms
retry_count = 0
lcg_catalog_type = lfc
lfc_host = lfc-cms-test.cern.ch
lfc_home = /grid/cms

Run Crab

Once your crab.cfg is ready and the whole underlying environment is set up, you can start running CRAB. CRAB supports command line help which can be useful for the first time. You can get it via:
crab -h
in particular there is a HOW TO RUN CRAB FOR THE IMPATIENT USER section where the base commands are reported.

Job Creation

The job creation checks the availability of the selected dataset and prepares all the jobs for submission according to the selected job splitting specified in the crab.cfg

The creation process creates a CRAB project directory (default: crab_0__

CRAB allows the user to chose the project name, so that it can be used later to distinguish multiple CRAB projects in the same directory.

crab -create

which should produce a similar screen output like:


Job Submission

With the submission command it's possible to specify a combination of jobs and job-ranges separated by comma (e.g.: =1,2,3-4), the default is all. To submit all jobs of the last created project with the default name, it's enough to execute the following command:

crab -submit 
to submit a specific project:
crab -submit -c  <dir name>

which should produce a similar screen output like:


Job Status Check

Check the status of the jobs in the latest CRAB project with the following command:
crab -status 
for check a specific project:
crab -status -c  <dir name>

which should produce a similar screen output like:


Job Output Retrieval

For the jobs which are in the "done" state it's possible to retrieve the output. The following command retrieves the output of all jobs of the last created CRAB project:
crab -getoutput 
to get the output of a specific project:
crab -getoutput -c  <dir name>

the job results will be copied in the res subdir of your crab project, and it's specified by a message like


Final plot

All 10 jobs produce a histogram output file which can be combined using ROOT in the res directory:

hadd histograms.root histograms_*.root

The final histograms.root opened in ROOT contains the final plot:

mzz->Draw();

Exercises

In parallel to the execution of the first 10 jobs, the user can try the new GRID submission modes of CRAB by changing

scheduler = edg

with

scheduler = glite

to use the gLite RB submission mode or to

scheduler = glitecoll

to use the gLite RB bulk submission mode repeating the creation, submission, status check and getoutput steps described above.

To submit to Crab Server put:

server_name = cnaf

CRAB with writing out CMSSW ROOT files

IMPORTANT NOTE:

There is a limit of 50MB on the size of the output sandbox which is imposed by the grid policy. If your ouptut size exceeds this limit it will be truncated, and you will loose it. To avoid this problem is recommended to transfer the output directly to a Storage Element (SE). There's is a purge policy for sandboxes older than 7 days. Files will be removed from the RBs after this time.

In the following sections we'll show how to configure crab to copy the output to a SE.

New CMSSW parameter-set

To write out a CMSSW ROOT file in this example, please create a new CMSSW parameter-set named

higgs2.cfg

with following content:

process A = {
  untracked PSet maxEvents = {untracked int32 input = 100}
  source = PoolSource {
        untracked vstring fileNames = {
                "/store/mc/2006/12/22/mc-physval-120-HToZZToMuMuMuMu-mH150-LowLumi/
      0018/0E3C16F1-9A9A-DB11-92D1-003048769D5F.root"
#edit the above 2 lines to be a single line
        }
        untracked uint32 skipEvents = 0
  }

  module higgs = MyTrackAnalyzer {
        untracked InputTag tracks = ctfWithMaterialTracks
        untracked string OutputFileName = "histograms.root"
  }

  module out = PoolOutputModule {
    untracked string fileName = "output.root"
  }

  path p = {
    higgs
  }

  endpath e = {
    out
  }

}

Prepare Castor area for storage element interaction

For CRAB to be able to write into your Castor user directory:

/castor/cern.ch/user/<u>/<username>

we have to create a destination directory and change the file permissions:

rfmkdir /castor/cern.ch/user/<u>/<username>/tutorial 
rfchmod +775 /castor/cern.ch/user/<u>/<username>/tutorial

replacing <u> with the first letter of your username and <username> with your username.

Prepare new crab.cfg

Now the cmssw parameter-set produces an output file (output.root) which the user can include into output file and can ask to CRAB to copy it in the Storage Element (castor). Please modify the crab.cfg as in the following example:

[CRAB]
                                                                                                                                                  
jobtype = cmssw
scheduler = edg
                                                                                                                                                  
[CMSSW]
                                                                                                                                                  
datasetpath = /tt4j_mT_70-alpgen/CMSSW_1_6_0-PreCSA07-HLT-A2/GEN-SIM-DIGI-RECO
pset = higgs2.cfg
total_number_of_events = 100
number_of_jobs =10
output_file =  output.root
                                                                                                                                                  
[USER]                                                                                                                              
return_data = 0
                                                                                                                                                  
copy_data = 1
storage_element = srm.cern.ch
storage_path = /srm/managerv1?SFN=/castor/cern.ch/user/u/username/subdir

#ui_working_dir = 

[EDG]
rb = CERN
proxy_server = myproxy.cern.ch
virtual_organization = cms
retry_count = 0
lcg_catalog_type = lfc
lfc_host = lfc-cms-test.cern.ch
lfc_home = /grid/cms

How to publish user data in a local DBS

From the version 2_1_1 CRAB allow users to publish their output data to a local DBS instance. Please read the How to publish user data in a private DBS to understand how to use this new functionality

Edit | Attach | Watch | Print version | History: r120 | r4 < r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r2 - 2008-06-11 - MattiaCinquilli
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback