5.6.1 Running CMSSW code on the Grid using CRAB3

Complete: 5

This is a special twiki created for the online tutorial to be held on October 17th 2014. This twiki has less content than the offline tutorial twiki and the sections are reordered so to make the online tutorial more dynamic (i.e. to avoid scrolling up and down over the twiki during the online tutorial session). This twiki will not be maintained and may be erased soon after October 17th 2014.

Contents:

Before starting

Text background color convention

In this tutorial page, the following text background color convention is used:

GREY: For commands.
GREEN: For the output example of the executed commands (nearly what the user should see in his/her terminal).
PINK: For CMSSW parameter-set configuration files.
BLUE: For CRAB configuration files.
YELLOW: For any other type of file.

Syntax conventions

In this tutorial page, the following syntax conventions are used:

  • Whenever we enclose a text within <>, one should replace the text by its description and remove those signs. For example, should be replaced by your username without the < and > signs.
  • In a CRAB command, text enclosed within [] refers to optional specifications in a command.
  • Text presided by a # sign in a configuration file represents a comment and it doesn't affect execution.

Prerequisites to run the tutorial

I will assume that everybody has followed the prerequisites instructions and are all set to run CRAB3.

About softwares versions, datasets and analysis code

For this tutorial we will use:

  • CMS software version 7.0.5 (i.e. CMSSW_7_0_5), which was built with slc6_amd64_gcc481 architecture. The following page contains a list of all available CMSSW production releases for different scram architectures: https://cmssdt.cern.ch/SDT/cgi-bin/ReleasesXML.
  • Already prepared CMSSW parameter-set configuration files.
  • The following CMS datasets:
    MC /GenericTTbar/HC-CMSSW_5_3_1_START53_V5-v1/GEN-SIM-RECO
    Data /SingleMu/Run2012B-13Jul2012-v1/AOD
  • The central installation of CRAB3 available at CVMFS, which as of mid of June 2014 corresponds to version 3.3.7 (i.e. CRAB_3_3_7).

Log in

We will use LXPLUS:

ssh -Y <username>@lxplus.cern.ch

Using the lxplus alias logs us in to an SLC6 machine.

Shell

The shell commands in this tutorial correspond to the Bourne Shell (bash). If you use a tch shell (tcsh):

  • Replace file extensions .sh by .csh.
  • Replace export <variable-name>=<variable-value> by setenv <variable-name> <variable-value>.

You can check what is the shell you are using by executing:

echo $0

which in my case it shows

bash

If your's show that you are using tcsh and you would like to work with bash, then do:

bash
export SHELL=/bin/bash

Setup the environment

In order to have the correct environment setup, the order in which one should source the environment files has to always be the following:

  1. Grid environment (for every new shell) (only if your site doesn't load the environment for you).
  2. CMSSW installation (only once).
  3. CMS environment (for every new shell).
  4. CRAB environment (for every new shell). We will do this after running CMSSW locally, because the CRAB3 environment interferes with the CMS environment.

Grid environment

In order to submit jobs to the Grid, one must have access to a Grid UI, which will allow access to WLCG-affiliated resources in a fully transparent way. Some sites provide the grid environment to all shells by default. If the following command returns without an error, then you don't need to source a Grid UI manually:

which grid-proxy-info

If otherwise you receive a message similar to the following:

/usr/bin/which: no grid-proxy-info in (/bin:/sbin:/usr/bin:/usr/sbin)

then you will need to source either the LCG Grid UI or the OSG Grid UI (most sites only have one or the other installed).

LXPLUS users can get the LCG Grid UI by sourcing the following (SLC6 has already the UI by default):

source /afs/cern.ch/cms/LCG/LCG-2/UI/cms_ui_env.sh

CMS software installation

Install CMSSW in a directory of your choice (LXPLUS users can get a work area under /afs/cern.ch/work/<first-letter-of-username>/<username>/ if they request it; otherwise they can just use their home directory /afs/cern.ch/user/<first-letter-of-username>/<username>/). We suggest to create a subdirectory (e.g. called CRAB3-tutorial) and install CMSSW there.

Before installing CMSSW, one has to check whether the scram architecture is the one needed (in our case slc6_amd64_gcc481), and if not, change it accordingly. The scram architecture is specified in the environment variable SCRAM_ARCH. Thus, one has to check this variable:

echo $SCRAM_ARCH

which in LXPLUS6 will most probably show:

slc6_amd64_gcc472

Lets set it to our desired scram architecture:

export SCRAM_ARCH=slc6_amd64_gcc481

Only after setting the appropriate scram architecture, install CMSSW:

cd /afs/cern.ch/<work|user>/<first-letter-of-username>/<username>/
mkdir CRAB3-tutorial
cd CRAB3-tutorial
cmsrel CMSSW_7_0_5

CMS environment

Setup the CMS environment:

cd /afs/cern.ch/<work|user>/<first-letter-of-username>/<username>/CRAB3-tutorial/CMSSW_7_0_5/src/ 
cmsenv

The cmsenv command will automatically set the scram architecture to be the one corresponding to the installed CMSSW release.

Get a CMS VO proxy

CRAB makes use of LCG resources on behalf of the user. And since of course the access to LCG resources is restricted to authorized entities, the user has to prove that he/she is authorized. The proof is relatively easy; it just consists in showing that he/she is a member of an LCG trusted organization, in our case VO CMS. And this is achieved by presenting a proxy issued by VO CMS. (In general, a proxy is a certification issued by a trusted organization that proves that the requester is known by this organization.) CRAB will then present the user's proxy for all operations that require identification.

Proxies are not issued at registration time and for the whole membership period. Instead, the user has to explicitly request a proxy and it will be valid for a limited time (12 hours by default). When requesting a proxy, the user has to present an identification; for VO CMS, the identification is the user's Grid certificate (and of course the Grid certificate has to be the same one as originally presented when registering to VO CMS). The command to request a proxy to VO CMS is voms-proxy-init --voms cms. This command will look for the user's Grid certificate in the .globus subdirectory of the user's home directory. If the Grid certificate is not in this standard location, the user can specify the location via the --cert and --key options. The user can also request a longer validity using the --valid option. For example, to request a proxy valid for seven days, execute:

voms-proxy-init --voms cms --valid 168:00

which in my case gives me the following screen output:

Enter GRID pass phrase for this identity:
Contacting lcg-voms.cern.ch:15002 [/DC=ch/DC=cern/OU=computers/CN=lcg-voms.cern.ch] "cms"...
Remote VOMS server contacted succesfully.


Created proxy in /tmp/x509up_u57506.

Your proxy is valid until Tue Jun 17 11:38:40 CEST 2014

The proxy is saved in the /tmp/ directory of the current machine, in a file named x509up_u<user-id> (where user-id can be obtained by executing the command id -u). Proxies are not specific to a login session, but using another machine requires to create another proxy or copy it over.

To get more information about the proxy, execute:

voms-proxy-info --all

which in my case gives me the following screen output:

subject   : /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=atanasi/CN=710186/CN=Andres Jorge Tanasijczuk/CN=proxy
issuer    : /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=atanasi/CN=710186/CN=Andres Jorge Tanasijczuk
identity  : /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=atanasi/CN=710186/CN=Andres Jorge Tanasijczuk
type      : full legacy globus proxy
strength  : 1024
path      : /tmp/x509up_u57506
timeleft  : 167:59:05
key usage : Digital Signature, Key Encipherment
=== VO cms extension information ===
VO        : cms
subject   : /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=atanasi/CN=710186/CN=Andres Jorge Tanasijczuk
issuer    : /DC=ch/DC=cern/OU=computers/CN=lcg-voms.cern.ch
attribute : /cms/Role=NULL/Capability=NULL
attribute : /cms/uscms/Role=NULL/Capability=NULL
timeleft  : 167:59:04
uri       : lcg-voms.cern.ch:15002

For more details about the voms-proxy-* commands used above, add the --help option to get a help menu.

CMSSW parameter-set configuration file

It is out of the scope of this tutorial to explain how to build any analysis using CMSSW. The interested reader should refer to the corresponding chapters in the CMS Offline WorkBook. For us it is enough to have some simple predefined examples of CMSSW parameter-set configuration files. We are only interested in distinguishing the following two cases: 1) using CMSSW code to make an analysis (whatever it is) on an existing dataset, and 2) using CMSSW code to generate MC events. Along the tutorial, we will call the first task as "to do an analysis" and the second task as "to do MC generation". We provide below corresponding examples of a CMSSW parameter-set configuration file. The expected default name by CRAB for the CMSSW parameter-set configuration file is pset.py, but of course, one can give it any name (respecting always the filename extension .py and not adding dots in the base filename), as long as one specifies the name in the CRAB configuration file.

CMSSW configuration file examples

Since the question of what kind of analysis to do on the input dataset is not important for this tutorial, we will do something very simple: slim an already existing dataset. The input dataset can be an official CMS dataset (either MC or Data) or a dataset produced by a user (e.g. yourself). In this tutorial, we will show how to do both things. We will also show how to run CMSSW with CRAB to generate MC events. We provide below the CMSSW parameter-set configuration files for both of these tasks.

1) CMSSW configuration file to process an existing dataset

This section shows the CMSSW parameter-set configuration file that we will use in this tutorial when running over an existing dataset (either MC or Data, either CMS official or user-produced). We call it pset_tutorial_analysis.py.

Show Hide CMSSW pset_tutorial_analysis.py configuration file.
import FWCore.ParameterSet.Config as cms

process = cms.Process('NoSplit')

process.source = cms.Source("PoolSource", fileNames = cms.untracked.vstring())
process.maxEvents = cms.untracked.PSet(input = cms.untracked.int32(10))
process.options = cms.untracked.PSet(wantSummary = cms.untracked.bool(True))
process.output = cms.OutputModule("PoolOutputModule",
    outputCommands = cms.untracked.vstring("drop *", "keep recoTracks_*_*_*"),
    fileName = cms.untracked.string('output.root'),
)
process.out = cms.EndPath(process.output)

This analysis code will produce an output ROOT file called output.root containing only the recoTracks_*_*_* branches for a maximum of 10 events in the input dataset.

Note: The input dataset is not yet specified in this file. When running the analysis with CRAB, the input dataset is specified in the CRAB configuration file.

Note: The maxEvents parameter is there to allow quick interactive testing; it is removed by CRAB in the submitted jobs.

2) CMSSW configuration file to generate MC events

In this section we provide an example of a CMSSW parameter-set configuration file to generate minimum bias events with the Pythia MC generator. We call it pset_tutorial_MC_generation.py. Using CRAB to generate MC events requires some special settings in the CRAB configuration file, as we will show in section Running CRAB to generate MC events.

Show Hide CMSSW pset_tutorial_MC_generation.py configuration file.
# Auto generated configuration file
# using: 
# Revision: 1.19 
# Source: /local/reps/CMSSW/CMSSW/Configuration/Applications/python/ConfigBuilder.py,v 
# with command line options: MinBias_8TeV_cfi --conditions auto:startup -s GEN,SIM --datatier GEN-SIM -n 10 
# --relval 9000,300 --eventcontent RAWSIM --io MinBias.io --python MinBias.py --no_exec --fileout minbias.root

import FWCore.ParameterSet.Config as cms

process = cms.Process('SIM')

# Import of standard configurations
process.load('Configuration.StandardSequences.Services_cff')
process.load('SimGeneral.HepPDTESSource.pythiapdt_cfi')
process.load('FWCore.MessageService.MessageLogger_cfi')
process.load('Configuration.EventContent.EventContent_cff')
process.load('SimGeneral.MixingModule.mixNoPU_cfi')
process.load('Configuration.StandardSequences.GeometryRecoDB_cff')
process.load('Configuration.Geometry.GeometrySimDB_cff')
process.load('Configuration.StandardSequences.MagneticField_38T_cff')
process.load('Configuration.StandardSequences.Generator_cff')
process.load('IOMC.EventVertexGenerators.VtxSmearedRealistic8TeVCollision_cfi')
process.load('GeneratorInterface.Core.genFilterSummary_cff')
process.load('Configuration.StandardSequences.SimIdeal_cff')
process.load('Configuration.StandardSequences.EndOfProcess_cff')
process.load('Configuration.StandardSequences.FrontierConditions_GlobalTag_cff')

process.maxEvents = cms.untracked.PSet(
    input = cms.untracked.int32(10)
)

# Input source
process.source = cms.Source("EmptySource")

process.options = cms.untracked.PSet(

)

# Production Info
process.configurationMetadata = cms.untracked.PSet(
    version = cms.untracked.string('$Revision: 1.7 $'),
    annotation = cms.untracked.string('MinBias_8TeV_cfi nevts:10'),
    name = cms.untracked.string('Applications')
)

# Output definition
process.RAWSIMoutput = cms.OutputModule("PoolOutputModule",
    splitLevel = cms.untracked.int32(0),
    eventAutoFlushCompressedSize = cms.untracked.int32(5242880),
    outputCommands = process.RAWSIMEventContent.outputCommands,
    fileName = cms.untracked.string('minbias.root'),
    dataset = cms.untracked.PSet(
        filterName = cms.untracked.string(''),
        dataTier = cms.untracked.string('GEN-SIM')
    ),
    SelectEvents = cms.untracked.PSet(
        SelectEvents = cms.vstring('generation_step')
    )
)

# Additional output definition

# Other statements
process.genstepfilter.triggerConditions=cms.vstring("generation_step")
from Configuration.AlCa.GlobalTag import GlobalTag
process.GlobalTag = GlobalTag(process.GlobalTag, 'auto:startup', '')

process.generator = cms.EDFilter("Pythia6GeneratorFilter",
    pythiaPylistVerbosity = cms.untracked.int32(0),
    filterEfficiency = cms.untracked.double(1.0),
    pythiaHepMCVerbosity = cms.untracked.bool(False),
    comEnergy = cms.double(8000.0),
    maxEventsToPrint = cms.untracked.int32(0),
    PythiaParameters = cms.PSet(
        pythiaUESettings = cms.vstring('MSTU(21)=1     ! Check on possible errors during program execution', 
            'MSTJ(22)=2     ! Decay those unstable particles', 
            'PARJ(71)=10 .  ! for which ctau  10 mm', 
            'MSTP(33)=0     ! no K factors in hard cross sections', 
            'MSTP(2)=1      ! which order running alphaS', 
            'MSTP(51)=10042 ! structure function chosen (external PDF CTEQ6L1)', 
            'MSTP(52)=2     ! work with LHAPDF', 
            'PARP(82)=1.921 ! pt cutoff for multiparton interactions', 
            'PARP(89)=1800. ! sqrts for which PARP82 is set', 
            'PARP(90)=0.227 ! Multiple interactions: rescaling power', 
            'MSTP(95)=6     ! CR (color reconnection parameters)', 
            'PARP(77)=1.016 ! CR', 
            'PARP(78)=0.538 ! CR', 
            'PARP(80)=0.1   ! Prob. colored parton from BBR', 
            'PARP(83)=0.356 ! Multiple interactions: matter distribution parameter', 
            'PARP(84)=0.651 ! Multiple interactions: matter distribution parameter', 
            'PARP(62)=1.025 ! ISR cutoff', 
            'MSTP(91)=1     ! Gaussian primordial kT', 
            'PARP(93)=10.0  ! primordial kT-max', 
            'MSTP(81)=21    ! multiple parton interactions 1 is Pythia default', 
            'MSTP(82)=4     ! Defines the multi-parton model'),
        processParameters = cms.vstring('MSEL=0         ! User defined processes', 
            'MSUB(11)=1     ! Min bias process', 
            'MSUB(12)=1     ! Min bias process', 
            'MSUB(13)=1     ! Min bias process', 
            'MSUB(28)=1     ! Min bias process', 
            'MSUB(53)=1     ! Min bias process', 
            'MSUB(68)=1     ! Min bias process', 
            'MSUB(92)=1     ! Min bias process, single diffractive', 
            'MSUB(93)=1     ! Min bias process, single diffractive', 
            'MSUB(94)=1     ! Min bias process, double diffractive', 
            'MSUB(95)=1     ! Min bias process'),
        parameterSets = cms.vstring('pythiaUESettings', 
            'processParameters')
    )
)

# Path and EndPath definitions
process.generation_step = cms.Path(process.pgen)
process.simulation_step = cms.Path(process.psim)
process.genfiltersummary_step = cms.EndPath(process.genFilterSummary)
process.endjob_step = cms.EndPath(process.endOfProcess)
process.RAWSIMoutput_step = cms.EndPath(process.RAWSIMoutput)

# Schedule definition
process.schedule = cms.Schedule(
    process.generation_step,
    process.genfiltersummary_step,
    process.simulation_step,
    process.endjob_step,
    process.RAWSIMoutput_step
)

# Filter all path with the production filter sequence
for path in process.paths:
    getattr(process,path)._seq = process.generator * getattr(process,path)._seq

This MC generation code will produce an output ROOT file called minbias.root with the content of a GEN-SIM data tier for 10 generated events.

Note: The maxEvents parameter is there to allow quick interactive testing; it is removed by CRAB in the submitted jobs (and its functionality is replaced by a corresponding CRAB parameter called totalUnits).

Input dataset

In order to run an analysis over a given dataset, one has to find out the corresponding dataset name and put it in the CRAB configuration file. In general, either someone will tell you which dataset to use or you will use the DAS web interface to find available datasets. To learn how to use DAS, the reader can refer to the CMS Offline WorkBook - Chapter 5.4 and/or read the more complete documentation linked from the DAS web interface. For this tutorial, we will use the datasets pointed out above. Below are screenshots of corresponding DAS query outputs for these datasets, where one can see that:

  • The datasets are in status "VALID" (otherwise we wouldn't be able to run on them).
  • The /GenericTTbar/HC-CMSSW_5_3_1_START53_V5-v1/GEN-SIM-RECO MC dataset has 1 block, 177 files and 300K events.
  • The /SingleMu/Run2012B-13Jul2012-v1/AOD dataset has 14 blocks, 4294 files, 60285 luminosity sections and ~59.5M events.

Show Hide DAS query for dataset /GenericTTbar/HC-CMSSW_5_3_1_START53_V5-v1/GEN-SIM-RECO.

Show Hide DAS query for dataset /SingleMu/Run2012B-13Jul2012-v1/AOD.

Show Hide DAS query for dataset /SingleMu/Run2012B-13Jul2012-v1/AOD, filtering number of lumis.

Note: Datasets availability at sites changes with time. If you are trying to follow this tutorial after the date it was given, please check that the datasets are still available. If they are not, you will need to choose another ones.

Note: The number of events shown in DAS when doing a simple query like shown in the screenshots above, includes events in INVALID files. On the other hand, CRAB will only analyze VALID files. The list of VALID files in the dataset is shown when clicking in the link "Files" (which is the same as doing the query file dataset=<dataset-name>). To obtain the number of events in VALID files, one can do the following query: file dataset=<dataset-name> | sum(file.nevents).

Running CMSSW code locally

Before submitting jobs to the Grid, it is a good practice to run the CMSSW code locally over a few events to discard problems not related with CRAB.

To run an analysis CMSSW code, one needs to specify an input dataset file directly in the CMSSW parameter-set configuration file (specifying as well how to access/open the file). One could either copy one file of the remote input dataset to a local machine, or, more conveniently, open the file remotely. For both things the recommended tool is the Xrootd service (please refer to the CMS Offline WorkBook - Chapter 5.13 to learn the basics about how to use Xrootd). We will choose to open a file remotely. In any case, one first has to find out the LFN of such a file. We used DAS to find the files in the dataset. The screenshot below shows the DAS web interface with the query we did for the MC dataset we are interested in and the beginning of the result we got from the query with one of the many files contained in the dataset.

Show Hide DAS query for files in dataset /GenericTTbar/HC-CMSSW_5_3_1_START53_V5-v1/GEN-SIM-RECO.

Now that we know the LFN for one file in the dataset, we proceed as follows. In the CMSSW parameter-set configuration file, replace

process.source = cms.Source("PoolSource", fileNames = cms.untracked.vstring())

by

process.source = cms.Source("PoolSource", fileNames = cms.untracked.vstring('root://cms-xrd-global.cern.ch//store/mc/HC/GenericTTbar/GEN-SIM-RECO/CMSSW_5_3_1_START53_V5-v1/0010/00CE4E7C-DAAD-E111-BA36-0025B32034EA.root'))

Notice that we added the string root://cms-xrd-global.cern.ch/ before the LFN. The first part, root:, specifies that the file should be opened with ROOT, // is a separator and the string cms-xrd-global.cern.ch specifies to use the Xrootd service with a particular "redirector".

Note: When running CMSSW code locally with cmsRun, we suggest to do it in a separate (fresh) shell, where the user sets up the Grid and CMS environments, but skips the CRAB setup. This is to avoid the CRAB environment to interfere with the CMSSW environment.

Now we run the analysis locally:

cmsRun pset_tutorial_analysis.py

On the screen we get an output that looks like this:

Show Hide cmsRun screen output for pset_tutorial_analysis.py.
10-Jun-2014 13:16:48 CEST  Initiating request to open file root://cms-xrd-global.cern.ch//store/mc/HC/GenericTTbar/GEN-SIM-RECO/CMSSW_5_3_1_START53_V5-v1/0010/00CE4E7C-DAAD-E111-BA36-0025B32034EA.root
10-Jun-2014 13:17:05 CEST  Successfully opened file root://cms-xrd-global.cern.ch//store/mc/HC/GenericTTbar/GEN-SIM-RECO/CMSSW_5_3_1_START53_V5-v1/0010/00CE4E7C-DAAD-E111-BA36-0025B32034EA.root
Begin processing the 1st record. Run 1, Event 74951, LumiSection 668165 at 10-Jun-2014 13:18:23.526 CEST
Begin processing the 2nd record. Run 1, Event 74952, LumiSection 668165 at 10-Jun-2014 13:18:23.548 CEST
Begin processing the 3rd record. Run 1, Event 74953, LumiSection 668165 at 10-Jun-2014 13:18:23.558 CEST
Begin processing the 4th record. Run 1, Event 74954, LumiSection 668165 at 10-Jun-2014 13:18:23.571 CEST
Begin processing the 5th record. Run 1, Event 74955, LumiSection 668165 at 10-Jun-2014 13:18:23.581 CEST
Begin processing the 6th record. Run 1, Event 74956, LumiSection 668165 at 10-Jun-2014 13:18:23.590 CEST
Begin processing the 7th record. Run 1, Event 74957, LumiSection 668165 at 10-Jun-2014 13:18:23.598 CEST
Begin processing the 8th record. Run 1, Event 74958, LumiSection 668165 at 10-Jun-2014 13:18:23.612 CEST
Begin processing the 9th record. Run 1, Event 74959, LumiSection 668165 at 10-Jun-2014 13:18:23.625 CEST
Begin processing the 10th record. Run 1, Event 74960, LumiSection 668165 at 10-Jun-2014 13:18:23.694 CEST
10-Jun-2014 13:18:23 CEST  Closed file root://cms-xrd-global.cern.ch//store/mc/HC/GenericTTbar/GEN-SIM-RECO/CMSSW_5_3_1_START53_V5-v1/0010/00CE4E7C-DAAD-E111-BA36-0025B32034EA.root

TrigReport ---------- Event  Summary ------------
TrigReport Events total = 10 passed = 10 failed = 0

TrigReport ---------- Path   Summary ------------
TrigReport  Trig Bit#        Run     Passed     Failed      Error Name

TrigReport -------End-Path   Summary ------------
TrigReport  Trig Bit#        Run     Passed     Failed      Error Name
TrigReport     0    0         10         10          0          0 out

TrigReport ------ Modules in End-Path: out ------------
TrigReport  Trig Bit#    Visited     Passed     Failed      Error Name
TrigReport     0    0         10         10          0          0 output

TrigReport ---------- Module Summary ------------
TrigReport    Visited        Run     Passed     Failed      Error Name
TrigReport         10         10         10          0          0 output

TimeReport ---------- Event  Summary ---[sec]----
TimeReport CPU/event = 0.000000 Real/event = 0.000000

TimeReport ---------- Path   Summary ---[sec]----
TimeReport             per event          per path-run 
TimeReport        CPU       Real        CPU       Real Name
TimeReport        CPU       Real        CPU       Real Name
TimeReport             per event          per path-run 

TimeReport -------End-Path   Summary ---[sec]----
TimeReport             per event       per endpath-run 
TimeReport        CPU       Real        CPU       Real Name
TimeReport   0.016897   0.017100   0.016897   0.017100 out
TimeReport        CPU       Real        CPU       Real Name
TimeReport             per event       per endpath-run 
TimeReport        CPU       Real        CPU       Real Name
TimeReport             per event      per module-visit 

TimeReport ------ Modules in End-Path: out ---[sec]----
TimeReport             per event      per module-visit 
TimeReport        CPU       Real        CPU       Real Name
TimeReport   0.016897   0.000000   0.016897   0.000000 output
TimeReport        CPU       Real        CPU       Real Name
TimeReport             per event      per module-visit 

TimeReport ---------- Module Summary ---[sec]----
TimeReport             per event        per module-run      per module-visit 
TimeReport        CPU       Real        CPU       Real        CPU       Real Name
TimeReport   0.016897   0.017100   0.016897   0.017100   0.016897   0.017100 output
TimeReport        CPU       Real        CPU       Real        CPU       Real Name
TimeReport             per event        per module-run      per module-visit 

T---Report end!


=============================================

MessageLogger Summary

 type     category        sev    module        subroutine        count    total
 ---- -------------------- -- ---------------- ----------------  -----    -----
    1 fileAction           -s file_close                             1        1
    2 fileAction           -s file_open                              2        2

 type    category    Examples: run/evt        run/evt          run/evt
 ---- -------------------- ---------------- ---------------- ----------------
    1 fileAction           PostEndRun                        
    2 fileAction           pre-events       pre-events       

Severity    # Occurrences   Total Occurrences
--------    -------------   -----------------
System                  3                   3

One can also check that an output.root file was created in the running directory. Using a TBrowser to inspect its content, it should be as shown in the screenshot below:

Show Hide output.root content.
TBrowser_output.root_content.png

In the case of MC generation, there is no need to specify an input dataset, and so running the CMSSW code locally requires no additional care, except to remember to set the maxEvents parameter to some small number. In our case, we already took care of that, and so we just run the pset_tutorial_MC_generation.py code as it is:

cmsRun pset_tutorial_MC_generation.py

The screen output we get looks like this:

Show Hide cmsRun screen output for pset_tutorial_MC_generation.py.
     MSTU(12)       changed from              0 to          12345
Set Driver verbosity to -2
 New QGSP_FTFP_BERT physics list, replaces LEP with FTF/P for p/n/pi (/K?)  Thresholds: 
    1) between BERT  and FTF/P over the interval 6 to 8 GeV. 
    2) between FTF/P and QGS/P over the interval 12 to 25 GeV. 
  -- quasiElastic was asked to be 1
     Changed to 1 for QGS  and to 0 (must be false) for FTF
1****************** PYINIT: initialization of PYTHIA routines *****************
 ==== PYTHIA WILL USE LHAPDF ====
 *************************************
 *       LHAPDF Version 5.8.5        *
 *   Configured for the following:   *
 *             All PDFs              *
 *          LOW MEMORY option        *
 *    Maximum  1 concurrent set(s)   *
 *************************************

 >>>>>> PDF description: <<<<<<
 CTEQ6L1 - LO with LO alpha_s                                    
 Reference:                                                      
 J. Pumplin, D.R. Stump, J. Huston, H.L. Lai, P. Nadolsky,       
 W.K. Tung                                                       
 hep-ph/0201195                                                  
 >>>>>>                   <<<<<<

 Parametrization: CTEQ6           

 ==============================================
 PDFset name /cvmfs/cms.cern.ch/slc6_amd64_gcc481/external/lhapdf/5.8.5-cms/share/lhapdf/PDFs
 with          1 members
 ====  initialized. ===========================
 Strong coupling at Mz for PDF is:  0.12978

 ==============================================================================
 I                                                                            I
 I              PYTHIA will be initialized for a p on p collider              I
 I                  at   8000.000 GeV center-of-mass energy                   I
 I                                                                            I
 ==============================================================================

 ******** PYMAXI: summary of differential cross-section maximum search ********

           ==========================================================
           I                                      I                 I
           I  ISUB  Subprocess name               I  Maximum value  I
           I                                      I                 I
           ==========================================================
           I                                      I                 I
           I   92   Single diffractive (XB)       I    6.9028D+00   I
           I   93   Single diffractive (AX)       I    6.9028D+00   I
           I   94   Double  diffractive           I    9.4556D+00   I
           I   95   Low-pT scattering             I    4.9586D+01   I
           I   96   Semihard QCD 2 -> 2           I    8.0723D+03   I
           I                                      I                 I
           ==========================================================

 ****** PYMULT: initialization of multiple interactions for MSTP(82) = 4 ******
        pT0 = 2.70 GeV gives sigma(parton-parton) = 4.56D+02 mb: accepted

 ****** PYMIGN: initialization of multiple interactions for MSTP(82) = 4 ******
        pT0 = 2.70 GeV gives sigma(parton-parton) = 1.80D+02 mb: accepted

 ********************** PYINIT: initialization completed **********************
Begin processing the 1st record. Run 1, Event 1, LumiSection 1 at 10-Jun-2014 14:12:11.754 CEST
Begin processing the 2nd record. Run 1, Event 2, LumiSection 1 at 10-Jun-2014 14:12:11.800 CEST
Begin processing the 3rd record. Run 1, Event 3, LumiSection 1 at 10-Jun-2014 14:12:37.699 CEST
G4Fragment::CalculateExcitationEnergy(): WARNING 
Fragment: A =  26, Z =  12, U = -1.588e+00 MeV  IsStable= 1
          P = (-9.813e+01,-9.655e+01,-2.467e+02) MeV   E = 2.420e+04 MeV

Begin processing the 4th record. Run 1, Event 4, LumiSection 1 at 10-Jun-2014 14:13:26.456 CEST
Begin processing the 5th record. Run 1, Event 5, LumiSection 1 at 10-Jun-2014 14:13:39.602 CEST
Begin processing the 6th record. Run 1, Event 6, LumiSection 1 at 10-Jun-2014 14:13:55.129 CEST
Begin processing the 7th record. Run 1, Event 7, LumiSection 1 at 10-Jun-2014 14:14:30.936 CEST
Begin processing the 8th record. Run 1, Event 8, LumiSection 1 at 10-Jun-2014 14:14:54.942 CEST
Begin processing the 9th record. Run 1, Event 9, LumiSection 1 at 10-Jun-2014 14:15:14.771 CEST
Begin processing the 10th record. Run 1, Event 10, LumiSection 1 at 10-Jun-2014 14:15:23.371 CEST
1********* PYSTAT:  Statistics on Number of Events and Cross-sections *********

 ==============================================================================
 I                                  I                            I            I
 I            Subprocess            I      Number of points      I    Sigma   I
 I                                  I                            I            I
 I----------------------------------I----------------------------I    (mb)    I
 I                                  I                            I            I
 I N:o Type                         I    Generated         Tried I            I
 I                                  I                            I            I
 ==============================================================================
 I                                  I                            I            I
 I   0 All included subprocesses    I           10           346 I  5.904D+01 I
 I  11 f + f' -> f + f' (QCD)       I            1             0 I  5.510D+00 I
 I  12 f + fbar -> f' + fbar'       I            0             0 I  0.000D+00 I
 I  13 f + fbar -> g + g            I            0             0 I  0.000D+00 I
 I  28 f + g -> f + g               I            2             0 I  1.102D+01 I
 I  53 g + g -> f + fbar            I            0             0 I  0.000D+00 I
 I  68 g + g -> g + g               I            6             0 I  3.306D+01 I
 I  92 Single diffractive (XB)      I            0             0 I  0.000D+00 I
 I  93 Single diffractive (AX)      I            0             0 I  0.000D+00 I
 I  94 Double  diffractive          I            1             1 I  9.456D+00 I
 I  95 Low-pT scattering            I            0             9 I  0.000D+00 I
 I                                  I                            I            I
 ==============================================================================

 ********* Total number of errors, excluding junctions =        0 *************
 ********* Total number of errors, including junctions =        0 *************
 ********* Total number of warnings =                           0 *************
 ********* Fraction of events that fail fragmentation cuts =  0.00000 *********


=============================================

MessageLogger Summary

Severity    # Occurrences   Total Occurrences
--------    -------------   -----------------

A minbias.root file should have been created in the running directory. Using a TBrowser to inspect its content, it should look like in the screenshot below:

Show Hide minbias.root content.
TBrowser_minbias.root_content.png

Setup the CRAB environment

At most sites, one can setup CRAB3 by sourcing:

source /cvmfs/cms.cern.ch/crab3/crab.sh

This script always points to the latest version of CRAB3. After sourcing this script, it is possible to use CRAB from any directory.

CRAB configuration file

For convenience, we suggest to locate the CRAB configuration file in the same directory as the CMSSW parameter-set file to be used by CRAB. The expected (by CRAB) default name of the CRAB configuration file is crabConfig.py, but of course one can give it any name (respecting always the filename extension .py and not adding dots in the base filename), as long as one specifies the name when required (e.g. when issuing the CRAB submission command).

In CRAB3 the configuration file is in Python language. It consists of creating a Configuration object imported from the WMCore library:

from WMCore.Configuration import Configuration
config = Configuration()

Once the Configuration object is created, it is possible to add new sections into it with corresponding parameters. This is done using the following syntax:

config.section_("<section-name>")
config.<section-name>.<parameter-name> = <parameter-value>

CRAB configuration sections

The table below shows what are the sections currently available for CRAB configuration.

Section Description
General In this section, the user specifies generic parameters about the request (e.g. request name).
JobType This section aims to contain all the parameters of the user job type and related configurables (e.g. CMSSW parameter-set configuration file, additional input files, etc.).
Data This section contains all the parameters related to the data to be analyzed; this also includes the splitting parameters.
Site Grid site parameters are defined in this section, including the stage out information (e.g. stage out destination site, white/black lists, etc.).
User This section is dedicated to all the information relative to the user (e.g. voms information).

CRAB configuration file examples

There are three different general use cases of CRAB configuration files that we want to show in this tutorial: 1) running an analysis on MC, 2) running an analysis on Data, and 3) generating MC events. In the following, we give an example of a basic configuration file for each of these cases, and we will use them later to run the tutorial. But keep in mind that while running the tutorial, we may want to change the configuration files a bit.

1) CRAB configuration file to run on MC

Here we give an example CRAB configuration file for running the pset_tutorial_analysis.py analysis on the MC dataset we have chosen. We name it crabConfig_tutorial_MC_analysis.py.

from WMCore.Configuration import Configuration
config = Configuration()

config.section_("General")
config.General.requestName = 'tutorial_MC_analysis_test1'
config.General.workArea = 'crab_projects'

config.section_("JobType")
config.JobType.pluginName = 'Analysis'
config.JobType.psetName = 'pset_tutorial_analysis.py'

config.section_("Data")
config.Data.inputDataset = '/GenericTTbar/HC-CMSSW_5_3_1_START53_V5-v1/GEN-SIM-RECO'
config.Data.dbsUrl = 'global'
config.Data.splitting = 'FileBased'
config.Data.unitsPerJob = 10
config.Data.publication = True
config.Data.publishDbsUrl = 'phys03'
config.Data.publishDataName = 'CRAB3_tutorial_MC_analysis_test1'

config.section_("Site")
config.Site.storageSite = <site where the user has permission to write>

Note: We have left the parameter Site.storageSite unspecified on purpose, because it depends on where one has permissions to write.

Note: We are publishing the output ROOT files in DBS even if this is a dummy tutorial example, mainly because we want to show later how to run a task reading a user-produced dataset.

2) CRAB configuration file to run on Data

Here we give the same example CRAB configuration file as above, but set up for running on the Data dataset we have chosen. We name it crabConfig_tutorial_Data_analysis.py.

from WMCore.Configuration import Configuration
config = Configuration()

config.section_("General")
config.General.requestName = 'tutorial_Data_analysis_test5'
config.General.workArea = 'crab_projects'

config.section_("JobType")
config.JobType.pluginName = 'Analysis'
config.JobType.psetName = 'pset_tutorial_analysis.py'

config.section_("Data")
config.Data.inputDataset = '/SingleMu/Run2012B-13Jul2012-v1/AOD'
config.Data.dbsUrl = 'global'
config.Data.splitting = 'LumiBased'
config.Data.unitsPerJob = 20 # 200
config.Data.lumiMask = 'https://cms-service-dqm.web.cern.ch/cms-service-dqm/CAF/certification/Collisions12/8TeV/Prompt/Cert_190456-208686_8TeV_PromptReco_Collisions12_JSON.txt'
#config.Data.lumiMask = 'Cert_190456-208686_8TeV_PromptReco_Collisions12_JSON.txt' # if you downloaded the file in the working directory
config.Data.runRange = '193093-193999' # '193093-194075'
config.Data.publication = True
config.Data.publishDbsUrl = 'phys03'
config.Data.publishDataName = 'CRAB3_tutorial_Data_analysis_test5'

config.section_("Site")
config.Site.storageSite = <site where the user has permission to write>

Note: We have left the parameter Site.storageSite unspecified on purpose, because it depends on where one has permissions to write.

3) CRAB configuration file to generate MC events

Finally, here is an example CRAB configuration file to run the pset_tutorial_MC_generation.py MC event generation code. We name it crabConfig_tutorial_MC_generation.py.

from WMCore.Configuration import Configuration
config = Configuration()

config.section_("General")
config.General.requestName = 'tutorial_MC_generation_test2'
config.General.workArea = 'crab_projects'

config.section_("JobType")
config.JobType.pluginName = 'PrivateMC'
config.JobType.psetName = 'pset_tutorial_MC_generation.py'

config.section_("Data")
config.Data.primaryDataset = 'MinBias'
config.Data.splitting = 'EventBased'
config.Data.unitsPerJob = 10
NJOBS = 10
config.Data.totalUnits = config.Data.unitsPerJob * NJOBS
config.Data.publishDbsUrl = 'phys03'
config.Data.publishDataName = 'CRAB3_tutorial_MC_generation_test2'

config.section_("Site")
config.Site.storageSite = <site where the user has permission to write>

Note: We have left the parameter Site.storageSite unspecified on purpose, because it depends on where one has permissions to write.

CRAB configuration parameters

The table below provides a list of all the available CRAB configuration parameters (organized by sections), including a short description. Mandatory parameters are marked with 2 stars. Other important parameters are marked with 1 star.

Parameter Type Description
Section General    
requestName (*) string A name the user gives to it's request/task. In particular, it is used by CRAB to create a project directory (named crab_<requestName>) where files (logs, a copy of the CRAB configuration file, etc.) corresponding to this particular task will be stored. Defaults to <time-stamp>, where the time stamp is of the form <YYYYMMDD>_<hhmmss> and corresponds to the submission time.
workArea (*) string The area (full or relative path) where to create the CRAB project directory. If the area doesn't exist, CRAB will try to create it using the mkdir command (without -p option). Defaults to the current working directory.
Section JobType    
pluginName (**) string Specifies if this task is running an analysis ('Analysis') on an existing dataset, or is running MC event generation ('PrivateMC').
psetName (*) string The name of the CMSSW parameter-set configuration file that should be run via cmsRun. Defaults to 'pset.py'.
Section Data    
inputDataset (**) string When running an analysis over a dataset registered in DBS, this parameter specifies the name of the dataset. The dataset can be an official CMS dataset or a dataset produced by a user.
primaryDataset (**) string When running MC generation, this parameter specifies the primary dataset name that should be used in the LFN of the output files and for publication (see section Data handling in CRAB below).
dbsUrl (*) string The URL of the DBS reader instance where the input dataset is published. The URL is of the form 'https://cmsweb.cern.ch/dbs/prod/<instance>/DBSReader', where instance can be global, phys01, phys02, phys03 or caf. The default is global instance. The aliases global, phys01, phys02, phys03 and caf01 in place of the whole URLs are also supported (and indeed recommended to avoid typos).
splitting (**) string Mode to use to split the task in jobs. When JobType.pluginName = 'Analysis', the splitting mode can either be 'FileBased' or 'LumiBased' (for Data, the recommended mode is 'LumiBased'). When JobType.pluginName = 'PrivateMC', the splitting mode can only be 'EventBased'.
unitsPerJob (**) integer Suggests (but not impose) how many units (i.e. files, luminosity sections or events -depending on the splitting mode-) to include in each job.
totalUnits (**) integer When JobType.pluginName = 'PrivateMC', this parameter tells how many events to generate in total. It is not possible to use this parameter when JobType.pluginName = 'Analysis'.
lumiMask (*) string A lumi-mask to apply to the input dataset before analysis. Can either be a URL address or the name of a JSON file on disk. Only possible to use when Data.splitting = 'LumiBased'. Default to an empty string (no lumi-sections filter).
runRange (*) string The runs and/or run ranges to process (e.g. '193093-193999,198050,199564'). Runs in Data.lumiMask are filtered according to Data.runRange. Default to an empty string (no run filter).
publication (*) boolean Whether to publish the output ROOT files in DBS or not. Default is True. See also the note on "Publication" below. Currently not able to set to False.
publishDbsUrl (*) string The URL of the DBS writer instance where to publish. The URL is of the form 'https://cmsweb.cern.ch/dbs/prod/<instance>/DBSWriter', where instance can so far only be phys03, and therefore it is set as the default, so the user doesn't have to specify this parameter. The alias phys03 in place of the whole URL is also supported.
publishDataName (*) string A custom string used in both, the LFN of the output files (even if Data.publication = False) and the publication dataset name (if Data.publication = True) (see section Data handling in CRAB below).
Section Site    
storageSite (**) string Site where the output files should be permanently copied to. See the note on "Storage site" below.

Storage site
In CRAB3, by default the output ROOT files of a task are transferred first to a temporary storage element in the site where the job runs, and later from there to a permanent storage element in a destination site. The transfer to the permanent storage element is done asynchronously by a service called AsyncStageOut (ASO). The destination site must be specified in the Site.storageSite parameter in the form 'Tx_yy_zzzzz' (e.g. 'T2_IT_Bari', 'T2_US_Nebraska', etc.). The official names of CMS sites can be found in the SiteDB - Sites web page. The user MUST have write permission in the storage site.

CRAB commands

In this section, we provide a list with the currently available CRAB commands and their explanation. We will see how to use the commands as we go along in the tutorial.

Command Description
submit Submit a task.
status Report the states of jobs in a task (and more).
resubmit Resubmit the failed jobs in a task.
report Get a task final report with the number of analyzed files, events and luminosity sections.
kill Kill all jobs in a task.
getoutput Retrieve the output ROOT files from a task.
getlog Retrieve the log files from a task.
uploadlog Uploads the crab log file to the CRAB cache in the server.
checkwrite Check write permission into a site.
purge Clean-up the user's directory in the schedd's and in crabcache. Not implemented yet.

To run a CRAB command, one has to type:

crab <command>

One can also get a list of available commands invoking the crab help menu:

crab -h

The screen output is something similar to this:

Usage: crab [options] COMMAND [command-options] [args]

Options:
  --version    show program's version number and exit
  -h, --help   show this help message and exit
  -q, --quiet  don't print any messages to stdout
  -d, --debug  print extra messages to stdout

Valid commands are: 
  checkwrite (chk)
  getlog (log)
  getoutput (output) (out)
  kill
  report (rep)
  resubmit
  status (st)
  submit (sub)
  uploadlog (uplog)
To get single command help run:
  crab command --help|-h

For more information on how to run CRAB-3 please follow this link: 
 https://twiki.cern.ch/twiki/bin/view/CMS/RunningCRAB3

Individual commands also provide a help menu showing the options available for the command. To see the help menu for a specific command, just add the -h option:

crab <command> -h

Task name
CRAB defines the name of the task at submission time using the following information: submission date and time, the name of the scheduler machine that received the task request, the username and the request name specified in the CRAB configuration file. The task name has the form <YYMMDD>_<hhmmss>_<schedd>:<username>_crab_<request-name>.

The checkwrite command

The CRAB command checkwrite can be used by a user to check if he/she has write permission to /store/user/<HN-username>/ in a given site. Here is an example how to check write permission in T2_ES_CIEMAT:

crab checkwrite --site=T2_ES_CIEMAT

This command tries to copy a dummy file into /store/user/<HN-username>/ in the specified site using the lcg-cp command. If lcg-cp succeeds, the user can, in principle (read the note below), write to the site. If the checkwrite command fails, it either means that the user is not allowed to write into the site or simply that the lcg-cp command is not supported by the site (unfortunately CRAB doesn't provide yet other means of checking write permission). An example of the second case happens to myself when I try to check T2_US_Nebraska, where I do have permission to write. The checkwrite command gives me the following output:

Attempting to write on site: T2_US_Nebraska 
Executing the command: lcg-cp --connect-timeout 180 /afs/cern.ch/work/a/atanasi/CRAB3-tutorial/CMSSW_7_0_5/src/crab3chkwrite.tmp srm://dcache07.unl.edu:8443/srm/v2/server?SFN=/mnt/hadoop/user/uscms01/pnfs/unl.edu/data4/cms/store/user/atanasi/crab3chkwrite.tmp
Please wait
Error : Error in lcg-cp 
Stdout: 

Stderr: 
[GFAL][get_se_types_and_endpoints][] [BDII][g1_sd_get_se_types_and_endpoints]: No available information
lcg_cp: Invalid argument

Error: Unable to write on site T2_US_Nebraska

Note: There are many sites that have not yet implemented a write permission policy, but this doesn't mean that users are free to use storage space on those sites. In terms of the checkwrite command, a user may see that the command succeeds for a given site even if he/she has not requested the write permission. However, if the user starts to write into such a site, he/she may be banned. Thus, users should ALWAYS ask for write permission before attempting to write into a site.

Running CMSSW analysis with CRAB on MC

In this section, we intend to show how to run an analysis on MC data. We use the CRAB configuration file crabConfig_tutorial_MC_analysis.py previously defined in section CRAB configuration file to run on MC and the CMSSW parameter-set configuration file defined in section CMSSW configuration file to slim an existing dataset.

Task submission

To submit a task, execute the following CRAB command:

crab submit [-c <CRAB-configuration-file>]

where the specification of the CRAB configuration file is only necessary if it is different than ./crabConfig.py.

In our case, we run:

crab submit -c crabConfig_tutorial_MC_analysis.py

and should get an output similar to this:

Sending the request to the server
Your task has been delivered to the CRAB3 server.
Please use 'crab status' to check how the submission process proceed
Log file is /afs/cern.ch/work/a/atanasi/CRAB3-tutorial/CMSSW_7_0_5/src/crab_projects/crab_tutorial_MC_analysis_test1/crab.log

Job creation
Users familiar with the previous version of CRAB (i.e. CRAB2) might wonder "Shouldn't we create the jobs before submitting?". The answer is NO. In CRAB3 the creation of jobs is done by the CRAB server when the CRAB client submits the task request.

Sanity check
CRAB performs a sanity check (e.g. it checks the availability of the selected dataset, the correctness of input parameters, etc.) before even creating the jobs. In case of detecting an error, the submission will not proceed and the user should get a self-explanatory message.

Task name
CRAB defines the name of the task at submission time using the following information: submission date and time, the name of the scheduler machine that received the task request, the username and the request name specified in the CRAB configuration file. The task name has the form <YYMMDD>_<hhmmss>_<schedd>:<username>_crab_<request-name>.

CRAB project directory

The submission command creates a CRAB project directory for the corresponding task, where the CRAB and CMSSW configurations are cached for later usage, avoiding interference with other projects. The CRAB project directory is named crab_<request-name>, where request-name is as specified by the parameter General.requestName in the CRAB configuration file. The CRAB project directory is created inside the directory specified by the parameter General.workArea (which defaults to the current working directory). Thus, using the parameter General.requestName, the user can choose the project name, so that it can later be distinguished from other CRAB projects in the same working area.

The CRAB project directory contains:

  • A crab.log file, containing log information from the CRAB commands that were executed using this project directory.
  • A .requestcache file with cached information of the task request and CRAB configuration.
  • A directory called inputs, containing:
    • A python file named CMSSW_cfg.py with the process object (in pickled string format) from the CMSSW code.
    • A zipped tarball of the input sandbox with the users code, consisting of:
      • The CMSSW_cfg.py file.
      • Any additional private files as specified in the CRAB configuration parameter JobType.inputFiles.
      • If they exist, the directories lib, module and src/data from the CMSSW area.
  • A directory called results, where the files retrieved via crab getlog, crab getoutput and crab report are put.

Note: The CRAB project directory is created even if the submission fails. Indeed, if something goes wrong with the submission, or if the submission is simply interrupted, and the user wishes to execute the submission command again without changing the request name, then he/she should first remove the project directory that has been created by the failed submission. The crab submit command will not override a project directory that already exists; instead it will abort the submission and print a message like this:

Working area '<CRAB-project-directory>' already exists 
Please change the requestName in the config file

Note: Many times the terminology "task name" is also used to refer to the project directory name. In this twiki, we tried to make sure the two terminologies are not mixed.

Specifying the task in CRAB commands

Most of the CRAB commands (all except submit and checkwrite) refer to a task and therefore require to provide the corresponding task name as an input. To pass the task name, one has to actually run the command passing the CRAB project directory name as argument to the --task/-t option. Relative paths and full paths can be used. CRAB will extract the task name from the .requestcache file present in the CRAB project directory. Thus, a CRAB command that requires a task name would always be run adding the CRAB project directory name:

crab <command> -t <CRAB-project-directory>

Task status

To check the status of a task, execute the following CRAB command:

crab status -t <CRAB-project-directory>

In our case, we run:

crab status -t crab_projects/crab_tutorial_MC_analysis_test1

The crab status command will produce an output containing the task name, the status of the task as a whole, the details of how many jobs are in which state (submitted, running, transfering, finished, cooloff, etc.) and the location of the CRAB log (crab.log) file. It will also print the URLs of two web pages that one can use to monitor the jobs. In summary, it should look something like this:

Task name:                      140610_133415_crab3test-3:atanasi_crab_tutorial_MC_analysis_test1
Task status:                    SUBMITTED
Glidemon monitoring URL:        http://glidemon.web.cern.ch/glidemon/jobs.php?taskname=140610_133415_crab3test-3%3Aatanasi_crab_tutorial_MC_analysis_test1
Dashboard monitoring URL:       http://dashb-cms-job.cern.ch/dashboard/templates/task-analysis/#user=atanasi&refresh=0&table=Jobs&p=1&records=25&activemenu=2&status=&site=&tid=140610_133415_crab3test-3%3Aatanasi_crab_tutorial_MC_analysis_test1
Details:                        cooloff         5.6% ( 1/18)
                                idle           16.7% ( 3/18)
                                running        55.6% (10/18)
                                transferring   22.2% ( 4/18)

Publication status:             idle           22.2% ( 4/18)
                                unsubmitted    77.8% (14/18)
Output datasets:                /GenericTTbar/atanasi-CRAB3_tutorial_MC_analysis_test1-37773c17ce2994cf16892d5f04945e41/USER
Output dataset url:             https://cmsweb.cern.ch/das/request?input=%2FGenericTTbar%2Fatanasi-CRAB3_tutorial_MC_analysis_test1-37773c17ce2994cf16892d5f04945e41%2FUSER&instance=prod%2Fphys03
Log file is /afs/cern.ch/work/a/atanasi/CRAB3-tutorial/CMSSW_7_0_5/src/crab_projects/crab_tutorial_MC_analysis_test1/crab.log

One can also get a more detailed status report (showing the state of each job, the job number, the site where the job ran, etc.), by adding the -l (long) option to the crab status command. For our task, we run:

crab status -t crab_projects/crab_tutorial_MC_analysis_test1 -l

and we get:

Task name:                      140610_133415_crab3test-3:atanasi_crab_tutorial_MC_analysis_test1
Task status:                    SUBMITTED
Glidemon monitoring URL:        http://glidemon.web.cern.ch/glidemon/jobs.php?taskname=140610_133415_crab3test-3%3Aatanasi_crab_tutorial_MC_analysis_test1
Dashboard monitoring URL:       http://dashb-cms-job.cern.ch/dashboard/templates/task-analysis/#user=atanasi&refresh=0&table=Jobs&p=1&records=25&activemenu=2&status=&site=&tid=140610_133415_crab3test-3%3Aatanasi_crab_tutorial_MC_analysis_test1
Details:                        running        77.8% (14/18)
                                transferring   22.2% ( 4/18)

Publication status:             idle           22.2% ( 4/18)
                                unsubmitted    77.8% (14/18)
Output datasets:                /GenericTTbar/atanasi-CRAB3_tutorial_MC_analysis_test1-37773c17ce2994cf16892d5f04945e41/USER
Output dataset url:             https://cmsweb.cern.ch/das/request?input=%2FGenericTTbar%2Fatanasi-CRAB3_tutorial_MC_analysis_test1-37773c17ce2994cf16892d5f04945e41%2FUSER&instance=prod%2Fphys03

Extended Job Status Table

 Job State        Most Recent Site        Runtime   Mem (MB)      CPU %    Retries      Waste
   1 transferring T2_US_Florida           0:03:13         42         32          0    0:00:00
   2 transferring T2_US_Florida           0:04:01          2         37          0    0:00:00
   3 transferring T2_BE_UCL               0:05:11        450         25          0    0:00:00
   4 transferring T2_BE_UCL               0:05:03         42         27          0    0:00:00
   5 running      T2_US_Wisconsin         0:10:10        496         14          0    0:00:00
   6 running      T2_US_Wisconsin         0:10:11        487         24          0    0:00:00
   7 running      T2_US_Wisconsin         0:10:27        446         11          0    0:00:00
   8 running      T2_US_Wisconsin         0:05:17        493         12          0    0:00:00
   9 running      T2_US_Wisconsin         0:09:28        450          5          0    0:00:00
  10 running      T2_US_Wisconsin         0:10:11        475         15          0    0:00:00
  11 running      T2_US_Wisconsin         0:10:17        487         19          0    0:00:00
  12 running      T2_US_Wisconsin         0:05:10        499         17          0    0:00:00
  13 running      T2_US_Wisconsin         0:10:10        472         20          0    0:00:00
  14 running      T2_US_Wisconsin         0:10:11        430         12          0    0:00:00
  15 running      T2_US_Wisconsin         0:10:11        437         15          0    0:00:00
  16 running      T2_US_Wisconsin         0:05:20        441         30          0    0:00:00
  17 running      T2_US_Wisconsin         0:10:11        481         14          0    0:00:00
  18 running      T2_US_Wisconsin         0:10:10        464         15          0    0:00:00

Summary:
 * Memory: 42MB min, 499MB max, 394MB ave
 * Runtime: 0:03:13 min, 0:10:27 max, 0:08:02 ave
 * CPU eff: 5% min, 37% max, 17% ave
 * Waste: 0:00:00 (0% of total)

Log file is /afs/cern.ch/work/a/atanasi/CRAB3-tutorial/CMSSW_7_0_5/src/crab_projects/crab_tutorial_MC_analysis_test1/crab.log

Note: Notice that the task has 18 jobs, which is expected, since the input dataset has 177 files and we have set in the CRAB configuration file the splitting mode as 'FileBased' and with 10 units (in this case units means files) per job.

Job states
The job state idle means that the job has been submitted, but is not yet running. On the other hand, the job state cooloff means that the server has not submitted the job yet for the first time or that the job is waiting for automatic resubmission after a recoverable error. For a complete list and explanation of task and job states, please refer to Task and Node States in CRAB3-HTCondor.

Note: If a job fails and the failure reason is considered "recoverable", CRAB will automatically resubmit the job. Thus, it may well happen that one first sees that all jobs are either in running or transferring state, and later one sees jobs in idle or cooloff state. These are jobs that have been automatically resubmitted. The number of automatic resubmissions is shown under the column "Retries".

Once all jobs are done, it may happen that some jobs have actually failed:

Task name:                      140610_133415_crab3test-3:atanasi_crab_tutorial_MC_analysis_test1
Task status:                    FAILED
Glidemon monitoring URL:        http://glidemon.web.cern.ch/glidemon/jobs.php?taskname=140610_133415_crab3test-3%3Aatanasi_crab_tutorial_MC_analysis_test1
Dashboard monitoring URL:       http://dashb-cms-job.cern.ch/dashboard/templates/task-analysis/#user=atanasi&refresh=0&table=Jobs&p=1&records=25&activemenu=2&status=&site=&tid=140610_133415_crab3test-3%3Aatanasi_crab_tutorial_MC_analysis_test1
Details:                        failed          5.6% ( 1/18)
                                finished       94.4% (17/18)

Publication status:             idle           83.3% (15/18)
                                finished       11.1% ( 2/18)
                                unsubmitted     5.6% ( 1/18)
Output datasets:                /GenericTTbar/atanasi-CRAB3_tutorial_MC_analysis_test1-37773c17ce2994cf16892d5f04945e41/USER
Output dataset url:             https://cmsweb.cern.ch/das/request?input=%2FGenericTTbar%2Fatanasi-CRAB3_tutorial_MC_analysis_test1-37773c17ce2994cf16892d5f04945e41%2FUSER&instance=prod%2Fphys03
Log file is /afs/cern.ch/work/a/atanasi/CRAB3-tutorial/CMSSW_7_0_5/src/crab_projects/crab_tutorial_MC_analysis_test1/crab.log

CRAB allows the user to manually resubmit a task (see Task resubmission), which will actually resubmit only the failed jobs in the task.

Eventually, all jobs will succeed and the crab status output should be something like this:

Task name:                      140610_133415_crab3test-3:atanasi_crab_tutorial_MC_analysis_test1
Task status:                    COMPLETED
Glidemon monitoring URL:        http://glidemon.web.cern.ch/glidemon/jobs.php?taskname=140610_133415_crab3test-3%3Aatanasi_crab_tutorial_MC_analysis_test1
Dashboard monitoring URL:       http://dashb-cms-job.cern.ch/dashboard/templates/task-analysis/#user=atanasi&refresh=0&table=Jobs&p=1&records=25&activemenu=2&status=&site=&tid=140610_133415_crab3test-3%3Aatanasi_crab_tutorial_MC_analysis_test1
Details:                        finished      100.0% (18/18)

Publication status:             idle           88.9% (16/18)
                                finished       11.1% ( 2/18)
Output datasets:                /GenericTTbar/atanasi-CRAB3_tutorial_MC_analysis_test1-37773c17ce2994cf16892d5f04945e41/USER
Output dataset url:             https://cmsweb.cern.ch/das/request?input=%2FGenericTTbar%2Fatanasi-CRAB3_tutorial_MC_analysis_test1-37773c17ce2994cf16892d5f04945e41%2FUSER&instance=prod%2Fphys03
Log file is /afs/cern.ch/work/a/atanasi/CRAB3-tutorial/CMSSW_7_0_5/src/crab_projects/crab_tutorial_MC_analysis_test1/crab.log

Task monitoring

There are two independent web services that one can use to monitor CRAB jobs:

  • Dashboard: This service is the one officially maintained by CMS. The jobs states as informed by this service are not necessarily compatible with the report provided by the crab status command. This is because Dashboard doesn't pool for information, but relies on services sending information to it, and this information might be sent only at certain stages or could even sometimes be lost while transmitting over the network.
  • GlideMon: This service is not yet officially maintained by CMS, but works as good as Dashboard. Contrary to Dashboard, GlideMon pools information from other services and in principle might update faster than Dashboard and be more compatible with the report provided by the crab status command.

Note: Given the differences in the way the information is gathered by Dashboard and GlideMon, one should not expect in principle both monitoring services to report exactly the same at a given moment in time. Of course, once a task has finished, the reports must be essentially the same (besides the usage of different terminologies).

Note: The crab status screen output will display the links to the monitoring pages for the task in question, even if the task is still unknown to the monitoring services. The user will get a corresponding error when trying to access the links; he/she should just wait a bit until the task becomes known to the services.

Task resubmission

CRAB allows the user to resubmit a task, which will actually resubmit only the failed jobs in the task. The resubmission command is as follows:

crab resubmit [-t] <CRAB-project-directory>

Using the option -i one can specify a selected list of jobs or ranges of jobs (using the format <jobidA>,<jobidB>-<jobidC>,<jobidD>-<jobidE>,<jobidF>,etc):

crab resubmit <CRAB-project-directory> [-i <comma-separated-list-of-jobs-and/or-job-ranges>]

In the list of jobs, one provides the job number as specified by the crab status -l command under the column "Job".

After resubmission, one should check again the status of the task. For a big task, it is expected that one would have to run a few resubmissions by hand until all jobs finish successfully.

Running CMSSW analysis with CRAB on Data

In this section, we do a similar exercise as in Running CMSSW analysis with CRAB on MC, but now running on real data. We make use of the CRAB configuration file crabConfig_tutorial_Data_analysis.py defined in section CRAB configuration file to run on Data and the CMSSW parameter-set configuration file defined in section CMSSW configuration file to slim an existing dataset. When running on real data, it is most likely one will want to use a lumi-mask file to filter only the good quality runs and luminosity sections in the input dataset. We show how to do this with CRAB.

Using a lumi-mask

When a lumi-mask is specified in the CRAB configuration file, CRAB will run the analysis only on the specified subset of runs and luminosity sections.

Note: Remember that using a lumi-mask is only possible if the task splitting mode is set to be based on luminosity sections, i.e. if Data.splitting = 'LumiBased'.

In this tutorial, we use the lumi-mask file Cert_190456-208686_8TeV_PromptReco_Collisions12_JSON.txt, available at the following page from the CMS Data Quality web service: https://cms-service-dqm.web.cern.ch/cms-service-dqm/CAF/certification/Collisions12/8TeV/Prompt/. This lumi-mask file contains good luminosity sections for all CMS runs (between 190645 and 208686) of the 8 TeV LHC run:

{"190645": [[10, 110]], "190704": [[1, 3]], "190705": [[1, 5], [7, 65], [81, 336], [338, 350], [353, 383]],
...
"208686": [[73, 79], [82, 181], [183, 224], [227, 243], [246, 311], [313, 463]]}

To use a lumi-mask, one has to specify in the CRAB configuration parameter Data.lumiMask either the URL address from where the lumi-mask file can be loaded or download the file locally and specify the file location on disk (using either full or relative path). To download the file, one can use the wget command:

wget --no-check-certificate https://cms-service-dqm.web.cern.ch/cms-service-dqm/CAF/certification/Collisions12/8TeV/Prompt/Cert_190456-208686_8TeV_PromptReco_Collisions12_JSON.txt

We decided to provide the URL, so we have this line in the CRAB configuration file:

config.Data.lumiMask = 'https://cms-service-dqm.web.cern.ch/cms-service-dqm/CAF/certification/Collisions12/8TeV/Prompt/Cert_190456-208686_8TeV_PromptReco_Collisions12_JSON.txt'

If we would have decided to download the file and use that one, we would have set in the CRAB configuration file the path (absolute or relative) to where the file is located:

config.Data.lumiMask = '<path-to-lumi-mask-file>/Cert_190456-208686_8TeV_PromptReco_Collisions12_JSON.txt'

Selecting a run range

A user may eventually be interested in running over specific runs of the dataset, even within a lumi-mask. This is useful for example when willing to run a small task to try things before submitting the full task. Here we will select a short run range, because we want to avoid big tasks for our didactic tutorial purpose. The easiest way to select a run range is to use the CRAB configuration parameter Data.runRange. We choose a run range 193093-193999, so we put in our CRAB configuration file:

config.Data.runRange = '193093-193999'

Another way is to directly manipulate the lumi-mask file to create another one with only the runs of interest (see next section).

Task submission

We submit our task as usual:

crab submit -c crabConfig_tutorial_Data_analysis.py

Sending the request to the server
Your task has been delivered to the CRAB3 server.
Please use 'crab status' to check how the submission process proceed
Log file is /afs/cern.ch/work/a/atanasi/CRAB3-tutorial/CMSSW_7_0_5/src/crab_projects/crab_tutorial_Data_analysis_test5/crab.log

Task status

After submission, we check the task status to see how many jobs are in the task and if nothing has unexpectedly failed:

crab status crab_projects/crab_tutorial_Data_analysis_test5

Task name:                      140611_151859_crab3test-3:atanasi_crab_tutorial_Data_analysis_test5
Task status:                    SUBMITTED
Glidemon monitoring URL:        http://glidemon.web.cern.ch/glidemon/jobs.php?taskname=140611_151859_crab3test-3%3Aatanasi_crab_tutorial_Data_analysis_test5
Dashboard monitoring URL:       http://dashb-cms-job.cern.ch/dashboard/templates/task-analysis/#user=atanasi&refresh=0&table=Jobs&p=1&records=25&activemenu=2&status=&site=&tid=140611_151859_crab3test-3%3Aatanasi_crab_tutorial_Data_analysis_test5
Details:                        idle           47.1% ( 8/17)
                                running        52.9% ( 9/17)

No publication information available yet
Log file is /afs/cern.ch/work/a/atanasi/CRAB3-tutorial/CMSSW_7_0_5/src/crab_projects/crab_tutorial_Data_analysis_test5/crab.log

Running CRAB to generate MC events

Let us finally briefly mention the case in which the user wants to generate MC events from scratch as opposed to analyze an existing dataset. An example CMSSW parameter-set configuration file is given in section CMSSW configuration file to generate MC events, and a corresponding CRAB configuration file is given in section CRAB configuration file to generate MC events. The first important parameter change to notice in the CRAB configuration file is:

config.JobType.pluginName = 'PrivateMC'

This instructs CRAB to do some additional configuration parameter validations on top of the validation done when running an analysis.

The next parameter change is:

config.Data.primaryDataset = 'MinBias'

When running MC generation, there is obviously no input dataset involved, so we do not specify Data.inputDataset. But we still need to specify in the parameter Data.primaryDataset the primary dataset name of the sample we are generating. CRAB uses the primary dataset name as one layer of the directory tree where the output dataset is stored and in the publication dataset name (see section Data handling in CRAB above). In principle, one could define this parameter to be anything, but it is better to use the appropriate primary dataset name used by CMS for the type of events being generated (MinBias is the primary dataset name used by CMS for datasets containing minimum bias events).

Finally, the jobs splitting mode when generating MC events only accepts 'EventBased'. But, as opposed to the case of doing an analysis, now one can (actually must) specify the total number of events one wants to generate:

config.Data.splitting = 'EventBased'
config.Data.unitsPerJob = 10
config.Data.totalUnits = 100

Task submission

We submit our task as usual:

crab submit -c crabConfig_tutorial_MC_generation.py

Sending the request to the server
Your task has been delivered to the CRAB3 server.
Please use 'crab status' to check how the submission process proceed
Log file is /afs/cern.ch/work/a/atanasi/CRAB3-tutorial/CMSSW_7_0_5/src/crab_projects/crab_tutorial_MC_generation_test2/crab.log

Task status

After submission, we check the status of the task to see if the jobs are starting to run:

crab status crab_projects/crab_tutorial_MC_generation_test2

Task name:                      140612_093900_crab3test-3:atanasi_crab_tutorial_MC_generation_test2
Task status:                    QUEUED

No publication information available yet

No jobs created yet!
Log file is /afs/cern.ch/work/a/atanasi/CRAB3-tutorial/CMSSW_7_0_5/src/crab_projects/crab_tutorial_MC_generation_test2/crab.log

The task submission is still queued at some point of the CRAB3 system, implying that the Task Worker has not yet submitted the task to HTCondor. The task should not stay queued for long. So let's check the status again:

Task name:                      140612_093900_crab3test-3:atanasi_crab_tutorial_MC_generation_test2
Task status:                    SUBMITTED
Glidemon monitoring URL:        http://glidemon.web.cern.ch/glidemon/jobs.php?taskname=140612_093900_crab3test-3%3Aatanasi_crab_tutorial_MC_generation_test2
Dashboard monitoring URL:       http://dashb-cms-job.cern.ch/dashboard/templates/task-analysis/#user=atanasi&refresh=0&table=Jobs&p=1&records=25&activemenu=2&status=&site=&tid=140612_093900_crab3test-3%3Aatanasi_crab_tutorial_MC_generation_test2
Details:                        idle           50.0% ( 5/10)
                                unsubmitted    50.0% ( 5/10)

No publication information available yet
Log file is /afs/cern.ch/work/a/atanasi/CRAB3-tutorial/CMSSW_7_0_5/src/crab_projects/crab_tutorial_MC_generation_test2/crab.log

Since these are short duration jobs, after 15 minutes more or less one should already see some outputs being transferred to the permanent storage element:

Task name:                      140612_093900_crab3test-3:atanasi_crab_tutorial_MC_generation_test2
Task status:                    SUBMITTED
Glidemon monitoring URL:        http://glidemon.web.cern.ch/glidemon/jobs.php?taskname=140612_093900_crab3test-3%3Aatanasi_crab_tutorial_MC_generation_test2
Dashboard monitoring URL:       http://dashb-cms-job.cern.ch/dashboard/templates/task-analysis/#user=atanasi&refresh=0&table=Jobs&p=1&records=25&activemenu=2&status=&site=&tid=140612_093900_crab3test-3%3Aatanasi_crab_tutorial_MC_generation_test2
Details:                        finished       20.0% ( 2/10)
                                running        60.0% ( 6/10)
                                transferring   20.0% ( 2/10)

Publication status:             idle           30.0% (3/10)
                                finished       10.0% (1/10)
                                unsubmitted    60.0% (6/10)
Log file is /afs/cern.ch/work/a/atanasi/CRAB3-tutorial/CMSSW_7_0_5/src/crab_projects/crab_tutorial_MC_generation_test2/crab.log

Task name:                      140612_093900_crab3test-3:atanasi_crab_tutorial_MC_generation_test2
Task status:                    FAILED
Glidemon monitoring URL:        http://glidemon.web.cern.ch/glidemon/jobs.php?taskname=140612_093900_crab3test-3%3Aatanasi_crab_tutorial_MC_generation_test2
Dashboard monitoring URL:       http://dashb-cms-job.cern.ch/dashboard/templates/task-analysis/#user=atanasi&refresh=0&table=Jobs&p=1&records=25&activemenu=2&status=&site=&tid=140612_093900_crab3test-3%3Aatanasi_crab_tutorial_MC_generation_test2
Details:                        failed         10.0% ( 1/10)
                                finished       90.0% ( 9/10)

Publication status:             idle           80.0% (8/10)
                                finished       10.0% (1/10)
                                unsubmitted    10.0% (1/10)
Log file is /afs/cern.ch/work/a/atanasi/CRAB3-tutorial/CMSSW_7_0_5/src/crab_projects/crab_tutorial_MC_generation_test2/crab.log

Good, we got one failed job! This is a good occasion to show the crab resubmit command in action.

Task resubmission

In the previous task status output we saw that we got one failed job. To resubmit all failed jobs we execute:

crab resubmit crab_projects/crab_tutorial_MC_generation_test2

Resubmit request successfully sent
Log file is /afs/cern.ch/work/a/atanasi/CRAB3-tutorial/CMSSW_7_0_5/src/crab_projects/crab_tutorial_MC_generation_test2/crab.log

Checking the status again after some time it will hopefully show that all jobs have now finished successfully:

Task name:                      140612_093900_crab3test-3:atanasi_crab_tutorial_MC_generation_test2
Task status:                    COMPLETED
Glidemon monitoring URL:        http://glidemon.web.cern.ch/glidemon/jobs.php?taskname=140612_093900_crab3test-3%3Aatanasi_crab_tutorial_MC_generation_test2
Dashboard monitoring URL:       http://dashb-cms-job.cern.ch/dashboard/templates/task-analysis/#user=atanasi&refresh=0&table=Jobs&p=1&records=25&activemenu=2&status=&site=&tid=140612_093900_crab3test-3%3Aatanasi_crab_tutorial_MC_generation_test2
Details:                        finished      100.0% (10/10)

Publication status:             idle            0.0% ( 0/10)
                                finished      100.0% (10/10)
Log file is /afs/cern.ch/work/a/atanasi/CRAB3-tutorial/CMSSW_7_0_5/src/crab_projects/crab_tutorial_MC_generation_test2/crab.log

More CRAB configuration parameters

Parameter Type Description
Section General    
transferOutput (*) boolean Whether to transfer the output to the storage site or leave it at the runtime site. (Not transferring the output might be useful for example to avoid filling up the storage area with useless files when the user is just doing some test.) See also the note on "Publication" below. Defaults to True. Currently not able to set to False.
saveLogs (*) boolean Whether or not to copy the cmsRun stdout / stderr to the storage site. If set to False, the last 1 MB of each job are still available through the monitoring in the job logs files and the full logs can be retrieved from the runtime site with crab getlog. Defaults to False. Currently not able to set to True.
Section JobType    
pyCfgParams list of strings List of parameters to pass to the CMSSW parameter-set configuration file, as explained here. For example, if set to ['myOption','param1=value1','param2=value2'], then the jobs will execute cmsRun JobType.psetName myOption param1=value1 param2=value2.
inputFiles list of strings List of private input files needed by the jobs.
outputFiles list of strings List of output files that need to be collected, besides those already specified in the output modules or TFileService of the CMSSW parameter-set configuration file.
allowNonProductionCMSSW boolean Set to True to allow using a CMSSW release that is not a production release. Defaults to False.
maxmemory integer Maximum amount of memory (in MB) a job is allowed to use.
maxjobruntime integer The maximum runtime per job, in hours. Jobs running longer than this amount of time will be removed.
numcores integer Number of requested cores per job. Defaults to 1.
priority integer Task priority among the user's own tasks. Higher priority tasks will be processed before lower priority. Two tasks of equal priority will have their jobs start in an undefined order. The first five jobs in a task are given a priority boost of 10. Defaults to 10.
externalPluginFile string Name of a plug-in provided by the user and which should be run instead of the standard CRAB plug-in Analysis or PrivateMC. Can not be specified together with pluginName; is either one or the other. Not supported yet.
Section Data    
outlfn (*) string The first part of the LFN of the output files. See section CRAB output files naming convention below. Accepted values are: /store/user/<HN-username>/[<subdirA/subdirB/etc>] (the trailing / after the username can not be omitted if a subdir is not given) and /store/group/<group-name>[/<subdirA/subdirB/etc>]. Defaults to /store/user/<HN-username>/.
ignoreLocality boolean Set to True to allow jobs to run at any site, regardless of whether the dataset is located at that site or not. Remote file access is done using Xrootd. The parameters Site.whitelist and Site.blacklist are still respected. This parameter is useful to allow jobs to run on other sites when for example a dataset is available on only one or a few sites which are very busy with jobs. Defaults to False.
userInputFile string When running an analysis over private input files, one should create a text file with the names (giving the full path) of the input files, and specify in this parameter the name of the text file. Not supported yet.
Section Site    
whitelist list of strings A user-specified list of sites where the jobs can run. For example: ['T2_CH_CERN','T2_IT_Bari',...]. Jobs will not be assigned to a site that is not in the white list.
blacklist list of strings A user-specified list of sites where the jobs should not run. Useful to avoid jobs to run on a site where the user knows they will fail (e.g. because of temporary problems with the site).
Section User    
voGroup string The VO group that should be used with the proxy and under which the task should be submitted.
voRole string The VO role that should be used with the proxy and under which the task should be submitted.

Data handling in CRAB

All successful jobs in a task produce output ROOT files which are eventually copied (staged-out) into the CMS site permanent storage element specified in the Site.storageSite parameter of the CRAB configuration file. The jobs (and the task itself) also produce log files, which are by default not copied to the permanent storage element, but kept in the temporary storage element of the running site. The user can force the stage-out of the log files to the permanent storage element by setting General.saveLogs = True in the CRAB configuration file. In either case, log files are easily accessible via the monitoring web pages. The user can also disable the stage-out of the output ROOT files by means of the General.transferOutput parameter, but in that case publication will not be possible. If output ROOT were successfully transferred to the permanent storage element, CRAB will automatically (and by default) publish the output dataset in DBS. This is done by the same service (ASO) which does the stage-out. If the user wants to disable the publication, he/she can set Data.publication = False in the CRAB configuration file.

In the subsections below, the convention followed for the naming of the output ROOT files and for publication are explained. For a more complete explanation on how data are handled in CRAB3, see Data Handling in CRAB3.

CRAB output files naming convention

When using CRAB, the LFN of the permanently stored output ROOT files are of the following form:

  • For files stored in the user's storage space:

/store/user/<HN-username>[/<subdir-tree>]/<primary-dataset>/<publication-name>/<time-stamp>/<counter>/<file-name>

For example: /store/user/atanasi/GenericTTbar/CRAB3-tutorial/140503_173849/0000/output_1.root.

  • For files stored in a group's storage space:

/store/group/<group-name>[/<subdir-tree>]/<primary-dataset>/<publication-name>/<time-stamp>/<counter>/<file-name>

Both LFNs can be summarized in the following general form:

<lfn-prefix>/<primary-dataset>/<publication-name>/<time-stamp>/<counter>/<file-name>

The naming of the temporary files is the same, except one has to change /store/user/<HN-username> by /store/temp/user/<HN-username>.<some-hash> and /store/group by /store/temp/group respectively. The table below provides an explanation of each of the fields appearing in this generic LFN form.

Field Description
HN-username The CMS Hypernews name of the user who submitted the task.
group-name The name of an CMS e-group to which the user who submitted the task is subscribed and allowed to write.
lfn-prefix The first part of the LFN, as specified by the user in the Data.outlfn parameter in the CRAB configuration file.
primary-dataset The primary dataset name as specified in the Data.primaryDataset parameter in the CRAB configuration file (when running MC-generation) or as extracted from the Data.inputDataset parameter (when running an analysis). For example, the primary dataset name of /GenericTTbar/HC-CMSSW_5_3_1_START53_V5-v1/GEN-SIM-RECO is GenericTTbar.
publication-name The user-specified portion of the publication name of the dataset, as specified in the Data.publishDataName parameter in the CRAB configuration file. Defaults to <time-stamp>_crab_<General.requestName>. This name is used for the output files even if Data.publication = False.
time-stamp A timestamp, based on when the task was submitted. A task submitted at 17:38:49 on 27 April 2014 would result in a timestamp of 140427_173849. The timestamp is used to prevent multiple otherwise-identical user tasks from overwriting each others' files.
counter A four-digit counter, used to prevent more than 1000 output files residing in the same directory. Outputs 1-1000 are kept in directory 0000; outputs 1001-2000 are kept in directory 0001; etc.
file-name The filename specified in the user's CMSSW parameter-set configuration file, with the job counter added in. If the parameter-set specifies an output file named output.root, the output file from job N will be output_N.root.

DBS publication naming convention

When publishing a user-produced dataset, the dataset name in DBS is of the following form:

/<primary-dataset>/<HN-username>-<publication-name>-<pset-hash>/USER

The fields primary-dataset, HN-username and publication-name are the same as used for the naming of the output files (see section CRAB output files naming convention above). The remaining parameter, pset-hash, is a hash produced from the CMSSW code used by the cmsRun job. The hash guarantees that every different CMSSW code has a distinct output dataset name in DBS, even if publication-name is not changed. It also allows to keep the same publication-name when re-doing a dataset after a modification in the CMSSW code (e.g. a modification to fix a bug in the previously produced dataset). Notice also that, by keeping the same publication-name, a user can add new files to an already published dataset by running a new task, as long as the CMSSW code to produce the new files is the same as the one used to produce the original files. This is the case in which a user wants to extend a dataset as data become available with time over the course of an LHC run.

Running CMSSW analysis with CRAB on MC (continuation)

Task status

Hopefully now all our jobs have finished:

crab status -t crab_projects/crab_tutorial_MC_analysis_test1

Task name:                      140610_133415_crab3test-3:atanasi_crab_tutorial_MC_analysis_test1
Task status:                    COMPLETED
Glidemon monitoring URL:        http://glidemon.web.cern.ch/glidemon/jobs.php?taskname=140610_133415_crab3test-3%3Aatanasi_crab_tutorial_MC_analysis_test1
Dashboard monitoring URL:       http://dashb-cms-job.cern.ch/dashboard/templates/task-analysis/#user=atanasi&refresh=0&table=Jobs&p=1&records=25&activemenu=2&status=&site=&tid=140610_133415_crab3test-3%3Aatanasi_crab_tutorial_MC_analysis_test1
Details:                        finished      100.0% (18/18)

Publication status:             idle           88.9% (16/18)
                                finished       11.1% ( 2/18)
Output datasets:                /GenericTTbar/atanasi-CRAB3_tutorial_MC_analysis_test1-37773c17ce2994cf16892d5f04945e41/USER
Output dataset url:             https://cmsweb.cern.ch/das/request?input=%2FGenericTTbar%2Fatanasi-CRAB3_tutorial_MC_analysis_test1-37773c17ce2994cf16892d5f04945e41%2FUSER&instance=prod%2Fphys03
Log file is /afs/cern.ch/work/a/atanasi/CRAB3-tutorial/CMSSW_7_0_5/src/crab_projects/crab_tutorial_MC_analysis_test1/crab.log

Task report

One can obtain a short report about a task, containing the total number of events and files processed by completed jobs, plus a summary file of the runs and luminosity sections processed by completed jobs written into the task's results subdirectory. To get the report, execute the following CRAB command:

crab report [-t] <CRAB-project-directory>

In our case, we run:

crab report crab_projects/crab_tutorial_MC_analysis_test1

If all jobs have completed, it should produce an output like this:

177 files have been read
300000 events have been read
Analyzed lumi written to /afs/cern.ch/work/a/atanasi/CRAB3-tutorial/CMSSW_7_0_5/src/crab_projects/crab_tutorial_MC_analysis_test1/results/lumiSummary.json
Log file is /afs/cern.ch/work/a/atanasi/CRAB3-tutorial/CMSSW_7_0_5/src/crab_projects/crab_tutorial_MC_analysis_test1/crab.log

And the lumiSummary.json file looks something like this:

{"1": [[666666, 672665]]}

This is a sequence of luminosity section ranges processed for run 1 (this dataset has only one run, which was given run number 1).

Task log files retrieval

Unfortunately this command is broken today. It should become functional again next Tuesday.

Both of the monitoring web interfaces, Dashboard and GlideMon, provide access to log files. From these log files, there are essentially two (per job) that are most relevant (from a CRAB point of view): the one known as the "job" log (job_out.<job-number>.<job-retry-count>.txt) and the one known as the "postjob" log (postjob.<job-number>.<job-retry-count>.txt). These two files contain, respectively, log information from the running job itself, including CRAB and cmsRun log, and from the post-processing (essentially the stage-out part). These files are located in the users home directory in the scheduler machine that submitted the jobs to the Grid. To avoid filling up the scheduler machines with information that is not relevant for CRAB developers and support crew, the cmsRun part in the job log is restricted to the last 1MB. On the other hand, the full cmsRun stdout and stderr log files are available in the temporary storage element where the job ran, and in the permanent storage element if the parameter General.saveLogs was set to True in the CRAB configuration file. The user can retrieve the full cmsRun logs using the following CRAB command:

crab getlog [-t] <CRAB-project-directory> [-i <comma-separated-list-of-jobs-and/or-job-ranges>]

For example, suppose we want to retrieve the log files for jobs 1 and 3 in the task. We execute:

crab getlog crab_projects/crab_tutorial_MC_analysis_test1 -i 1,3

The screen output is something like this:

Setting the destination to /afs/cern.ch/work/a/atanasi/CRAB3-tutorial/CMSSW_7_0_5/src/crab_projects/crab_tutorial_MC_analysis_test1/results 
Retrieving 2 files
Placing file 'cmsRun_3.log.tar.gz' in retrieval queue 
Placing file 'cmsRun_1.log.tar.gz' in retrieval queue 
Please wait
Retrieving cmsRun_3.log.tar.gz 
Retrieving cmsRun_1.log.tar.gz 
Success in retrieving cmsRun_3.log.tar.gz 
Success in retrieving cmsRun_1.log.tar.gz 
All files successfully retrieve 
Log file is /afs/cern.ch/work/a/atanasi/CRAB3-tutorial/CMSSW_7_0_5/src/crab_projects/crab_tutorial_MC_analysis_test1/crab.log

The log files (assembled in zipped tarballs) are copied into the task's results subdirectory. To unzip and extract the log files, one can use the command tar -zxvf. In our case, we do:

cd crab_projects/crab_tutorial_MC_analysis_test1/results
tar -zxvf cmsRun_1.log.tar.gz

The screen output shows that cmsRun_1.log.tar.gz has three files in it: the cmsRun stdout and stderr log files and a job report file (which has information of the content of the output ROOT file, plus some job performance report).

cmsRun-stdout-1.log
cmsRun-stderr-1.log
FrameworkJobReport-1.xml

Finally, one can use its favourite editor to inspect the files.

Note: Opening the crab.log file, one can see the following kind of messages showing us that to copy the remote files to our local area the lcg-cp command was used:

DEBUG 2014-06-10 17:19:27,727:   Executing lcg-cp --connect-timeout 20 --verbose -b -D srmv2 --sendreceive-timeout 1800  --srm-timeout 60  srm://ingrid-se02.cism.ucl.ac.be:8444/srm/managerv2?SFN=/storage/data/cms/store/temp/user/atanasi.49f14a3a459fb095e75e64b6cc9e824bb642b6e1/GenericTTbar/CRAB3_tutorial_MC_analysis_test1/140610_133415/0000/log/cmsRun_3.log.tar.gz file:///afs/cern.ch/work/a/atanasi/CRAB3-tutorial/CMSSW_7_0_5/src/crab_projects/crab_tutorial_MC_analysis_test1/results/cmsRun_3.log.tar.gz

Task output retrieval

Unfortunately this command is broken today. It should become functional again next Tuesday.

In case one wants to retrieve some output ROOT files of a task, one can do so with the following CRAB command:

crab getoutput [-t] <CRAB-project-directory> [-i <comma-separated-list-of-jobs-and/or-job-ranges>]

The files are copied into the corresponding task's results subdirectory.

For our task, running:

crab getoutput crab_projects/crab_tutorial_MC_analysis_test1 -i 11

retrieves the output ROOT file from job number 11 in the task (i.e. it retrieves the file output_11.root). This command produces a screen output like this:

Setting the destination to /afs/cern.ch/work/a/atanasi/CRAB3-tutorial/CMSSW_7_0_5/src/crab_projects/crab_tutorial_MC_analysis_test1/results 
Retrieving 1 files
Placing file 'output_11.root' in retrieval queue 
Please wait
Retrieving output_11.root 
Success in retrieving output_11.root 
All files successfully retrieve 
Log file is /afs/cern.ch/work/a/atanasi/CRAB3-tutorial/CMSSW_7_0_5/src/crab_projects/crab_tutorial_MC_analysis_test1/crab.log

We can open the file and check for example that it contains only the branches we have specified.

Show Hide output_11.root content.

Output dataset publication

If publication was not disabled in the CRAB configuration file, CRAB will automatically publish the task output dataset in DBS. However, this may take some time even after the jobs have completed. The publication timing logic is as follows: the first files available (whatever is the number) are published immediately; then ASO waits for 100 files (per user) or a maximum 8 hours. We plan to trigger publication also at task completion time. One can check the publication status using the crab status command. For our task, we run:

crab status crab_projects/crab_tutorial_MC_analysis_test1

and we get:

Task name:                      140610_133415_crab3test-3:atanasi_crab_tutorial_MC_analysis_test1
Task status:                    COMPLETED
Glidemon monitoring URL:        http://glidemon.web.cern.ch/glidemon/jobs.php?taskname=140610_133415_crab3test-3%3Aatanasi_crab_tutorial_MC_analysis_test1
Dashboard monitoring URL:       http://dashb-cms-job.cern.ch/dashboard/templates/task-analysis/#user=atanasi&refresh=0&table=Jobs&p=1&records=25&activemenu=2&status=&site=&tid=140610_133415_crab3test-3%3Aatanasi_crab_tutorial_MC_analysis_test1
Details:                        finished      100.0% (18/18)

Publication status:             idle           88.9% (16/18)
                                finished       11.1% ( 2/18)
Output datasets:                /GenericTTbar/atanasi-CRAB3_tutorial_MC_analysis_test1-37773c17ce2994cf16892d5f04945e41/USER
Output dataset url:             https://cmsweb.cern.ch/das/request?input=%2FGenericTTbar%2Fatanasi-CRAB3_tutorial_MC_analysis_test1-37773c17ce2994cf16892d5f04945e41%2FUSER&instance=prod%2Fphys03
Log file is /afs/cern.ch/work/a/atanasi/CRAB3-tutorial/CMSSW_7_0_5/src/crab_projects/crab_tutorial_MC_analysis_test1/crab.log

The publication state idle means that the publication request for the corresponding job output has not yet been processed by ASO. Once ASO starts to process the request, the publication state becomes running, and finally it becomes either finished if the publication succeeded or failed if it didn't. If the publication state is idle, the ("urged") user can force the publication of the already completed jobs (and transferred outputs) by issuing the following CRAB command:

NOT YET IMPLEMENTED

crab publish -t <CRAB-project-directory>

Since this feature is not implemented yet, we have no other option than just wait..., until we get the following crab status output:

Task name:                      140610_133415_crab3test-3:atanasi_crab_tutorial_MC_analysis_test1
Task status:                    COMPLETED
Glidemon monitoring URL:        http://glidemon.web.cern.ch/glidemon/jobs.php?taskname=140610_133415_crab3test-3%3Aatanasi_crab_tutorial_MC_analysis_test1
Dashboard monitoring URL:       http://dashb-cms-job.cern.ch/dashboard/templates/task-analysis/#user=atanasi&refresh=0&table=Jobs&p=1&records=25&activemenu=2&status=&site=&tid=140610_133415_crab3test-3%3Aatanasi_crab_tutorial_MC_analysis_test1
Details:                        finished      100.0% (18/18)

Publication status:             idle            0.0% ( 0/18)
                                finished      100.0% (18/18)
Output datasets:                /GenericTTbar/atanasi-CRAB3_tutorial_MC_analysis_test1-37773c17ce2994cf16892d5f04945e41/USER
Output dataset url:             https://cmsweb.cern.ch/das/request?input=%2FGenericTTbar%2Fatanasi-CRAB3_tutorial_MC_analysis_test1-37773c17ce2994cf16892d5f04945e41%2FUSER&instance=prod%2Fphys03
Log file is /afs/cern.ch/work/a/atanasi/CRAB3-tutorial/CMSSW_7_0_5/src/crab_projects/crab_tutorial_MC_analysis_test1/crab.log

The publication name of our output dataset is /GenericTTbar/atanasi-CRAB3_tutorial_MC_analysis_test1-37773c17ce2994cf16892d5f04945e41/USER (yours will contain your username and maybe another hash).

One can get more details about the published dataset using the DAS web interface. The URL pointing to our dataset in DAS is also given in the crab status screen output. One can use this direct link or query DAS using the publication dataset name as shown in the screenshot below. Notice that we are searching in DBS instance phys03.

Show Hide DAS query for our dataset.

We can also get the list of files in our dataset by clicking on "Files", and check that the files naming is as expected (i.e. as explained in section CRAB output files naming convention).

Show Hide DAS query for files in our dataset.

Running on the published dataset

Lets assume we want to run another analysis (e.g. just another slimming) over the output dataset that we have produced (and published) in our previous example. We will use again the CRAB configuration file crabConfig_tutorial_MC_analysis.py defined in section CRAB configuration file to run on MC, but we have to change the following:

  • The request name.
  • The dataset name.
  • The DBS reader URL (since we have published our dataset in the phys03 instance, while the default instance for reading is global).

To keep an order, we first copy the CRAB configuration file with a new name:

cp crabConfig_tutorial_MC_analysis.py crabConfig_tutorial_MCUSER_analysis.py

Now, in crabConfig_tutorial_MCUSER_analysis.py, we change:

config.General.requestName = 'tutorial_MC_analysis_test1'

config.Data.inputDataset = '/GenericTTbar/HC-CMSSW_5_3_1_START53_V5-v1/GEN-SIM-RECO'
config.Data.dbsUrl = 'global'

to:

config.General.requestName = 'tutorial_MCUSER_analysis_test1'

config.Data.inputDataset = '/GenericTTbar/atanasi-CRAB3_tutorial_MC_analysis_test1-37773c17ce2994cf16892d5f04945e41/USER'
config.Data.dbsUrl = 'phys03'

If you want, you can change the pset_tutorial_analysis.py CMSSW parameter-set configuration file to do some further slimming. E.g. you can change this line:

process.output = cms.OutputModule("PoolOutputModule",
    outputCommands = cms.untracked.vstring("drop *", "keep recoTracks_*_*_*"),
    fileName = cms.untracked.string('output.root'),
)

to:

process.output = cms.OutputModule("PoolOutputModule",
    outputCommands = cms.untracked.vstring("drop *", "keep recoTracks_globalMuons_*_*"),
    fileName = cms.untracked.string('output.root'),
)

Now we can run CRAB just as before:

  1. Submit the task.
  2. Check the task status (there should be only 2 jobs).
  3. Get a task report.
  4. If necessary, retrieve log files (we will skip this).
  5. When jobs are done, retrieve the output.

We show the output we get for each of these steps:

crab submit -c crabConfig_tutorial_MCUSER_analysis.py

Sending the request to the server
Your task has been delivered to the CRAB3 server.
Please use 'crab status' to check how the submission process proceed
Log file is /afs/cern.ch/work/a/atanasi/CRAB3-tutorial/CMSSW_7_0_5/src/crab_projects/crab_tutorial_MCUSER_analysis_test1/crab.log

crab status crab_projects/crab_tutorial_MCUSER_analysis_test1

Task name:                      140611_080528_crab3test-3:atanasi_crab_tutorial_MCUSER_analysis_test1
Task status:                    SUBMITTED
Glidemon monitoring URL:        http://glidemon.web.cern.ch/glidemon/jobs.php?taskname=140611_080528_crab3test-3%3Aatanasi_crab_tutorial_MCUSER_analysis_test1
Dashboard monitoring URL:       http://dashb-cms-job.cern.ch/dashboard/templates/task-analysis/#user=atanasi&refresh=0&table=Jobs&p=1&records=25&activemenu=2&status=&site=&tid=140611_080528_crab3test-3%3Aatanasi_crab_tutorial_MCUSER_analysis_test1
Details:                        running       100.0% (2/2)

No publication information available yet
Log file is /afs/cern.ch/work/a/atanasi/CRAB3-tutorial/CMSSW_7_0_5/src/crab_projects/crab_tutorial_MCUSER_analysis_test1/crab.log

And after some time:

Task name:                      140611_080528_crab3test-3:atanasi_crab_tutorial_MCUSER_analysis_test1
Task status:                    COMPLETED
Glidemon monitoring URL:        http://glidemon.web.cern.ch/glidemon/jobs.php?taskname=140611_080528_crab3test-3%3Aatanasi_crab_tutorial_MCUSER_analysis_test1
Dashboard monitoring URL:       http://dashb-cms-job.cern.ch/dashboard/templates/task-analysis/#user=atanasi&refresh=0&table=Jobs&p=1&records=25&activemenu=2&status=&site=&tid=140611_080528_crab3test-3%3Aatanasi_crab_tutorial_MCUSER_analysis_test1
Details:                        finished      100.0% (2/2)

Publication status:             idle           50.0% (1/2)
                                finished       50.0% (1/2)
Log file is /afs/cern.ch/work/a/atanasi/CRAB3-tutorial/CMSSW_7_0_5/src/crab_projects/crab_tutorial_MCUSER_analysis_test1/crab.log

crab report crab_projects/crab_tutorial_MCUSER_analysis_test1

18 files have been read
300000 events have been read
Analyzed lumi written to /afs/cern.ch/work/a/atanasi/CRAB3-tutorial/CMSSW_7_0_5/src/crab_projects/crab_tutorial_MCUSER_analysis_test1/results/lumiSummary.json
Log file is /afs/cern.ch/work/a/atanasi/CRAB3-tutorial/CMSSW_7_0_5/src/crab_projects/crab_tutorial_MCUSER_analysis_test1/crab.log

crab getoutput crab_projects/crab_tutorial_MCUSER_analysis_test1

Setting the destination to /afs/cern.ch/work/a/atanasi/CRAB3-tutorial/CMSSW_7_0_5/src/crab_projects/crab_tutorial_MCUSER_analysis_test1/results 
Retrieving 2 files
Placing file 'output_1.root' in retrieval queue 
Placing file 'output_2.root' in retrieval queue 
Please wait
Retrieving output_1.root 
Retrieving output_2.root 
Success in retrieving output_1.root 
Success in retrieving output_2.root 
All files successfully retrieve 
Log file is /afs/cern.ch/work/a/atanasi/CRAB3-tutorial/CMSSW_7_0_5/src/crab_projects/crab_tutorial_MCUSER_analysis_test1/crab.log

Opening one of the files, we can check that it contains only the branches we have specified.

Show Hide output_1.root content.
TBrowser_output_1.root_content.png

Running CMSSW analysis with CRAB on Data (continuation)

Task status

These jobs should finish very fast. So if we do:

crab status -t crab_projects/crab_tutorial_Data_analysis_test5

at this point of the tutorial we should get:

Task name:                      140611_151859_crab3test-3:atanasi_crab_tutorial_Data_analysis_test5
Task status:                    COMPLETED
Glidemon monitoring URL:        http://glidemon.web.cern.ch/glidemon/jobs.php?taskname=140611_151859_crab3test-3%3Aatanasi_crab_tutorial_Data_analysis_test5
Dashboard monitoring URL:       http://dashb-cms-job.cern.ch/dashboard/templates/task-analysis/#user=atanasi&refresh=0&table=Jobs&p=1&records=25&activemenu=2&status=&site=&tid=140611_151859_crab3test-3%3Aatanasi_crab_tutorial_Data_analysis_test5
Details:                        finished      100.0% (17/17)

Publication status:             idle           94.1% (16/17)
                                finished        5.9% ( 1/17)
Log file is /afs/cern.ch/work/a/atanasi/CRAB3-tutorial/CMSSW_7_0_5/src/crab_projects/crab_tutorial_Data_analysis_test5/crab.log

Task report

Getting the task report is relevant when running on real data, because of the files we get containing the analyzed and the non-analyzed luminosity sections. Remember that the report includes only successfully done jobs.

crab report crab_projects/crab_tutorial_Data_analysis_test5

83 files have been read
295042 events have been read
Analyzed lumi written to /afs/cern.ch/work/a/atanasi/CRAB3-tutorial/CMSSW_7_0_5/src/crab_projects/crab_tutorial_Data_analysis_test5/results/lumiSummary.json
Not Analyzed lumi written to /afs/cern.ch/work/a/atanasi/CRAB3-tutorial/CMSSW_7_0_5/src/crab_projects/crab_tutorial_Data_analysis_test5/results/missingLumiSummary.json
Log file is /afs/cern.ch/work/a/atanasi/CRAB3-tutorial/CMSSW_7_0_5/src/crab_projects/crab_tutorial_Data_analysis_test5/crab.log

The content of our lumiSummary.json file is:

{"193834": [[1, 35]], "193835": [[1, 20], [22, 26]], "193836": [[1, 2]], "193999": [[1, 50]], "193998": [[66, 113], [115, 278]]}

And the content of our missingLumiSummary.json file is:

{"193093": [[1, 33]], "193124": [[1, 52]], "193123": [[1, 27]], 
"193575": [[48, 173], [176, 349], [351, 394], [397, 415], [417, 658], [660, 752]], 
"193541": [[77, 101], [103, 413], [416, 575], [578, 619]], 
"193621": [[60, 570], [573, 769], [772, 976], [979, 1053], [1056, 1137], [1139, 1193], [1195, 1371], [1373, 1654]], 
"193557": [[1, 84]], "193556": [[41, 83]], "193207": [[54, 182]], 
"193336": [[1, 264], [267, 492], [495, 684], [687, 729], [732, 951]], "193334": [[29, 172]]}

The missingLumiSummary.json file is just the difference between the input lumi-mask file and the lumiSummary.json file. This means that the missingLumiSummary.json file does not have to be necessarily an empty file even if all jobs have completed successfully. And this is simply because the input dataset and the input lumi-mask file may not span the same overall range of runs and luminosity sections. For example, take the extreme (dummy) case in which an input dataset from the 2012 LHC run is analyzed using a lumi-mask file from the 2011 LHC run; this will result in a missingLumiSummary.json file that is identical to the input lumi-mask file, no matter if the jobs succeed or failed. In our example above, we have a non-empty missingLumiSummary.json file, but the runs in there are all runs that are not in the input dataset.

Re-analyzing the missing luminosity sections

If (and only if) some jobs have failed, and one would like to submit a new task (as opposed to resubmit the failed jobs) analyzing only the luminosity sections that were not analyzed by the original task, then one can use the missingLumiSummary.json file as the lumi-mask for the new task. Moreover, keeping the same Data.publishDataName as in the original task, the new outputs will be published in the same dataset of the original task. Thus, one would in principle only change in the CRAB configuration file the request name and the lumi-mask file. For our task, this would mean to change:

config.General.requestName = 'tutorial_Data_analysis_test5'

config.Data.lumiMask = 'https://cms-service-dqm.web.cern.ch/cms-service-dqm/CAF/certification/Collisions12/8TeV/Prompt/Cert_190456-208686_8TeV_PromptReco_Collisions12_JSON.txt'

to, for example:

config.General.requestName = 'tutorial_Data_analysis_test5_missingLumis'

config.Data.lumiMask = 'crab_projects/crab_tutorial_Data_analysis_test5/results/missingLumiSummary.json'

One could also change the number of luminosity sections to analyze per job (Data.unitsPerJob); e.g. one could decrease it so that to have shorter jobs.

Once these changes are done in the CRAB configuration file, one would just submit the new task.

Running CRAB to generate MC events (continuation)

Task report

After all jobs in the task have finished, one can get the final task report. For MC generation, the report shows how many events were generated and returns the lumiSummary.json file with the run and luminosity section numbers assigned to the events. For our task, running:

crab report crab_projects/crab_tutorial_MC_generation_test2

produces a screen output like this:

0 files have been read
100 events have been read
Analyzed lumi written to /afs/cern.ch/work/a/atanasi/CRAB3-tutorial/CMSSW_7_0_5/src/crab_projects/crab_tutorial_MC_generation_test2/results/lumiSummary.json
Log file is /afs/cern.ch/work/a/atanasi/CRAB3-tutorial/CMSSW_7_0_5/src/crab_projects/crab_tutorial_MC_generation_test2/crab.log

The content of the lumiSummary.json file is:

{"1": [[1, 10]]}

This is, the same run number ("1") was used for all the 10 jobs, and the luminosity section number was just the job number.

Output dataset publication

As already mentioned many times along this page, the publication of the task output dataset in DBS will be performed automatically by CRAB (more specifically, by the ASO service), as long as one didn't set Data.publication = False or General.transferOutput = False in the CRAB configuration file (which we didn't). It was also mentioned that it may take some time for ASO to complete the publication. Fortunately, as show by the last crab status output, all the files in our output dataset have already been published.

One can look at the details of the published dataset using the DAS web interface. We can query DAS without specifying the famous unknown pset-hash, since anyway we do not have many possible outcomes to the query. Or we can check what is the full name of the published dataset by searching in one of the postjob log files linked in, for example, the GlideMon monitoring URL for the task. Lets do that. If we open the postjob log for, let say, job number 1, and search for the (last outcome of the) string outdatasetname, we find the following:

2014-06-12 12:31:51,623:DEBUG:PostJob Uploading output file to /crabserver/prod/filemetadata: 
[('outlfn', '/store/user/atanasi/MinBias/CRAB3_tutorial_MC_generation_test2/140612_093900/0000/minbias_1.root'), 
('outdatasetname', u'/MinBias/atanasi-CRAB3_tutorial_MC_generation_test2-68f1597d7759bb49c2434d0a49a7eac6/USER'),
...

So, our publication dataset name is /MinBias/atanasi-CRAB3_tutorial_MC_generation_test2-68f1597d7759bb49c2434d0a49a7eac6/USER (it contains of course my username, and your dataset will contain yours). Now we query DAS with our dataset name (remember to search in DBS instance phys03) and look, for example, what are the files contained in the dataset.

Show Hide DAS query for files in our dataset.

Hopefully this was instructive enough smile

Killing a CRAB task

Jobs in a task can be killed using the crab kill command:

crab kill crab_<request-name> [-i <comma-separated-list-of-jobs-and/or-job-ranges>]

If a list of jobs is omitted, then all jobs in the task are killed. This command stops all running jobs, removes idle jobs from the queue, cancels all output file transfers (ongoing transfers will still complete) and cancels the publication (already transfered outputs will still be published). Here is an example of a task that we submitted and killed. We used the crabConfig_tutorial_MC_generation.py file, changing the request name to:

config.General.requestName = 'tutorial_MC_generation_kill'

We submitted the task:

crab submit -c crabConfig_tutorial_MC_generation.py

Sending the request to the server
Your task has been delivered to the CRAB3 server.
Please use 'crab status' to check how the submission process proceed
Log file is /afs/cern.ch/work/a/atanasi/CRAB3-tutorial/CMSSW_7_0_5/src/crab_projects/crab_tutorial_MC_generation_kill/crab.log

We checked the task status after a few minutes:

crab status -t crab_projects/crab_tutorial_MC_generation_kill

Task name:               140512_141737_crab3test-3:atanasi_crab_tutorial_MC_generation_kill
Task status:             SUBMITTED
Glidemon monitoring URL:          http://glidemon.web.cern.ch/glidemon/jobs.php?taskname=140512_141737_crab3test-3%3Aatanasi_crab_tutorial_MC_generation_kill
Dashboard monitoring URL:          .....
Details:                 running       100.0% (10/10)
Log file is /afs/cern.ch/work/a/atanasi/CRAB3-tutorial/CMSSW_7_0_5/src/crab_projects/crab_tutorial_MC_generation_kill/crab.log

And we killed the task:

crab kill -t crab_projects/crab_tutorial_MC_generation_kill

Kill request successfully sent
Log file is /afs/cern.ch/work/a/atanasi/CRAB3-tutorial/CMSSW_7_0_5/src/crab_projects/crab_tutorial_MC_generation_kill/crab.log

After some more minutes we checked the task status again:

crab status -t crab_projects/crab_tutorial_MC_generation_kill

And we got:

Task name:               140512_141737_crab3test-3:atanasi_crab_tutorial_MC-generation_kill
Task status:             KILLED
Glidemon monitoring URL:          http://glidemon.web.cern.ch/glidemon/jobs.php?taskname=140512_141737_crab3test-3%3Aatanasi_crab_tutorial_MC_generation_kill
Dashboard monitoring URL:          .....
Details:                 running       100.0% (10/10)
Log file is /afs/cern.ch/work/a/atanasi/CRAB3-tutorial/CMSSW_5_3_17/src/crab_projects/crab_tutorial_MC_generation_kill/crab.log

The task status became KILLED. The jobs state is still in running; this should be fixed.

Below are GlideMon and Dashboard screenshots after the kill.

Show Hide GlideMon for task 140512_141737_crab3test-3:atanasi_crab_tutorial_MC_generation_kill after killing the task.

Show Hide Dashboard for task 140512_141737_crab3test-3:atanasi_crab_tutorial_MC_generation_kill after killing the task.

In GlideMon the task status became FAILED, while in Dashboard there is no concept of task status, only job states. Jobs in GlideMon are shown as Removed and in Dashboard as failed (soon we will change this to show them as canceled).

By the way, sending another crab kill for this task will return an error, because the task is already in KILLED state:

crab kill -t crab_projects/crab_tutorial_MC_generation_kill

Error contacting the server.
Server answered with: Execution error
Reason is: You cannot kill a task if it is in the KILLED state
Log file is /afs/cern.ch/work/a/atanasi/CRAB3-tutorial/CMSSW_7_0_5/src/crab_projects/crab_tutorial_MC_generation_kill/crab.log

Note about Dashboard

In the Dashboard screenshot, we can see one job still shown as in running state, with the "Finished" column showing the default value of 1970-01-01T00:00:00 and the "Wall Time" column showing 00:00:00. This is most probably an example of a case in which the packet with the information of finished time and finished job state (in this case "finished" should be interpreted as "killed") sent to the Dashboard was lost. Since the packet is not sent again, Dashboard will keep showing this job as in running state for some time. Checking after 48 hours, we see the job has switched to unknown state after 24 hours.

Show Hide Dashboard for task 140512_141737_crab3test-3:atanasi_crab_tutorial_MC-generation_kill after more than 24 hours of killing the task.

Anyhow, the important message here to give to the user is the following: information in Dashboard is updated by sending UPD packets to it by the services; a UPD packet can sometimes be lost, in which case the information shown in Dashboard will not be updated; a later UPD packet might fix the situation; the user should not panic, but cross check with information shown in GlideMon or the crab status command, and in principle trust those over the Dashboard.

Appendix A

Interference between CRAB commands

Every time a CRAB command that refers to a particular task is executed, the CRAB project directory name (with the full path to it) to which the command refers to is saved in a file named .crab3, located by default in the user's home directory. Caching the project directory name allows the user to not have to explicitly specify it repeatedly in consecutive CRAB commands; if the user doesn't specify a CRAB project directory name, then the cached one is used (this is true for all commands, except for crab kill). This is a nice feature to save some typing, but should be used with care. For example, if CRAB commands are being executed by a script in this short form, and while the script is running the user executes another CRAB command for a different task, then this other project directory name will be cached, with an obvious effect in the script. To avoid this kind of problems, we prefer to teach to the user to always specify the project directory name in the CRAB commands. Otherwise, the user can set the location for the .crab3 file by means of the environment variable CRAB3_CACHE_FILE:

export CRAB3_CACHE_FILE=<full-path-to-the-directory-where-to-save-the-.crab3-file>

Since environment variables are specific to the shell session, this feature allows the user to have two different shell sessions, with different locations of the .crab3 files (by means of setting the corresponding CRAB3_CACHE_FILE variables to different directories), and execute in each shell CRAB commands (in the short form) referring to two different tasks without mixing the .crab3 files.

Note: The CRAB3_CACHE_FILE environment variable can only be used to set the location of the .crab3 cache file; the name of the file can not be changed.

Appendix B

Doing lumi-mask arithmetics

There is a tool written in python called LumiList.py (available in the WMCore library; is the same code as cmssw/FWCore/PythonUtilities/python/LumiList.py) that can be used to do lumi-mask arithmetics. The arithmetics can even be done inside the CRAB configuration file (that's the advantage of having the configuration file written in python). Below are some examples.

Example 1: A run range selection can be achieved by subtracting from the original lumi-mask file the complement of the run range of interest. (We will soon add a method selectRuns() in LumiList.py, which would make this example simpler.)

from WMCore.DataStructs.LumiList import LumiList

originalLumiList = LumiList(filename='my_original_lumi_mask.json')
subtractLumiList = LumiList(filename='my_original_lumi_mask.json')
subtractLumiList.removeRuns([x for x in range(193093,193999+1)])
newLumiList = originalLumiList - subtractLumiList
newLumiList.writeJSON('my_lumi_mask.json')

config.Data.lumiMask = 'my_lumi_mask.json'

Example 2: Use a new lumi-mask file that is the intersection of two other lumi-mask files.

from WMCore.DataStructs.LumiList import LumiList

originalLumiList1 = LumiList(filename='my_original_lumi_mask_1.json')
originalLumiList2 = LumiList(filename='my_original_lumi_mask_2.json')
newLumiList = originalLumiList1 & originalLumiList2
newLumiList.writeJSON('my_lumi_mask.json')

config.Data.lumiMask = 'my_lumi_mask.json'

Example 3: Use a new lumi-mask file that is the union of two other lumi-mask files.

from WMCore.DataStructs.LumiList import LumiList

originalLumiList1 = LumiList(filename='my_original_lumi_mask_1.json')
originalLumiList2 = LumiList(filename='my_original_lumi_mask_2.json')
newLumiList = originalLumiList1 | originalLumiList2
newLumiList.writeJSON('my_lumi_mask.json')

config.Data.lumiMask = 'my_lumi_mask.json'

Note: In the future, we may allow the Data.lumiMask parameter to accept a LumiList object. This way we will improve versatility and there will be no need to write the new JSON file.

Appendix C

User quota in the CRAB scheduler machines

This section needs to be revisited

Each user has a home directory with 100GB of disk space in each of the scheduler machines (schedd for short) assigned to CRAB3 for submitting jobs to GlideInWMS. In this space, all the log files for the user's tasks are saved (except for the cmsRun log files, which are saved in the storage site). As a guidance, a task with 100 jobs uses on average 50MB in log files (this number depends a lot on the number of resubmissions, since each resubmission produces its log files). If a user reaches his/her quota, he/she will not be able to submit more jobs via that schedd (he/she may still be able to submit via other schedd, but since the user can not choose the schedd to which to submit -the choice is done by the CRAB server-, he/she would have to keep trying the submission until the task goes to a schedd with non-exahusted quota). To avoid that, log files are removed automatically after 15 days of their last modification. If a user reaches 50% of its quota in a given schedd, an automatic e-mail similar to the one shown below is sent to her/him.

Show Hide e-mail template.
Subject: WARNING: Reaching your quota

Dear analysis user <username>,

You are using <X>% of your disk quota on the server <schedd-name>. The moment you reach the disk quota of <Y>GB, you will be unable to
run jobs and will experience problems recovering outputs. In order to avoid that, you have to clean up your directory at the server. 
Here are the instructions to do so:
 https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideCrabFaq#How_to_clean_up_your_directory_i
Here it is a more detailed description of the issue:
 https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideCrabFaq#Disk_space_for_output_files
If you have any questions, please contact hn-cms-computing-tools(AT)cern.ch
 Regards,
CRAB support

This e-mail has a link (https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideCrabFaq#How_to_clean_up_your_directory_i) to the instructions on how to clean up space in the user's home directory in a schedd. We plan to implement a CRAB command (crab purge) to do the clean-up.

Appendix D

Tips and tricks

crab submit -w

Show Hide section content.
The crab submit command has an option -w/--wait, which forces CRAB to recursively check the status of the task after submission to the server and until the task is either successfully submitted by the server to the grid submission infrastructure (the task status is either SUBMITTED or UNKNOWN), the submission to the grid submission infrastructure fails (the task status is FAILED), or for a maximum of 15 minutes. For example, if we would have added the -w option to the crab submit command in our example Running CMSSW analysis with CRAB on MC - Task submission:

crab submit -c crabConfig_tutorial_MC_analysis.py -w

the screen output would have looked something like this (assuming the submission was successful):

Sending the request to the server
Your task has been delivered to the CRAB3 server.
Waiting for task to be processed
Checking task status
Task status:NEW
Please wait...
Task status:QUEUED
Please wait...
Task status:UNKNOWN
Your task has been processed and your jobs have been submitted successfully
Log file is /afs/cern.ch/work/a/atanasi/CRAB3-tutorial/CMSSW_7_0_5/src/crab_projects/crab_tutorial_MC_analysis_test1/crab.log

In each Please wait... occurrence, the CRAB client waits 30 seconds before doing another crab status. The -w/--wait option is then useful to free the user from having to keep executing crab status -t <CRAB-project-directory> until he/she sees that the task has been either successfully submitted or the submission failed.

Dynamically changing the request name in the CRAB configuration file

Show Hide section content.
The fact the the CRAB configuration file is written in python gives to the user lot of flexibility when defining the CRAB configuration parameter values dynamically. Below we give an example where we added some lines of code to the CRAB configuration file in order to define the parameter General.requestName as tasknumber<index>, where the index is a non-negative integer number that follows a sequence and is helping to organize a bunch of tasks that have otherwise the same name. The code starts by setting the index variable to 0; it then starts a loop where it checks if the CRAB project directory corresponding to the current index already exists, and if it does increments the index by 1 and if it doesn't stops the loop.

from WMCore.Configuration import Configuration
config = Configuration()

config.section_("General")
config.General.workArea = 'crab_projects'
## --- Auxiliary code starts here --- ##
## We want to define config.General.requestName = 'tasknumber<index>'
import os
index = 0
base_request_name = 'tasknumber'
while os.path.isdir("%/crab_%s%s" % (config.General.workArea, base_request_name, index)):
    index += 1
## --- Auxiliary code ends here --- ##
config.General.requestName = base_request_name + str(index)
...
...

-- AndresTanasijczuk - 03 Jul 2014

Edit | Attach | Watch | Print version | History: r7 < r6 < r5 < r4 < r3 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r7 - 2014-10-17 - AndresTanasijczuk
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback