This tutorial makes use of acronyms, some of which might be trivial and some of which the reader may not be familiar with. So here is a list of all acronyms used and with their definition.
In this tutorial page, the following text background color convention is used:
GREY: For commands.
GREEN: For the output example of the executed commands (nearly what the user should see in his/her terminal).
PINK: For CMSSW parameter-set configuration files.
YELLOW: For any other type of file.
Syntax conventions
In this tutorial page, the following syntax conventions are used:
Whenever we enclose a text within <>, one should replace the text by its description and remove those signs. For example, should be replaced by your username without the < and > signs.
In a CRAB command, text enclosed within [] refers to optional specifications in a command.
Text presided by a # sign in a configuration file represents a comment and it doesn't affect execution.
Prerequisites to run the tutorial
The following prerequisites are necessary to run the examples in this tutorial:
Make sure your certificate is correctly mapped to your primary CERN account. The twiki page Username for CRAB may be helpful. For checking CERN username extraction from your authentication credential , CRAB provides the command crab checkusername. To be able to execute the command, one has to first setup the environment and have a valid proxy (see sections Setup the environment and Get a CMS VO proxy).
To have write permission into a T2/T3 user/group storage space. By default CRAB will try to store the output (and log) files in /store/user// in the site specified by the user (the user can change the destination directory using the Data.outLFNDirBase parameter in the CRAB configuration file). It is the user responsibility to make sure he/she has write permission to the specified site. If your university hosts a Tier 2 or Tier 3, you should have write permission in its storage space; if you don't, ask the local site support team for the permission. If your university does not host a Tier 2 or Tier 3, ask your supervisor/conveners/colleagues where you might have permissions to write or who/where you can ask for permissions. CRAB has a command, namely crab checkwrite, that may help users to check if they have write permission into a given site. To be able to execute the command one has to first setup the environment (see section Setup the environment).
To have access to LXPLUS machines at CERN with SLC7 (the lxplus login alias points to SLC7 machines, but one can explicitly use the lxplus6 alias) or to an SL6 UI.
About softwares versions, datasets and analysis code
For this tutorial we will use:
CMS software version 10.6.18 (i.e. CMSSW_10_6_18), which was built with slc7_amd64_gcc700 architecture. The following page contains a list of all available CMSSW production releases for different scram architectures: https://cmssdt.cern.ch/SDT/cgi-bin/ReleasesXML.
The central installation of CRAB3 available in CMSSW
Log in
For LXPLUS users:
ssh -Y <username>@lxplus.cern.ch
Using the lxplus alias directs you to an SLC7 machine. For FNAL LPC users:
ssh -Y <username>@cmslpc-sl7.fnal.gov
Using the cmslpc-sl7 alias directs you to an SLC7 machine.
Shell
The shell commands in this tutorial correspond to the Bourne Shell (bash). If you use a tch shell (tcsh):
Replace file extensions .sh by .csh.
Replace export <variable-name>=<variable-value> by setenv <variable-name> <variable-value>.
You can check what is the shell you are using by executing:
echo $0
which in my case it shows
bash
If yours show that you are using tcsh and you would like to work with bash, then do:
bash
export SHELL=/bin/bash
Setup the environment
In order to have the correct environment setup, the order in which one should source the environment files has to always be the following:
Grid environment (for every new shell) (only if your site doesn't load the environment for you).
CMSSW installation (only once).
CMSSW environment (for every new shell).
Grid environment
In order to submit jobs to the Grid, one must have access to a Grid UI, which will allow access to WLCG-affiliated resources in a fully transparent way. Some sites provide the grid environment to all shells by default. If the following command returns without an error, then you don't need to source a Grid UI manually:
which grid-proxy-info
If otherwise you receive a message similar to the following:
/usr/bin/which: no grid-proxy-info in (/bin:/sbin:/usr/bin:/usr/sbin)
then you will need to source either the LCG Grid UI or the OSG Grid UI (most sites only have one or the other installed).
LXPLUS users can get the LCG Grid UI by sourcing the following (SLC7 has already the UI by default):
Install CMSSW in a directory of your choice (we will use the home area). Before installing CMSSW, one has to check whether the scram architecture is the one needed (in our case slc7_amd64_gcc700), and if not, change it accordingly. The scram architecture is specified in the environment variable SCRAM_ARCH. Thus, one has to check this variable:
echo $SCRAM_ARCH
which on LXPLUS6 or an SLC7 LPC machine probably shows:
slc7_amd64_gcc700
If instead you see a blank line or the following:
SCRAM_ARCH: Undefined variable.
you need to setup the defaults with:
source /cvmfs/cms.cern.ch/cmsset_default.sh
Lets set SCRAM_ARCH to our desired scram architecture:
export SCRAM_ARCH=slc7_amd64_gcc700
Only after setting the appropriate scram architecture, install CMSSW:
cd ~
mkdir CRAB3-tutorial
cd CRAB3-tutorial
cmsrel CMSSW_10_6_18
CMSSW environment
Setup the CMSSW environment:
cd ~/CRAB3-tutorial/CMSSW_10_6_18/src/
cmsenv
The cmsenv command will automatically set the scram architecture to be the one corresponding to the installed CMSSW release.
CRAB environment
CRAB is bundled with CMSSW, so once you have loaded the CMSSW environment with cmsenv, you can verify the CRAB installation with:
which crab
/cvmfs/cms.cern.ch/common/crab
or
crab --version
CRAB client v3.3.2005
To know more
CMSCrabClient twiki page has more about CRAB Client: setup, available variants, API use, debug and development contribution
Get a CMS VO proxy
CRAB makes use of LCG resources on behalf of the user. And since of course the access to LCG resources is restricted to authorized entities, the user has to prove that he/she is authorized. The proof is relatively easy; it just consists in showing that he/she is a member of an LCG trusted organization, in our case VO CMS. And this is achieved by presenting a proxy issued by VO CMS. (In general, a proxy is a certification issued by a trusted organization that proves that the requester is known by this organization.) CRAB will then present the user's proxy for all operations that require identification.
Proxies are not issued at registration time and for the whole membership period. Instead, the user has to explicitly request a proxy and it will be valid for a limited time (12 hours by default). When requesting a proxy, the user has to present an identification; for VO CMS, the identification is the user's Grid certificate (and of course the Grid certificate has to be the same one as originally presented when registering to VO CMS). The command to request a proxy to VO CMS is voms-proxy-init --voms cms. This command will look for the user's Grid certificate in the .globus subdirectory of the user's home directory. If the Grid certificate is not in this standard location, the user can specify the location via the --cert and --key options. The user can also request a longer validity using the --valid option. For example, to request a proxy valid for seven days, execute:
voms-proxy-init --voms cms --valid 168:00
Enter GRID pass phrase for this identity:
Contacting voms2.cern.ch:15002 [/DC=ch/DC=cern/OU=computers/CN=voms2.cern.ch] "cms"...
Remote VOMS server contacted succesfully.
Created proxy in /tmp/x509up_u<user-id>.
Your proxy is valid until <some date-time 168 hours in the future>
The proxy is saved in the /tmp/ directory of the current machine, in a file named x509up_u<user-id> (where user-id can be obtained by executing the command id -u). Proxies are not specific to a login session; thus using another machine requires to create another proxy or copy it over.
To get more information about the proxy, execute:
voms-proxy-info --all
subject : /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=atanasi/CN=710186/CN=Andres Jorge Tanasijczuk/CN=proxy
issuer : /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=atanasi/CN=710186/CN=Andres Jorge Tanasijczuk
identity : /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=atanasi/CN=710186/CN=Andres Jorge Tanasijczuk
type : full legacy globus proxy
strength : 1024
path : /tmp/x509up_u57506
timeleft : 167:58:35
key usage : Digital Signature, Key Encipherment
=== VO cms extension information ===
VO : cms
subject : /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=atanasi/CN=710186/CN=Andres Jorge Tanasijczuk
issuer : /DC=ch/DC=cern/OU=computers/CN=voms2.cern.ch
attribute : /cms/Role=NULL/Capability=NULL
attribute : /cms/uscms/Role=NULL/Capability=NULL
timeleft : 167:58:35
uri : voms2.cern.ch:15002
For more details about the voms-proxy-* commands used above, add the --help option to get a help menu.
CMSSW parameter-set configuration file
It is out of the scope of this tutorial to explain how to build any analysis using CMSSW. The interested reader should refer to the corresponding chapters in the CMS Offline WorkBook. For us it is enough to have some simple predefined examples of CMSSW parameter-set configuration files. We are only interested in distinguishing the following two cases: 1) using CMSSW code to make an analysis (whatever it is) on an existing dataset, and 2) using CMSSW code to generate MC events. Along the tutorial, we will call the first case as "to do an analysis" and the second case as "to do MC generation". We provide below corresponding examples of a CMSSW parameter-set configuration file. The expected default name for the CMSSW parameter-set configuration file is pset.py, but of course, one can give it any name (respecting always the filename extension .py and not adding other dots in the filename), as long as one specifies the name in the CRAB configuration file.
CMSSW configuration file examples
Since the question of what kind of analysis to do on the input dataset is not important for this tutorial, we will do something very simple: slim an already existing dataset. The input dataset can be an official CMS dataset (either MC or Data) or a dataset produced by a user. In this tutorial, we will show how to do both things. We will also show how to run CMSSW with CRAB to generate MC events. We provide below the CMSSW parameter-set configuration files for both of these cases.
1) CMSSW configuration file to process an existing dataset
This section shows the CMSSW parameter-set configuration file that we will use in this tutorial when running over an existing dataset (either MC or Data, either CMS official or user-produced). We call it pset_tutorial_analysis.py.
This analysis code will produce an EDM output file called output.root containing only the recoTracks_*_*_* branches for a maximum of 10 events in the input dataset.
Note: The input dataset is not yet specified in this file. When running the analysis with CRAB, the input dataset is specified in the CRAB configuration file.
Note: The maxEvents parameter is there to allow quick interactive testing; it is removed by CRAB in the submitted jobs. A restriction on the number of events to analyze can be set via the CRAB configuration parameter Data.totalUnits (strictly speaking the limit is not on number of events, but on number of files or luminosity sections).
2) CMSSW configuration file to generate MC events
In this section we provide an example of a CMSSW parameter-set configuration file to generate minimum bias events with the Pythia MC generator. We call it pset_tutorial_MC_generation.py. Using CRAB to generate MC events requires some special settings in the CRAB configuration file, as we will show in section Running CRAB to generate MC events.
# Auto generated configuration file
# using:
# Revision: 1.19
# Source: /local/reps/CMSSW/CMSSW/Configuration/Applications/python/ConfigBuilder.py,v
# with command line options: MinBias_8TeV_cfi --conditions auto:startup -s GEN,SIM --datatier GEN-SIM -n 10
# --relval 9000,300 --eventcontent RAWSIM --io MinBias.io --python MinBias.py --no_exec --fileout minbias.root
import FWCore.ParameterSet.Config as cms
process = cms.Process('SIM')
# Import of standard configurations
process.load('Configuration.StandardSequences.Services_cff')
process.load('SimGeneral.HepPDTESSource.pythiapdt_cfi')
process.load('FWCore.MessageService.MessageLogger_cfi')
process.load('Configuration.EventContent.EventContent_cff')
process.load('SimGeneral.MixingModule.mixNoPU_cfi')
process.load('Configuration.StandardSequences.GeometryRecoDB_cff')
process.load('Configuration.Geometry.GeometrySimDB_cff')
process.load('Configuration.StandardSequences.MagneticField_38T_cff')
process.load('Configuration.StandardSequences.Generator_cff')
process.load('IOMC.EventVertexGenerators.VtxSmearedRealistic8TeVCollision_cfi')
process.load('GeneratorInterface.Core.genFilterSummary_cff')
process.load('Configuration.StandardSequences.SimIdeal_cff')
process.load('Configuration.StandardSequences.EndOfProcess_cff')
process.load('Configuration.StandardSequences.FrontierConditions_GlobalTag_cff')
process.maxEvents = cms.untracked.PSet(
input = cms.untracked.int32(10)
)
# Input source
process.source = cms.Source("EmptySource")
process.options = cms.untracked.PSet(
)
# Production Info
process.configurationMetadata = cms.untracked.PSet(
version = cms.untracked.string('$Revision: 1.19 $'),
annotation = cms.untracked.string('MinBias_8TeV_cfi nevts:10'),
name = cms.untracked.string('Applications')
)
# Output definition
process.RAWSIMoutput = cms.OutputModule("PoolOutputModule",
splitLevel = cms.untracked.int32(0),
eventAutoFlushCompressedSize = cms.untracked.int32(5242880),
outputCommands = process.RAWSIMEventContent.outputCommands,
fileName = cms.untracked.string('minbias.root'),
dataset = cms.untracked.PSet(
filterName = cms.untracked.string(''),
dataTier = cms.untracked.string('GEN-SIM')
),
SelectEvents = cms.untracked.PSet(
SelectEvents = cms.vstring('generation_step')
)
)
# Additional output definition
# Other statements
process.genstepfilter.triggerConditions=cms.vstring("generation_step")
from Configuration.AlCa.GlobalTag import GlobalTag
process.GlobalTag = GlobalTag(process.GlobalTag, 'auto:startup', '')
process.generator = cms.EDFilter("Pythia6GeneratorFilter",
pythiaPylistVerbosity = cms.untracked.int32(0),
filterEfficiency = cms.untracked.double(1.0),
pythiaHepMCVerbosity = cms.untracked.bool(False),
comEnergy = cms.double(8000.0),
maxEventsToPrint = cms.untracked.int32(0),
PythiaParameters = cms.PSet(
pythiaUESettings = cms.vstring('MSTU(21)=1 ! Check on possible errors during program execution',
'MSTJ(22)=2 ! Decay those unstable particles',
'PARJ(71)=10 . ! for which ctau 10 mm',
'MSTP(33)=0 ! no K factors in hard cross sections',
'MSTP(2)=1 ! which order running alphaS',
'MSTP(51)=10042 ! structure function chosen (external PDF CTEQ6L1)',
'MSTP(52)=2 ! work with LHAPDF',
'PARP(82)=1.921 ! pt cutoff for multiparton interactions',
'PARP(89)=1800. ! sqrts for which PARP82 is set',
'PARP(90)=0.227 ! Multiple interactions: rescaling power',
'MSTP(95)=6 ! CR (color reconnection parameters)',
'PARP(77)=1.016 ! CR',
'PARP(78)=0.538 ! CR',
'PARP(80)=0.1 ! Prob. colored parton from BBR',
'PARP(83)=0.356 ! Multiple interactions: matter distribution parameter',
'PARP(84)=0.651 ! Multiple interactions: matter distribution parameter',
'PARP(62)=1.025 ! ISR cutoff',
'MSTP(91)=1 ! Gaussian primordial kT',
'PARP(93)=10.0 ! primordial kT-max',
'MSTP(81)=21 ! multiple parton interactions 1 is Pythia default',
'MSTP(82)=4 ! Defines the multi-parton model'),
processParameters = cms.vstring('MSEL=0 ! User defined processes',
'MSUB(11)=1 ! Min bias process',
'MSUB(12)=1 ! Min bias process',
'MSUB(13)=1 ! Min bias process',
'MSUB(28)=1 ! Min bias process',
'MSUB(53)=1 ! Min bias process',
'MSUB(68)=1 ! Min bias process',
'MSUB(92)=1 ! Min bias process, single diffractive',
'MSUB(93)=1 ! Min bias process, single diffractive',
'MSUB(94)=1 ! Min bias process, double diffractive',
'MSUB(95)=1 ! Min bias process'),
parameterSets = cms.vstring('pythiaUESettings',
'processParameters')
)
)
# Path and EndPath definitions
process.generation_step = cms.Path(process.pgen)
process.simulation_step = cms.Path(process.psim)
process.genfiltersummary_step = cms.EndPath(process.genFilterSummary)
process.endjob_step = cms.EndPath(process.endOfProcess)
process.RAWSIMoutput_step = cms.EndPath(process.RAWSIMoutput)
# Schedule definition
process.schedule = cms.Schedule(
process.generation_step,
process.genfiltersummary_step,
process.simulation_step,
process.endjob_step,
process.RAWSIMoutput_step
)
# Filter all path with the production filter sequence
for path in process.paths:
getattr(process,path)._seq = process.generator * getattr(process,path)._seq
This MC generation code will produce an EDM output file called minbias.root with the content of a GEN-SIM data tier for 10 generated events.
Note: The maxEvents parameter is there to allow quick interactive testing; it is removed by CRAB in the submitted jobs and its functionality is replaced by a corresponding CRAB parameter called Data.totalUnits.
Input dataset
In order to run an analysis over a given dataset, one has to find out the corresponding dataset name and put it in the CRAB configuration file.
In general, either someone will tell you which dataset to use or you will use the DAS web interface to find available datasets. To learn how to use DAS, the reader can refer to the CMS Offline WorkBook - Chapter 5.4 and/or read the more complete documentation linked from the DAS web interface. For this tutorial, we will use the datasets pointed out above. Below are screenshots of corresponding DAS query outputs for these datasets, where one can see that:
The /WJetsToLNu_1J_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL17RECO-106X_mc2017_realistic_v6-v1/AODSIM MC dataset has 1 block, 177 files and 300K events.
The /DoubleMuon/Run2016C-21Feb2020_UL2016_HIPM-v1/AOD dataset has 14 blocks, 4294 files, 60285 luminosity sections and ~59.5M events.
ShowHide DAS query for dataset /WJetsToLNu_1J_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL17RECO-106X_mc2017_realistic_v6-v1/AODSIM.
ShowHide DAS query for dataset /DoubleMuon/Run2016C-21Feb2020_UL2016_HIPM-v1/AOD.
-->
Note: Datasets availability at sites changes with time. If you are trying to follow this tutorial after the date it was given, please check that the datasets are still available.
Note: The number of events shown in DAS when doing a simple query like shown in the screenshots above, includes events in INVALID files. On the other hand, CRAB will only analyze VALID files. The list of VALID files in the dataset is shown when clicking in the link "Files" (which is the same as doing the query file dataset=<dataset-name>). To obtain the number of events in VALID files, one can do the following query: file dataset=<dataset-name> | sum(file.nevents).
Running CMSSW code locally
Before submitting jobs to the Grid, it is a good practice to run the CMSSW code locally over a few events to discard problems not related with CRAB.
To run an analysis CMSSW code, one needs to specify an input dataset file directly in the CMSSW parameter-set configuration file (specifying as well how to access/open the file). One could either copy one file of the remote input dataset to a local machine, or, more conveniently, open the file remotely. For both things the recommended tool is the Xrootd service (please refer to the CMS Offline WorkBook - Chapter 5.13 to learn the basics about how to use Xrootd). We will choose to open a file remotely. In any case, one first has to find out the LFN of such a file. We used DAS to find the files in the dataset. The screenshot below shows the DAS web interface with the query we did for the MC dataset we are interested in and the beginning of the result we got from the query with one of the many files contained in the dataset.
ShowHide DAS query for files in dataset /WJetsToLNu_1J_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL17RECO-106X_mc2017_realistic_v6-v1/AODSIM.
Now that we know the LFN for one file in the dataset, we proceed as follows. In the CMSSW parameter-set configuration file pset_tutorial_analysis.py, replace
Notice that we added the string root://cms-xrd-global.cern.ch// before the LFN. The first part, root:, specifies that the file should be opened with ROOT, // is a separator and the string cms-xrd-global.cern.ch specifies to use the Xrootd service with a particular "redirector".
Note: When running CMSSW code locally, we suggest to do it in a separate (fresh) shell, where the user sets up the Grid and CMSSW environments, but skips the CRAB setup. This is to avoid the CRAB environment to interfere with the CMSSW environment.
Now we run the analysis locally:
04-May-2015 16:42:20 CEST Initiating request to open file root://cms-xrd-global.cern.ch///store/mc/HC/GenericTTbar/GEN-SIM-RECO/CMSSW_5_3_1_START53_V5-v1/0010/00CE4E7C-DAAD-E111-BA36-0025B32034EA.root
%MSG-w XrdAdaptor: file_open 04-May-2015 16:42:31 CEST pre-events
Data is served from ihep.ac.cn instead of original site CERN-PROD
%MSG
04-May-2015 16:42:49 CEST Successfully opened file root://cms-xrd-global.cern.ch///store/mc/HC/GenericTTbar/GEN-SIM-RECO/CMSSW_5_3_1_START53_V5-v1/0010/00CE4E7C-DAAD-E111-BA36-0025B32034EA.root
[2015-05-04 16:42:49.950141 +0200][Error ][XRootD ] [cmsdbs.ihep.ac.cn:1094] Handling error while processing kXR_open (file: /store/mc/HC/GenericTTbar/GEN-SIM-RECO/CMSSW_5_3_1_START53_V5-v1/0010/00CE4E7C-DAAD-E111-BA36-0025B32034EA.root?tried=seadmin.ihep.ac.cn, mode: 0660, flags: kXR_open_read kXR_async kXR_retstat ): [ERROR] Error response.
Begin processing the 1st record. Run 1, Event 74951, LumiSection 668165 at 04-May-2015 16:43:12.731 CEST
Begin processing the 2nd record. Run 1, Event 74952, LumiSection 668165 at 04-May-2015 16:43:12.745 CEST
Begin processing the 3rd record. Run 1, Event 74953, LumiSection 668165 at 04-May-2015 16:43:12.752 CEST
Begin processing the 4th record. Run 1, Event 74954, LumiSection 668165 at 04-May-2015 16:43:12.762 CEST
Begin processing the 5th record. Run 1, Event 74955, LumiSection 668165 at 04-May-2015 16:43:12.770 CEST
Begin processing the 6th record. Run 1, Event 74956, LumiSection 668165 at 04-May-2015 16:43:12.777 CEST
Begin processing the 7th record. Run 1, Event 74957, LumiSection 668165 at 04-May-2015 16:43:12.784 CEST
Begin processing the 8th record. Run 1, Event 74958, LumiSection 668165 at 04-May-2015 16:43:12.793 CEST
Begin processing the 9th record. Run 1, Event 74959, LumiSection 668165 at 04-May-2015 16:43:12.802 CEST
Begin processing the 10th record. Run 1, Event 74960, LumiSection 668165 at 04-May-2015 16:43:12.808 CEST
04-May-2015 16:43:12 CEST Closed file root://cms-xrd-global.cern.ch///store/mc/HC/GenericTTbar/GEN-SIM-RECO/CMSSW_5_3_1_START53_V5-v1/0010/00CE4E7C-DAAD-E111-BA36-0025B32034EA.root
TrigReport ---------- Event Summary ------------
TrigReport Events total = 10 passed = 10 failed = 0
TrigReport ---------- Path Summary ------------
TrigReport Trig Bit# Executed Passed Failed Error Name
TrigReport -------End-Path Summary ------------
TrigReport Trig Bit# Executed Passed Failed Error Name
TrigReport 0 0 10 10 0 0 out
TrigReport ------ Modules in End-Path: out ------------
TrigReport Trig Bit# Visited Passed Failed Error Name
TrigReport 0 0 10 10 0 0 output
TrigReport ---------- Module Summary ------------
TrigReport Visited Executed Passed Failed Error Name
TrigReport 10 10 10 0 0 output
TimeReport ---------- Event Summary ---[sec]----
TimeReport event loop CPU/event = 0.098742
TimeReport event loop Real/event = 1.006391
TimeReport sum Streams Real/event = 0.825495
TimeReport efficiency CPU/Real/thread = 0.098115
TimeReport ---------- Path Summary ---[Real sec]----
TimeReport per event per exec Name
TimeReport per event per exec Name
TimeReport -------End-Path Summary ---[Real sec]----
TimeReport per event per exec Name
TimeReport 0.008074 0.008074 out
TimeReport per event per exec Name
TimeReport ------ Modules in End-Path: out ---[Real sec]----
TimeReport per event per visit Name
TimeReport 0.003582 0.003582 output
TimeReport per event per visit Name
TimeReport ---------- Module Summary ---[Real sec]----
TimeReport per event per exec per visit Name
TimeReport 0.003582 0.003582 0.003582 output
TimeReport per event per exec per visit Name
T---Report end!
=============================================
MessageLogger Summary
type category sev module subroutine count total
---- -------------------- -- ---------------- ---------------- ----- -----
1 XrdAdaptor -w file_open 1 1
2 fileAction -s file_close 1 1
3 fileAction -s file_open 2 2
type category Examples: run/evt run/evt run/evt
---- -------------------- ---------------- ---------------- ----------------
1 XrdAdaptor pre-events
2 fileAction PostEndRun
3 fileAction pre-events pre-events
Severity # Occurrences Total Occurrences
-------- ------------- -----------------
Warning 1 1
System 3 3
A file output.root should have been created in the running directory. Using a TBrowser to inspect its content, it should be as shown in the screenshot below:
In the case of MC generation there is no need to specify an input dataset and so running the CMSSW code locally requires no additional care, except to remember to set the maxEvents parameter to some small number. In our case we already took care of that and so we just run the pset_tutorial_MC_generation.py code as it is:
MSTU(12) changed from 0 to 12345
1****************** PYINIT: initialization of PYTHIA routines *****************
==== LHAPDF6 USING PYTHIA-TYPE LHAGLUE INTERFACE ====
LHAPDF 6.1.5 loading /cvmfs/cms.cern.ch/slc7_amd64_gcc700/external/lhapdf/6.1.5-eccfad/share/LHAPDF/cteq6l1/cteq6l1_0000.dat
cteq6l1 PDF set, member #0, version 4; LHAPDF ID = 10042
==============================================================================
I I
I PYTHIA will be initialized for a p on p collider I
I at 8000.000 GeV center-of-mass energy I
I I
==============================================================================
******** PYMAXI: summary of differential cross-section maximum search ********
==========================================================
I I I
I ISUB Subprocess name I Maximum value I
I I I
==========================================================
I I I
I 92 Single diffractive (XB) I 6.9028D+00 I
I 93 Single diffractive (AX) I 6.9028D+00 I
I 94 Double diffractive I 9.4556D+00 I
I 95 Low-pT scattering I 4.9586D+01 I
I 96 Semihard QCD 2 -> 2 I 8.0683D+03 I
I I I
==========================================================
****** PYMULT: initialization of multiple interactions for MSTP(82) = 4 ******
pT0 = 2.70 GeV gives sigma(parton-parton) = 4.44D+02 mb: accepted
****** PYMIGN: initialization of multiple interactions for MSTP(82) = 4 ******
pT0 = 2.70 GeV gives sigma(parton-parton) = 1.84D+02 mb: accepted
********************** PYINIT: initialization completed **********************
Begin processing the 1st record. Run 1, Event 1, LumiSection 1 at 04-May-2015 16:48:59.474 CEST
Begin processing the 2nd record. Run 1, Event 2, LumiSection 1 at 04-May-2015 16:49:08.317 CEST
Begin processing the 3rd record. Run 1, Event 3, LumiSection 1 at 04-May-2015 16:49:09.541 CEST
Begin processing the 4th record. Run 1, Event 4, LumiSection 1 at 04-May-2015 16:49:30.805 CEST
Begin processing the 5th record. Run 1, Event 5, LumiSection 1 at 04-May-2015 16:49:55.305 CEST
Begin processing the 6th record. Run 1, Event 6, LumiSection 1 at 04-May-2015 16:49:57.733 CEST
Begin processing the 7th record. Run 1, Event 7, LumiSection 1 at 04-May-2015 16:49:58.405 CEST
Begin processing the 8th record. Run 1, Event 8, LumiSection 1 at 04-May-2015 16:50:03.842 CEST
Begin processing the 9th record. Run 1, Event 9, LumiSection 1 at 04-May-2015 16:50:09.901 CEST
Begin processing the 10th record. Run 1, Event 10, LumiSection 1 at 04-May-2015 16:50:26.191 CEST
1********* PYSTAT: Statistics on Number of Events and Cross-sections *********
==============================================================================
I I I I
I Subprocess I Number of points I Sigma I
I I I I
I----------------------------------I----------------------------I (mb) I
I I I I
I N:o Type I Generated Tried I I
I I I I
==============================================================================
I I I I
I 0 All included subprocesses I 10 274 I 6.594D+01 I
I 11 f + f' -> f + f' (QCD) I 2 0 I 1.417D+01 I
I 12 f + fbar -> f' + fbar' I 0 0 I 0.000D+00 I
I 13 f + fbar -> g + g I 0 0 I 0.000D+00 I
I 28 f + g -> f + g I 0 0 I 0.000D+00 I
I 53 g + g -> f + fbar I 1 0 I 7.084D+00 I
I 68 g + g -> g + g I 4 0 I 2.833D+01 I
I 92 Single diffractive (XB) I 2 2 I 6.903D+00 I
I 93 Single diffractive (AX) I 0 0 I 0.000D+00 I
I 94 Double diffractive I 1 1 I 9.456D+00 I
I 95 Low-pT scattering I 0 7 I 0.000D+00 I
I I I I
==============================================================================
********* Total number of errors, excluding junctions = 0 *************
********* Total number of errors, including junctions = 0 *************
********* Total number of warnings = 0 *************
********* Fraction of events that fail fragmentation cuts = 0.00000 *********
1********* PYSTAT: Statistics on Number of Events and Cross-sections *********
==============================================================================
I I I I
I Subprocess I Number of points I Sigma I
I I I I
I----------------------------------I----------------------------I (mb) I
I I I I
I N:o Type I Generated Tried I I
I I I I
==============================================================================
I I I I
I 0 All included subprocesses I 10 274 I 6.594D+01 I
I 11 f + f' -> f + f' (QCD) I 2 0 I 1.417D+01 I
I 12 f + fbar -> f' + fbar' I 0 0 I 0.000D+00 I
I 13 f + fbar -> g + g I 0 0 I 0.000D+00 I
I 28 f + g -> f + g I 0 0 I 0.000D+00 I
I 53 g + g -> f + fbar I 1 0 I 7.084D+00 I
I 68 g + g -> g + g I 4 0 I 2.833D+01 I
I 92 Single diffractive (XB) I 2 2 I 6.903D+00 I
I 93 Single diffractive (AX) I 0 0 I 0.000D+00 I
I 94 Double diffractive I 1 1 I 9.456D+00 I
I 95 Low-pT scattering I 0 7 I 0.000D+00 I
I I I I
==============================================================================
********* Total number of errors, excluding junctions = 0 *************
********* Total number of errors, including junctions = 0 *************
********* Total number of warnings = 0 *************
********* Fraction of events that fail fragmentation cuts = 0.00000 *********
------------------------------------
GenXsecAnalyzer:
------------------------------------
Before Filtrer: total cross section = 6.594e+10 +- 0.000e+00 pb
Filter efficiency (taking into account weights)= (10) / (10) = 1.000e+00 +- 0.000e+00
Filter efficiency (event-level)= (10) / (10) = 1.000e+00 +- 0.000e+00
After filter: final cross section = 6.594e+10 +- 0.000e+00 pb
=============================================
MessageLogger Summary
type category sev module subroutine count total
---- -------------------- -- ---------------- ---------------- ----- -----
1 GenXSecAnalyzer -w GenXSecAnalyzer: 5 5
type category Examples: run/evt run/evt run/evt
---- -------------------- ---------------- ---------------- ----------------
1 GenXSecAnalyzer PostEndRun PostEndRun PostEndRun
Severity # Occurrences Total Occurrences
-------- ------------- -----------------
Warning 5 5
A file minbias.root should have been created in the running directory. Using a TBrowser to inspect its content, it should look like in the screenshot below:
The documentation on how to write a CRAB configuration file, with detailed explanation about the available configuration parameters, is in the CRAB configuration file page.
CRAB configuration file examples
There are three different general use cases of CRAB configuration files that we want to show in this tutorial: 1) running an analysis on MC, 2) running an analysis on Data, and 3) generating MC events. In the following we give an example of a basic configuration file for each of these cases and we will use them later to run the tutorial. But keep in mind that while running the tutorial we may want to change the configuration files a bit.
For simplicity we omit by these example the initial lines which are the same in all cases (see CRAB configuration file )
import CRABClient
from CRABClient.UserUtilities import config
1) CRAB configuration file to run on MC
Here we give an example CRAB configuration file for running the pset_tutorial_analysis.py analysis on the MC dataset we have chosen. We name it crabConfig_tutorial_MC_analysis.py.
config = config()
config.General.requestName = 'tutorial_Aug2021_MC_analysis'
config.General.workArea = 'crab_projects'
config.General.transferOutputs = True
config.JobType.pluginName = 'Analysis'
config.JobType.psetName = 'pset_tutorial_analysis.py'
config.Data.inputDataset = '/WJetsToLNu_1J_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL17RECO-106X_mc2017_realistic_v6-v1/AODSIM'
config.Data.inputDBS = 'global'
config.Data.splitting = 'FileBased'
config.Data.unitsPerJob = 10
config.Data.publication = True
config.Data.outputDatasetTag = 'CRAB3_tutorial_Aug2021_MC_analysis'
config.Site.storageSite = <site where the user has permission to write>
Note: We have left the parameter Site.storageSite unspecified on purpose, because it depends on where one has permissions to write. See Prerequisites to run the tutorial.
2) CRAB configuration file to run on Data
Here we give the same example CRAB configuration file as above, but set up for running on the Data dataset we have chosen. We name it crabConfig_tutorial_Data_analysis.py.
Note: We have left the parameter Site.storageSite unspecified on purpose, because it depends on where one has permissions to write. See Prerequisites to run the tutorial.
3) CRAB configuration file to generate MC events
Finally, here is an example CRAB configuration file to run the pset_tutorial_MC_generation.py MC event generation code. We name it crabConfig_tutorial_MC_generation.py.
config = config()
config.General.requestName = 'tutorial_Aug2021_MC_generation'
config.General.workArea = 'crab_projects'
config.General.transferOutputs = True
config.JobType.pluginName = 'PrivateMC'
config.JobType.psetName = 'pset_tutorial_MC_generation.py'
config.Data.outputPrimaryDataset = 'MinBias'
config.Data.splitting = 'EventBased'
config.Data.unitsPerJob = 10
NJOBS = 10 # This is not a configuration parameter, but an auxiliary variable that we use in the next line.
config.Data.totalUnits = config.Data.unitsPerJob * NJOBS
config.Data.publication = True
config.Data.outputDatasetTag = 'CRAB3_tutorial_Aug2021_MC_generation'
config.Site.storageSite = <site where the user has permission to write>
Note: We have left the parameter Site.storageSite unspecified on purpose, because it depends on where one has permissions to write. See Prerequisites to run the tutorial.
CRAB commands
The documentation of the CRAB commands, including some usage examples, is in the CRAB commands page.
Running CMSSW analysis with CRAB on MC
In this section, we intend to show how to run an analysis on MC data. We use the CRAB configuration file crabConfig_tutorial_MC_analysis.py previously defined in section CRAB configuration file to run on MC and the CMSSW parameter-set configuration file defined in section CMSSW configuration file to slim an existing dataset. We will follow these basic steps:
where the specification of the CRAB configuration file is only necessary if it is different than ./crabConfig.py.
In our case, we run:
crab submit -c crabConfig_tutorial_MC_analysis.py
and should get an output similar to this:
Will use CRAB configuration file crabConfig_tutorial_MC_analysis.py
Sending the request to the server
Success: Your task has been delivered to the CRAB3 server.
Task name: 150506_134232:atanasi_crab_tutorial_Aug2021_MC_analysis
Please use 'crab status' to check how the submission process proceeds.
Log file is /afs/cern.ch/user/a/atanasi/CRAB3-tutorial/CMSSW_10_6_18/src/crab_projects/crab_tutorial_Aug2021_MC_analysis/crab.log
CRAB performs a sanity check before even creating the jobs (e.g. it checks the availability of the selected dataset, the correctness of input parameters, etc.). In case of detecting an error, the submission will not proceed and the user should get a self-explanatory message.
Task status
To check the status of a task, execute the following CRAB command:
crab status --dir/-d <CRAB-project-directory>
In our case, we run:
crab status -d crab_projects/crab_tutorial_Aug2021_MC_analysis
The crab status command will produce an output containing the CRAB project directory, the task name, the status of the task as a whole, the details of how many jobs are in which state (submitted, running, transfering, finished, cooloff, etc.) and the location of the CRAB log (crab.log) file. It will also print the URLs of two web pages that one can use to monitor the jobs. In summary, it should look something like this:
CRAB project directory: /afs/cern.ch/user/a/atanasi/CRAB3-tutorial/CMSSW_10_6_18/src/crab_projects/crab_tutorial_Aug2021_MC_analysis
Task name: 150506_134232:atanasi_crab_tutorial_Aug2021_MC_analysis
Task status: SUBMITTED
Dashboard monitoring URL: http://dashb-cms-job.cern.ch/dashboard/templates/task-analysis/#user=crab&refresh=0&table=Jobs&p=1&records=25&activemenu=2&status=&site=&tid=150506_134232%3Aatanasi_crab_tutorial_Aug2021_MC_analysis
Details: idle 44.4% ( 8/18)
running 55.6% (10/18)
No publication information available yet
Log file is /afs/cern.ch/user/a/atanasi/CRAB3-tutorial/CMSSW_10_6_18/src/crab_projects/crab_tutorial_Aug2021_MC_analysis/crab.log
One can also get a more detailed status report (showing the state of each job, the job number, the site where the job ran, etc.), by adding the --long option to the crab status command. For our task:
crab status -d crab_projects/crab_tutorial_Aug2021_MC_analysis --long
Note: Notice that the task has 18 jobs, which is expected, since the input dataset has 177 files and we have set in the CRAB configuration file the splitting mode as 'FileBased' and with 10 units (in this case units means files) per job.
Job states The job state idle means that the job has been submitted, but is not yet running. On the other hand, the job state cooloff means that the server has not submitted the job yet for the first time or that the job is waiting for resubmission. For a complete list and explanation of task and job states, please refer to Task and Node States in CRAB3-HTCondor.
Note: If a job fails and the failure reason is considered "recoverable", CRAB will automatically resubmit the job. Thus, it may well happen that one first sees that all jobs are either in running or transferring state, and later one sees jobs in idle or cooloff state. These are jobs that have been automatically resubmitted. The number of resubmissions is shown under the column "Retries".
Once all jobs are done, it may happen that some jobs have actually failed:
CRAB allows the user to manually resubmit a task (see Task resubmission), which will actually resubmit only the failed jobs in the task.
Eventually, all jobs will succeed and the crab status output should be something like this:
There are two independent web services that one can use to monitor CRAB jobs:
Dashboard: The jobs states as informed by this service are not necessarily compatible with the report provided by the crab status command. This is because Dashboard doesn't pool for information, but relies on services sending information to it, and this information might be sent only at certain stages or could even sometimes be lost while transmitting over the network.
Note: The crab status output will display the link to the dashboard monitoring page for the task in question, even if the task is still unknown to dashboard. The user will get a corresponding error when trying to access the link; he/she should just wait a bit until the task becomes known to the service.
Task resubmission
CRAB allows the user to resubmit a task, which will actually resubmit only the failed jobs in the task. The resubmission command is as follows:
crab resubmit --dir/-d <CRAB-project-directory>
Using the option --jobids one can specify a selected list of jobs or ranges of jobs (using the format <jobidA>,<jobidB>-<jobidC>,etc):
In the list of jobs, one has to provide the job number as specified by the crab status --long command under the column "Job".
After resubmission, one should check again the status of the task. For a big task, it is expected that one would have to run a few resubmissions by hand until all jobs finish successfully.
Task report
One can obtain a short report about a task, containing the total number of events and files processed and written by completed jobs, plus a summary file of the runs and luminosity sections processed by completed jobs written into the results subdirectory of the CRAB project directory. To get the report, execute the following CRAB command:
If all jobs have completed, it should produce an output like this:
177 files have been processed
300000 events have been read
300000 events have been written
Analyzed luminosity sections written to /afs/cern.ch/user/a/atanasi/CRAB3-tutorial/CMSSW_10_6_18/src/crab_projects/crab_tutorial_Aug2021_MC_analysis/results/lumiSummary.json
Log file is /afs/cern.ch/user/a/atanasi/CRAB3-tutorial/CMSSW_10_6_18/src/crab_projects/crab_tutorial_Aug2021_MC_analysis/crab.log
And the lumiSummary.json file looks something like this:
{"1": [[666666, 672665]]}
This is a sequence of luminosity section ranges processed for each run. (MC datasets typically have only one run, which is given run number 1, and the luminosity sections are contiguous.)
Task log files retrieval
The dashboard monitoring web interface provides links to log files. From these log files, there are essentially two (per job) that are most relevant (from a CRAB point of view): the one known as the "job" log (job_out.<job-number>.<job-retry-count>.txt) and the one known as the "postjob" log (postjob.<job-number>.<job-retry-count>.txt). These two files contain, respectively, log information from the running job itself, including CRAB and cmsRun log, and from the post-processing (essentially the stage-out part). These files are located in the users home directory in the scheduler machine that submitted the jobs to the Grid. To avoid filling up the scheduler machines with information that is not relevant for CRAB developers and support crew, the cmsRun part in the job log is restricted to the first 1000 and last 3000 lines. On the other hand, the full cmsRun stdout and stderr log files are available in the storage if General.transferLogs = True in the CRAB configuration file when the task was submitted. In that case the full cmsRun logs can be retrieved using the following CRAB command:
Setting the destination to /afs/cern.ch/user/a/atanasi/CRAB3-tutorial/CMSSW_10_6_18/src/crab_projects/crab_tutorial_Aug2021_MC_analysis/results
Retrieving 2 files
Will use `gfal-copy` command for file transfers
Placing file 'cmsRun_1.log.tar.gz' in retrieval queue
Retrieving cmsRun_1.log.tar.gz
Placing file 'cmsRun_5.log.tar.gz' in retrieval queue
Please wait
Retrieving cmsRun_5.log.tar.gz
Success: Success in retrieving cmsRun_5.log.tar.gz
Success: Success in retrieving cmsRun_1.log.tar.gz
Success: All files successfully retrieved
Log file is /afs/cern.ch/user/a/atanasi/CRAB3-tutorial/CMSSW_10_6_18/src/crab_projects/crab_tutorial_Aug2021_MC_analysis/crab.log
The log files (assembled in zipped tarballs) are copied into the results subdirectory of the corresponding CRAB project directory. To unzip and extract the log files, one can use the command tar -zxvf. In our case, we do:
cd crab_projects/crab_tutorial_Aug2021_MC_analysis/results
tar -zxvf cmsRun_1.log.tar.gz
The screen output shows that cmsRun_1.log.tar.gz has three files in it: the cmsRun stdout and stderr log files and the framework job report file produced by cmsRun.
Note: To copy the remote files to the local area, CRAB uses the gfal-copy command (or lcg-cp if gfal-copy is not available). For example: env -i gfal-copy -v -T 1800 -t 60 <PFN-of-source-file> <PFN-of-destination-file>
Task output retrieval
In case one wants to retrieve some output ROOT files of a task, one can do so with the following CRAB command:
retrieves the output ROOT file from job number 11 in the task (i.e. it retrieves the file output_11.root). This command produces a screen output like this:
Setting the destination to /afs/cern.ch/user/a/atanasi/CRAB3-tutorial/CMSSW_10_6_18/src/crab_projects/crab_tutorial_Aug2021_MC_analysis/results
Retrieving 1 files
Will use `gfal-copy` command for file transfers
Placing file 'output_11.root' in retrieval queue
Please wait
Retrieving output_11.root
Success: Success in retrieving output_11.root
Success: All files successfully retrieved
Log file is /afs/cern.ch/user/a/atanasi/CRAB3-tutorial/CMSSW_10_6_18/src/crab_projects/crab_tutorial_Aug2021_MC_analysis/crab.log
We can open the file and check for example that it contains only the branches we have specified.
If publication was not disabled in the CRAB configuration file, CRAB will automatically publish the task output dataset in DBS (in the phys03 local instance). The publication timing logic is as follows: the first files available (whatever is the number) are published immediately; then ASO waits to accumulate another 100 files (per user) or until the task is in state COMPLETED (i.e. all jobs finished successfully, and therefore all files are published) or a maximum of 8 hours (in which case only the successfully finished jobs are published). The publication status is shown in the crab status output:
The publication state idle means that the publication request for the corresponding job output has not yet been processed by ASO. Once ASO starts to process the request, the publication state becomes running, and finally it becomes either finished if the publication succeeded or failed if it didn't:
The publication name of our output dataset is /GenericTTbar/atanasi-CRAB3_tutorial_Aug2021_MC_analysis-37773c17ce2994cf16892d5f04945e41/USER (yours will contain your username and maybe another hash).
One can get more details about the published dataset using the DAS web interface. The URL pointing to the dataset in DAS is also given in the crab status output. One can use this direct link or query DAS using the publication dataset name as shown in the screenshot below. Notice that we are searching in DBS instance phys03.
Note: Remember, publication can not be performed if either the General.transferOutputs or the Data.publication parameter was originally disabled in the CRAB configuration file used at submission time.
Running on the published dataset
Lets assume we want to run another analysis (e.g. just another slimming) over the output dataset that we have produced (and published) in our previous example. We will use again the CRAB configuration file crabConfig_tutorial_MC_analysis.py defined in section CRAB configuration file to run on MC, but we have to do the following minimal changes:
choose another request name;
point the input dataset to our previous output dataset;
change the input DBS URL to phys03;
change the publication dataset name (even if we would turn the publication off, the publication dataset name is still used in the LFN path of the output files).
To keep things organized, we first copy the CRAB configuration file with a new name:
Note for users who wish to process data that were staged out to T3_US_FNALLPC: The LPC only allows local jobs to run (not Grid jobs), so you need to allow your jobs to run elsewhere (for example any site in the US) and access the data via Xrootd. In this case you need to use the following settings:
If you want, you can change the pset_tutorial_analysis.py CMSSW parameter-set configuration file to do some further slimming. E.g. you can change this line:
Will use CRAB configuration file crabConfig_tutorial_MCUSER_analysis.py
Sending the request to the server
Success: Your task has been delivered to the CRAB3 server.
Task name: 150506_191539:atanasi_crab_tutorial_Aug2021_MCUSER_analysis
Please use 'crab status' to check how the submission process proceeds.
Log file is /afs/cern.ch/user/a/atanasi/CRAB3-tutorial/CMSSW_10_6_18/src/crab_projects/crab_tutorial_Aug2021_MCUSER_analysis/crab.log
crab status -d crab_projects/crab_tutorial_Aug2021_MCUSER_analysis
CRAB project directory: /afs/cern.ch/user/a/atanasi/CRAB3-tutorial/CMSSW_10_6_18/src/crab_projects/crab_tutorial_Aug2021_MCUSER_analysis
Task name: 150506_191539:atanasi_crab_tutorial_Aug2021_MCUSER_analysis
Task status: SUBMITTED
Dashboard monitoring URL: http://dashb-cms-job.cern.ch/dashboard/templates/task-analysis/#user=crab&refresh=0&table=Jobs&p=1&records=25&activemenu=2&status=&site=&tid=150506_191539%3Aatanasi_crab_tutorial_Aug2021_MCUSER_analysis
Details: running 100.0% (2/2)
No publication information available yet
Log file is /afs/cern.ch/user/a/atanasi/CRAB3-tutorial/CMSSW_10_6_18/src/crab_projects/crab_tutorial_Aug2021_MCUSER_analysis/crab.log
18 files have been read
300000 events have been read
300000 events have been written
Analyzed lumi written to /afs/cern.ch/user/a/atanasi/CRAB3-tutorial/CMSSW_10_6_18/src/crab_projects/crab_tutorial_Aug2021_MCUSER_analysis/results/lumiSummary.json
Log file is /afs/cern.ch/user/a/atanasi/CRAB3-tutorial/CMSSW_10_6_18/src/crab_projects/crab_tutorial_Aug2021_MCUSER_analysis/crab.log
In this section, we do a similar exercise as in Running CMSSW analysis with CRAB on MC, but now running on real data. We make use of the CRAB configuration file crabConfig_tutorial_Data_analysis.py defined in section CRAB configuration file to run on Data and the CMSSW parameter-set configuration file defined in section CMSSW configuration file to slim an existing dataset. When running on real data, it is most likely one will want to use a lumi-mask file to filter only the good quality runs and luminosity sections in the input dataset. We show how to do this with CRAB.
Using a lumi-mask
When a lumi-mask is specified in the CRAB configuration file, CRAB will run the analysis only on the specified subset of runs and luminosity sections.
In this tutorial, we use the lumi-mask file Cert_271036-275783_13TeV_PromptReco_Collisions16_JSON.txt, available at the following page from the CMS Data Quality web service: https://cms-service-dqmdc.web.cern.ch/CAF/certification/Collisions16/13TeV/. This lumi-mask file contains good luminosity sections for all CMS runs (between 275776 and 275782) of the 13 TeV LHC run:
To use a lumi-mask, one has to specify in the CRAB configuration parameter Data.lumiMask either the URL address from where the lumi-mask file can be loaded or download the file locally and specify the file location on disk (using either full or relative path). To download the file, one can use the wget command:
If we would have decided to download the file and use that one, we would have set in the CRAB configuration file the path (absolute or relative) to where the file is located:
A user may eventually be interested in running over specific runs of the dataset, even within a lumi-mask. CRAB provides the configuration parameter Data.runRange for that. We choose a run range 275776-275782, so we put in our CRAB configuration file:
config.Data.runRange = '275776-275782'
Another way would be to directly manipulate the lumi-mask file to create another one with only the runs of interest (see Doing lumi mask arithmetics).
Will use CRAB configuration file crabConfig_tutorial_Data_analysis.py
Sending the request to the server
Success: Your task has been delivered to the CRAB3 server.
Task name: 150506_195732:atanasi_crab_tutorial_Aug2021_Data_analysis
Please use 'crab status' to check how the submission process proceeds.
Log file is /afs/cern.ch/user/a/atanasi/CRAB3-tutorial/CMSSW_10_6_18/src/crab_projects/crab_tutorial_Aug2021_Data_analysis/crab.log
Task status
After submission, we check the task status to see how many jobs are in the task and if nothing has unexpectedly failed.
crab status -d crab_projects/crab_tutorial_Aug2021_Data_analysis
CRAB project directory: /afs/cern.ch/user/a/atanasi/CRAB3-tutorial/CMSSW_10_6_18/src/crab_projects/crab_tutorial_Aug2021_Data_analysis
Task name: 150506_195732:atanasi_crab_tutorial_Aug2021_Data_analysis
Task status: SUBMITTED
Dashboard monitoring URL: http://dashb-cms-job.cern.ch/dashboard/templates/task-analysis/#user=crab&refresh=0&table=Jobs&p=1&records=25&activemenu=2&status=&site=&tid=150506_195732%3Aatanasi_crab_tutorial_Aug2021_Data_analysis
Details: idle 47.1% ( 8/17)
running 52.9% ( 9/17)
No publication information available yet
Log file is /afs/cern.ch/user/a/atanasi/CRAB3-tutorial/CMSSW_10_6_18/src/crab_projects/crab_tutorial_Aug2021_Data_analysis/crab.log
Getting the task report is relevant when running on real data, because of the files we get containing the analyzed and the non-analyzed luminosity sections. Remember that the report includes only successfully done jobs.
83 files have been processed
295042 events have been read
295042 events have been written
Analyzed luminosity sections written to /afs/cern.ch/user/a/atanasi/CRAB3-tutorial/CMSSW_10_6_18/src/crab_projects/crab_tutorial_Aug2021_Data_analysis/results/lumiSummary.json
Warning: Not analyzed luminosity sections written to /afs/cern.ch/user/a/atanasi/CRAB3-tutorial/CMSSW_10_6_18/src/crab_projects/crab_tutorial_Aug2021_Data_analysis/results/missingLumiSummary.json
Log file is /afs/cern.ch/user/a/atanasi/CRAB3-tutorial/CMSSW_10_6_18/src/crab_projects/crab_tutorial_Aug2021_Data_analysis/crab.log
The missingLumiSummary.json file is just the difference between the input lumi-mask file and the lumiSummary.json file. This means that the missingLumiSummary.json file does not have to be necessarily an empty file even if all jobs have completed successfully. And this is simply because the input dataset and the input lumi-mask file may not span the same overall range of runs and luminosity sections. For example, take the extreme (dummy) case in which an input dataset from the 2012 LHC run is analyzed using a lumi-mask file from the 2011 LHC run; this will result in a missingLumiSummary.json file that is identical to the input lumi-mask file, no matter if the jobs succeed or failed. In our example above, we have a non-empty missingLumiSummary.json file, but the runs in there are all runs that are not in the input dataset.
Re-analyzing the missing luminosity sections
If (and only if) some jobs have failed, and one would like to submit a new task (as opposed to resubmit the failed jobs) analyzing only the luminosity sections that were not analyzed by the original task, then one can use the missingLumiSummary.json file as the lumi-mask for the new task. Moreover, keeping the same Data.outputDatasetTag as in the original task, the new outputs will be published in the same dataset of the original task. Thus, one would in principle only change in the CRAB configuration file the request name and the lumi-mask file. For our task, this would mean to change:
One could also change the number of luminosity sections to analyze per job (Data.unitsPerJob); e.g. one could decrease it so that to have shorter jobs.
Once these changes are done in the CRAB configuration file, one would just submit the new task.
Output dataset publication
As we already emphasized, the publication of the output dataset in DBS is done automatically by CRAB. Please refer to the equivalent section in the "Running CMSSW analysis with CRAB on MC" example above to see how to use DAS to look for the details of the published dataset.
Running CRAB to generate MC events
Let us finally briefly mention the case in which the user wants to generate MC events from scratch as opposed to analyze an existing dataset. An example CMSSW parameter-set configuration file is given in section CMSSW configuration file to generate MC events and a corresponding CRAB configuration file is given in section CRAB configuration file to generate MC events. The first important parameter change to notice in the CRAB configuration file is:
config.JobType.pluginName = 'PrivateMC'
This instructs CRAB to for example not do input dataset discovery from DBS.
Then we have a new parameter to specify:
config.Data.outputPrimaryDataset = 'MinBias'
When running MC generation, there is obviously no input dataset involved, so we do not specify Data.inputDataset. But we still need to specify in the parameter Data.outputPrimaryDataset the primary dataset name of the sample we are generating. CRAB uses the primary dataset name as one layer of the directory tree where the output dataset is stored and in the publication dataset name. In principle, one could define this parameter to be anything, but it is better to use the appropriate primary dataset name used by CMS for the type of events being generated (MinBias is the primary dataset name used by CMS for datasets containing minimum bias events).
Finally, the jobs splitting mode when generating MC events only accepts 'EventBased'. Also, one must specify the total number of events one wants to generate:
where we have defined an auxiliary variable NJOBS to specify how may jobs we want to run and used it to automatically define the total number of events to generate. We start by generating 100 events in 10 jobs as a test; later we can generate more.
Will use CRAB configuration file crabConfig_tutorial_MC_generation.py
Sending the request to the server
Success: Your task has been delivered to the CRAB3 server.
Task name: 150507_092955:atanasi_crab_tutorial_Aug2021_MC_generation
Please use 'crab status' to check how the submission process proceeds.
Log file is /afs/cern.ch/user/a/atanasi/CRAB3-tutorial/CMSSW_10_6_18/src/crab_projects/crab_tutorial_Aug2021_MC_generation/crab.log
Task status
After submission, we check the status of the task to see if the jobs are starting to run. And here there is a trick we can use: we can save some typing and just do crab status taking advantage of the fact that CRAB caches the CRAB project directory name for which the last command was executed (see The .crab3 cache file).
crab status
CRAB project directory: /afs/cern.ch/user/a/atanasi/CRAB3-tutorial/CMSSW_10_6_18/src/crab_projects/crab_tutorial_Aug2021_MC_generation
Task name: 150507_092955:atanasi_crab_tutorial_Aug2021_MC_generation
Task status: QUEUED
No jobs created yet!
Log file is /afs/cern.ch/user/a/atanasi/CRAB3-tutorial/CMSSW_10_6_18/src/crab_projects/crab_tutorial_Aug2021_MC_generation/crab.log
The task submission is still queued at some point of the CRAB3 system, implying that the Task Worker has not yet submitted the task to HTCondor. The task should not stay queued for long. So let's check the status again:
CRAB project directory: /afs/cern.ch/user/a/atanasi/CRAB3-tutorial/CMSSW_10_6_18/src/crab_projects/crab_tutorial_Aug2021_MC_generation
Task name: 150507_092955:atanasi_crab_tutorial_Aug2021_MC_generation
Task status: SUBMITTED
Dashboard monitoring URL: http://dashb-cms-job.cern.ch/dashboard/templates/task-analysis/#user=crab&refresh=0&table=Jobs&p=1&records=25&activemenu=2&status=&site=&tid=150507_092955%3Aatanasi_crab_tutorial_Aug2021_MC_generation
Details: idle 100.0% (10/10)
No publication information available yet
Log file is /afs/cern.ch/user/a/atanasi/CRAB3-tutorial/CMSSW_10_6_18/src/crab_projects/crab_tutorial_Aug2021_MC_generation/crab.log
These are very short jobs, so it will not take long to get all the 10 jobs finished. Publication may still take some time.
After all jobs in the task have finished, one can get the final task report. For MC generation, the report only shows how many events were generated. For our task, running:
crab report
produces a screen output like this:
0 files have been processed
0 events have been read
100 events have been written
Log file is /afs/cern.ch/user/a/atanasi/CRAB3-tutorial/CMSSW_10_6_18/src/crab_projects/crab_tutorial_Aug2021_MC_generation/crab.log
Output dataset publication
As already mentioned many times along this page, the publication of the task output dataset in DBS will be performed automatically by CRAB (more specifically, by the ASO service), as long as one didn't set Data.publication = False or General.transferOutputs = False in the CRAB configuration file. It was also mentioned that it may take some time for ASO to complete the publication.
One can look at the details of the published dataset using the DAS web interface (a direct link to DAS with the query to the task output dataset is given in the crab status output). Lets look what are the files contained in the dataset:
One can see that the files are stored in a directory named /store/user/atanasi/MinBias/CRAB3_tutorial_Aug2021_MC_generation/150507_092955/. The MinBias and CRAB3_tutorial_Aug2021_MC_generation layers are what it was specified in the CRAB configuration file in the Data.outputPrimaryDataset and Data.outputDatasetTag parameters. The 150507_092955 in the next layer is a time stamp from when the task has been submitted.
Generating more events in the same output dataset
Suppose we want to extend our output dataset of minimum bias events. One can run another similar task using of course the same CMSSW parameter-set configuration file, and in the CRAB configuration file change only the request name. Of course one can specify a different total number of events to generate or number of events to generate per job. The important thing is that, to have the output files published in the same output dataset, one has to keep the same publication name (and use the same CMSSW parameter-set configuration file). Notice that the output files will be named again minbias_1.root, minbias_2.root, etc, but they will be stored in a different directory, because the task submission time stamp will change.
So we submit a new task using again crabConfig_tutorial_MC_generation.py, but with the following settings:
Will use CRAB configuration file crabConfig_tutorial_MC_generation.py
Sending the request to the server
Success: Your task has been delivered to the CRAB3 server.
Task name: 150507_112633:atanasi_crab_tutorial_Aug2021_MC_generation_2
Please use 'crab status' to see if your jobs have been submitted successfully.
Log file is /afs/cern.ch/user/a/atanasi/CRAB3-tutorial/CMSSW_10_6_18/src/crab_projects/crab_tutorial_Aug2021_MC_generation_2/crab.log
crab status
CRAB project directory: /afs/cern.ch/user/a/atanasi/CRAB3-tutorial/CMSSW_10_6_18/src/crab_projects/crab_tutorial_Aug2021_MC_generation_2
Task name: 150507_112633:atanasi_crab_tutorial_Aug2021_MC_generation_2
Task status: SUBMITTED
Dashboard monitoring URL: http://dashb-cms-job.cern.ch/dashboard/templates/task-analysis/#user=crab&refresh=0&table=Jobs&p=1&records=25&activemenu=2&status=&site=&tid=150507_112633%3Aatanasi_crab_tutorial_Aug2021_MC_generation_2
Details: idle 100.0% (20/20)
No publication information available yet
Log file is /afs/cern.ch/user/a/atanasi/CRAB3-tutorial/CMSSW_10_6_18/src/crab_projects/crab_tutorial_Aug2021_MC_generation_2/crab.log
Once jobs have finished and outputs have been published
If a list of jobs is omitted, then all jobs in the task are killed. This command stops all running jobs, removes idle jobs from the queue, cancels all file transfers (ongoing transfers will still complete) and cancels the publication (already transfered outputs will still be published). Here is an example of a task that we submitted and killed. We used the crabConfig_tutorial_MC_generation.py file, changing the request name to:
Will use CRAB configuration file crabConfig_tutorial_MC_generation.py
Sending the request to the server
Success: Your task has been delivered to the CRAB3 server.
Task name: 150508_102820:atanasi_crab_tutorial_Aug2021_MC_generation_kill
Please use 'crab status' to check how the submission process proceeds.
Log file is /afs/cern.ch/user/a/atanasi/CRAB3-tutorial/CMSSW_10_6_18/src/crab_projects/crab_tutorial_Aug2021_MC_generation_kill/crab.log
We checked the task status after a few minutes:
crab status -d crab_projects/crab_tutorial_Aug2021_MC_generation_kill
CRAB project directory: /afs/cern.ch/user/a/atanasi/CRAB3-tutorial/CMSSW_10_6_18/src/crab_projects/crab_tutorial_Aug2021_MC_generation_kill
Task name: 150508_102820:atanasi_crab_tutorial_Aug2021_MC_generation_kill
Task status: SUBMITTED
Dashboard monitoring URL: http://dashb-cms-job.cern.ch/dashboard/templates/task-analysis/#user=crab&refresh=0&table=Jobs&p=1&records=25&activemenu=2&status=&site=&tid=150508_102820%3Aatanasi_crab_tutorial_Aug2021_MC_generation_kill
Details: running 100.0% (10/10)
No publication information available yet
Log file is /afs/cern.ch/user/a/atanasi/CRAB3-tutorial/CMSSW_10_6_18/src/crab_projects/crab_tutorial_Aug2021_MC_generation_kill/crab.log
Kill request successfully sent
Log file is /afs/cern.ch/user/a/atanasi/CRAB3-tutorial/CMSSW_10_6_18/src/crab_projects/crab_tutorial_Aug2021_MC_generation_kill/crab.log
After some more minutes we checked the task status again:
crab status -d crab_projects/crab_tutorial_Aug2021_MC_generation_kill
And we got:
CRAB project directory: /afs/cern.ch/user/a/atanasi/CRAB3-tutorial/CMSSW_10_6_18/src/crab_projects/crab_tutorial_Aug2021_MC_generation_kill
Task name: 150508_102820:atanasi_crab_tutorial_Aug2021_MC_generation_kill
Task status: KILLED
Dashboard monitoring URL: http://dashb-cms-job.cern.ch/dashboard/templates/task-analysis/#user=crab&refresh=0&table=Jobs&p=1&records=25&activemenu=2&status=&site=&tid=150508_102820%3Aatanasi_crab_tutorial_Aug2021_MC_generation_kill
Details: failed 100.0% (10/10)
No publication information available yet
Error Summary: (use --verboseErrors for details about the errors)
Could not find exit code details for 10 jobs.
Have a look at https://twiki.cern.ch/twiki/bin/viewauth/CMSPublic/JobExitCodes for a description of the exit codes.
Log file is /afs/cern.ch/user/a/atanasi/CRAB3-tutorial/CMSSW_10_6_18/src/crab_projects/crab_tutorial_Aug2021_MC_generation_kill/crab.log
The task status became KILLED. (The jobs status will depend on the status they had when the kill signal was sent.)
Note that sending another crab kill for this task will return an error, because the task is already in KILLED state:
Error contacting the server.
Server answered with: Execution error
Reason is: You cannot kill a task if it is in the KILLED state
Log file is /afs/cern.ch/user/a/atanasi/CRAB3-tutorial/CMSSW_10_6_18/src/crab_projects/crab_tutorial_Aug2021_MC_generation_kill/crab.log