CRAB Logo

CRAB tutorial (advanced)

Complete: 5 Go to SWGuideCrab

Exercises

The tips you can see with the show links are in progressive order of difficulty. If you get stuck on one exercise try to solve it showing only the first tip before looking at the second one.

Exercise 1 - dry run

The following work area was prepared for this exercise: /afs/cern.ch/cms/ccs/wm/scripts/Crab/CRAB3/AdvTutorial/Exercise_1. It contains

  • the CMSSW parameter set configuration files PSet.py and PSet.pkl;
  • the CMSSW release area CMSSW_7_2_3_patch1 with additional packages used by the above CMSSW parameter set.
In order to use this already prepared work area with its full enironment, execute cmsenv directly inside /afs/cern.ch/cms/ccs/wm/scripts/Crab/CRAB3/AdvTutorial/Exercise_1/CMSSW_7_2_3_patch1/src/. Then you can setup CRAB3 and execute CRAB command from any other location you wish (for example your home area); you will only need to copy the PSet files to that location.

I am trying to analyze /VBF_HToTauTau_M-125_13TeV-powheg-pythia6/Phys14DR-PU20bx25_tsg_PHYS14_25_V1-v2/MINIAODSIM with the following splitting parameters:

config.Data.splitting = 'FileBasedí
config.Data.unitsPerJob = 1

and after waiting one day I am getting the following output with CRAB status:

Error Summary:

1 jobs failed with exit code 50664:

    1 jobs failed with following error message: (for example, job 1)

        Not retrying job due to wall clock limit (job automatically killed on the worker node)

1.A) Can you tell me what happened to my jobs?

Show Hide answer.
By default jobs cannot run more than 24 hours on the Grid worker nodes. They are automatically killed if their runtime (called wall clock 
time) exceeds this limit. That's what happened to the jobs; they were killed, because they reached the wall clock limit.

1.B) Prepare a CRAB configuration to analyze this dataset and execute crab submit --dryrun.

Help:

Show Hide CRAB configuration file.
from CRABClient.UserUtilities import config
config = config()

config.General.requestName = 'CRAB3_Advanced_Tutorial_May2015_Exercise1'

config.JobType.psetName = 'PSet.py'
config.JobType.pluginName = 'Analysis'

config.Data.inputDataset = '/VBF_HToTauTau_M-125_13TeV-powheg-pythia6/Phys14DR-PU20bx25_tsg_PHYS14_25_V1-v2/MINIAODSIM'
config.Data.splitting = 'FileBased'
config.Data.unitsPerJob = 1

config.Site.storageSite = <site where you have permission to write>

1.C) How much time your code is taking for each event (use CRAB3 estimation of --dryrun)?

Show Hide answer.
The estimated time per event is 3 seconds.

1.D) Try to come up with better splitting parameters in the CRAB configuration: your target for this exercise are jobs running for 8 hours (estimate the number of events per lumi from DAS). Use 'crab submit --dryrun' (and please don't submit the task!).

Help:

Show Hide average number of events per lumi.
Looking at the first file in the dataset, it has in average ~100 events per lumi. One way to get this number is to use DAS to get the list of lumis in the file [1] (and count the number of lumis for example with python). Then divide this number by the number of events in the file (you can also find it in DAS):

$ python
>>> lumis = [[3967, 3969], [3973, 3973], ...]
>>> sum([y-x+1 for x, y in lumis])
431
>>> 42902 / 431
99

[1] https://cmsweb.cern.ch/das/request?input=lumi%20file%3D/store/mc/Phys14DR/VBF_HToTauTau_M-125_13TeV-powheg-pythia6/MINIAODSIM/PU20bx25_tsg_PHYS14_25_V1-v2/00000/147B369C-9F77-E411-B99D-00266CF9B184.root&instance=prod/global&idx=0&limit=10
Show Hide CRAB configuration file.
eventsPerLumi = 100
timePerEv = 4
desiredTime = 8*60*60
config.Data.unitsPerJob = desiredTime / (eventsPerLumi*timePerEv)

Exercise 2 - user input files

2.A) Run a task similar to the one in section Running CMSSW analysis with CRAB on Data of the CRAB3 (basic) tutorial, but with each job analyzing exactly one input file and with the task analyzing in total five input files. Also, don't use any lumi-mask nor run-range.

Help:

Show Hide CMSSW parameter-set configuration file.
import FWCore.ParameterSet.Config as cms

process = cms.Process('NoSplit')

process.source = cms.Source("PoolSource", fileNames = cms.untracked.vstring())
process.maxEvents = cms.untracked.PSet(input = cms.untracked.int32(10))
process.options = cms.untracked.PSet(wantSummary = cms.untracked.bool(True))
process.output = cms.OutputModule("PoolOutputModule",
    outputCommands = cms.untracked.vstring("drop *", "keep recoTracks_*_*_*"),
    fileName = cms.untracked.string('output.root'),
)
process.out = cms.EndPath(process.output)
Show Hide CRAB configuration file.
from CRABClient.UserUtilities import config
config = config()

config.General.requestName = 'CRAB3_Advanced_Tutorial_June2015_Exercise2A'
config.General.transferOutputs = True
config.General.transferLogs = False

config.JobType.pluginName = 'Analysis'
config.JobType.psetName = 'pset_tutorial_analysis.py'

config.Data.inputDataset = '/SingleMu/Run2012B-13Jul2012-v1/AOD'
config.Data.splitting = 'FileBased'
config.Data.unitsPerJob = 1
config.Data.totalUnits = 5
config.Data.publication = True
config.Data.publishDataName = config.General.requestName

config.Site.storageSite = <site where you have permission to write>

2.B) Look at what were the five input files analyzed by task A and create a local text file containing the LFNs of these five files. Submit a task to analyze the same input files as task A, but instead of specifying the input dataset use the local text file you just created. Once task A and task B have finished, check that both have published a similar output dataset. The goal of exercises 2.A and 2.B is to show that it is possible to run CRAB over published files treating them as "user input files" (i.e. using Data.userInputFiles instead of Data.inputDataset), although this is of course not recommended.

Note: When running over user input files CRAB will not try to find out at which sites the input files are stored. Instead CRAB will submit the jobs to the less busy site. If the input files are not stored at these sites, they will be accessed via Xrootd. Since Xrootd is of course less efficient than direct access, it is recommended to force CRAB to submit the jobs to the sites where the input files are stored by whitelisting those sites.

Help:

Show Hide how to know which input files were analyzed by task A?
Look for example in the job log files linked from the monitoring pages. In each job log file, at the very beginning, it should say what 
were the input files analyzed by the corresponding job.

Answer:
/store/data/Run2012B/SingleMu/AOD/13Jul2012-v1/0000/008DBED0-86D3-E111-AEDF-20CF3019DF17.root
/store/data/Run2012B/SingleMu/AOD/13Jul2012-v1/0000/00F9715A-A1D3-E111-BE6F-E0CB4E29C4D1.root
/store/data/Run2012B/SingleMu/AOD/13Jul2012-v1/0000/00100164-41D4-E111-981B-20CF3027A5AF.root
/store/data/Run2012B/SingleMu/AOD/13Jul2012-v1/0000/0093BEF2-A4D3-E111-A6B9-E0CB4E19F979.root
/store/data/Run2012B/SingleMu/AOD/13Jul2012-v1/0000/0009F1CC-0DD4-E111-974D-20CF305B0584.root
Show Hide how to know at which sites are these input files stored?
Search for the input dataset /SingleMu/Run2012B-13Jul2012-v1/AOD in DAS.

Answer:
T2_BE_UCL, T2_IT_Legnaro, T2_RU_JINR, T1_US_FNAL, T1_IT_CNAF
Show Hide CRAB configuration file.
from CRABClient.UserUtilities import config
config = config()

config.General.requestName = 'CRAB3_Advanced_Tutorial_June2015_Exercise2B'
config.General.transferOutputs = True
config.General.transferLogs = False

config.JobType.pluginName = 'Analysis'
config.JobType.psetName = 'pset_tutorial_analysis.py'

config.Data.primaryDataset = 'SingleMu'
config.Data.userInputFiles = open('CRAB3_Advanced_Tutorial_June2015_Exercise2B_inputFiles.txt').readlines()
config.Data.splitting = 'FileBased'
config.Data.unitsPerJob = 1
config.Data.totalUnits = 5
config.Data.publication = True
config.Data.publishDataName = config.General.requestName

config.Site.storageSite = <site where you have permission to write>
config.Site.whitelist = ['T2_BE_UCL', 'T2_IT_Legnaro', 'T2_RU_JINR']

2.C) Run a task as in 2.A, but turning off the publication.

2.D) Run a task over the output files from task C (the output files from task C are not published; so this is a typical case of user input files) using a local text file containing the LFNs of the output files from task C. Whitelist the site where you stored the output files from task C.

Help:

Show Hide how to get the LFNs of the output files from task C?
Use 'crab getoutput --dump'.

Exercise 3 - recovery task

Assume the following CRAB configuration, which uses the same CMSSW parameter-set configuration as in exercise 2, and produces 213 jobs:

from CRABClient.UserUtilities import config
config = config()

config.General.requestName = 'CRAB3_Advanced_Tutorial_June2015_Exercise3C'
config.General.transferOutputs = True
config.General.transferLogs = False

config.JobType.pluginName = 'Analysis'
config.JobType.psetName = 'pset_tutorial_analysis.py'
# Assume for this exercise that the default job runtime limit is 1 hour.
config.JobType.maxJobRuntimeMin = 60

config.Data.inputDataset = '/SingleMu/Run2012B-13Jul2012-v1/AOD'
config.Data.splitting = 'LumiBased'
config.Data.unitsPerJob = 240
config.Data.lumiMask = 'https://cms-service-dqm.web.cern.ch/cms-service-dqm/CAF/certification/Collisions12/8TeV/Prompt/Cert_190456-208686_8TeV_PromptReco_Collisions12_JSON.txt'
config.Data.publication = True
config.Data.publishDataName =  config.General.requestName

config.Site.storageSite = <site where the user has permission to write>

3.A) Imagine the following situation. After submitting the task you did crab status and you got a message saying: Error during task injection: Your task failed to bootstrap on the Grid scheduler .... What would you do and why?

  1. Submit the task again.
  2. Resubmit the task.
  3. Submit a recovery task.

Show Hide answer.
The task has not been submitted to the Grid, so there is nothing to resubmit (or to recover). One has to submit the task again.

3.B) Imagine the following situation. You submitted the task and 50 jobs have failed in transferring the output files to the destination storage. You were told that there was a temporary issue with the storage and that it has been fixed now. What would you do and why?

  1. Submit a completely new task from scratch.
  2. Resubmit the failed jobs of the current task.
  3. Submit a recovery task.

Show Hide answer.
Since the issue was a temporary problem not related to the jobs (which were otherwise finishing without problems), resubmitting the failed 
jobs would be the most reasonable choice.

3.C) Imagine the following situation. You submitted the task and 50 jobs were killed at the worker nodes, because they reached the default runtime limit. What would you do and why?

  1. Submit a completely new task from scratch using a finer splitting.
  2. Resubmit the failed jobs of the current task requesting a higher job runtime limit.
  3. Submit a recovery task to analyze the failed jobs, but using a finer splitting.

Show Hide answer.
In this case the problem was with the jobs themselves. Resubmitting without changing the requested job runtime would not help, as jobs will 
most probably fail again. Resubmitting requesting a higher job runtime may cause the jobs to be queued for a long time before they start 
running. Submitting a new task would be a waste of resources as ~80% of the task has already completed. The best choice is to submit a 
recovery task to analyze only the failed jobs, but it must be with a finer splitting.

3.D) In relation to 3.C, describe step-by-step the process to submit a recovery task in which the jobs are expected to use 1/4 the walltime than the original jobs. How many jobs will the recovery task have (approx.)? Will the output files from the recovery task be stored in the same directory as the output files from the original task? Will they be published in the same output dataset?

Help: You can use the following existing task, 150626_112217:atanasi_crab_CRAB3_Advanced_Tutorial_June2015_Exercise3C, which represents the exact situation described in 3.C. Doing crab remake --task 150626_112217:atanasi_crab_CRAB3_Advanced_Tutorial_June2015_Exercise3C you will create a CRAB project directory for this task, at which point you will be able to execute crab commands referring to this task. You can then submit your proposed recovery task. Will it publish in the same output dataset?

Exercise 4 - user script

The following exercises can be all run in one task if you want to save some time.

4.A) Run a task similar to the one in section Running CRAB to generate MC events of the CRAB3 (basic) tutorial, but wrapping cmsRun in a shell script. Don't forget to tell cmsRun to produce the FrameworkJobReport.xml. In the script, print some messages before cmsRun starts and after cmsRun finishes. Where are these messages printed?

Help:

Show Hide CMSSW parameter-set configuration file.
# Auto generated configuration file
# using: 
# Revision: 1.19 
# Source: /local/reps/CMSSW/CMSSW/Configuration/Applications/python/ConfigBuilder.py,v 
# with command line options: MinBias_8TeV_cfi --conditions auto:startup -s GEN,SIM --datatier GEN-SIM -n 10 
# --relval 9000,300 --eventcontent RAWSIM --io MinBias.io --python MinBias.py --no_exec --fileout minbias.root

import FWCore.ParameterSet.Config as cms

process = cms.Process('SIM')

# Import of standard configurations
process.load('Configuration.StandardSequences.Services_cff')
process.load('SimGeneral.HepPDTESSource.pythiapdt_cfi')
process.load('FWCore.MessageService.MessageLogger_cfi')
process.load('Configuration.EventContent.EventContent_cff')
process.load('SimGeneral.MixingModule.mixNoPU_cfi')
process.load('Configuration.StandardSequences.GeometryRecoDB_cff')
process.load('Configuration.Geometry.GeometrySimDB_cff')
process.load('Configuration.StandardSequences.MagneticField_38T_cff')
process.load('Configuration.StandardSequences.Generator_cff')
process.load('IOMC.EventVertexGenerators.VtxSmearedRealistic8TeVCollision_cfi')
process.load('GeneratorInterface.Core.genFilterSummary_cff')
process.load('Configuration.StandardSequences.SimIdeal_cff')
process.load('Configuration.StandardSequences.EndOfProcess_cff')
process.load('Configuration.StandardSequences.FrontierConditions_GlobalTag_cff')

process.maxEvents = cms.untracked.PSet(
    input = cms.untracked.int32(10)
)

# Input source
process.source = cms.Source("EmptySource")

process.options = cms.untracked.PSet(

)

# Production Info
process.configurationMetadata = cms.untracked.PSet(
    version = cms.untracked.string('$Revision: 1.19 $'),
    annotation = cms.untracked.string('MinBias_8TeV_cfi nevts:10'),
    name = cms.untracked.string('Applications')
)

# Output definition
process.RAWSIMoutput = cms.OutputModule("PoolOutputModule",
    splitLevel = cms.untracked.int32(0),
    eventAutoFlushCompressedSize = cms.untracked.int32(5242880),
    outputCommands = process.RAWSIMEventContent.outputCommands,
    fileName = cms.untracked.string('minbias.root'),
    dataset = cms.untracked.PSet(
        filterName = cms.untracked.string(''),
        dataTier = cms.untracked.string('GEN-SIM')
    ),
    SelectEvents = cms.untracked.PSet(
        SelectEvents = cms.vstring('generation_step')
    )
)

# Additional output definition

# Other statements
process.genstepfilter.triggerConditions=cms.vstring("generation_step")
from Configuration.AlCa.GlobalTag import GlobalTag
process.GlobalTag = GlobalTag(process.GlobalTag, 'auto:startup', '')

process.generator = cms.EDFilter("Pythia6GeneratorFilter",
    pythiaPylistVerbosity = cms.untracked.int32(0),
    filterEfficiency = cms.untracked.double(1.0),
    pythiaHepMCVerbosity = cms.untracked.bool(False),
    comEnergy = cms.double(8000.0),
    maxEventsToPrint = cms.untracked.int32(0),
    PythiaParameters = cms.PSet(
        pythiaUESettings = cms.vstring('MSTU(21)=1     ! Check on possible errors during program execution', 
            'MSTJ(22)=2     ! Decay those unstable particles', 
            'PARJ(71)=10 .  ! for which ctau  10 mm', 
            'MSTP(33)=0     ! no K factors in hard cross sections', 
            'MSTP(2)=1      ! which order running alphaS', 
            'MSTP(51)=10042 ! structure function chosen (external PDF CTEQ6L1)', 
            'MSTP(52)=2     ! work with LHAPDF', 
            'PARP(82)=1.921 ! pt cutoff for multiparton interactions', 
            'PARP(89)=1800. ! sqrts for which PARP82 is set', 
            'PARP(90)=0.227 ! Multiple interactions: rescaling power', 
            'MSTP(95)=6     ! CR (color reconnection parameters)', 
            'PARP(77)=1.016 ! CR', 
            'PARP(78)=0.538 ! CR', 
            'PARP(80)=0.1   ! Prob. colored parton from BBR', 
            'PARP(83)=0.356 ! Mult
Edit | Attach | Watch | Print version | History: r23 | r21 < r20 < r19 < r18 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r19 - 2015-06-29 - AndresTanasijczuk
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback