5.6.1 Running CMSSW code on the Grid using CRAB

Complete: 5
Detailed Review status

WARNING

  • You should always use latest production CRAB version
  • This tutorial may be outdated since it was prepared for a live lesson at a specific time and thus refers to a particular dataset and CMSSW version that may not be available when you read this (and where you try it).
    • as of June 2013 you should be able to kickstart your Crab work using CMSSW 5_3_8 and the dataset /RelValProdTTbar/JobRobot-MC_3XY_V24_JobRobot-v1/GEN-SIM-DIGI-RECO as MC data and /SingleMu/Run2012B-13Jul2012-v1/AOD as real data.

Contents:

Prerequisites to run the tutorial

  • to have a valid Grid certificate
  • to be registered to the CMS virtual organization
  • to be registered to the siteDB
  • to have access to lxplus machines or to an SLC5 User Interface

Recipe for the tutorial

For this tutorial we will refer to CMS software:

  • CMSSW_5_3_8

and we will use an already prepared CMSSW analysis code to analyze the sample:

We will use the central installation of CRAB available at CERN:

  • CRAB_2_8_5

The example is written to use the csh shell family. If you want to use the Bourne Shell replace csh with sh.

Legend of colors for this tutorial

BEIGE background for the commands to execute  (cut&paste)
GREEN background for the output sample of the executed commands (nearly what you should see in your terminal)
BLUE background for the configuration files  (cut&paste)

Setup local Environment and prepare user analysis code

In order to submit jobs to the Grid, you must have an access to a LCG User Interface (LCG UI). It will allow you to access WLCG-affiliated resources in a fully transparent way. LXPLUS users can get an LCG UI via AFS by:

source /afs/cern.ch/cms/LCG/LCG-2/UI/cms_ui_env.csh

Install CMSSW project in a directory of your choice. In this case we create a "TESTfirst " directory:

mkdir TEST
cd TEST
cmsrel CMSSW_5_3_8
#cmsrel is an alias of scramv1 project CMSSW CMSSW_5_3_8
cd CMSSW_5_3_8/src/ 
cmsenv
#cmsenv is an alias for scramv1 runtime -csh

For this tutorial we are going to use as CMSSW configuration file, the tutorial.py:

import FWCore.ParameterSet.Config as cms
process = cms.Process('Slurp')

process.source = cms.Source("PoolSource", fileNames = cms.untracked.vstring())
process.maxEvents = cms.untracked.PSet( input       = cms.untracked.int32(10) )
process.options   = cms.untracked.PSet( wantSummary = cms.untracked.bool(True) )

process.output = cms.OutputModule("PoolOutputModule",
    outputCommands = cms.untracked.vstring("drop *", "keep recoTracks_*_*_*"),
    fileName = cms.untracked.string('outfile.root'),
)
process.out_step = cms.EndPath(process.output)

CRAB setup

Setup on lxplus:

In order to setup and use CRAB from any directory, source the script crab.(c)sh located in /afs/cern.ch/cms/ccs/wm/scripts/Crab/, which always points to the latest version of CRAB. After the source of the script it's possible to use CRAB from any directory (typically use it on your CMSSW working directory).

source /afs/cern.ch/cms/ccs/wm/scripts/Crab/crab.csh

Warning: in order to have the correct environment, the order to source env files has always to be

  • source of UI env
  • setup of CMSSW software
  • source of CRAB env

Locate the dataset and prepare CRAB submission

In order to run our analysis over a whole dataset, we have to find first the data name and then put it on the crab configuration file.

Data selection

To select data you want to access, use the DAS web page where available datasets are listed Data Aggregation Service (DAS) . For this tutorial we'll use :

/RelValProdTTbar/JobRobot-MC_3XY_V24_JobRobot-v1/GEN-SIM-DIGI-RECO
 (MC data)
  • Beware: datasets availability as sites changes with time, if you are trying to follow this tutorial after the date it was given, you may need to use another one

CRAB configuration

Modify the CRAB configuration file crab.cfg according to your needs: a fully documented template is available at $CRABPATH/full_crab.cfg, a template with essential parameters is available at $CRABPATH/crab.cfg . The default name of configuration file is crab.cfg, but you can rename it as you want.

Copy one of these files in your local area.

For guidance, see the list and description of configuration parameters in the on-line CRAB manual. For this tutorial, the only relevant sections of the file are [CRAB], [CMSSW] and [USER] .

Configuration parameters

The list of the main parameters you need to specify on your crab.cfg:
  • pset: the CMSSW configuration file name;
  • output_file: the output file name produced by your pset; if in the CMSSW pset the output is defined in TFileService, the file is automatically handled by CRAB, and there is no need to specify it on this parameter;
  • datasetpath: the full dataset name you want to analyze;
  • Jobs splitting:
    • By event: only for MC data. You need to specify 2 of these parameters: total_number_of_events, number_of_jobs, events_per_job
      • specify the total_number_of_events and the number_of_jobs: this will assing to each job a number of events equal to total_number_of_events/number_of_jobs
      • specify the total_number_of_events and the events_per_job: this will assign to each job events_per_job events and will calculate the number of jobs by total_number_of_events/events_per_job;
      • or you can specify the number_of_jobs and the events_per_job;
    • By lumi: real data require it. You need to specify 2 of these parameters: total_number_of_lumis, lumis_per_job, number_of_jobs
      • because jobs in split-by-lumi mode process entire rather than partial files, you will often end up with fewer jobs processing more lumis than expected. Additionally, a single job cannot analyze files from multiple blocks in DBS. So these parameters are "advice" to CRAB rather than determinative.
      • specify the lumis_per_job and the number_of_jobs: the total number of lumis processed will be number_of_jobs x lumis_per_job
      • or you can specify the total_number_of_lumis and the number_of_jobs
      • lumi_mask: the filename of a JSON file that describes which runs and lumis to process. CRAB will skip luminosity blocks not listed in the file.
  • return_data: this can be 0 or 1; if it is one you will retrieve your output files to your local working area;
  • copy_data: this can be 0 or 1; if it is one you will copy your output files to a remote Storage Element;
  • local_stage_out: this can be 0 or 1; if this is one your produced output is copied to the closeSE in the case of failure of the copy to the SE specified in your crab.cfg
  • publish_data: this can be 0 or 1; if it is one you can publish your produced data to a local DBS;
  • use_server: the usage for crab server is deprecated now, so by default this parameter is set to 0;
  • scheduler: the name of the scheduler you want to use;
  • jobtype: the type of the jobs.

Run CRAB on MonteCarlo data copying the output to an SE

The chance to copy the output to an existing Storage Element allows to bypass the output size limit constraint, to publish the data on a local DBS and then to easily re-run over the published data. In order to make CRAB copies to a Storage Element you have to add the following information on the Crab configuration file:
  • that we want to copy our results adding copy_data=1 and return_data=0 (it is not allowed to have both at 1);
  • add the official CMS site name where we are going to copy our results (examples just for this session with Legnaro T2_IT_Legnaro StorageElement); the name of other official sites can be found in the siteDB

CRAB configuration file for MonteCarlo data

You can find more details on this at the corresponding link on the Crab FAQ page.

The CRAB configuration file (default name crab.cfg) should be located at the same location as the CMSSW parameter-set to be used by CRAB with the following content:

[CMSSW]
total_number_of_events  = 10
number_of_jobs          = 5
pset                    = tutorial.py
datasetpath             =  /RelValProdTTbar/JobRobot-MC_3XY_V24_JobRobot-v1/GEN-SIM-DIGI-RECO

output_file              = outfile.root

[USER]
return_data             = 0
copy_data               = 1
storage_element        = T2_IT_Legnaro
user_remote_dir         = TutGridSchool

[CRAB]
scheduler = remoteGlidein
jobtype                 = cmssw

Run Crab

Once your crab.cfg is ready and the whole underlying environment is set up, you can start running CRAB. CRAB supports command line help which can be useful for the first time. You can get it via:
crab -h

Job Creation

The job creation checks the availability of the selected dataset and prepares all the jobs for submission according to the selected job splitting specified in the crab.cfg

  • By default the creation process creates a CRAB project directory (default: crab_0_date_time) in the current working directory, where the related crab configuration file is cached for further usage, avoiding interference with other (already created) projects

  • Using the [USER] ui_working_dir parameter in the configuration file CRAB allows the user to chose the project name, so that it can be used later to distinguish multiple CRAB projects in the same directory.

crab -create  
that takes by default the configuration file called crab.cfg associated for this tutorial with MC data.

The creation command could ask for proxy/myproxy passwords the first time you use it and it should produce a similar screen output like:

 
$ crab -create
crab: Version 2.8.5 running on Wed Feb 20 17:39:32 2013 CET (16:39:32 UTC)
crab. Working options:
scheduler remoteGlidein
job type CMSSW
server OFF
working directory /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130220_173930/
Enter GRID pass phrase:
Your identity: /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=fanzago/CN=610896/CN=Federica Fanzago
Creating temporary proxy .................................. Done
Contacting voms.cern.ch:15002 [/DC=ch/DC=cern/OU=computers/CN=voms.cern.ch] "cms" Done
Creating proxy ............................................... Done
Your proxy is valid until Thu Feb 28 17:40:02 2013
verify if user DN is mapped in CERN's SSO
OK. user ready for SiteDB switchover on March 12, 2013
crab: Contacting Data Discovery Services ...
crab: Accessing DBS at: http://cmsdbsprod.cern.ch/cms_dbs_prod_global/servlet/DBSServlet
crab: Requested dataset: /RelValProdTTbar/JobRobot-MC_3XY_V24_JobRobot-v1/GEN-SIM-DIGI-RECO has 300000 events in 1 blocks.
crab: May not create the exact number_of_jobs requested.
crab: 5 job(s) can run on 50 events.
crab: List of jobs and available destination sites:
Block 1: jobs 1-5: sites: T2_HU_Budapest, T2_CH_CSCS, T2_ES_IFCA, T2_FR_CCIN2P3, T2_IT_Bari, T2_RU_SINP, T3_IT_Bologna, T2_KR_KNU, T2_UK_SGrid_Bristol, T2_FR_GRIF_LLR, T2_RU_INR, T2_CN_Beijing, T2_US_MIT, T2_RU_PNPI, T2_TR_METU, T2_UK_London_IC, T2_DE_DESY, T2_TW_Taiwan, T2_US_UCSD, T2_RU_RRC_KI, T2_PL_Warsaw, T2_PT_LIP_Lisbon, T2_US_Caltech, T2_PT_NCG_Lisbon, T2_BR_SPRACE, T2_IT_Rome, T2_US_Purdue, T2_BE_IIHE, T2_IT_Legnaro, T2_ES_CIEMAT, T2_DE_RWTH, T2_RU_JINR, T2_CH_CERN, T2_FR_GRIF_IRFU, T2_UA_KIPT, T2_UK_SGrid_RALPP, T2_PK_NCP, T2_UK_London_Brunel, T2_RU_IHEP, T2_IT_Pisa, T2_IN_TIFR, T2_US_Vanderbilt, T2_US_Florida, T2_RU_ITEP, T2_FR_IPHC, T2_BE_UCL, T2_US_Wisconsin, T2_US_Nebraska, T3_UK_London_RHUL, T2_FI_HIP, T2_EE_Estonia
crab: Checking remote location
crab: Creating 5 jobs, please wait...
crab: Total of 5 jobs created.
Log file is /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130220_173930/log/crab.log

  • the project directory called crab_0_130220_173930 is created

Job Submission

With the submission command it's possible to specify a combination of jobs and job-ranges separated by comma (e.g.: =1,2,3-4), the default is all. To submit all jobs of the last created project with the default name, it's enough to execute the following command:

crab -submit 
to submit a specific project:
crab -submit -c  <dir name>

which should produce a similar screen output like:

 
$ crab -submit
crab: Version 2.8.5 running on Wed Feb 20 17:42:10 2013 CET (16:42:10 UTC)
crab. Working options:
scheduler remoteGlidein
job type CMSSW
server OFF
working directory /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130220_173930/
crab: Checking available resources...
crab: Found compatible site(s) for job 1
crab: 1 blocks of jobs will be submitted
crab: remotehost from Avail.List = submit-2.t2.ucsd.edu
crab: contacting remote host submit-2.t2.ucsd.edu
crab: Establishing gsissh ControlPath. Wait 2 sec ...
crab: Establishing gsissh ControlPath. Wait 2 sec ...
crab: Establishing gsissh ControlPath. Wait 2 sec ...
crab: COPY FILES TO REMOTE HOST
crab: SUBMIT TO REMOTE GLIDEIN FRONTEND
Submitting 5 jobs
100% [=================================================================================================]
please wait crab: Total of 5 jobs submitted.
Log file is /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130220_173930/log/crab.log

Job Status Check

Check the status of the jobs in the latest CRAB project with the following command:
crab -status 
to check a specific project:
crab -status -c  <dir name>

which should produce a similar screen output like:

$ crab -status
crab: Version 2.8.5 running on Wed Feb 20 17:43:04 2013 CET (16:43:04 UTC)
crab. Working options:
scheduler remoteGlidein
job type CMSSW
server OFF
working directory /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130220_173930/
crab: Checking the status of all jobs: please wait
crab: contacting remote host submit-2.t2.ucsd.edu
crab:
ID END STATUS ACTION ExeExitCode JobExitCode E_HOST
----- --- ----------------- ------------ ---------- ----------- ---------
1 N Submitted SubSuccess
2 N Submitted SubSuccess
3 N Submitted SubSuccess
4 N Submitted SubSuccess
5 N Submitted SubSuccess
crab: 5 Total Jobs
>>>>>>>>> 5 Jobs Submitted
List of jobs Submitted: 1-5
crab: You can also follow the status of this task on :
CMS Dashboard: http://dashb-cms-job-task.cern.ch/taskmon.html#task=fanzago_crab_0_130220_173930_68zw1c
Your task name is: fanzago_crab_0_130220_173930_68zw1c
Log file is /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130220_173930/log/crab.log

Job Output Retrieval

For the jobs which are in the "Done" status it is possible to retrieve the log files of the jobs (just the log files, because the output files are copied to the Storage Element associated to the T2 specified on the crab.cfg and infact return_data is 0). The following command retrieves the log files of all "Done" jobs of the last created CRAB project:
crab -getoutput 
to get the output of a specific project:
crab -getoutput -c  <dir name>

the job results (CMSSW_n.stdout, CMSSW_n.stderr and crab_fjr_n.xml) will be copied in the res subdirectory of your crab project:

$ crab -get
crab: Version 2.8.5 running on Wed Feb 20 20:17:02 2013 CET (19:17:02 UTC)
crab. Working options:
scheduler remoteGlidein
job type CMSSW
server OFF
working directory /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130220_173930/
crab: contacting remote host submit-2.t2.ucsd.edu
crab: RETRIEVE FILE out_files_1.tgz for job #1
crab: RETRIEVE FILE crab_fjr_1.xml for job #1
crab: Results of Jobs # 1 are in /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130220_173930/res/
crab: contacting remote host submit-2.t2.ucsd.edu
crab: RETRIEVE FILE out_files_2.tgz for job #2
crab: RETRIEVE FILE crab_fjr_2.xml for job #2
crab: RETRIEVE FILE out_files_3.tgz for job #3
crab: RETRIEVE FILE crab_fjr_3.xml for job #3
crab: RETRIEVE FILE out_files_4.tgz for job #4
crab: RETRIEVE FILE crab_fjr_4.xml for job #4
crab: RETRIEVE FILE out_files_5.tgz for job #5
crab: RETRIEVE FILE crab_fjr_5.xml for job #5
crab: Results of Jobs # 2 are in /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130220_173930/res/
crab: Results of Jobs # 3 are in /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130220_173930/res/
crab: Results of Jobs # 4 are in /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130220_173930/res/
crab: Results of Jobs # 5 are in /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130220_173930/res/
Log file is /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130220_173930/log/crab.log

The stderr is an empty file, the stdout is the output of the wrapper of your analysis code (the output of CMSSW.sh script created by CRAB) and the crab_fjr.xml is the FrameworkJobReport created by your analysis code.

Use the -report option

Print a short report about the task, namely the total number of events and files processed/requested/available, the name of the dataset path, a summary of the status of the jobs, and so on. A summary file of the runs and luminosity sections processed is written to res/. In principle -report should generate all the info needed for an analysis. Command to execute:

crab -report
Example of execution:

$ crab -report
crab:  Version 2.8.5 running on Thu Feb 21 02:17:06 2013 CET (01:17:06 UTC)

crab. Working options:
       scheduler           remoteGlidein
       job type            CMSSW
       server              OFF
       working directory   /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130220_173930/

crab:  --------------------
Dataset: /RelValProdTTbar/JobRobot-MC_3XY_V24_JobRobot-v1/GEN-SIM-DIGI-RECO
Remote output :
SE: T2_IT_Legnaro t2-srm-02.lnl.infn.it  srmPath: srm://t2-srm-02.lnl.infn.it:8443/srm/managerv2?SFN=/pnfs/lnl.infn.it/data/cms/store/user/fanzago/TutGridSchool/
Total Events read: 50
Total Files read: 5
Total Jobs : 5
Luminosity section summary file: /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130220_173930/res/lumiSummary.json
  # Jobs: Retrieved:5

----------------------------

crab:  The summary file inputLumiSummaryOfTask.json about input run and lumi isn't created
crab:  No json file to compare
Log file is /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130220_173930/log/crab.log

The message "The summary file inputLumiSummaryOfTask.json about input run and lumi isn't created" isn't an error but a message that means input data didn't provide lumi section info, as expected for the MC data.

The full srm path will allow you to know where your data has been stored and to perform operations by hand on it. As example you can delete the data using srmrm command and check the content of the remote directory through srmls. In this case the remote directory is:

srm://t2-srm-02.lnl.infn.it:8443/srm/managerv2?SFN=/pnfs/lnl.infn.it/data/cms/store/user/fanzago/TutGridSchool/

It could be necessary to substitute the ? with the "?" in the srm path, depending on the shell you are using. Additional srm commands include srmrm, srmrmdir, srmmv, for moving files within an srm system, srmcp which can copy files locally. Note that to copy files locally, srmcp may require the additional flag "-2" to ensure that the version 2 client is used.

Here is the content of the file containing the luminosity summary /crab_0_130220_173930/res/lumiSummary.json:

{"1": [[39, 39]]}

Copy the output from the SE to the local User Interface

Option that can be used only if your output have been previously copied by CRAB on a remote SE. By default the -copyData copies your output from the remote SE to the local CRAB working directory (under res). Otherwise you can copy the output from the remote SE to another one, specifying either -dest_se= or -dest_endpoint=. If dest_se is used, CRAB finds the correct path where the output can be stored. The command to execute in order to retrieve locally the remote output files to your local user interface is:
crab -copyData ## or crab -copyData -c <dir name>
An example of execution:

$ crab -copyData
crab:  Version 2.8.5 running on Thu Feb 21 02:49:18 2013 CET (01:49:18 UTC)

crab. Working options:
        scheduler           remoteGlidein
        job type            CMSSW
        server              OFF
        working directory   /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130220_173930/

crab:  Copy file locally.
        Output dir: /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130220_173930/res/
crab:  Starting copy...
directory/afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130220_173930/res/already exists
crab:  Copy success for file: outfile_1_1_aOu.root 
crab:  Copy failed for file: outfile_4_1_Pi9.root 
        Copy failed because : Problem copying outfile_4_1_Pi9.root file'Permission denied!'
crab:  Copy success for file: outfile_2_1_bC1.root 
crab:  Copy success for file: outfile_5_1_yna.root 
crab:  Copy success for file: outfile_3_1_96A.root 
Log file is /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130220_173930/log/crab.log

Publish your result in DBS

The publication of the produced data to a DBS allows to re-run over the produced data that has been published. The instructions to follow are below, and here is the link to the how to. You have to add to the Crab configuration file more information specifying the data name to publish and the DBS url instance where to register the output results.
[USER]
....
publish_data            = 1
publish_data_name       = what_you_want
dbs_url_for_publication = url_local_dbs
....
Warning:
  • all the parameters related publication have to be added in the configuration file before creation of jobs, even if the publication step is executed after retrieving of job output.
  • for this tutorial we will publish the data to the test DBS instance https://cmsdbsprod.cern.ch:8443/cms_dbs_ph_analysis_02_writer/servlet/DBSServlet. This instance is only for publication test, so the maintaining of published data is not guarantee for long time and the publication here doesn't require writing authorization. If you belong to a PAG group, you have to publish your data to the DBS associated to your group, checking at the DBS access twiki page the correct DBS url and which role in voms you need to be an allowed user.
  • remember to change the ui_working_dir value in the configuration file to create a new project (if you don't use the default name of crab project), otherwise the creation step will fail with the error message "project already exists, please remove it before create new task "

Run Crab publishing your results

You can also run your analysis code publishing the results copied to a remote Storage Element. Here below an example of the CRAB configuration file, coherent with this tutorial:

For MC data (crab.cfg)

[CMSSW]
total_number_of_events  = 50
number_of_jobs          = 10
pset                    = tutorial.py
datasetpath             = /RelValProdTTbar/JobRobot-MC_3XY_V24_JobRobot-v1/GEN-SIM-DIGI-RECO 
output_file              = outfile.root

[USER]
return_data             = 0
copy_data               = 1
storage_element         = T2_IT_Legnaro
publish_data            = 1
publish_data_name       = FedeTutGrid
dbs_url_for_publication = https://cmsdbsprod.cern.ch:8443/cms_dbs_ph_analysis_02_writer/servlet/DBSServlet

[CRAB]
scheduler               = remoteGlidein
jobtype                 = cmssw

And with this crab.cfg you can re-do complete the workflow as described before plus the publication step:

  • creation
  • submission
  • status progress monitoring
  • output retrieval
  • publish the results

Use the -publish option

After having done the previous workflow untill the retrieval of you jobs, you can publish the output data that have been stored in the Storage Element indicated in the crab.cfg using

   crab -publish
or
   crab -publish -c <dir name>
to publish outputs of a specific project. It is not necessary all the jobs are done and retrieved. You can publish your output at different time.

It will look for all the FrameworkJobReport ( crab-project-dir/res/crab_fjr_*.xml ) produced by each jobs and will extract from there the information (i.e. number of events, LFN,....) to publish.

Publication output example

$ crab -publish -c crab_0_130221_030014/
crab:  Version 2.8.5 running on Tue Mar  5 12:04:57 2013 CET (11:04:57 UTC)

crab. Working options:
       scheduler           remoteGlidein
       job type            CMSSW
       server              OFF
       working directory   /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130221_030014/

crab:  <dbs_url_for_publication> = https://cmsdbsprod.cern.ch:8443/cms_dbs_ph_analysis_02_writer/servlet/DBSServlet
file_list =  ['/afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130221_030014//res//crab_fjr_1.xml', '/afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130221_030014//res//crab_fjr_2.xml', '/afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130221_030014//res//crab_fjr_3.xml', '/afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130221_030014//res//crab_fjr_4.xml', '/afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130221_030014//res//crab_fjr_5.xml']
crab:  --->>> Start dataset publication
crab:  --->>> Importing parent dataset in the dbs: /RelValProdTTbar/JobRobot-MC_3XY_V24_JobRobot-v1/GEN-SIM-DIGI-RECO
crab:  --->>> Importing all parents level
Block /RelValProdTTbar/JobRobot-MC_3XY_V24_JobRobot-v1/GEN-SIM-RAW#c9d3a01e-a3a1-4fde-8104-1c7b024b5ef6 is already at destination
Block /RelValProdTTbar/JobRobot-MC_3XY_V24_JobRobot-v1/GEN-SIM-DIGI-RECO#8f881129-b4fd-4d88-902a-f7ca78a9da8f is already at destination
crab:  --->>> duration of all parents import (sec): 3.43028283119
crab:  Import ok of dataset /RelValProdTTbar/JobRobot-MC_3XY_V24_JobRobot-v1/GEN-SIM-DIGI-RECO
crab:  PrimaryDataset = RelValProdTTbar
crab:  ProcessedDataset = fanzago-FedeTutGrid-c8295e0370df515614ca6812ce2cfe77
crab:  <User Dataset Name> = /RelValProdTTbar/fanzago-FedeTutGrid-c8295e0370df515614ca6812ce2cfe77/USER
debug_verbose:crab::Primary:  {'Type': 'mc', 'Name': 'RelValProdTTbar'}
primary =  {'Type': 'mc', 'Name': 'RelValProdTTbar'}
...
crab:  --->>> End dataset publication
INFO:crab::--->>> End dataset publication
crab:  --->>> Start files publication
INFO:crab::--->>> Start files publication
DEBUG:crab::FJR = /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130221_030014//res//crab_fjr_1.xml
DEBUG:crab::--->>> LFN of file to publish =  /store/user/fanzago/RelValProdTTbar/FedeTutGrid/c8295e0370df515614ca6812ce2cfe77/outfile_1_1_haS.root
DEBUG:crab::--->>> Inserting file in blocks = ['/RelValProdTTbar/fanzago-FedeTutGrid-c8295e0370df515614ca6812ce2cfe77/USER#c5c0a5bc-aa35-4dcb-ade4-52211e5e8332']
DEBUG:crab::FJR = /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130221_030014//res//crab_fjr_2.xml
DEBUG:crab::--->>> LFN of file to publish =  /store/user/fanzago/RelValProdTTbar/FedeTutGrid/c8295e0370df515614ca6812ce2cfe77/outfile_2_1_Nw2.root
...
crab:  --->>> End files publication
INFO:crab::--->>> End files publication
crab:  --->>> Check data publication: dataset /RelValProdTTbar/fanzago-FedeTutGrid-c8295e0370df515614ca6812ce2cfe77/USER in DBS url https://cmsdbsprod.cern.ch:8443/cms_dbs_ph_analysis_02_writer/servlet/DBSServlet

INFO:crab::--->>> Check data publication: dataset /RelValProdTTbar/fanzago-FedeTutGrid-c8295e0370df515614ca6812ce2cfe77/USER in DBS url https://cmsdbsprod.cern.ch:8443/cms_dbs_ph_analysis_02_writer/servlet/DBSServlet

=== dataset /RelValProdTTbar/fanzago-FedeTutGrid-c8295e0370df515614ca6812ce2cfe77/USER
=== dataset description =  
===== File block name: /RelValProdTTbar/fanzago-FedeTutGrid-c8295e0370df515614ca6812ce2cfe77/USER#c5c0a5bc-aa35-4dcb-ade4-52211e5e8332
     File block located at:  ['t2-srm-02.lnl.infn.it']
     File block status: 0
     Number of files: 5
     Number of Bytes: 3279142
     Number of Events: 50

total events: 50 in dataset: /RelValProdTTbar/fanzago-FedeTutGrid-c8295e0370df515614ca6812ce2cfe77/USER

crab:  You can obtain more info about files of the dataset using: crab -checkPublication -USER.dataset_to_check=/RelValProdTTbar/fanzago-FedeTutGrid-c8295e0370df515614ca6812ce2cfe77/USER -USER.dbs_url_for_publication=https://cmsdbsprod.cern.ch:8443/cms_dbs_ph_analysis_02_writer/servlet/DBSServlet -debug
INFO:crab::You can obtain more info about files of the dataset using: crab -checkPublication -USER.dataset_to_check=/RelValProdTTbar/fanzago-FedeTutGrid-c8295e0370df515614ca6812ce2cfe77/USER -USER.dbs_url_for_publication=https://cmsdbsprod.cern.ch:8443/cms_dbs_ph_analysis_02_writer/servlet/DBSServlet -debug
Log file is /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130221_030014//log/crab.log

Warning: some versions of CMSSW switch off the debug mode of crab, so a lot of duplicated info can be reported at screen level.

Check the result of data publication and analyze your published data

Note that:
  • CRAB by default publishes all files finished correctly, including files with 0 events
  • CRAB by default imports all dataset parents of your dataset

To check if your data have been published you can use the option:

crab -checkPublication -USER.dataset_to_check=your_dataset_path -USER.dbs_url_for_publication=url_local_dbs -debug
where dbs_url_for_publication is the dbs_url you have written in the crab.cfg file and name_of_your_dataset is the name of dataset published by CRAB primarydataset/publish_data_name/USER (it is also printed by CRAB in corrispondence of the line "User Dataset Name" when you run the crab -publish command).

The output is:

 $ crab -checkPublication -USER.dataset_to_check=/RelValProdTTbar/fanzago-FedeTutGrid-c8295e0370df515614ca6812ce2cfe77/USER -USER.dbs_url_for_publication=https://cmsdbsprod.cern.ch:8443/cms_dbs_ph_analysis_02_writer/servlet/DBSServlet -debug
crab:  /afs/cern.ch/cms/ccs/wm/scripts/Crab/CRAB_2_8_5_patch1/python/crab.py -checkPublication -USER.dataset_to_check=/RelValProdTTbar/fanzago-FedeTutGrid-c8295e0370df515614ca6812ce2cfe77/USER -USER.dbs_url_for_publication=https://cmsdbsprod.cern.ch:8443/cms_dbs_ph_analysis_02_writer/servlet/DBSServlet -debug

crab:  Version 2.8.5 running on Tue Mar  5 12:11:37 2013 CET (11:11:37 UTC)

crab. Working options:
       scheduler           remoteGlidein
       job type            CMSSW
       server              OFF
       working directory   /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130304_120142/

crab:  Downloading file [http://cmsdoc.cern.ch/cms/LCG/crab/config/] to [/afs/cern.ch/user/f/fanzago/.cms_crab/allowed_releases.conf].
crab:  Service initialised ({'endpoint': 'http://cmsdoc.cern.ch/cms/LCG/crab/config/', 'maxcachereuse': 24.0, 'cachepath': '/afs/cern.ch/user/f/fanzago/.cms_crab', 'basepath': '/cms/LCG/crab/config/', 'method': None, 'timeout': 20, 'requests': {'endpoint': 'http://cmsdoc.cern.ch/cms/LCG/crab/config/', 'timeout': 20, 'cachepath': '/afs/cern.ch/user/f/fanzago/.cms_crab', 'cacheduration': 0.5, 'host': 'cmsdoc.cern.ch', 'accept_type': 'text/html', 'content_type': 'application/x-www-form-urlencoded', 'logger': <logging.Logger object at 0xf928f10>, 'type': 'txt/csv', 'conn': <httplib.HTTPConnection instance at 0xf9334d0>}, 'logger': <logging.Logger object at 0xf928f10>, 'cacheduration': 0.5, 'type': 'txt/csv', 'inputdata': {}}):
        host: cmsdoc.cern.ch, basepath: /cms/LCG/crab/config/ (text/html)
        cache: /afs/cern.ch/user/f/fanzago/.cms_crab (duration 0.5 hours, max reuse 24.0 hours)
crab:  Service initialised ({'endpoint': 'https://cmsweb.cern.ch/sitedb/json/index/', 'maxcachereuse': 24.0, 'cachepath': '/afs/cern.ch/user/f/fanzago/.cms_sitedbcache', 'basepath': '/sitedb/json/index/', 'method': None, 'timeout': 30, 'requests': {'host': 'cmsweb.cern.ch', 'endpoint': 'https://cmsweb.cern.ch/sitedb/json/index/', 'accept_type': 'text/html', 'content_type': 'application/x-www-form-urlencoded', 'logger': <logging.Logger object at 0xf928f10>, 'cachepath': '/afs/cern.ch/user/f/fanzago/.cms_sitedbcache', 'conn': <httplib.HTTPSConnection instance at 0xfa903b0>}, 'logger': <logging.Logger object at 0xf928f10>, 'cacheduration': 0.5, 'inputdata': {}}):
        host: cmsweb.cern.ch, basepath: /sitedb/json/index/ (text/html)
        cache: /afs/cern.ch/user/f/fanzago/.cms_sitedbcache (duration 0.5 hours, max reuse 24.0 hours)
crab:  Service initialised ({'endpoint': 'https://cmsweb.cern.ch/sitedb/json/index/', 'maxcachereuse': 24.0, 'cachepath': '/afs/cern.ch/user/f/fanzago/.cms_sitedbcache', 'basepath': '/sitedb/json/index/', 'method': None, 'timeout': 30, 'requests': {'host': 'cmsweb.cern.ch', 'endpoint': 'https://cmsweb.cern.ch/sitedb/json/index/', 'accept_type': 'text/html', 'content_type': 'application/x-www-form-urlencoded', 'logger': <logging.Logger object at 0xf928f10>, 'cachepath': '/afs/cern.ch/user/f/fanzago/.cms_sitedbcache', 'conn': <httplib.HTTPSConnection instance at 0xfa90440>}, 'logger': <logging.Logger object at 0xf928f10>, 'cacheduration': 0.5, 'inputdata': {}}):
        host: cmsweb.cern.ch, basepath: /sitedb/json/index/ (text/html)
        cache: /afs/cern.ch/user/f/fanzago/.cms_sitedbcache (duration 0.5 hours, max reuse 24.0 hours)
crab:  Input whitelist:
crab:  Input blacklist:
crab:  Converted whitelist:
crab:  Converted blacklist:
crab:  Downloading file [http://cmsdoc.cern.ch/cms/LCG/crab/config/] to [/afs/cern.ch/user/f/fanzago/.cms_crab/myproxy_server.conf].
crab:  Service initialised ({'endpoint': 'http://cmsdoc.cern.ch/cms/LCG/crab/config/', 'maxcachereuse': 24.0, 'cachepath': '/afs/cern.ch/user/f/fanzago/.cms_crab', 'basepath': '/cms/LCG/crab/config/', 'method': None, 'timeout': 20, 'requests': {'endpoint': 'http://cmsdoc.cern.ch/cms/LCG/crab/config/', 'timeout': 20, 'cachepath': '/afs/cern.ch/user/f/fanzago/.cms_crab', 'cacheduration': 0.5, 'host': 'cmsdoc.cern.ch', 'accept_type': 'text/html', 'content_type': 'application/x-www-form-urlencoded', 'logger': <logging.Logger object at 0xf928f10>, 'type': 'txt/csv', 'conn': <httplib.HTTPConnection instance at 0xfa904d0>}, 'logger': <logging.Logger object at 0xf928f10>, 'cacheduration': 0.5, 'type': 'txt/csv', 'inputdata': {}}):
        host: cmsdoc.cern.ch, basepath: /cms/LCG/crab/config/ (text/html)
        cache: /afs/cern.ch/user/f/fanzago/.cms_crab (duration 0.5 hours, max reuse 24.0 hours)
crab:  Downloading file [http://cmsdoc.cern.ch/cms/LCG/crab/config/] to [/afs/cern.ch/user/f/fanzago/.cms_crab/site_black_list.conf].
crab:  Service initialised ({'endpoint': 'http://cmsdoc.cern.ch/cms/LCG/crab/config/', 'maxcachereuse': 24.0, 'cachepath': '/afs/cern.ch/user/f/fanzago/.cms_crab', 'basepath': '/cms/LCG/crab/config/', 'method': None, 'timeout': 20, 'requests': {'endpoint': 'http://cmsdoc.cern.ch/cms/LCG/crab/config/', 'timeout': 20, 'cachepath': '/afs/cern.ch/user/f/fanzago/.cms_crab', 'cacheduration': 0.5, 'host': 'cmsdoc.cern.ch', 'accept_type': 'text/html', 'content_type': 'application/x-www-form-urlencoded', 'logger': <logging.Logger object at 0xf928f10>, 'type': 'txt/csv', 'conn': <httplib.HTTPConnection instance at 0xfa904d0>}, 'logger': <logging.Logger object at 0xf928f10>, 'cacheduration': 0.5, 'type': 'txt/csv', 'inputdata': {}}):
        host: cmsdoc.cern.ch, basepath: /cms/LCG/crab/config/ (text/html)
        cache: /afs/cern.ch/user/f/fanzago/.cms_crab (duration 0.5 hours, max reuse 24.0 hours)
crab:  Enforced black list: <Downloader.Downloader instance at 0xfa90440>
crab:  --->>> Check data publication: dataset /RelValProdTTbar/fanzago-FedeTutGrid-c8295e0370df515614ca6812ce2cfe77/USER in DBS url https://cmsdbsprod.cern.ch:8443/cms_dbs_ph_analysis_02_writer/servlet/DBSServlet

PrimaryDataset =  RelValProdTTbar
ProcessedDataset =  fanzago-FedeTutGrid-c8295e0370df515614ca6812ce2cfe77
DataTier =  USER
datasets matching your requirements=  [{'RunsList': [], 'Name': 'fanzago-FedeTutGrid-c8295e0370df515614ca6812ce2cfe77', 'PathList': ['/RelValProdTTbar/fanzago-FedeTutGrid-c8295e0370df515614ca6812ce2cfe77/USER'], 'LastModifiedBy': '/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=fanzago/CN=610896/CN=Federica Fanzago', 'AlgoList': [{'ExecutableName': 'cmsRun', 'ApplicationVersion': 'CMSSW_5_3_8', 'ParameterSetID': {'Hash': 'c8295e0370df515614ca6812ce2cfe77'}, 'ApplicationFamily': 'cmsRun'}], 'XtCrossSection': 0.0, 'Status': 'VALID', 'ParentList': [], 'AcquisitionEra': '', 'PhysicsGroup': 'NoGroup', 'Description': '', 'GlobalTag': '', 'PrimaryDataset': {'Name': 'RelValProdTTbar'}, 'TierList': ['USER'], 'CreatedBy': '/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=fanzago/CN=610896/CN=Federica Fanzago', 'PhysicsGroupConverner': 'NO_CONVENOR', 'CreationDate': '1362481519', 'LastModificationDate': '1362481520'}]
=== dataset /RelValProdTTbar/fanzago-FedeTutGrid-c8295e0370df515614ca6812ce2cfe77/USER
=== dataset description =  
===== File block name: /RelValProdTTbar/fanzago-FedeTutGrid-c8295e0370df515614ca6812ce2cfe77/USER#c5c0a5bc-aa35-4dcb-ade4-52211e5e8332
     File block located at:  ['t2-srm-02.lnl.infn.it']
     File block status: 0
     Number of files: 5
     Number of Bytes: 3279142
     Number of Events: 50
--------- info about files --------
Size    Events          LFN     FileStatus
666747 10 /store/user/fanzago/RelValProdTTbar/FedeTutGrid/c8295e0370df515614ca6812ce2cfe77/outfile_1_1_haS.root
635831 10 /store/user/fanzago/RelValProdTTbar/FedeTutGrid/c8295e0370df515614ca6812ce2cfe77/outfile_2_1_Nw2.root
648594 10 /store/user/fanzago/RelValProdTTbar/FedeTutGrid/c8295e0370df515614ca6812ce2cfe77/outfile_4_2_VKk.root
682364 10 /store/user/fanzago/RelValProdTTbar/FedeTutGrid/c8295e0370df515614ca6812ce2cfe77/outfile_5_1_bi0.root
645606 10 /store/user/fanzago/RelValProdTTbar/FedeTutGrid/c8295e0370df515614ca6812ce2cfe77/outfile_3_1_rWE.root

total events: 50 in dataset: /RelValProdTTbar/fanzago-FedeTutGrid-c8295e0370df515614ca6812ce2cfe77/USER

Log file is /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130304_120142/log/crab.log

If you want to analyze your published data you have to modify your crab.cfg specifying the datasetpath name of your dataset and the dbs_url where data are published

[CMSSW]
....
datasetpath=your_dataset_path
dbs_url=url_local_dbs
If you found that data of your interest is in the DBS instance "https://cmsdbsprod.cern.ch:8443/cms_dbs_ph_analysis_02_writer/servlet/DBSServlet" you can specify
https://cmsdbsprod.cern.ch:8443/cms_dbs_ph_analysis_02_writer/servlet/DBSServlet

The creation output will be something similar to:

$ crab -create
crab:  Version 2.8.5 running on Tue Mar  5 12:19:06 2013 CET (11:19:06 UTC)

crab. Working options:
       scheduler           remoteGlidein
       job type            CMSSW
       server              OFF
       working directory   /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130305_121906/

verify if user DN is mapped in CERN's SSO
OK. user ready for SiteDB switchover on March 12, 2013
crab:  Contacting Data Discovery Services ...
crab:  Accessing DBS at: https://cmsdbsprod.cern.ch:8443/cms_dbs_ph_analysis_02_writer/servlet/DBSServlet
crab:  Requested dataset: /RelValProdTTbar/fanzago-FedeTutGrid-c8295e0370df515614ca6812ce2cfe77/USER has 50 events in 1 blocks.

crab:  May not create the exact number_of_jobs requested.
crab:  5 job(s) can run on 50 events.

crab:  List of jobs and available destination sites:

Block     1: jobs                  1-5: sites: T2_IT_Legnaro

crab:  Creating 5 jobs, please wait...
crab:  Total of 5 jobs created.

Run CRAB on real data copying the output to an SE

Running CRAB on real data has no major difference with running CRAB on MonteCarlo data. The main difference is related on the configuration preparation for the CRAB workflow, as showed in the next section.

CRAB configuration file for real data with lumi mask

You can find more details on this at the corresponding link on the Crab FAQ page.

The CRAB configuration file (default name crab.cfg) should be located at the same location as the CMSSW parameter-set to be used by CRAB. The dataset used is: /SingleMu/Run2012B-13Jul2012-v1/AOD

For real data (crab_lumi.cfg)

[CMSSW]
lumis_per_job           = 50
number_of_jobs          = 10 
pset                    = tutorial.py
datasetpath             = /SingleMu/Run2012B-13Jul2012-v1/AOD
lumi_mask             = Cert_190456-195947_8TeV_PromptReco_Collisions12_JSON_v2.txt
output_file            = outfile.root

[USER]
return_data              = 0
copy_data                = 1
publish_data             = 1
storage_element          = T2_IT_Legnaro
publish_data_name        = FedeTutGridGlide_data
dbs_url_for_publication  = https://cmsdbsprod.cern.ch:8443/cms_dbs_ph_analysis_02_writer/servlet/DBSServlet

[CRAB]
scheduler               = remoteGlidein 
jobtype                 = cmssw

where the lumi_mask file can be downloaded with

wget --no-check-certificate https://cms-service-dqm.web.cern.ch/cms-service-dqm/CAF/certification/Collisions12/8TeV/Prompt/Cert_190456-195947_8TeV_PromptReco_Collisions12_JSON_v2.txt

For the tutorial we are using a subset of run and lumi (using a lumiMask.json file). The lumi_mask file (Cert_190456-195947_8TeV_PromptReco_Collisions12_JSON_v2.txt) contains:

{"190645": [[10, 110]], "190704": [[1, 3]], "190705": [[1, 5], [7, 76], [78, 336], [338, 350], [353, 384]],
...
"195937": [[1, 28], [31, 186], [188, 400]], "195947": [[23, 62], [64, 88]]}

Job Creation

Creating jobs for real data is analogous to montecarlo data. To not overwrite previous run for this tutorial, it is suggested to use a dedicated cfg:

crab -create -cfg crab_lumi.cfg  
that takes as configuration file the file name specified with the option -cfg, in this case the crab_lumi.cfg associated for this tutorial with real data.

$ crab -create -cfg crab_lumi.cfg
crab:  Version 2.8.5 running on Tue Mar  5 14:47:56 2013 CET (13:47:56 UTC)

crab. Working options:
       scheduler           remoteGlidein
       job type            CMSSW
       server              OFF
       working directory   /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130305_144756/

verify if user DN is mapped in CERN's SSO
OK. user ready for SiteDB switchover on March 12, 2013

crab:  Contacting Data Discovery Services ...
crab:  Accessing DBS at: http://cmsdbsprod.cern.ch/cms_dbs_prod_global/servlet/DBSServlet
crab:  Requested (A)DS /SingleMu/Run2012B-TOPMuPlusJets-PromptSkim-v1/AOD has 13 block(s).
crab:  Requested number of lumis reached.
crab:  8 jobs created to run on 500 lumis
crab:  Checking remote location
crab:  Creating 8 jobs, please wait...
crab:  Total of 8 jobs created.

Log file is /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130305_144756/log/crab.log

  • The project directory called crab_0_130305_144756 is created.
  • As explained the number of created jobs can not match the number of jobs required in the configuration file (9 created but 10 required jobs).

Job Submission

Job submission is always analogous:

$ crab -submit
crab:  Version 2.8.5 running on Tue Mar  5 14:54:39 2013 CET (13:54:39 UTC)

crab. Working options:
       scheduler           remoteGlidein
       job type            CMSSW
       server              OFF
       working directory   /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130305_144756/

crab:  Checking available resources...
crab:  Found  compatible site(s) for job 1
crab:  1 blocks of jobs will be submitted
crab:  remotehost from Avail.List = submit-2.t2.ucsd.edu
crab:  contacting remote host submit-2.t2.ucsd.edu
crab:  Establishing gsissh ControlPath. Wait 2 sec ...
crab:  Establishing gsissh ControlPath. Wait 2 sec ...
crab:  COPY FILES TO REMOTE HOST
crab:  SUBMIT TO REMOTE GLIDEIN FRONTEND
                                                                                         Submitting 8 jobs                                                                                          
100% [=================================================================================================================]
                                                                                                   please wait  
crab:  Total of 8 jobs submitted.
Log file is /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130305_144756/log/crab.log

Job Status Check

Check the status of the jobs in the latest CRAB project with the following command:
crab -status 
to check a specific project:
crab -status -c  <dir name>

which should produce a similar screen output like:

$ crab -status
crab:  Version 2.8.5 running on Tue Mar  5 14:59:36 2013 CET (13:59:36 UTC)

crab. Working options:
       scheduler           remoteGlidein
       job type            CMSSW
       server              OFF
       working directory   /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130305_144756/

crab:  Checking the status of all jobs: please wait
crab:  contacting remote host submit-2.t2.ucsd.edu
crab:  
ID    END STATUS            ACTION       ExeExitCode JobExitCode E_HOST
----- --- ----------------- ------------  ---------- ----------- ---------
1     N   Running           SubSuccess                           cream02.iihe.ac.be
2     N   Running           SubSuccess                           cream02.iihe.ac.be
3     N   Running           SubSuccess                           cream02.iihe.ac.be
4     N   Running           SubSuccess                           cream02.iihe.ac.be
5     N   Submitted         SubSuccess                           
6     N   Running           SubSuccess                           cream02.iihe.ac.be
7     N   Running           SubSuccess                           cream02.iihe.ac.be
8     N   Running           SubSuccess                           red-gw2.unl.edu

crab:   8 Total Jobs
>>>>>>>>> 1 Jobs Submitted
       List of jobs Submitted: 5
>>>>>>>>> 7 Jobs Running
       List of jobs Running: 1-4,6-8

crab:  You can also follow the status of this task on :
       CMS Dashboard: http://dashb-cms-job-task.cern.ch/taskmon.html#task=fanzago_crab_0_130305_144756_db2r51
       Your task name is: fanzago_crab_0_130305_144756_db2r51

Job Output Retrieval

For the jobs which are in the "Done" status it is possible to retrieve the log files of the jobs (just the log files, because the output files are copied to the Storage Element associated to the T2 specified on the crab.cfg and infact return_data is 0). The following command retrieves the log files of all "Done" jobs of the last created CRAB project:
crab -getoutput 
to get the output of a specific project:
crab -getoutput -c  <dir name>

the job results will be copied in the res subdirectory of your crab project:

$ crab -get
crab:  Version 2.8.5 running on Tue Mar  5 15:15:32 2013 CET (14:15:32 UTC)

crab. Working options:
       scheduler           remoteGlidein
       job type            CMSSW
       server              OFF
       working directory   /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130305_144756/

crab:  contacting remote host submit-2.t2.ucsd.edu
crab:  RETRIEVE FILE out_files_1.tgz for job #1
crab:  RETRIEVE FILE crab_fjr_1.xml for job #1
crab:  Results of Jobs # 1 are in /afs/cern.ch/user/f/fanzago/scratch0/TEST_RELEASE/TEST_PATC2/TEST_2_8_2/crab_0_130305_144756/res/
...
Log file is /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130305_144756/log/crab.log

Use the -report option

As for the MonteCarlo data example, it is possible to run the report command:

crab -report -c <dir name>

$ crab -report
crab:  Version 2.8.5 running on Tue Mar  5 15:18:00 2013 CET (14:18:00 UTC)

crab. Working options:
       scheduler           remoteGlidein
       job type            CMSSW
       server              OFF
       working directory   /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130305_144756/

crab:  --------------------
Dataset: /SingleMu/Run2012B-TOPMuPlusJets-PromptSkim-v1/AOD
Remote output :
SE: T2_IT_Legnaro t2-srm-02.lnl.infn.it  srmPath: srm://t2-srm-02.lnl.infn.it:8443/srm/managerv2?SFN=/pnfs/lnl.infn.it/data/cms/store/user/fanzago/SingleMu/FedeTutGridGlide_data/${PSETHASH}/
Total Events read: 39942
Total Files read: 29
Total Jobs : 8
Luminosity section summary file: /afs/cern.ch/user/f/fanzago/scratch0/TEST_RELEASE/TEST_PATC2/TEST_2_8_2/crab_0_130305_144756/res/lumiSummary.json
  # Jobs: Retrieved:8

----------------------------

crab:  Summary file of input run and lumi to be analize with this task: /afs/cern.ch/user/f/fanzago/scratch0/TEST_RELEASE/TEST_PATC2/TEST_2_8_2/crab_0_130305_144756/res//inputLumiSummaryOfTask.json
crab:  to complete your analysis, you have to analyze the run and lumi reported in the /afs/cern.ch/user/f/fanzago/scratch0/TEST_RELEASE/TEST_PATC2/TEST_2_8_2/crab_0_130305_144756/res//missingLumiSummary.json file

Log file is /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130305_144756/log/crab.log

And the content of files containing the luminosity info about the task are: the original lumiMask.json file used in the creation of your task

$ cat Cert_190456-195947_8TeV_PromptReco_Collisions12_JSON_v2.txt 
{"190645": [[10, 110]], "190704": [[1, 3]], "190705": [[1, 5], [7, 76], [78, 336], [338, 350], [353, 384]], "190738": [[1, 130], [133, 226], [229, 355]], ...
the lumi sections that your created jobs have to analyze
$ cat crab_0_130609_231016/res/inputLumiSummaryOfTask.json 
{"195947": [[27, 27], [36, 36]], "194108": [[95, 96], [117, 121], [123, 126], [149, 152], [154, 157], [160, 161], [166, 169], [172, 174], [176, 176], [185, 185], [187, 187], [190, 191], [196, 197], [200, 201], [206, 209], [211, 212], [216, 221], [231, 232], [234, 235], [238, 243], [249, 250], [277, 278], [285, 286], [305, 308], [311, 312], [333, 334], [438, 439], [520, 520], [527, 527]], ...
the lumi sections really analyzed by your correctly terminated jobs
$ cat /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130305_144756/res/lumiSummary.json
cat crab_0_130609_231016/res/lumiSummary.json 
{"194424": [[63, 63], [92, 92], [121, 121], [123, 123], [168, 173], [176, 177], [184, 185], [187, 187], [199, 200], [202, 203], [207, 207], [213, 213], [220, 221], [256, 256], [557, 557], [559, 559], [562, 562], [564, 564], [599, 599], [602, 602], [607, 607], [609, 609], [639, 639], [648, 649], [656, 656], [658, 658], [660, 660]], "194108": [[95, 96], [117, 121], [123, 126], [149, 152], [154, 157], [160, 161], [166, 169], [172, 174], [176, 176], [185, 185], [187, 187], [190, 191], [196, 197], [200, 201], [206, 209], [211, 212], [216, 221], [231, 232], [234, 235], [238, 243], [249, 250], [277, 278], [285, 286], [305, 308], [311, 312], [333, 334], [438, 439], [520, 520], [527, 527]], ...
and the missing lumi (difference between the lumiMask and lumiSummary) that you can analyze creating a new task and using this file as new lumiMask file
$ cat /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130305_144756/res//missingLumiSummary.json file
cat crab_0_130609_231016/res/missingLumiSummary.json file
{"190645": [[10, 110]],
 "190704": [[1, 3]],
 "190705": [[1, 5], [7, 76], [78, 336], [338, 350], [353, 384]],
 "190738": [[1, 130], [133, 226], [229, 355]],
...

To create a task to analyze the missing lumis you can use the missingLumiSummary.json file as lumiMask.json file in your crab.cfg

[CMSSW]
total_number_of_lumis = -1
number_of_jobs          = 10 
pset                    = tutorial.py
datasetpath             = /SingleMu/Run2012B-13Jul2012-v1/AOD
lumi_mask             =  crab_0_130305_144756/res/missingLumiSummary.json  
output_file            = outfile.root

[USER]
return_data              = 0
copy_data                = 1
publish_data =1
storage_element          = T2_IT_Legnaro
publish_data_name        = FedeTutGridGlide_data
dbs_url_for_publication  = https://cmsdbsprod.cern.ch:8443/cms_dbs_ph_analysis_02_writer/servlet/DBSServlet

[CRAB]
scheduler               = remoteGlidein 
jobtype                 = cmssw

$ crab -create -cfg crab_missing.cfg
crab:  Version 2.8.5 running on Tue Mar  5 15:22:50 2013 CET (14:22:50 UTC)

crab. Working options:
       scheduler           remoteGlidein
       job type            CMSSW
       server              OFF
       working directory   /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130305_152250/

verify if user DN is mapped in CERN's SSO
OK. user ready for SiteDB switchover on March 12, 2013


crab:  Contacting Data Discovery Services ...
crab:  Accessing DBS at: http://cmsdbsprod.cern.ch/cms_dbs_prod_global/servlet/DBSServlet
crab:  Requested (A)DS /SingleMu/Run2012B-TOPMuPlusJets-PromptSkim-v1/AOD has 13 block(s).
crab:  Each job will process about 192 lumis.
crab:  9 jobs created to run on 1918 lumis
crab:  Checking remote location
crab:  WARNING: The stageout directory already exists. Be careful not to accidentally mix outputs from different tasks
crab:  Creating 9 jobs, please wait...
crab:  Total of 9 jobs created.
and submit them as usual. The created jobs will analyze all the missing lumi of the original lumiMask.json file

Run Crab retrieving your output (without copying to a Storage Element)

You can also run your analysis code without interacting with a remote Storage Element, but retrieving the outputs to your workspace area (under the res dir of the project). Here below an example of the CRAB configuration file, coerent with this tutorial:

[CMSSW]
total_number_of_events  = 100
number_of_jobs          = 10
pset                    = tutorial.py
datasetpath             =  /RelValProdTTbar/JobRobot-MC_3XY_V24_JobRobot-v1/GEN-SIM-DIGI-RECO
output_file              = outfile.root

[USER]
return_data             = 1

[CRAB]
scheduler               = remoteGlidein
jobtype                 = cmssw

And with this crab.cfg in place you can re-do de workflow as described before (a part of the publication step):

  • creation
  • submission
  • status progress monitoring
  • output retrieval (in this step you'll be able to retrieve directly the real output produced by your pset file)

Where to find more on CRAB

Note also that all CMS members using the Grid must subscribe to the Grid Annoucements CMS.HyperNews forum.

Review status

Reviewer/Editor and Date (copy from screen) Comments
JohnStupak - 4-June-2013 Review, minor revisions, updated real data dataset to an existing dataset
NitishDhingra - 2012-04-07 See detailed comments below.
MattiaCinquilli - 2010-04-15 Update for tutorial
FedericaFanzago - 18 Feb 2009 Update for tutorial
AndriusJuodagalvis - 2009-08-21 Added an instance of url_local_dbs

Complete Review, Minor Changes. Page gives a good idea of doing a physics analysis using CRAB

Responsible: FedericaFanzago

Topic attachments
I Attachment History Action Size Date WhoSorted ascending Comment
Cascading Style Sheet filecss tutorial.css r1 manage 0.3 K 2010-04-14 - 10:19 MattiaCinquilli  

This topic: CMSPublic > DefaultWeb > WebHome > SWGuide > WorkBook > WorkBookCRAB2Tutorial
Topic revision: r110 - 2013-06-11 - JohnStupak
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback