. If dest_se is used, CRAB finds the correct path where the output can be stored. The command to execute in order to retrieve locally the remote output files to your local user interface is:
crab -copyData
## or crab -copyData -c <dir name>
An example of execution:
$ crab -copyData
crab: Version 2.9.1 running on Fri Oct 11 17:08:38 2013 CET (15:08:38 UTC)
crab. Working options:
scheduler remoteGlidein
job type CMSSW
server OFF
working directory /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131011_153317/
crab: error detecting glite version
crab: error detecting glite version
crab: Copy file locally.
Output dir: /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131011_153317/res/
crab: Starting copy...
directory/afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131011_153317/res/already exists
crab: Copy success for file: outfile_4_1_Jlr.root
crab: Copy success for file: outfile_3_1_MsR.root
crab: Copy success for file: outfile_1_1_HF3.root
crab: Copy success for file: outfile_2_1_cVA.root
crab: Copy success for file: outfile_5_1_gAw.root
Log file is /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131011_153317/log/crab.log
Publish your result in DBS
The publication of the produced data to DBS allows to re-run over the produced data that has been published. The instructions to follow are below, and here is the link to the how to.
You have to add to the Crab configuration file more information specifying that you (will) want to publish and the data name to publish.
[USER]
....
publish_data = 1
publish_data_name = what_you_want
....
Warning:
- All the parameters related publication have to be added in the configuration file before creation of jobs, even if the publication step is executed after retrieving of job output.
- Publication is done in the phys03 instance of DBS3. If you belong to a PAG group, you have to publish your data to the DBS associated to your group, checking at the DBS access twiki page the correct DBS url and which role in voms you need to be an allowed user.
- Remember to change the ui_working_dir value in the configuration file to create a new project (if you don't use the default name of crab project), otherwise the creation step will fail with the error message "project already exists, please remove it before create new task ".
Run Crab publishing your results
You can also run your analysis code publishing the results copied to a remote Storage Element.
Here below an example of the CRAB configuration file, coherent with this tutorial:
For MC data (crab.cfg)
[CMSSW]
total_number_of_events = 50
number_of_jobs = 10
pset = tutorial.py
datasetpath = /RelValZMM/CMSSW_5_3_6-START53_V14-v2/GEN-SIM-RECO
output_file = outfile.root
[USER]
return_data = 0
copy_data = 1
storage_element = T2_xx_yyyy
publish_data = 1
publish_data_name = FanzagoTutGrid
[CRAB]
scheduler = remoteGlidein
jobtype = cmssw
And with this crab.cfg
you can re-do the complete workflow as described before, plus the publication step:
- creation
- submission
- status progress monitoring
- output retrieval
- publish the results
Use the -publish option
After having done the previous workflow untill the retrieval of you jobs, you can publish the output data that have been stored in the Storage Element indicated in the crab.cfg
file using:
crab -publish
or to publish the outputs of a specific project:
crab -publish -c <dir_name>
It is not necessary that all the jobs are done and retrieved. You can publish your output at a different time.
It will look for all the FrameworkJobReport files ( crab-project-dir/res/crab_fjr_*.xml ) produced by each job and will extract from there the information (i.e. number of events, LFN, etc.) to publish.
Publication output example
The output shown below corresponds to an old output using DBS2.
$ crab -publish
crab: Version 2.9.1 running on Mon Oct 14 14:35:56 2013 CET (12:35:56 UTC)
crab. Working options:
scheduler remoteGlidein
job type CMSSW
server OFF
working directory /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_123645/
crab: <dbs_url_for_publication> = https://cmsdbsprod.cern.ch:8443/cms_dbs_ph_analysis_02_writer/servlet/DBSServlet
file_list = ['/afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_123645/res//crab_fjr_1.xml', '/afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_123645/res//crab_fjr_2.xml', '/afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_123645/res//crab_fjr_3.xml', '/afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_123645/res//crab_fjr_4.xml', '/afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_123645/res//crab_fjr_5.xml', '/afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_123645/res//crab_fjr_6.xml', '/afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_123645/res//crab_fjr_7.xml', '/afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_123645/res//crab_fjr_8.xml', '/afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_123645/res//crab_fjr_9.xml', '/afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_123645/res//crab_fjr_10.xml']
crab: --->>> Start dataset publication
crab: --->>> Importing parent dataset in the dbs: /RelValZMM/CMSSW_5_3_6-START53_V14-v2/GEN-SIM-RECO
crab: --->>> Importing all parents level
-----------------------------------------------------------------------------------
Transferring path /RelValZMM/CMSSW_5_2_1-START52_V4-v1/GEN-SIM
block /RelValZMM/CMSSW_5_2_1-START52_V4-v1/GEN-SIM#24e1effb-0f0c-4557-bb46-3d5ecae691b8
-----------------------------------------------------------------------------------
-----------------------------------------------------------------------------------
Transferring path /RelValZMM/CMSSW_5_3_6-START53_V14-v2/GEN-SIM-DIGI-RAW-HLTDEBUG
block /RelValZMM/CMSSW_5_3_6-START53_V14-v2/GEN-SIM-DIGI-RAW-HLTDEBUG#13e93136-29ed-11e2-9c63-00221959e7c0
-----------------------------------------------------------------------------------
-----------------------------------------------------------------------------------
Transferring path /RelValZMM/CMSSW_5_3_6-START53_V14-v2/GEN-SIM-RECO
block /RelValZMM/CMSSW_5_3_6-START53_V14-v2/GEN-SIM-RECO#43683124-29f6-11e2-9c63-00221959e7c0
-----------------------------------------------------------------------------------
crab: --->>> duration of all parents import (sec): 552.62570405
crab: Import ok of dataset /RelValZMM/CMSSW_5_3_6-START53_V14-v2/GEN-SIM-RECO
crab: PrimaryDataset = RelValZMM
crab: ProcessedDataset = fanzago-FanzagoTutGrid-f30a6bb13f516198b2814e83414acca1
crab: <User Dataset Name> = /RelValZMM/fanzago-FanzagoTutGrid-f30a6bb13f516198b2814e83414acca1/USER
crab: --->>> End dataset publication
crab: --->>> Start files publication
crab: --->>> End files publication
crab: --->>> Check data publication: dataset /RelValZMM/fanzago-FanzagoTutGrid-f30a6bb13f516198b2814e83414acca1/USER in DBS url https://cmsdbsprod.cern.ch:8443/cms_dbs_ph_analysis_02_writer/servlet/DBSServlet
=== dataset /RelValZMM/fanzago-FanzagoTutGrid-f30a6bb13f516198b2814e83414acca1/USER
=== dataset description =
===== File block name: /RelValZMM/fanzago-FanzagoTutGrid-f30a6bb13f516198b2814e83414acca1/USER#787d164e-b485-4a23-b334-a8abde3fe146
File block located at: ['t2-srm-02.lnl.infn.it']
File block status: 0
Number of files: 10
Number of Bytes: 33667525
Number of Events: 50
total events: 50 in dataset: /RelValZMM/fanzago-FanzagoTutGrid-f30a6bb13f516198b2814e83414acca1/USER
Log file is /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_123645/log/crab.log
Warning: Some versions of CMSSW switch off the debug mode of crab, so a lot of duplicated info can be reported at screen level.
Analyze your published data
First note that:
- CRAB by default publishes all files finished correctly, including files with 0 events
- CRAB by default imports all dataset parents of your dataset
You have to modify your crab.cfg
file specifying the datasetpath name of your dataset and the dbs_url where data are published (we will assume phys03 instance of DBS3):
[CMSSW]
....
datasetpath = your_dataset_path
dbs_url = phys03
The creation output will be something similar to:
$ crab -create
crab: Version 2.9.1 running on Mon Oct 14 15:49:31 2013 CET (13:49:31 UTC)
crab. Working options:
scheduler remoteGlidein
job type CMSSW
server OFF
working directory /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_154931/
crab: error detecting glite version
crab: error detecting glite version
crab: Contacting Data Discovery Services ...
crab: Accessing DBS at: https://cmsweb.cern.ch/dbs/prod/phys03/DBSReader
crab: Requested dataset: /RelValZMM/fanzago-FanzagoTutGrid-f30a6bb13f516198b2814e83414acca1/USER has 50 events in 1 blocks.
crab: SE black list applied to data location: ['srm-cms.cern.ch', 'srm-cms.gridpp.rl.ac.uk', 'T1_DE', 'T1_ES', 'T1_FR', 'T1_IT', 'T1_RU', 'T1_TW', 'cmsdca2.fnal.gov', 'T3_US_Vanderbilt_EC2']
crab: May not create the exact number_of_jobs requested.
crab: 10 job(s) can run on 50 events.
crab: List of jobs and available destination sites:
Block 1: jobs 1-10: sites: T2_IT_Legnaro
crab: Checking remote location
crab: WARNING: The stageout directory already exists. Be careful not to accidentally mix outputs from different tasks
crab: Creating 10 jobs, please wait...
crab: Total of 10 jobs created.
Log file is /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_154931/log/crab.log
The jobs will run in the site where your USER data have been stored.
CRAB configuration file for real data with lumi mask
You can find more details on this at the corresponding link on the Crab FAQ page.
The CRAB configuration file (default name crab.cfg) should be located at the same location as the CMSSW parameter-set to be used by CRAB.
The dataset used is: /SingleMu/Run2012B-13Jul2012-v1/AOD
For real data (crab_lumi.cfg)
[CMSSW]
lumis_per_job = 50
number_of_jobs = 10
pset = tutorial.py
datasetpath = /SingleMu/Run2012B-13Jul2012-v1/AOD
lumi_mask = Cert_190456-208686_8TeV_PromptReco_Collisions12_JSON.txt
output_file = outfile.root
[USER]
return_data = 0
copy_data = 1
publish_data = 1
publish_data_name = FanzagoTutGrid_data
[CRAB]
scheduler = remoteGlidein
jobtype = cmssw
where the lumi_mask file can be downloaded with
wget --no-check-certificate https://cms-service-dqm.web.cern.ch/cms-service-dqm/CAF/certification/Collisions12/8TeV/Prompt/Cert_190456-208686_8TeV_PromptReco_Collisions12_JSON.txt
For the tutorial we are using a subset of run and lumi (using a lumiMask.json file). The lumi_mask file (Cert_190456-208686_8TeV_PromptReco_Collisions12_JSON.txt) contains:
{"190645": [[10, 110]], "190704": [[1, 3]], "190705": [[1, 5], [7, 76], [78, 336], [338, 350], [353, 384]],
...
"208551": [[119, 193], [195, 212], [215, 300], [303, 354], [356, 554], [557, 580]], "208686": [[73, 79], [82, 181], [183, 224], [227, 243], [246, 311], [313, 463]]}
Job Creation
Creating jobs for real data is analogous to montecarlo data. To not overwrite previous run for this tutorial, it is suggested to use a dedicated cfg:
crab -create -cfg crab_lumi.cfg
that takes as configuration file the file name specified with the option -cfg, in this case the crab_lumi.cfg associated for this tutorial with real data.
$ crab -create -cfg crab_lumi.cfg
crab: Version 2.9.1 running on Mon Oct 14 16:05:18 2013 CET (14:05:18 UTC)
crab. Working options:
scheduler remoteGlidein
job type CMSSW
server OFF
working directory /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_160518/
crab: error detecting glite version
crab: error detecting glite version
crab: Contacting Data Discovery Services ...
crab: Accessing DBS at: http://cmsdbsprod.cern.ch/cms_dbs_prod_global/servlet/DBSServlet
crab: Requested (A)DS /SingleMu/Run2012B-13Jul2012-v1/AOD has 14 block(s).
crab: SE black list applied to data location: ['srm-cms.cern.ch', 'srm-cms.gridpp.rl.ac.uk', 'T1_DE', 'T1_ES', 'T1_FR', 'T1_IT', 'T1_RU', 'T1_TW', 'cmsdca2.fnal.gov', 'T3_US_Vanderbilt_EC2']
crab: Requested number of lumis reached.
crab: 9 jobs created to run on 500 lumis
crab: Checking remote location
crab: Creating 9 jobs, please wait...
crab: Total of 9 jobs created.
Log file is /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_160518/log/crab.log
- The project directory called crab_0_131014_160518 is created.
- As explained the number of created jobs can not match the number of jobs required in the configuration file (9 created but 10 required jobs).
Job Submission
Job submission is always analogous:
$ crab -submit
crab: Version 2.9.1 running on Mon Oct 14 16:07:59 2013 CET (14:07:59 UTC)
crab. Working options:
scheduler remoteGlidein
job type CMSSW
server OFF
working directory /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_160518/
crab: error detecting glite version
crab: error detecting glite version
crab: Checking available resources...
crab: Found compatible site(s) for job 1
crab: 1 blocks of jobs will be submitted
crab: remotehost from Avail.List = submit-4.t2.ucsd.edu
crab: contacting remote host submit-4.t2.ucsd.edu
crab: Establishing gsissh ControlPath. Wait 2 sec ...
crab: Establishing gsissh ControlPath. Wait 2 sec ...
crab: COPY FILES TO REMOTE HOST
crab: SUBMIT TO REMOTE GLIDEIN FRONTEND
Submitting 9 jobs
100% [====================================================================================================================================================]
please wait crab: Total of 9 jobs submitted.
Log file is /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_160518/log/crab.log
Job Status Check
Check the status of the jobs in the latest CRAB project with the following command:
crab -status
to check a specific project:
crab -status -c <dir name>
which should produce a similar screen output like:
[fanzago@lxplus0445 SLC6]$ crab -status
crab: Version 2.9.1 running on Mon Oct 14 16:23:52 2013 CET (14:23:52 UTC)
crab. Working options:
scheduler remoteGlidein
job type CMSSW
server OFF
working directory /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_160518/
crab: error detecting glite version
crab: error detecting glite version
crab: Checking the status of all jobs: please wait
crab: contacting remote host submit-4.t2.ucsd.edu
crab:
ID END STATUS ACTION ExeExitCode JobExitCode E_HOST
----- --- ----------------- ------------ ---------- ----------- ---------
1 N Running SubSuccess ce208.cern.ch
2 N Submitted SubSuccess
3 N Running SubSuccess cream03.lcg.cscs.ch
4 N Running SubSuccess t2-ce-01.lnl.infn.it
5 N Running SubSuccess cream01.lcg.cscs.ch
6 N Running SubSuccess cream01.lcg.cscs.ch
7 N Running SubSuccess ingrid.cism.ucl.ac.be
8 N Running SubSuccess ingrid.cism.ucl.ac.be
9 N Running SubSuccess ce203.cern.ch
crab: 9 Total Jobs
>>>>>>>>> 1 Jobs Submitted
List of jobs Submitted: 2
>>>>>>>>> 8 Jobs Running
List of jobs Running: 1,3-9
crab: You can also follow the status of this task on :
CMS Dashboard: http://dashb-cms-job-task.cern.ch/taskmon.html#task=fanzago_crab_0_131014_160518_582igd
Your task name is: fanzago_crab_0_131014_160518_582igd
Log file is /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_160518/log/crab.log
and then ...
$ crab -status
crab: Version 2.9.1 running on Tue Oct 15 10:53:33 2013 CET (08:53:33 UTC)
crab. Working options:
scheduler remoteGlidein
job type CMSSW
server OFF
working directory /afs/cern.ch/user/f/fanzago/scratch0/TEST_RELEASE/TEST_PATC2/TEST_2_8_2/TUTORIAL/TUT_5_3_11/SLC6/crab_0_131014_160518/
crab: error detecting glite version
crab: error detecting glite version
crab: Checking the status of all jobs: please wait
crab: contacting remote host submit-4.t2.ucsd.edu
crab: Establishing gsissh ControlPath. Wait 2 sec ...
crab: Establishing gsissh ControlPath. Wait 2 sec ...
crab:
ID END STATUS ACTION ExeExitCode JobExitCode E_HOST
----- --- ----------------- ------------ ---------- ----------- ---------
1 N Done Terminated 0 0 ce208.cern.ch
2 N Done Terminated 0 60317 cream03.lcg.cscs.ch
3 N Done Terminated 0 60317 cream03.lcg.cscs.ch
4 N Done Terminated 0 0 t2-ce-01.lnl.infn.it
5 N Done Terminated 0 60317 cream01.lcg.cscs.ch
6 N Done Terminated 0 60317 cream01.lcg.cscs.ch
7 N Done Terminated 0 0 ingrid.cism.ucl.ac.be
8 N Done Terminated 0 0 ingrid.cism.ucl.ac.be
9 N Done Terminated 0 0 ce203.cern.ch
crab: ExitCodes Summary
>>>>>>>>> 4 Jobs with Wrapper Exit Code : 60317
List of jobs: 2-3,5-6
See https://twiki.cern.ch/twiki/bin/view/CMS/JobExitCodes for Exit Code meaning
crab: ExitCodes Summary
>>>>>>>>> 5 Jobs with Wrapper Exit Code : 0
List of jobs: 1,4,7-9
See https://twiki.cern.ch/twiki/bin/view/CMS/JobExitCodes for Exit Code meaning
crab: 9 Total Jobs
crab: You can also follow the status of this task on :
CMS Dashboard: http://dashb-cms-job-task.cern.ch/taskmon.html#task=fanzago_crab_0_131014_160518_582igd
Your task name is: fanzago_crab_0_131014_160518_582igd
Log file is /afs/cern.ch/user/f/fanzago/scratch0/TEST_RELEASE/TEST_PATC2/TEST_2_8_2/TUTORIAL/TUT_5_3_11/SLC6/crab_0_131014_160518/log/crab.log
Job Output Retrieval
For the jobs which are in the "Done" status it is possible to retrieve the log files of the jobs (just the log files, because the output files are copied to the Storage Element associated to the T2 specified on the crab.cfg and infact return_data is 0).
The following command retrieves the log files of all "Done" jobs of the last created CRAB project:
crab -getoutput
to get the output of a specific project:
crab -getoutput -c <dir name>
the job results will be copied in the res
subdirectory of your crab project:
$ crab -get
crab: Version 2.9.1 running on Tue Oct 15 10:53:53 2013 CET (08:53:53 UTC)
crab. Working options:
scheduler remoteGlidein
job type CMSSW
server OFF
working directory /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_160518/
crab: error detecting glite version
crab: error detecting glite version
crab: contacting remote host submit-4.t2.ucsd.edu
crab: Preparing to rsync 2 files
crab: Results of Jobs # 1 are in /afs/cern.ch/user/f/fanzago/scratch0/TEST_RELEASE/TEST_PATC2/TEST_2_8_2/TUTORIAL/TUT_5_3_11/SLC6/crab_0_131014_160518/res/
crab: contacting remote host submit-4.t2.ucsd.edu
crab: Preparing to rsync 16 files
crab: Results of Jobs # 2 are in /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_160518/res/
crab: Results of Jobs # 3 are in /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_160518/res/
crab: Results of Jobs # 4 are in /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_160518/res/
crab: Results of Jobs # 5 are in /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_160518/res/
crab: Results of Jobs # 6 are in /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_160518/res/
crab: Results of Jobs # 7 are in /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_160518/res/
crab: Results of Jobs # 8 are in /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_160518/res/
crab: Results of Jobs # 9 are in /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_160518/res/
Log file is /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_160518/log/crab.log
Use the -report option
As for the MonteCarlo data example, it is possible to run the report command:
crab -report -c <dir name>
the report command returns info about correctly finished jobs, that means jobs with JobExitCode = 0 and ExeExitCode = 0
$ crab -report
crab: Version 2.9.1 running on Tue Oct 15 15:55:10 2013 CET (13:55:10 UTC)
crab. Working options:
scheduler remoteGlidein
job type CMSSW
server OFF
working directory /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_160518/
crab: error detecting glite version
crab: error detecting glite version
crab: --------------------
Dataset: /SingleMu/Run2012B-13Jul2012-v1/AOD
Remote output :
SE: T2_IT_Legnaro t2-srm-02.lnl.infn.it srmPath: srm://t2-srm-02.lnl.infn.it:8443/srm/managerv2?SFN=/pnfs/lnl.infn.it/data/cms/store/user/fanzago/SingleMu/FanzagoTutGrid_data/${PSETHASH}/
Total Events read: 264540
Total Files read: 21
Total Jobs : 9
Luminosity section summary file: /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_160518/res/lumiSummary.json
# Jobs: Retrieved:9
----------------------------
crab: Summary file of input run and lumi to be analize with this task: /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_160518/res/inputLumiSummaryOfTask.json
crab: to complete your analysis, you have to analyze the run and lumi reported in the //afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_160518/missingLumiSummary.json file
Log file is /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_160518/log/crab.log
where the content of files containing the luminosity info about the task are:
the original lumiMask.json file written in the crab,.cfg file and used during the creation of your task
$ cat Cert_190456-208686_8TeV_PromptReco_Collisions12_JSON.txt
{"190645": [[10, 110]], "190704": [[1, 3]], "190705": [[1, 5], [7, 65], [81, 336], .... "208686": [[73, 79], [82, 181], [183, 224], [227, 243], [246, 311], [313, 463]]}
the lumi sections that your created jobs have to analyze (that are info used as arguments of your jobs)
$ cat crab_0_131014_160518/res/inputLumiSummaryOfTask.json
{"194305": [[84, 85]], "194108": [[95, 96], [117, 120], [123, 126], [149, 152], [154, 157], [160, 161], [166, 169], [172, 174], [176, 176], [185, 185], [187, 187], [190, 191], [196, 197], [200, 201], [206, 209], [211, 212], [216, 221], [231, 232], [234, 235], [238, 243], [249, 250], [277, 278], [305, 306], [333, 334], [438, 439], [520, 520], [527, 527]], "194120": [[13, 14], [22, 23], [32, 33], [43, 44], [57, 57], [67, 67], [73, 74], [88, 89], [105, 105], [110, 111], [139, 139], [144, 144], [266, 266]], "194224": [[94, 94], [111, 111], [257, 257], [273, 273], [324, 324]], "194896": [[35, 35], [68, 69]], "194424": [[63, 63], [92, 92], [121, 121], [123, 123], [168, 173], [176, 177], [184, 185], [187, 187], [199, 200], [202, 203], [207, 207], [213, 213], [220, 221], [256, 256], [557, 557], [559, 559], [562, 562], [564, 564], [599, 599], [602, 602], [607, 607], [609, 609], [639, 639], [648, 649], [656, 656], [658, 658], [660, 660]], "194631": [[222, 222]], "193998": [[66, 113], [115, 119], [124, 124], [126, 127], [132, 137], [139, 154], [158, 159], [168, 169], [172, 172], [174, 176], [180, 185], [191, 192], [195, 196], [233, 234], [247, 247]], "194027": [[93, 93], [109, 109], [113, 115]], "194778": [[127, 127], [130, 130]], "195947": [[27, 27], [36, 36]], "195099": [[77, 77], [106, 106]], "196200": [[66, 67]], "194711": [[1, 4], [11, 17], [19, 19], [25, 30], [33, 38], [46, 49], [54, 55], [62, 62], [64, 64], [70, 71], [82, 83], [90, 91], [98, 99], [102, 103], [106, 107], [112, 115], [123, 124], [129, 130], [140, 140], [142, 142], [614, 617]], "195552": [[256, 256], [263, 263]], "195013": [[133, 133], [144, 144]], "195868": [[16, 16], [20, 20]], "194912": [[130, 131]], "194699": [[38, 39], [253, 253], [256, 256]], "194050": [[353, 354], [1881, 1881]], "194075": [[82, 82], [101, 101], [103, 103]], "194076": [[3, 6], [9, 9], [16, 17], [20, 21], [29, 30], [33, 34], [46, 47], [58, 59], [84, 87], [93, 94], [100, 101], [106, 107], [130, 131], [143, 143], [154, 155], [228, 228], [239, 240], [246, 246], [268, 269], [284, 285], [376, 377], [396, 397], [490, 491], [718, 719]], "195970": [[77, 77], [79, 79]], "195919": [[5, 6]], "194644": [[8, 9], [19, 20], [34, 35], [58, 59], [78, 79], [100, 100], [106, 106], [128, 129]], "196250": [[73, 74]], "195164": [[62, 62], [64, 64]], "194199": [[114, 115], [124, 125], [148, 148], [156, 157], [159, 159], [207, 208], [395, 395], [401, 402]], "194480": [[621, 622], [630, 631], [663, 664], [715, 716], [996, 997], [1000, 1001], [1010, 1011], [1020, 1021], [1186, 1187], [1190, 1193]], "196531": [[284, 284], [289, 289]], "195774": [[150, 150], [159, 159]], "196027": [[150, 151]], "193834": [[1, 35]], "193835": [[1, 20], [22, 26]], "193836": [[1, 2]]}
the lumi sections really analyzed by your correctly terminated jobs
$ cat crab_0_131014_160518/res/lumiSummary.json
{"195947": [[27, 27], [36, 36]], "194108": [[95, 96], [119, 120], [123, 126], [154, 157], [160, 161], [166, 167], [172, 174], [176, 176], [185, 185], [187, 187], [196, 197], [211, 212], [231, 232], [238, 241], [249, 250], [277, 278], [305, 306], [333, 334], [438, 439], [520, 520], [527, 527]], "193998": [[66, 66], [69, 70], [87, 88], [90, 100], [103, 105], [108, 109], [112, 113], [115, 119], [124, 124], [126, 126], [132, 135], [139, 140], [142, 142], [144, 154], [158, 159], [168, 169], [172, 172], [174, 176], [180, 185], [191, 192], [195, 196], [233, 234]], "194224": [[94, 94], [111, 111], [257, 257]], "194424": [[63, 63], [92, 92], [121, 121], [123, 123], [168, 173], [176, 177], [184, 185], [187, 187], [207, 207], [213, 213], [220, 221], [256, 256], [599, 599], [602, 602], [607, 607], [609, 609], [639, 639], [656, 656]], "194631": [[222, 222]], "196250": [[73, 74]], "194027": [[93, 93], [109, 109], [113, 115]], "194778": [[127, 127], [130, 130]], "195099": [[77, 77], [106, 106]], "194711": [[140, 140], [142, 142]], "195552": [[256, 256], [263, 263]], "195868": [[16, 16], [20, 20]], "194912": [[130, 131]], "194699": [[253, 253], [256, 256]], "195970": [[77, 77], [79, 79]], "194076": [[3, 6], [29, 30], [33, 34], [58, 59], [84, 87], [93, 94], [106, 107], [130, 131], [154, 155], [228, 228], [239, 240], [246, 246], [268, 269], [284, 285], [718, 719]], "194050": [[353, 354], [1881, 1881]], "195919": [[5, 6]], "194644": [[34, 35], [78, 79]], "195164": [[62, 62], [64, 64]], "194199": [[114, 115], [124, 125], [148, 148], [156, 157], [159, 159], [207, 208]], "196531": [[284, 284], [289, 289]], "196027": [[150, 151]], "193834": [[1, 24], [27, 30], [33, 34]], "193835": [[19, 20], [22, 23], [26, 26]], "193836": [[1, 2]]}
and the missing lumi (difference between the original lumiMask and lumiSummary) that you can analyze creating a new task and using this file as new lumiMask file
$ cat crab_0_131014_160518/res/missingLumiSummary.json file
{"190645": [[10, 110]],
"190704": [[1, 3]],
"190705": [[1, 5], [7, 65], [81, 336], [338, 350], [353, 383]],
"190738": [[1, 130], [133, 226], [229, 355]],
.....
"208541": [[1, 57], [59, 173], [175, 376], [378, 417]],
"208551": [[119, 193], [195, 212], [215, 300], [303, 354], [356, 554], [557, 580]],
"208686": [[73, 79], [82, 181], [183, 224], [227, 243], [246, 311], [313, 463]]}
To create a task to analyze the missing lumis of the original lumiMask you can use the missingLumiSummary.json file as new lumiMask.json file in your crab.cfg.
As before, you can decide the split you want, and using the same publish_data_name the news outputs will be published in the same dataset of previuosly task
[CMSSW]
lumis_per_job = 50
number_of_jobs = 4
pset = tutorial.py
datasetpath = /SingleMu/Run2012B-13Jul2012-v1/AOD
lumi_mask = crab_0_131014_160518/res/missingLumiSummary.json
output_file = outfile.root
[USER]
return_data = 0
copy_data = 1
publish_data =1
storage_element = T2_xx_yyyy
publish_data_name = FanzagoTutGrid_data
[CRAB]
scheduler = remoteGlidein
jobtype = cmssw
$ crab -create -cfg crab_missing.cfg
[fanzago@lxplus0445 SLC6]$ crab -create -cfg crab_data.cfg
crab: Version 2.9.1 running on Tue Oct 15 17:10:16 2013 CET (15:10:16 UTC)
crab. Working options:
scheduler remoteGlidein
job type CMSSW
server OFF
working directory /afs/cern.ch/user/f/fanzago/scratch0/TEST_RELEASE/TEST_PATC2/TEST_2_8_2/TUTORIAL/TUT_5_3_11/SLC6/crab_0_131015_171016/
crab: error detecting glite version
crab: error detecting glite version
crab: Contacting Data Discovery Services ...
crab: Accessing DBS at: http://cmsdbsprod.cern.ch/cms_dbs_prod_global/servlet/DBSServlet
crab: Requested (A)DS /SingleMu/Run2012B-13Jul2012-v1/AOD has 14 block(s).
crab: SE black list applied to data location: ['srm-cms.cern.ch', 'srm-cms.gridpp.rl.ac.uk', 'T1_DE', 'T1_ES', 'T1_FR', 'T1_IT', 'T1_RU', 'T1_TW', 'cmsdca2.fnal.gov', 'T3_US_Vanderbilt_EC2']
crab: Requested number of jobs reached.
crab: 4 jobs created to run on 200 lumis
crab: Checking remote location
crab: WARNING: The stageout directory already exists. Be careful not to accidentally mix outputs from different tasks
crab: Creating 4 jobs, please wait...
crab: Total of 4 jobs created.
Log file is /afs/cern.ch/user/f/fanzago/scratch0/TEST_RELEASE/TEST_PATC2/TEST_2_8_2/TUTORIAL/TUT_5_3_11/SLC6/crab_0_131015_171016/log/crab.log
and submit them as usual. The created jobs will analyze part of the missing lumi of the original lumiMask.json file.
- If you select total_number_of_lumis = -1 instead of lumi_per_job or number_of_job, the new task will analyze all the missing lumi.
Run Crab retrieving your output (without copying to a Storage Element)
You can also run your analysis code without interacting with a remote Storage Element, but retrieving the outputs to your workspace area (under the res dir of the project).
Here below an example of the CRAB configuration file, coerent with this tutorial:
[CMSSW]
total_number_of_events = 100
number_of_jobs = 10
pset = tutorial.py
datasetpath = /RelValZMM/CMSSW_5_3_6-START53_V14-v2/GEN-SIM-RECO
output_file = outfile.root
[USER]
return_data = 1
[CRAB]
scheduler = remoteGlidein
jobtype = cmssw
And with this crab.cfg in place you can re-do de workflow as described before (a part of the publication step):
- creation
- submission
- status progress monitoring
- output retrieval (in this step you'll be able to retrieve directly the real output produced by your pset file)
Where to find more on CRAB
Note also that all CMS members using the Grid must subscribe to the Grid Annoucements CMS.HyperNews forum
.
Review status
Complete Review, Minor Changes. Page gives a good idea of doing a physics analysis using CRAB
Responsible: FedericaFanzago