. If dest_se is used, CRAB finds the correct path where the output can be stored. The command to execute in order to retrieve locally the remote output files to your local user interface is:
crab -copyData ## or crab -copyData -c <dir name>
An example of execution:
$ crab -copyData
crab: Version 2.8.5 running on Thu Feb 21 02:49:18 2013 CET (01:49:18 UTC)
crab. Working options:
scheduler remoteGlidein
job type CMSSW
server OFF
working directory /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130220_173930/
crab: Copy file locally.
Output dir: /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130220_173930/res/
crab: Starting copy...
directory/afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130220_173930/res/already exists
crab: Copy success for file: outfile_1_1_aOu.root
crab: Copy failed for file: outfile_4_1_Pi9.root
Copy failed because : Problem copying outfile_4_1_Pi9.root file'Permission denied!'
crab: Copy success for file: outfile_2_1_bC1.root
crab: Copy success for file: outfile_5_1_yna.root
crab: Copy success for file: outfile_3_1_96A.root
Log file is /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130220_173930/log/crab.log
Publish your result in DBS
The publication of the produced data to a DBS allows to re-run over the produced data that has been published. The instructions to follow are below, and here is the link to the how to.
You have to add to the Crab configuration file more information specifying the data name to publish and the DBS url instance where to register the output results.
[USER]
....
publish_data = 1
publish_data_name = what_you_want
dbs_url_for_publication = url_local_dbs
....
Warning:
- all the parameters related publication have to be added in the configuration file before creation of jobs, even if the publication step is executed after retrieving of job output.
- for this tutorial we will publish the data to the test DBS instance https://cmsdbsprod.cern.ch:8443/cms_dbs_ph_analysis_02_writer/servlet/DBSServlet. This instance is only for publication test, so the maintaining of published data is not guarantee for long time and the publication here doesn't require writing authorization. If you belong to a PAG group, you have to publish your data to the DBS associated to your group, checking at the DBS access twiki page the correct DBS url and which role in voms you need to be an allowed user.
- remember to change the ui_working_dir value in the configuration file to create a new project (if you don't use the default name of crab project), otherwise the creation step will fail with the error message "project already exists, please remove it before create new task "
Run Crab publishing your results
You can also run your analysis code publishing the results copied to a remote Storage Element.
Here below an example of the CRAB configuration file, coherent with this tutorial:
For MC data (crab.cfg)
[CMSSW]
total_number_of_events = 50
number_of_jobs = 10
pset = tutorial.py
datasetpath = /RelValProdTTbar/JobRobot-MC_3XY_V24_JobRobot-v1/GEN-SIM-DIGI-RECO
output_file = outfile.root
[USER]
return_data = 0
copy_data = 1
storage_element = T2_IT_Legnaro
publish_data = 1
publish_data_name = FedeTutGrid
dbs_url_for_publication = https://cmsdbsprod.cern.ch:8443/cms_dbs_ph_analysis_02_writer/servlet/DBSServlet
[CRAB]
scheduler = remoteGlidein
jobtype = cmssw
And with this crab.cfg you can re-do complete the workflow as described before plus the publication step:
- creation
- submission
- status progress monitoring
- output retrieval
- publish the results
Use the -publish option
After having done the previous workflow untill the retrieval of you jobs, you can publish the output data that have been stored in the Storage Element indicated in the crab.cfg using
crab -publish
or
crab -publish -c <dir name>
to publish outputs of a specific project.
It is not necessary all the jobs are done and retrieved. You can publish your output at different time.
It will look for all the FrameworkJobReport ( crab-project-dir/res/crab_fjr_*.xml ) produced by each jobs
and will extract from there the information (i.e. number of events, LFN,....) to publish.
Publication output example
$ crab -publish -c crab_0_130221_030014/
crab: Version 2.8.5 running on Tue Mar 5 12:04:57 2013 CET (11:04:57 UTC)
crab. Working options:
scheduler remoteGlidein
job type CMSSW
server OFF
working directory /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130221_030014/
crab: <dbs_url_for_publication> = https://cmsdbsprod.cern.ch:8443/cms_dbs_ph_analysis_02_writer/servlet/DBSServlet
file_list = ['/afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130221_030014//res//crab_fjr_1.xml', '/afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130221_030014//res//crab_fjr_2.xml', '/afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130221_030014//res//crab_fjr_3.xml', '/afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130221_030014//res//crab_fjr_4.xml', '/afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130221_030014//res//crab_fjr_5.xml']
crab: --->>> Start dataset publication
crab: --->>> Importing parent dataset in the dbs: /RelValProdTTbar/JobRobot-MC_3XY_V24_JobRobot-v1/GEN-SIM-DIGI-RECO
crab: --->>> Importing all parents level
Block /RelValProdTTbar/JobRobot-MC_3XY_V24_JobRobot-v1/GEN-SIM-RAW#c9d3a01e-a3a1-4fde-8104-1c7b024b5ef6 is already at destination
Block /RelValProdTTbar/JobRobot-MC_3XY_V24_JobRobot-v1/GEN-SIM-DIGI-RECO#8f881129-b4fd-4d88-902a-f7ca78a9da8f is already at destination
crab: --->>> duration of all parents import (sec): 3.43028283119
crab: Import ok of dataset /RelValProdTTbar/JobRobot-MC_3XY_V24_JobRobot-v1/GEN-SIM-DIGI-RECO
crab: PrimaryDataset = RelValProdTTbar
crab: ProcessedDataset = fanzago-FedeTutGrid-c8295e0370df515614ca6812ce2cfe77
crab: <User Dataset Name> = /RelValProdTTbar/fanzago-FedeTutGrid-c8295e0370df515614ca6812ce2cfe77/USER
debug_verbose:crab::Primary: {'Type': 'mc', 'Name': 'RelValProdTTbar'}
primary = {'Type': 'mc', 'Name': 'RelValProdTTbar'}
...
crab: --->>> End dataset publication
INFO:crab::--->>> End dataset publication
crab: --->>> Start files publication
INFO:crab::--->>> Start files publication
DEBUG:crab::FJR = /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130221_030014//res//crab_fjr_1.xml
DEBUG:crab::--->>> LFN of file to publish = /store/user/fanzago/RelValProdTTbar/FedeTutGrid/c8295e0370df515614ca6812ce2cfe77/outfile_1_1_haS.root
DEBUG:crab::--->>> Inserting file in blocks = ['/RelValProdTTbar/fanzago-FedeTutGrid-c8295e0370df515614ca6812ce2cfe77/USER#c5c0a5bc-aa35-4dcb-ade4-52211e5e8332']
DEBUG:crab::FJR = /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130221_030014//res//crab_fjr_2.xml
DEBUG:crab::--->>> LFN of file to publish = /store/user/fanzago/RelValProdTTbar/FedeTutGrid/c8295e0370df515614ca6812ce2cfe77/outfile_2_1_Nw2.root
...
crab: --->>> End files publication
INFO:crab::--->>> End files publication
crab: --->>> Check data publication: dataset /RelValProdTTbar/fanzago-FedeTutGrid-c8295e0370df515614ca6812ce2cfe77/USER in DBS url https://cmsdbsprod.cern.ch:8443/cms_dbs_ph_analysis_02_writer/servlet/DBSServlet
INFO:crab::--->>> Check data publication: dataset /RelValProdTTbar/fanzago-FedeTutGrid-c8295e0370df515614ca6812ce2cfe77/USER in DBS url https://cmsdbsprod.cern.ch:8443/cms_dbs_ph_analysis_02_writer/servlet/DBSServlet
=== dataset /RelValProdTTbar/fanzago-FedeTutGrid-c8295e0370df515614ca6812ce2cfe77/USER
=== dataset description =
===== File block name: /RelValProdTTbar/fanzago-FedeTutGrid-c8295e0370df515614ca6812ce2cfe77/USER#c5c0a5bc-aa35-4dcb-ade4-52211e5e8332
File block located at: ['t2-srm-02.lnl.infn.it']
File block status: 0
Number of files: 5
Number of Bytes: 3279142
Number of Events: 50
total events: 50 in dataset: /RelValProdTTbar/fanzago-FedeTutGrid-c8295e0370df515614ca6812ce2cfe77/USER
crab: You can obtain more info about files of the dataset using: crab -checkPublication -USER.dataset_to_check=/RelValProdTTbar/fanzago-FedeTutGrid-c8295e0370df515614ca6812ce2cfe77/USER -USER.dbs_url_for_publication=https://cmsdbsprod.cern.ch:8443/cms_dbs_ph_analysis_02_writer/servlet/DBSServlet -debug
INFO:crab::You can obtain more info about files of the dataset using: crab -checkPublication -USER.dataset_to_check=/RelValProdTTbar/fanzago-FedeTutGrid-c8295e0370df515614ca6812ce2cfe77/USER -USER.dbs_url_for_publication=https://cmsdbsprod.cern.ch:8443/cms_dbs_ph_analysis_02_writer/servlet/DBSServlet -debug
Log file is /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130221_030014//log/crab.log
Warning: some versions of CMSSW switch off the debug mode of crab, so a lot of duplicated info can be reported at screen level.
Check the result of data publication and analyze your published data
Note that:
- CRAB by default publishes all files finished correctly, including files with 0 events
- CRAB by default imports all dataset parents of your dataset
To check if your data have been published you can use the option:
crab -checkPublication -USER.dataset_to_check=your_dataset_path -USER.dbs_url_for_publication=url_local_dbs -debug
where dbs_url_for_publication is the dbs_url you have written in the crab.cfg file and name_of_your_dataset is the name of dataset published by CRAB primarydataset/publish_data_name/USER (it is also printed by CRAB in corrispondence of the line "User Dataset Name" when you run the crab -publish command).
The output is:
$ crab -checkPublication -USER.dataset_to_check=/RelValProdTTbar/fanzago-FedeTutGrid-c8295e0370df515614ca6812ce2cfe77/USER -USER.dbs_url_for_publication=https://cmsdbsprod.cern.ch:8443/cms_dbs_ph_analysis_02_writer/servlet/DBSServlet -debug
crab: /afs/cern.ch/cms/ccs/wm/scripts/Crab/CRAB_2_8_5_patch1/python/crab.py -checkPublication -USER.dataset_to_check=/RelValProdTTbar/fanzago-FedeTutGrid-c8295e0370df515614ca6812ce2cfe77/USER -USER.dbs_url_for_publication=https://cmsdbsprod.cern.ch:8443/cms_dbs_ph_analysis_02_writer/servlet/DBSServlet -debug
crab: Version 2.8.5 running on Tue Mar 5 12:11:37 2013 CET (11:11:37 UTC)
crab. Working options:
scheduler remoteGlidein
job type CMSSW
server OFF
working directory /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130304_120142/
crab: Downloading file [http://cmsdoc.cern.ch/cms/LCG/crab/config/] to [/afs/cern.ch/user/f/fanzago/.cms_crab/allowed_releases.conf].
crab: Service initialised ({'endpoint': 'http://cmsdoc.cern.ch/cms/LCG/crab/config/', 'maxcachereuse': 24.0, 'cachepath': '/afs/cern.ch/user/f/fanzago/.cms_crab', 'basepath': '/cms/LCG/crab/config/', 'method': None, 'timeout': 20, 'requests': {'endpoint': 'http://cmsdoc.cern.ch/cms/LCG/crab/config/', 'timeout': 20, 'cachepath': '/afs/cern.ch/user/f/fanzago/.cms_crab', 'cacheduration': 0.5, 'host': 'cmsdoc.cern.ch', 'accept_type': 'text/html', 'content_type': 'application/x-www-form-urlencoded', 'logger': <logging.Logger object at 0xf928f10>, 'type': 'txt/csv', 'conn': <httplib.HTTPConnection instance at 0xf9334d0>}, 'logger': <logging.Logger object at 0xf928f10>, 'cacheduration': 0.5, 'type': 'txt/csv', 'inputdata': {}}):
host: cmsdoc.cern.ch, basepath: /cms/LCG/crab/config/ (text/html)
cache: /afs/cern.ch/user/f/fanzago/.cms_crab (duration 0.5 hours, max reuse 24.0 hours)
crab: Service initialised ({'endpoint': 'https://cmsweb.cern.ch/sitedb/json/index/', 'maxcachereuse': 24.0, 'cachepath': '/afs/cern.ch/user/f/fanzago/.cms_sitedbcache', 'basepath': '/sitedb/json/index/', 'method': None, 'timeout': 30, 'requests': {'host': 'cmsweb.cern.ch', 'endpoint': 'https://cmsweb.cern.ch/sitedb/json/index/', 'accept_type': 'text/html', 'content_type': 'application/x-www-form-urlencoded', 'logger': <logging.Logger object at 0xf928f10>, 'cachepath': '/afs/cern.ch/user/f/fanzago/.cms_sitedbcache', 'conn': <httplib.HTTPSConnection instance at 0xfa903b0>}, 'logger': <logging.Logger object at 0xf928f10>, 'cacheduration': 0.5, 'inputdata': {}}):
host: cmsweb.cern.ch, basepath: /sitedb/json/index/ (text/html)
cache: /afs/cern.ch/user/f/fanzago/.cms_sitedbcache (duration 0.5 hours, max reuse 24.0 hours)
crab: Service initialised ({'endpoint': 'https://cmsweb.cern.ch/sitedb/json/index/', 'maxcachereuse': 24.0, 'cachepath': '/afs/cern.ch/user/f/fanzago/.cms_sitedbcache', 'basepath': '/sitedb/json/index/', 'method': None, 'timeout': 30, 'requests': {'host': 'cmsweb.cern.ch', 'endpoint': 'https://cmsweb.cern.ch/sitedb/json/index/', 'accept_type': 'text/html', 'content_type': 'application/x-www-form-urlencoded', 'logger': <logging.Logger object at 0xf928f10>, 'cachepath': '/afs/cern.ch/user/f/fanzago/.cms_sitedbcache', 'conn': <httplib.HTTPSConnection instance at 0xfa90440>}, 'logger': <logging.Logger object at 0xf928f10>, 'cacheduration': 0.5, 'inputdata': {}}):
host: cmsweb.cern.ch, basepath: /sitedb/json/index/ (text/html)
cache: /afs/cern.ch/user/f/fanzago/.cms_sitedbcache (duration 0.5 hours, max reuse 24.0 hours)
crab: Input whitelist:
crab: Input blacklist:
crab: Converted whitelist:
crab: Converted blacklist:
crab: Downloading file [http://cmsdoc.cern.ch/cms/LCG/crab/config/] to [/afs/cern.ch/user/f/fanzago/.cms_crab/myproxy_server.conf].
crab: Service initialised ({'endpoint': 'http://cmsdoc.cern.ch/cms/LCG/crab/config/', 'maxcachereuse': 24.0, 'cachepath': '/afs/cern.ch/user/f/fanzago/.cms_crab', 'basepath': '/cms/LCG/crab/config/', 'method': None, 'timeout': 20, 'requests': {'endpoint': 'http://cmsdoc.cern.ch/cms/LCG/crab/config/', 'timeout': 20, 'cachepath': '/afs/cern.ch/user/f/fanzago/.cms_crab', 'cacheduration': 0.5, 'host': 'cmsdoc.cern.ch', 'accept_type': 'text/html', 'content_type': 'application/x-www-form-urlencoded', 'logger': <logging.Logger object at 0xf928f10>, 'type': 'txt/csv', 'conn': <httplib.HTTPConnection instance at 0xfa904d0>}, 'logger': <logging.Logger object at 0xf928f10>, 'cacheduration': 0.5, 'type': 'txt/csv', 'inputdata': {}}):
host: cmsdoc.cern.ch, basepath: /cms/LCG/crab/config/ (text/html)
cache: /afs/cern.ch/user/f/fanzago/.cms_crab (duration 0.5 hours, max reuse 24.0 hours)
crab: Downloading file [http://cmsdoc.cern.ch/cms/LCG/crab/config/] to [/afs/cern.ch/user/f/fanzago/.cms_crab/site_black_list.conf].
crab: Service initialised ({'endpoint': 'http://cmsdoc.cern.ch/cms/LCG/crab/config/', 'maxcachereuse': 24.0, 'cachepath': '/afs/cern.ch/user/f/fanzago/.cms_crab', 'basepath': '/cms/LCG/crab/config/', 'method': None, 'timeout': 20, 'requests': {'endpoint': 'http://cmsdoc.cern.ch/cms/LCG/crab/config/', 'timeout': 20, 'cachepath': '/afs/cern.ch/user/f/fanzago/.cms_crab', 'cacheduration': 0.5, 'host': 'cmsdoc.cern.ch', 'accept_type': 'text/html', 'content_type': 'application/x-www-form-urlencoded', 'logger': <logging.Logger object at 0xf928f10>, 'type': 'txt/csv', 'conn': <httplib.HTTPConnection instance at 0xfa904d0>}, 'logger': <logging.Logger object at 0xf928f10>, 'cacheduration': 0.5, 'type': 'txt/csv', 'inputdata': {}}):
host: cmsdoc.cern.ch, basepath: /cms/LCG/crab/config/ (text/html)
cache: /afs/cern.ch/user/f/fanzago/.cms_crab (duration 0.5 hours, max reuse 24.0 hours)
crab: Enforced black list: <Downloader.Downloader instance at 0xfa90440>
crab: --->>> Check data publication: dataset /RelValProdTTbar/fanzago-FedeTutGrid-c8295e0370df515614ca6812ce2cfe77/USER in DBS url https://cmsdbsprod.cern.ch:8443/cms_dbs_ph_analysis_02_writer/servlet/DBSServlet
PrimaryDataset = RelValProdTTbar
ProcessedDataset = fanzago-FedeTutGrid-c8295e0370df515614ca6812ce2cfe77
DataTier = USER
datasets matching your requirements= [{'RunsList': [], 'Name': 'fanzago-FedeTutGrid-c8295e0370df515614ca6812ce2cfe77', 'PathList': ['/RelValProdTTbar/fanzago-FedeTutGrid-c8295e0370df515614ca6812ce2cfe77/USER'], 'LastModifiedBy': '/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=fanzago/CN=610896/CN=Federica Fanzago', 'AlgoList': [{'ExecutableName': 'cmsRun', 'ApplicationVersion': 'CMSSW_5_3_8', 'ParameterSetID': {'Hash': 'c8295e0370df515614ca6812ce2cfe77'}, 'ApplicationFamily': 'cmsRun'}], 'XtCrossSection': 0.0, 'Status': 'VALID', 'ParentList': [], 'AcquisitionEra': '', 'PhysicsGroup': 'NoGroup', 'Description': '', 'GlobalTag': '', 'PrimaryDataset': {'Name': 'RelValProdTTbar'}, 'TierList': ['USER'], 'CreatedBy': '/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=fanzago/CN=610896/CN=Federica Fanzago', 'PhysicsGroupConverner': 'NO_CONVENOR', 'CreationDate': '1362481519', 'LastModificationDate': '1362481520'}]
=== dataset /RelValProdTTbar/fanzago-FedeTutGrid-c8295e0370df515614ca6812ce2cfe77/USER
=== dataset description =
===== File block name: /RelValProdTTbar/fanzago-FedeTutGrid-c8295e0370df515614ca6812ce2cfe77/USER#c5c0a5bc-aa35-4dcb-ade4-52211e5e8332
File block located at: ['t2-srm-02.lnl.infn.it']
File block status: 0
Number of files: 5
Number of Bytes: 3279142
Number of Events: 50
--------- info about files --------
Size Events LFN FileStatus
666747 10 /store/user/fanzago/RelValProdTTbar/FedeTutGrid/c8295e0370df515614ca6812ce2cfe77/outfile_1_1_haS.root
635831 10 /store/user/fanzago/RelValProdTTbar/FedeTutGrid/c8295e0370df515614ca6812ce2cfe77/outfile_2_1_Nw2.root
648594 10 /store/user/fanzago/RelValProdTTbar/FedeTutGrid/c8295e0370df515614ca6812ce2cfe77/outfile_4_2_VKk.root
682364 10 /store/user/fanzago/RelValProdTTbar/FedeTutGrid/c8295e0370df515614ca6812ce2cfe77/outfile_5_1_bi0.root
645606 10 /store/user/fanzago/RelValProdTTbar/FedeTutGrid/c8295e0370df515614ca6812ce2cfe77/outfile_3_1_rWE.root
total events: 50 in dataset: /RelValProdTTbar/fanzago-FedeTutGrid-c8295e0370df515614ca6812ce2cfe77/USER
Log file is /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130304_120142/log/crab.log
If you want to analyze your published data you have to modify your crab.cfg specifying the datasetpath name of your dataset and the dbs_url where data are published
[CMSSW]
....
datasetpath=your_dataset_path
dbs_url=url_local_dbs
If you found that data of your interest is in the DBS instance "https://cmsdbsprod.cern.ch:8443/cms_dbs_ph_analysis_02_writer/servlet/DBSServlet" you can specify
https://cmsdbsprod.cern.ch:8443/cms_dbs_ph_analysis_02_writer/servlet/DBSServlet
The creation output will be something similar to:
$ crab -create
crab: Version 2.8.5 running on Tue Mar 5 12:19:06 2013 CET (11:19:06 UTC)
crab. Working options:
scheduler remoteGlidein
job type CMSSW
server OFF
working directory /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130305_121906/
verify if user DN is mapped in CERN's SSO
OK. user ready for SiteDB switchover on March 12, 2013
crab: Contacting Data Discovery Services ...
crab: Accessing DBS at: https://cmsdbsprod.cern.ch:8443/cms_dbs_ph_analysis_02_writer/servlet/DBSServlet
crab: Requested dataset: /RelValProdTTbar/fanzago-FedeTutGrid-c8295e0370df515614ca6812ce2cfe77/USER has 50 events in 1 blocks.
crab: May not create the exact number_of_jobs requested.
crab: 5 job(s) can run on 50 events.
crab: List of jobs and available destination sites:
Block 1: jobs 1-5: sites: T2_IT_Legnaro
crab: Creating 5 jobs, please wait...
crab: Total of 5 jobs created.
Run CRAB on real data copying the output to an SE
Running CRAB on real data has no major difference with running CRAB on MonteCarlo data. The main difference is related on the configuration preparation for the CRAB workflow, as showed in the next section.
CRAB configuration file for real data with lumi mask
You can find more details on this at the corresponding link on the Crab FAQ page.
The CRAB configuration file (default name crab.cfg) should be located at the same location as the CMSSW parameter-set to be used by CRAB.
The dataset used is: /SingleMu/Run2012B-13Jul2012-v1/AOD
For real data (crab_lumi.cfg)
[CMSSW]
lumis_per_job = 50
number_of_jobs = 10
pset = tutorial.py
datasetpath = /SingleMu/Run2012B-13Jul2012-v1/AOD
lumi_mask = Cert_190456-195947_8TeV_PromptReco_Collisions12_JSON_v2.txt
output_file = outfile.root
[USER]
return_data = 0
copy_data = 1
publish_data = 1
storage_element = T2_IT_Legnaro
publish_data_name = FedeTutGridGlide_data
dbs_url_for_publication = https://cmsdbsprod.cern.ch:8443/cms_dbs_ph_analysis_02_writer/servlet/DBSServlet
[CRAB]
scheduler = remoteGlidein
jobtype = cmssw
where the lumi_mask file can be downloaded with
wget --no-check-certificate https://cms-service-dqm.web.cern.ch/cms-service-dqm/CAF/certification/Collisions12/8TeV/Prompt/Cert_190456-195947_8TeV_PromptReco_Collisions12_JSON_v2.txt
For the tutorial we are using a subset of run and lumi (using a lumiMask.json file). The lumi_mask file (Cert_190456-195947_8TeV_PromptReco_Collisions12_JSON_v2.txt) contains:
{"190645": [[10, 110]], "190704": [[1, 3]], "190705": [[1, 5], [7, 76], [78, 336], [338, 350], [353, 384]],
...
"195937": [[1, 28], [31, 186], [188, 400]], "195947": [[23, 62], [64, 88]]}
Job Creation
Creating jobs for real data is analogous to montecarlo data. To not overwrite previous run for this tutorial, it is suggested to use a dedicated cfg:
crab -create -cfg crab_lumi.cfg
that takes as configuration file the file name specified with the option -cfg, in this case the crab_lumi.cfg associated for this tutorial with real data.
$ crab -create -cfg crab_lumi.cfg
crab: Version 2.8.5 running on Tue Mar 5 14:47:56 2013 CET (13:47:56 UTC)
crab. Working options:
scheduler remoteGlidein
job type CMSSW
server OFF
working directory /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130305_144756/
verify if user DN is mapped in CERN's SSO
OK. user ready for SiteDB switchover on March 12, 2013
crab: Contacting Data Discovery Services ...
crab: Accessing DBS at: http://cmsdbsprod.cern.ch/cms_dbs_prod_global/servlet/DBSServlet
crab: Requested (A)DS /SingleMu/Run2012B-TOPMuPlusJets-PromptSkim-v1/AOD has 13 block(s).
crab: Requested number of lumis reached.
crab: 8 jobs created to run on 500 lumis
crab: Checking remote location
crab: Creating 8 jobs, please wait...
crab: Total of 8 jobs created.
Log file is /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130305_144756/log/crab.log
- The project directory called crab_0_130305_144756 is created.
- As explained the number of created jobs can not match the number of jobs required in the configuration file (9 created but 10 required jobs).
Job Submission
Job submission is always analogous:
$ crab -submit
crab: Version 2.8.5 running on Tue Mar 5 14:54:39 2013 CET (13:54:39 UTC)
crab. Working options:
scheduler remoteGlidein
job type CMSSW
server OFF
working directory /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130305_144756/
crab: Checking available resources...
crab: Found compatible site(s) for job 1
crab: 1 blocks of jobs will be submitted
crab: remotehost from Avail.List = submit-2.t2.ucsd.edu
crab: contacting remote host submit-2.t2.ucsd.edu
crab: Establishing gsissh ControlPath. Wait 2 sec ...
crab: Establishing gsissh ControlPath. Wait 2 sec ...
crab: COPY FILES TO REMOTE HOST
crab: SUBMIT TO REMOTE GLIDEIN FRONTEND
Submitting 8 jobs
100% [=================================================================================================================]
please wait
crab: Total of 8 jobs submitted.
Log file is /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130305_144756/log/crab.log
Job Status Check
Check the status of the jobs in the latest CRAB project with the following command:
crab -status
to check a specific project:
crab -status -c <dir name>
which should produce a similar screen output like:
$ crab -status
crab: Version 2.8.5 running on Tue Mar 5 14:59:36 2013 CET (13:59:36 UTC)
crab. Working options:
scheduler remoteGlidein
job type CMSSW
server OFF
working directory /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130305_144756/
crab: Checking the status of all jobs: please wait
crab: contacting remote host submit-2.t2.ucsd.edu
crab:
ID END STATUS ACTION ExeExitCode JobExitCode E_HOST
----- --- ----------------- ------------ ---------- ----------- ---------
1 N Running SubSuccess cream02.iihe.ac.be
2 N Running SubSuccess cream02.iihe.ac.be
3 N Running SubSuccess cream02.iihe.ac.be
4 N Running SubSuccess cream02.iihe.ac.be
5 N Submitted SubSuccess
6 N Running SubSuccess cream02.iihe.ac.be
7 N Running SubSuccess cream02.iihe.ac.be
8 N Running SubSuccess red-gw2.unl.edu
crab: 8 Total Jobs
>>>>>>>>> 1 Jobs Submitted
List of jobs Submitted: 5
>>>>>>>>> 7 Jobs Running
List of jobs Running: 1-4,6-8
crab: You can also follow the status of this task on :
CMS Dashboard: http://dashb-cms-job-task.cern.ch/taskmon.html#task=fanzago_crab_0_130305_144756_db2r51
Your task name is: fanzago_crab_0_130305_144756_db2r51
Job Output Retrieval
For the jobs which are in the "Done" status it is possible to retrieve the log files of the jobs (just the log files, because the output files are copied to the Storage Element associated to the T2 specified on the crab.cfg and infact return_data is 0).
The following command retrieves the log files of all "Done" jobs of the last created CRAB project:
crab -getoutput
to get the output of a specific project:
crab -getoutput -c <dir name>
the job results will be copied in the res
subdirectory of your crab project:
$ crab -get
crab: Version 2.8.5 running on Tue Mar 5 15:15:32 2013 CET (14:15:32 UTC)
crab. Working options:
scheduler remoteGlidein
job type CMSSW
server OFF
working directory /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130305_144756/
crab: contacting remote host submit-2.t2.ucsd.edu
crab: RETRIEVE FILE out_files_1.tgz for job #1
crab: RETRIEVE FILE crab_fjr_1.xml for job #1
crab: Results of Jobs # 1 are in /afs/cern.ch/user/f/fanzago/scratch0/TEST_RELEASE/TEST_PATC2/TEST_2_8_2/crab_0_130305_144756/res/
...
Log file is /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130305_144756/log/crab.log
Use the -report option
As for the MonteCarlo data example, it is possible to run the report command:
crab -report -c <dir name>
$ crab -report
crab: Version 2.8.5 running on Tue Mar 5 15:18:00 2013 CET (14:18:00 UTC)
crab. Working options:
scheduler remoteGlidein
job type CMSSW
server OFF
working directory /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130305_144756/
crab: --------------------
Dataset: /SingleMu/Run2012B-TOPMuPlusJets-PromptSkim-v1/AOD
Remote output :
SE: T2_IT_Legnaro t2-srm-02.lnl.infn.it srmPath: srm://t2-srm-02.lnl.infn.it:8443/srm/managerv2?SFN=/pnfs/lnl.infn.it/data/cms/store/user/fanzago/SingleMu/FedeTutGridGlide_data/${PSETHASH}/
Total Events read: 39942
Total Files read: 29
Total Jobs : 8
Luminosity section summary file: /afs/cern.ch/user/f/fanzago/scratch0/TEST_RELEASE/TEST_PATC2/TEST_2_8_2/crab_0_130305_144756/res/lumiSummary.json
# Jobs: Retrieved:8
----------------------------
crab: Summary file of input run and lumi to be analize with this task: /afs/cern.ch/user/f/fanzago/scratch0/TEST_RELEASE/TEST_PATC2/TEST_2_8_2/crab_0_130305_144756/res//inputLumiSummaryOfTask.json
crab: to complete your analysis, you have to analyze the run and lumi reported in the /afs/cern.ch/user/f/fanzago/scratch0/TEST_RELEASE/TEST_PATC2/TEST_2_8_2/crab_0_130305_144756/res//missingLumiSummary.json file
Log file is /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130305_144756/log/crab.log
And the content of files containing the luminosity info about the task are:
the original lumiMask.json file used in the creation of your task
$ cat Cert_190456-195947_8TeV_PromptReco_Collisions12_JSON_v2.txt
{"190645": [[10, 110]], "190704": [[1, 3]], "190705": [[1, 5], [7, 76], [78, 336], [338, 350], [353, 384]], "190738": [[1, 130], [133, 226], [229, 355]], ...
the lumi sections that your created jobs have to analyze
$ cat crab_0_130609_231016/res/inputLumiSummaryOfTask.json
{"195947": [[27, 27], [36, 36]], "194108": [[95, 96], [117, 121], [123, 126], [149, 152], [154, 157], [160, 161], [166, 169], [172, 174], [176, 176], [185, 185], [187, 187], [190, 191], [196, 197], [200, 201], [206, 209], [211, 212], [216, 221], [231, 232], [234, 235], [238, 243], [249, 250], [277, 278], [285, 286], [305, 308], [311, 312], [333, 334], [438, 439], [520, 520], [527, 527]], ...
the lumi sections really analyzed by your correctly terminated jobs
$ cat /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130305_144756/res/lumiSummary.json
cat crab_0_130609_231016/res/lumiSummary.json
{"194424": [[63, 63], [92, 92], [121, 121], [123, 123], [168, 173], [176, 177], [184, 185], [187, 187], [199, 200], [202, 203], [207, 207], [213, 213], [220, 221], [256, 256], [557, 557], [559, 559], [562, 562], [564, 564], [599, 599], [602, 602], [607, 607], [609, 609], [639, 639], [648, 649], [656, 656], [658, 658], [660, 660]], "194108": [[95, 96], [117, 121], [123, 126], [149, 152], [154, 157], [160, 161], [166, 169], [172, 174], [176, 176], [185, 185], [187, 187], [190, 191], [196, 197], [200, 201], [206, 209], [211, 212], [216, 221], [231, 232], [234, 235], [238, 243], [249, 250], [277, 278], [285, 286], [305, 308], [311, 312], [333, 334], [438, 439], [520, 520], [527, 527]], ...
and the missing lumi (difference between the lumiMask and lumiSummary) that you can analyze creating a new task and using this file as new lumiMask file
$ cat /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130305_144756/res//missingLumiSummary.json file
cat crab_0_130609_231016/res/missingLumiSummary.json file
{"190645": [[10, 110]],
"190704": [[1, 3]],
"190705": [[1, 5], [7, 76], [78, 336], [338, 350], [353, 384]],
"190738": [[1, 130], [133, 226], [229, 355]],
...
To create a task to analyze the missing lumis you can use the missingLumiSummary.json file as lumiMask.json file in your crab.cfg
[CMSSW]
total_number_of_lumis = -1
number_of_jobs = 10
pset = tutorial.py
datasetpath = /SingleMu/Run2012B-13Jul2012-v1/AOD
lumi_mask = crab_0_130305_144756/res/missingLumiSummary.json
output_file = outfile.root
[USER]
return_data = 0
copy_data = 1
publish_data =1
storage_element = T2_IT_Legnaro
publish_data_name = FedeTutGridGlide_data
dbs_url_for_publication = https://cmsdbsprod.cern.ch:8443/cms_dbs_ph_analysis_02_writer/servlet/DBSServlet
[CRAB]
scheduler = remoteGlidein
jobtype = cmssw
$ crab -create -cfg crab_missing.cfg
crab: Version 2.8.5 running on Tue Mar 5 15:22:50 2013 CET (14:22:50 UTC)
crab. Working options:
scheduler remoteGlidein
job type CMSSW
server OFF
working directory /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130305_152250/
verify if user DN is mapped in CERN's SSO
OK. user ready for SiteDB switchover on March 12, 2013
crab: Contacting Data Discovery Services ...
crab: Accessing DBS at: http://cmsdbsprod.cern.ch/cms_dbs_prod_global/servlet/DBSServlet
crab: Requested (A)DS /SingleMu/Run2012B-TOPMuPlusJets-PromptSkim-v1/AOD has 13 block(s).
crab: Each job will process about 192 lumis.
crab: 9 jobs created to run on 1918 lumis
crab: Checking remote location
crab: WARNING: The stageout directory already exists. Be careful not to accidentally mix outputs from different tasks
crab: Creating 9 jobs, please wait...
crab: Total of 9 jobs created.
and submit them as usual. The created jobs will analyze all the missing lumi of the original lumiMask.json file
Run Crab retrieving your output (without copying to a Storage Element)
You can also run your analysis code without interacting with a remote Storage Element, but retrieving the outputs to your workspace area (under the res dir of the project).
Here below an example of the CRAB configuration file, coerent with this tutorial:
[CMSSW]
total_number_of_events = 100
number_of_jobs = 10
pset = tutorial.py
datasetpath = /RelValProdTTbar/JobRobot-MC_3XY_V24_JobRobot-v1/GEN-SIM-DIGI-RECO
output_file = outfile.root
[USER]
return_data = 1
[CRAB]
scheduler = remoteGlidein
jobtype = cmssw
And with this crab.cfg in place you can re-do de workflow as described before (a part of the publication step):
- creation
- submission
- status progress monitoring
- output retrieval (in this step you'll be able to retrieve directly the real output produced by your pset file)
Where to find more on CRAB
Note also that all CMS members using the Grid must subscribe to the Grid Annoucements CMS.HyperNews forum
.
Review status
Complete Review, Minor Changes. Page gives a good idea of doing a physics analysis using CRAB
Responsible: FedericaFanzago