. If dest_se is used, CRAB finds the correct path where the output can be stored. The command to execute in order to retrieve locally the remote output files to your local user interface is:
crab -copyData
## or crab -copyData -c <dir name>
An example of execution:
$ crab -copyData
crab: Version 2.9.1 running on Fri Oct 11 17:08:38 2013 CET (15:08:38 UTC)
crab. Working options:
scheduler remoteGlidein
job type CMSSW
server OFF
working directory /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131011_153317/
crab: error detecting glite version
crab: error detecting glite version
crab: Copy file locally.
Output dir: /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131011_153317/res/
crab: Starting copy...
directory/afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131011_153317/res/already exists
crab: Copy success for file: outfile_4_1_Jlr.root
crab: Copy success for file: outfile_3_1_MsR.root
crab: Copy success for file: outfile_1_1_HF3.root
crab: Copy success for file: outfile_2_1_cVA.root
crab: Copy success for file: outfile_5_1_gAw.root
Log file is /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131011_153317/log/crab.log
Publish your result in DBS
The publication of the produced data to DBS allows to re-run over the produced data that has been published. The instructions to follow are below, and here is the link to the how to.
You have to add to the Crab configuration file more information specifying that you (will) want to publish and the data name to publish.
[USER]
....
publish_data = 1
publish_data_name = what_you_want
....
Warning:
- All the parameters related publication have to be added in the configuration file before creation of jobs, even if the publication step is executed after retrieving of job output.
- Publication is done in the phys03 instance of DBS3. If you belong to a PAG group, you have to publish your data to the DBS associated to your group, checking at the DBS access twiki page the correct DBS url and which role in voms you need to be an allowed user.
- Remember to change the ui_working_dir value in the configuration file to create a new project (if you don't use the default name of crab project), otherwise the creation step will fail with the error message "project already exists, please remove it before create new task ".
Run Crab publishing your results
You can also run your analysis code publishing the results copied to a remote Storage Element.
Here below an example of the CRAB configuration file, coherent with this tutorial:
For MC data (crab.cfg)
[CMSSW]
total_number_of_events = 50
number_of_jobs = 10
pset = tutorial.py
datasetpath = /RelValZMM/CMSSW_5_3_6-START53_V14-v2/GEN-SIM-RECO
output_file = outfile.root
[USER]
return_data = 0
copy_data = 1
storage_element = T2_xx_yyyy
publish_data = 1
publish_data_name = FanzagoTutGrid
[CRAB]
scheduler = remoteGlidein
jobtype = cmssw
And with this crab.cfg
you can re-do the complete workflow as described before, plus the publication step:
- creation
- submission
- status progress monitoring
- output retrieval
- publish the results
Use the -publish option
After having done the previous workflow untill the retrieval of you jobs, you can publish the output data that have been stored in the Storage Element indicated in the crab.cfg
file using:
crab -publish
or to publish the outputs of a specific project:
crab -publish -c <dir_name>
It is not necessary that all the jobs are done and retrieved. You can publish your output at a different time.
It will look for all the FrameworkJobReport files ( crab-project-dir/res/crab_fjr_*.xml ) produced by each job and will extract from there the information (i.e. number of events, LFN, etc.) to publish.
Publication output example
The output shown below corresponds to an old output using DBS2.
$ crab -publish
crab: Version 2.9.1 running on Mon Oct 14 14:35:56 2013 CET (12:35:56 UTC)
crab. Working options:
scheduler remoteGlidein
job type CMSSW
server OFF
working directory /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_123645/
crab: <dbs_url_for_publication> = https://cmsdbsprod.cern.ch:8443/cms_dbs_ph_analysis_02_writer/servlet/DBSServlet
file_list = ['/afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_123645/res//crab_fjr_1.xml', '/afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_123645/res//crab_fjr_2.xml', '/afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_123645/res//crab_fjr_3.xml', '/afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_123645/res//crab_fjr_4.xml', '/afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_123645/res//crab_fjr_5.xml', '/afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_123645/res//crab_fjr_6.xml', '/afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_123645/res//crab_fjr_7.xml', '/afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_123645/res//crab_fjr_8.xml', '/afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_123645/res//crab_fjr_9.xml', '/afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_123645/res//crab_fjr_10.xml']
crab: --->>> Start dataset publication
crab: --->>> Importing parent dataset in the dbs: /RelValZMM/CMSSW_5_3_6-START53_V14-v2/GEN-SIM-RECO
crab: --->>> Importing all parents level
-----------------------------------------------------------------------------------
Transferring path /RelValZMM/CMSSW_5_2_1-START52_V4-v1/GEN-SIM
block /RelValZMM/CMSSW_5_2_1-START52_V4-v1/GEN-SIM#24e1effb-0f0c-4557-bb46-3d5ecae691b8
-----------------------------------------------------------------------------------
-----------------------------------------------------------------------------------
Transferring path /RelValZMM/CMSSW_5_3_6-START53_V14-v2/GEN-SIM-DIGI-RAW-HLTDEBUG
block /RelValZMM/CMSSW_5_3_6-START53_V14-v2/GEN-SIM-DIGI-RAW-HLTDEBUG#13e93136-29ed-11e2-9c63-00221959e7c0
-----------------------------------------------------------------------------------
-----------------------------------------------------------------------------------
Transferring path /RelValZMM/CMSSW_5_3_6-START53_V14-v2/GEN-SIM-RECO
block /RelValZMM/CMSSW_5_3_6-START53_V14-v2/GEN-SIM-RECO#43683124-29f6-11e2-9c63-00221959e7c0
-----------------------------------------------------------------------------------
crab: --->>> duration of all parents import (sec): 552.62570405
crab: Import ok of dataset /RelValZMM/CMSSW_5_3_6-START53_V14-v2/GEN-SIM-RECO
crab: PrimaryDataset = RelValZMM
crab: ProcessedDataset = fanzago-FanzagoTutGrid-f30a6bb13f516198b2814e83414acca1
crab: <User Dataset Name> = /RelValZMM/fanzago-FanzagoTutGrid-f30a6bb13f516198b2814e83414acca1/USER
crab: --->>> End dataset publication
crab: --->>> Start files publication
crab: --->>> End files publication
crab: --->>> Check data publication: dataset /RelValZMM/fanzago-FanzagoTutGrid-f30a6bb13f516198b2814e83414acca1/USER in DBS url https://cmsdbsprod.cern.ch:8443/cms_dbs_ph_analysis_02_writer/servlet/DBSServlet
=== dataset /RelValZMM/fanzago-FanzagoTutGrid-f30a6bb13f516198b2814e83414acca1/USER
=== dataset description =
===== File block name: /RelValZMM/fanzago-FanzagoTutGrid-f30a6bb13f516198b2814e83414acca1/USER#787d164e-b485-4a23-b334-a8abde3fe146
File block located at: ['t2-srm-02.lnl.infn.it']
File block status: 0
Number of files: 10
Number of Bytes: 33667525
Number of Events: 50
total events: 50 in dataset: /RelValZMM/fanzago-FanzagoTutGrid-f30a6bb13f516198b2814e83414acca1/USER
Log file is /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_123645/log/crab.log
Warning: Some versions of CMSSW switch off the debug mode of crab, so a lot of duplicated info can be reported at screen level.
Analyze your published data
First note that:
- CRAB by default publishes all files finished correctly, including files with 0 events
- CRAB by default imports all dataset parents of your dataset
You have to modify your crab.cfg
file specifying the datasetpath name of your dataset and the dbs_url where data are published (we will assume phys03 instance of DBS3):
[CMSSW]
....
datasetpath = your_dataset_path
dbs_url = phys03
The creation output will be something similar to:
$ crab -create
crab: Version 2.9.1 running on Mon Oct 14 15:49:31 2013 CET (13:49:31 UTC)
crab. Working options:
scheduler remoteGlidein
job type CMSSW
server OFF
working directory /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_154931/
crab: error detecting glite version
crab: error detecting glite version
crab: Contacting Data Discovery Services ...
crab: Accessing DBS at: https://cmsweb.cern.ch/dbs/prod/phys03/DBSReader
crab: Requested dataset: /RelValZMM/fanzago-FanzagoTutGrid-f30a6bb13f516198b2814e83414acca1/USER has 50 events in 1 blocks.
crab: SE black list applied to data location: ['srm-cms.cern.ch', 'srm-cms.gridpp.rl.ac.uk', 'T1_DE', 'T1_ES', 'T1_FR', 'T1_IT', 'T1_RU', 'T1_TW', 'cmsdca2.fnal.gov', 'T3_US_Vanderbilt_EC2']
crab: May not create the exact number_of_jobs requested.
crab: 10 job(s) can run on 50 events.
crab: List of jobs and available destination sites:
Block 1: jobs 1-10: sites: T2_IT_Legnaro
crab: Checking remote location
crab: WARNING: The stageout directory already exists. Be careful not to accidentally mix outputs from different tasks
crab: Creating 10 jobs, please wait...
crab: Total of 10 jobs created.
Log file is /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_154931/log/crab.log
The jobs will run in the site where your USER data have been stored.
CRAB configuration file for real data with lumi mask
You can find more details on this at the corresponding link on the Crab FAQ page.
The CRAB configuration file (default name crab.cfg) should be located at the same location as the CMSSW parameter-set to be used by CRAB.
The dataset used is: /SingleMu/Run2012B-13Jul2012-v1/AOD
For real data (crab_lumi.cfg)
[CMSSW]
lumis_per_job = 50
number_of_jobs = 10
pset = tutorial.py
datasetpath = /SingleMu/Run2012B-13Jul2012-v1/AOD
lumi_mask = Cert_190456-208686_8TeV_PromptReco_Collisions12_JSON.txt
output_file = outfile.root
[USER]
return_data = 0
copy_data = 1
publish_data = 1
publish_data_name = FanzagoTutGrid_data
[CRAB]
scheduler = remoteGlidein
jobtype = cmssw
where the lumi_mask file can be downloaded with
wget --no-check-certificate https://cms-service-dqm.web.cern.ch/cms-service-dqm/CAF/certification/Collisions12/8TeV/Prompt/Cert_190456-208686_8TeV_PromptReco_Collisions12_JSON.txt
For the tutorial we are using a subset of run and lumi (using a lumiMask.json file). The lumi_mask file (Cert_190456-208686_8TeV_PromptReco_Collisions12_JSON.txt) contains:
{"190645": [[10, 110]], "190704": [[1, 3]], "190705": [[1, 5], [7, 76], [78, 336], [338, 350], [353, 384]],
...
"208551": [[119, 193], [195, 212], [215, 300], [303, 354], [356, 554], [557, 580]], "208686": [[73, 79], [82, 181], [183, 224], [227, 243], [246, 311], [313, 463]]}
Job Creation
Creating jobs for real data is analogous to montecarlo data. To not overwrite previous run for this tutorial, it is suggested to use a dedicated cfg:
crab -create -cfg crab_lumi.cfg
that takes as configuration file the file name specified with the option -cfg, in this case the crab_lumi.cfg associated for this tutorial with real data.
$ crab -create -cfg crab_lumi.cfg
crab: Version 2.9.1 running on Mon Oct 14 16:05:18 2013 CET (14:05:18 UTC)
crab. Working options:
scheduler remoteGlidein
job type CMSSW
server OFF
working directory /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_160518/
crab: error detecting glite version
crab: error detecting glite version
crab: Contacting Data Discovery Services ...
crab: Accessing DBS at: http://cmsdbsprod.cern.ch/cms_dbs_prod_global/servlet/DBSServlet
crab: Requested (A)DS /SingleMu/Run2012B-13Jul2012-v1/AOD has 14 block(s).
crab: SE black list applied to data location: ['srm-cms.cern.ch', 'srm-cms.gridpp.rl.ac.uk', 'T1_DE', 'T1_ES', 'T1_FR', 'T1_IT', 'T1_RU', 'T1_TW', 'cmsdca2.fnal.gov', 'T3_US_Vanderbilt_EC2']
crab: Requested number of lumis reached.
crab: 9 jobs created to run on 500 lumis
crab: Checking remote location
crab: Creating 9 jobs, please wait...
crab: Total of 9 jobs created.
Log file is /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_160518/log/crab.log
- The project directory called crab_0_131014_160518 is created.
- As explained the number of created jobs can not match the number of jobs required in the configuration file (9 created but 10 required jobs).
Job Submission
Job submission is always analogous:
$ crab -submit
crab: Version 2.9.1 running on Mon Oct 14 16:07:59 2013 CET (14:07:59 UTC)
crab. Working options:
scheduler remoteGlidein
job type CMSSW
server OFF
working directory /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_160518/
crab: error detecting glite version
crab: error detecting glite version
crab: Checking available resources...
crab: Found compatible site(s) for job 1
crab: 1 blocks of jobs will be submitted
crab: remotehost from Avail.List = submit-4.t2.ucsd.edu
crab: contacting remote host submit-4.t2.ucsd.edu
crab: Establishing gsissh ControlPath. Wait 2 sec ...
crab: Establishing gsissh ControlPath. Wait 2 sec ...
crab: COPY FILES TO REMOTE HOST
crab: SUBMIT TO REMOTE GLIDEIN FRONTEND
Submitting 9 jobs
100% [====================================================================================================================================================]
please wait crab: Total of 9 jobs submitted.
Log file is /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_160518/log/crab.log
Job Status Check
Check the status of the jobs in the latest CRAB project with the following command:
crab -status
to check a specific project:
crab -status -c <dir name>
which should produce a similar screen output like:
[fanzago@lxplus0445 SLC6]$ crab -status
crab: Version 2.9.1 running on Mon Oct 14 16:23:52 2013 CET (14:23:52 UTC)
crab. Working options:
scheduler remoteGlidein
job type CMSSW
server OFF
working directory /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_160518/
crab: error detecting glite version
crab: error detecting glite version
crab: Checking the status of all jobs: please wait
crab: contacting remote host submit-4.t2.ucsd.edu
crab:
ID END STATUS ACTION ExeExitCode JobExitCode E_HOST
----- --- ----------------- ------------ ---------- ----------- ---------
1 N Running SubSuccess ce208.cern.ch
2 N Submitted SubSuccess
3 N Running SubSuccess cream03.lcg.cscs.ch
4 N Running SubSuccess t2-ce-01.lnl.infn.it
5 N Running SubSuccess cream01.lcg.cscs.ch
6 N Running SubSuccess cream01.lcg.cscs.ch
7 N Running SubSuccess ingrid.cism.ucl.ac.be
8 N Running SubSuccess ingrid.cism.ucl.ac.be
9 N Running SubSuccess ce203.cern.ch
crab: 9 Total Jobs
>>>>>>>>> 1 Jobs Submitted
List of jobs Submitted: 2
>>>>>>>>> 8 Jobs Running
List of jobs Running: 1,3-9
crab: You can also follow the status of this task on :
CMS Dashboard: http://dashb-cms-job-task.cern.ch/taskmon.html#task=fanzago_crab_0_131014_160518_582igd
Your task name is: fanzago_crab_0_131014_160518_582igd
Log file is /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_160518/log/crab.log
and then ...
$ crab -status
crab: Version 2.9.1 running on Tue Oct 15 10:53:33 2013 CET (08:53:33 UTC)
crab. Working options:
scheduler remoteGlidein
job type CMSSW
server OFF
working directory /afs/cern.ch/user/f/fanzago/scratch0/TEST_RELEASE/TEST_PATC2/TEST_2_8_2/TUTORIAL/TUT_5_3_11/SLC6/crab_0_131014_160518/
crab: error detecting glite version
crab: error detecting glite version
crab: Checking the status of all jobs: please wait
crab: contacting remote host submit-4.t2.ucsd.edu
crab: Establishing gsissh ControlPath. Wait 2 sec ...
crab: Establishing gsissh ControlPath. Wait 2 sec ...
crab:
ID END STATUS ACTION ExeExitCode JobExitCode E_HOST
----- --- ----------------- ------------ ---------- ----------- ---------
1 N Done Terminated 0 0 ce208.cern.ch
2 N Done Terminated 0 60317 cream03.lcg.cscs.ch
3 N Done Terminated 0 60317 cream03.lcg.cscs.ch
4 N Done Terminated 0 0 t2-ce-01.lnl.infn.it
5 N Done Terminated 0 60317 cream01.lcg.cscs.ch
6 N Done Terminated 0 60317 cream01.lcg.cscs.ch
7 N Done Terminated 0 0 ingrid.cism.ucl.ac.be
8 N Done Terminated 0 0 ingrid.cism.ucl.ac.be
9 N Done Terminated 0 0 ce203.cern.ch
crab: ExitCodes Summary
>>>>>>>>> 4 Jobs with Wrapper Exit Code : 60317
List of jobs: 2-3,5-6
See https://twiki.cern.ch/twiki/bin/view/CMS/JobExitCodes for Exit Code meaning
crab: ExitCodes Summary
>>>>>>>>> 5 Jobs with Wrapper Exit Code : 0
List of jobs: 1,4,7-9
See https://twiki.cern.ch/twiki/bin/view/CMS/JobExitCodes for Exit Code meaning
crab: 9 Total Jobs
crab: You can also follow the status of this task on :
CMS Dashboard: http://dashb-cms-job-task.cern.ch/taskmon.html#task=fanzago_crab_0_131014_160518_582igd
Your task name is: fanzago_crab_0_131014_160518_582igd
Log file is /afs/cern.ch/user/f/fanzago/scratch0/TEST_RELEASE/TEST_PATC2/TEST_2_8_2/TUTORIAL/TUT_5_3_11/SLC6/crab_0_131014_160518/log/crab.log
Job Output Retrieval
For the jobs which are in the "Done" status it is possible to retrieve the log files of the jobs (just the log files, because the output files are copied to the Storage Element associated to the T2 specified on the crab.cfg and infact return_data is 0).
The following command retrieves the log files of all "Done" jobs of the last created CRAB project:
crab -getoutput
to get the output of a specific project:
crab -getoutput -c <dir name>
the job results will be copied in the res
subdirectory of your crab project:
$ crab -get
crab: Version 2.9.1 running on Tue Oct 15 10:53:53 2013 CET (08:53:53 UTC)
crab. Working options:
scheduler remoteGlidein
job type CMSSW
server OFF
working directory /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_160518/
crab: error detecting glite version
crab: error detecting glite version
crab: contacting remote host submit-4.t2.ucsd.edu
crab: Preparing to rsync 2 files
crab: Results of Jobs # 1 are in /afs/cern.ch/user/f/fanzago/scratch0/TEST_RELEASE/TEST_PATC2/TEST_2_8_2/TUTORIAL/TUT_5_3_11/SLC6/crab_0_131014_160518/res/
crab: contacting remote host submit-4.t2.ucsd.edu
crab: Preparing to rsync 16 files
crab: Results of Jobs # 2 are in /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_160518/res/
crab: Results of Jobs # 3 are in /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_160518/res/
crab: Results of Jobs # 4 are in /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_160518/res/
crab: Results of Jobs # 5 are in /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_160518/res/
crab: Results of Jobs # 6 are in /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_160518/res/
crab: Results of Jobs # 7 are in /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_160518/res/
crab: Results of Jobs # 8 are in /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_160518/res/
crab: Results of Jobs # 9 are in /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_160518/res/
Log file is /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_160518/log/crab.log
Use the -report option
As for the MonteCarlo data example, it is possible to run the report command:
crab -report -c <dir name>
the report command returns info about correctly finished jobs, that means jobs with JobExitCode = 0 and ExeExitCode = 0
$ crab -report
crab: Version 2.9.1 running on Tue Oct 15 15:55:10 2013 CET (13:55:10 UTC)
crab. Working options:
scheduler remoteGlidein
job type CMSSW
server OFF
working directory /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_160518/
crab: error detecting glite version
crab: error detecting glite version
crab: --------------------
Dataset: /SingleMu/Run2012B-13Jul2012-v1/AOD
Remote output :
SE: T2_IT_Legnaro t2-srm-02.lnl.infn.it srmPath: srm://t2-srm-02.lnl.infn.it:8443/srm/managerv2?SFN=/pnfs/lnl.infn.it/data/cms/store/user/fanzago/SingleMu/FanzagoTutGrid_data/${PSETHASH}/
Total Events read: 264540
Total Files read: 21
Total Jobs : 9
Luminosity section summary file: /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_160518/res/lumiSummary.json
# Jobs: Retrieved:9
----------------------------
crab: Summary file of input run and lumi to be analize with this task: /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_160518/res/inputLumiSummaryOfTask.json
crab: to complete your analysis, you have to analyze the run and lumi reported in the //afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_160518/missingLumiSummary.json file
Log file is /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_131014_160518/log/crab.log
where the content of files containing the luminosity info about the task are:
the original lumiMask.json file written in the crab,.cfg file and used during the creation of your task
$ cat Cert_190456-208686_8TeV_PromptReco_Collisions12_JSON.txt
{"190645": [[10, 110]], "190704": [[1, 3]], "190705": [[1, 5], [7, 65], [81, 336], .... "208686": [[73, 79], [82, 181], [183, 224], [227, 243], [246, 311], [313, 463]]}
the lumi sections that your created jobs have to analyze (that are info used as arguments of your jobs)
$ cat crab_0_131014_160518/res/inputLumiSummaryOfTask.json
{"194305": [[84, 85]], "194108": [[95, 96], [117, 120], [123, 126], [149, 152], [154, 157], [160, 161], [166, 169], [172, 174], [176, 176], [185, 185], [187, 187], [190, 191], [196, 197], [200, 201], [206, 209], [211, 212], [216, 221], [231, 232], [234, 235], [238, 243], [249, 250], [277, 278], [305, 306], [333, 334], [438, 439], [520, 520], [527, 527]], "194120": [[13, 14], [22, 23], [32, 33], [43, 44], [57, 57], [67, 67], [73, 74], [88, 89], [105, 105], [110, 111], [139, 139], [144, 144], [266, 266]], "194224": [[94, 94], [111, 111], [257, 257], [273, 273], [324, 324]], "194896": [[35, 35], [68, 69]], "194424": [[63, 63], [92, 92], [121, 121], [123, 123], [168, 173], [176, 177], [184, 185], [187, 187], [199, 200], [202, 203], [207, 207], [213, 213], [220, 221], [256, 256], [557, 557], [559, 559], [562, 562], [564, 564], [599, 599], [602, 602], [607, 607], [609, 609], [639, 639], [648, 649], [656, 656], [658, 658], [660, 660]], "194631": [[222, 222]], "193998": [[66, 113], [115, 119], [124, 124], [126, 127], [132, 137], [139, 154], [158, 159], [168, 169], [172, 172], [174, 176], [180, 185], [191, 192], [195, 196], [233, 234], [247, 247]], "194027": [[93, 93], [109, 109], [113, 115]], "194778": [[127, 127], [130, 130]], "195947": [[27, 27], [36, 36]], "195099": [[77, 77], [106, 106]], "196200": [[66, 67]], "194711": [[1, 4], [11, 17], [19, 19], [25, 30], [33, 38], [46, 49], [54, 55], [62, 62], [64, 64], [70, 71], [82, 83], [90, 91], [98, 99], [102, 103], [106, 107], [112, 115], [123, 124], [129, 130], [140, 140], [142, 142], [614, 617]], "195552": [[256, 256], [263, 263]], "195013": [[133, 133], [144, 144]], "195868": [[16, 16], [20, 20]], "194912": [[130, 131]], "194699": [[38, 39], [253, 253], [256, 256]], "194050": [[353, 354], [1881, 1881]], "194075": [[82, 82], [101, 101], [103, 103]], "194076": [[3, 6], [9, 9], [16, 17], [20, 21], [29, 30], [33, 34], [46, 47], [58, 59], [84, 87], [93, 94], [100, 101], [106, 107], [130, 131], [143, 143], [154, 155], [228, 228], [239, 240], [246, 246], [268, 269], [284, 285], [376, 377], [396, 397], [490, 491], [718, 719]], "195970": [[77, 77], [79, 79]], "195919": [[5, 6]], "194644": [[8, 9], [19, 20], [34, 35], [58, 59], [78, 79], [100, 100], [106, 106], [128, 129]], "196250": [[73, 74]], "195164": [[62, 62], [64, 64]], "194199": [[114, 115], [124, 125], [148, 148], [156, 157], [159, 159], [207, 208], [395, 395], [401, 402]], "194480": [[621, 622], [630, 631], [663, 664], [715, 716], [996, 997], [1000, 1001], [1010, 1011], [1020, 1021], [1186, 1187], [1190, 1193]], "196531": [[284, 284], [289, 289]], "195774": [[150, 150], [159, 159]], "196027": [[150, 151]], "193834": [[1, 35]], "193835": [[1, 20], [22, 26]], "193836": [[1, 2]]}
the lumi sections really analyzed by your correctly terminated jobs
$ cat crab_0_131014_160518/res/lumiSummary.json
{"195947": [[27, 27], [36, 36]], "194108": [[95, 96], [119, 120], [123, 126], [154, 157], [160, 161], [166, 167], [172, 174], [176, 176], [185, 185], [187, 187], [196, 197], [211, 212], [231, 232], [238, 241], [249, 250], [277, 278], [305, 306], [333, 334], [438, 439], [520, 520], [527, 527]], "193998": [[66, 66], [69, 70], [87, 88], [90, 100], [103, 105], [108, 109], [112, 113], [115, 119], [124, 124], [126, 126], [132, 135], [139, 140], [142, 142], [144, 154], [158, 159], [168, 169], [172, 172], [174, 176], [180, 185], [191, 192], [195, 196], [233, 234]], "194224": [[94, 94], [111, 111], [257, 257]], "194424": [[63, 63], [92, 92], [121, 121], [123, 123], [168, 173], [176, 177], [184, 185], [187, 187], [207, 207], [213, 213], [220, 221], [256, 256], [599, 599], [602, 602], [607, 607], [609, 609], [639, 639], [656, 656]], "194631": [[222, 222]], "196250": [[73, 74]], "194027": [[93, 93], [109, 109], [113, 115]], "194778": [[127, 127], [130, 130]], "195099": [[77, 77], [106, 106]], "194711": [[140, 140], [142, 142]], "195552": [[256, 256], [263, 263]], "195868": [[16, 16], [20, 20]], "194912": [[130, 131]], "194699": [[253, 253], [256, 256]], "195970": [[77, 77], [79, 79]], "194076": [[3, 6], [29, 30], [33, 34], [58, 59], [84, 87], [93, 94], [106, 107], [130, 131], [154, 155], [228, 228], [239, 240], [246, 246], [268, 269], [284, 285], [718, 719]], "194050": [[353, 354], [1881, 1881]], "195919": [[5, 6]], "194644": [[34, 35], [78, 79]], "195164": [[62, 62], [64, 64]], "194199": [[114, 115], [124, 125], [148, 148], [156, 157], [159, 159], [207, 208]], "196531": [[284, 284], [289, 289]], "196027": [[150, 151]], "193834": [[1, 24], [27, 30], [33, 34]], "193835": [[19, 20], [22, 23], [26, 26]], "193836": [[1, 2]]}
and the missing lumi (difference between the original lumiMask and lumiSummary) that you can analyze creating a new task and using this file as new lumiMask file
$ cat crab_0_131014_160518/res/missingLumiSummary.json file
{"190645": [[10, 110]],
"190704": [[1, 3]],
"190705": [[1, 5], [7, 65], [81, 336], [338, 350], [353, 383]],
"190738": [[1, 130], [133, 226], [229, 355]],
.....
"208541": [[1, 57], [59, 173], [175, 376], [378, 417]],
"208551": [[119, 193], [195, 212], [215, 300], [303, 354], [356, 554], [557, 580]],
"208686": [[73, 79], [82, 181], [183, 224], [227, 243], [246, 311], [313, 463]]}
To create a task to analyze the missing lumis of the original lumiMask you can use the missingLumiSummary.json file as new lumiMask.json file in your crab.cfg.
As before, you can decide the split you want, and using the same publish_data_name the news outputs will be published in the same dataset of previuosly task
[CMSSW]
lumis_per_job = 50
number_of_jobs = 4
pset = tutorial.py
datasetpath = /SingleMu/Run2012B-13Jul2012-v1/AOD
lumi_mask = crab_0_131014_160518/res/missingLumiSummary.json
output_file = outfile.root
[USER]
return_data = 0
copy_data = 1
publish_data =1
storage_element = T2_xx_yyyy
publish_data_name = FanzagoTutGrid_data
[CRAB]
scheduler = remoteGlidein
jobtype = cmssw
$ crab -create -cfg crab_missing.cfg
[fanzago@lxplus0445 SLC6]$ crab -create -cfg crab_data.cfg
crab: Version 2.9.1 running on Tue Oct 15 17:10:16 2013 CET (15:10:16 UTC)
crab. Working options:
scheduler remoteGlidein
job type CMSSW
server OFF
working directory /afs/cern.ch/user/f/fanzago/scratch0/TEST_RELEASE/TEST_PATC2/TEST_2_8_2/TUTORIAL/TUT_5_3_11/SLC6/crab_0_131015_171016/
crab: error detecting glite version
crab: error detecting glite version
crab: Contacting Data Discovery Services ...
crab: Accessing DBS at: http://cmsdbsprod.cern.ch/cms_dbs_prod_global/servlet/DBSServlet
crab: Requested (A)DS /SingleMu/Run2012B-13Jul2012-v1/AOD has 14 block(s).
crab: SE black list applied to data location: ['srm-cms.cern.ch', 'srm-cms.gridpp.rl.ac.uk', 'T1_DE', 'T1_ES', 'T1_FR', 'T1_IT', 'T1_RU', 'T1_TW', 'cmsdca2.fnal.gov', 'T3_US_Vanderbilt_EC2']
crab: Requested number of jobs reached.
crab: 4 jobs created to run on 200 lumis
crab: Checking remote location
crab: WARNING: The stageout directory already exists. Be careful not to accidentally mix outputs from different tasks
crab: Creating 4 jobs, please wait...
crab: Total of 4 jobs created.
Log file is /afs/cern.ch/user/f/fanzago/scratch0/TEST_RELEASE/TEST_PATC2/TEST_2_8_2/TUTORIAL/TUT_5_3_11/SLC6/crab_0_131015_171016/log/crab.log
and submit them as usual. The created jobs will analyze part of the missing lumi of the original lumiMask.json file.
- If you select total_number_of_lumis = -1 instead of lumi_per_job or number_of_job, the new task will analyze all the missing lumi.
Run Crab retrieving your output (without copying to a Storage Element)
You can also run your analysis code without interacting with a remote Storage Element, but retrieving the outputs to your workspace area (under the res dir of the project).
Here below an example of the CRAB configuration file, coerent with this tutorial:
[CMSSW]
total_number_of_events = 100
number_of_jobs = 10
pset = tutorial.py
datasetpath = /RelValZMM/CMSSW_5_3_6-START53_V14-v2/GEN-SIM-RECO
output_file = outfile.root
[USER]
return_data = 1
[CRAB]
scheduler = remoteGlidein
jobtype = cmssw
And with this crab.cfg in place you can re-do de workflow as described before (a part of the publication step):
- creation
- submission
- status progress monitoring
- output retrieval (in this step you'll be able to retrieve directly the real output produced by your pset file)
Where to find more on CRAB
Note also that all CMS members using the Grid must subscribe to the Grid Annoucements CMS.HyperNews forum
.
Review status
Complete Review, Minor Changes. Page gives a good idea of doing a physics analysis using CRAB
Responsible: FedericaFanzago
CRAB at CAF/LSF at CERN
Complete:
Detailed Review status
This document describes how to use CRAB at CERN for direct submission to the batch system LSF or CERN Analysis Facility (CAF). The responsable of CAF is Peter Kreuzer.
Useful information of CAF can be found in the CAF twiki page.
Prerequisites
- Dataset you want to access has to be available at the CAF, so it must be registered in the CAF DBS
- If you run on CAF, you have to be authorized to do so. In this page: https://twiki.cern.ch/twiki/bin/view/CMS/CAF#User_Permissions you can find the sub-groups and the correspond leader. If you know your sub-group, you can contact the leader for the authorization.
- CRAB StandAlone (direct submission)
- Jobs has to be submitted from an afs directory, from a node with LSF access for exemple on lxplus
- Since in this case you are effectively using CRAB as a convenience tool to do LSF submission from your shell, you need to setup the environment as usual:
- Please note that you must be sure to have enough quota on your afs area. Large output should be put on castor (look at CAF stageout below)
- Even if you decided to send the output to castor, the stdout/err and the Framework Job Report will be returned back to your afs area in any case.
- Removes the requirement to use an AFS directory and a host with LSF access, so can also submit from your desktop/laptop
Running
The workflow is exactly the same of that you would follow to access data on the Grid (see: CRAB Tutorial). So you setup your CMSSW area, you develop your code, test it on a (small) part of a dataset and then you configure CRAB to create and submit identical jobs to CAF to analyze the full Dataset.
In the crab.cfg configuration file, you have just to put under the [CRAB]
section:
scheduler = caf
The available CAF queues are:
cmscaf1nh
cmscaf1nd
cmscaf1nw
Running on the CAF, using caf
as scheduler instead of lsf
, the longest queue will be selected automatically (cmscaf1nw).
If you need to select a different queue you can fill the parameter queue
under the [CAF]
section with either cmscaf1nh or cmscaf1nd (i.e. queue = cmscaf1nh
).
If you know that your jobs are short, it should be more efficient to use shorter queues.
CAF stageout
If you are running jobs at CAF then the required stageout configuration is:
- Stage out into CAF user area (T2_CH_CERN is the offical site name for CAF):
[USER]
copy_data = 1
storage_element=T2_CH_CERN
user_remote_dir=xxx
the path where data will be stored is /store/caf/user/<username>/<user_remote_dir>
There is no support for staging out to the CAF-T1 from the GRID. The above instructions only apply for jobs running on the CAF itself.
Further details on CRAB and Stage out configurations available at this page.
CAF publication
You need the following in crab configuration file:
- (NOTE: the storage element where the data are copied have to be T2_CH_CERN):
[USER]
copy_data = 1
storage_element=T2_CH_CERN
publish_data=1
publish_data_name = data-name-to-publish (e.g. publish_data_name = JohnSmithTestDataVersion666 )
dbs_url_for_publication = https://cmsdbsprod.cern.ch:8443/cms_dbs_caf_analysis_01_writer/servlet/DBSServlet
The path where data will be stored is /store/caf/user/<username>/<primarydataset>/<publish_data_name>/<PSETHASH>
Review status
Responsible: MarcoCalloni
Last reviewed by:
5.8 Job Monitoring with CMS Dashboard
Complete:
Detailed Review Status
Contents
CMS Dashboard provides a web interface for Job Monitoring
Most of the CMS Job Submission Systems, including CRAB and PA, are instrumented to send monitoring information
to the CMS Dashboard. In addition to reports of the CMS Job Submission Systems, Dashboard collects information from the Grid Monitoring systems.
Monitoring data is stored in the central database and there is a web interface running on top of it and allowing CMS users to follow the progress of their jobs.
For the Dashboard monitoring you do not need to setup any environment. The only thing you need is a web browser.
In case you see a problem, please submit a bug in savannah:
https://savannah.cern.ch/projects/dashboard
mail to dashboard-support@cernNOSPAMPLEASE.ch
How to follow the progress of your tasks
- After submission of your task to the CRAB server you get back the Dashboard task monitoring link, which you can follow in order to get information about progress of your task.
- If you want to find information about multiple tasks submitted during a particular time range you can enter the application via the entry task monitoring page:
http://dashb-cms-job-task.cern.ch/dashboard/request.py/taskmonitoring
- Choose your identiy in the "Select a User" window and the time window to define the tasks submitted during a given time range. You should get at the screen the list of all your tasks submitted over the time range you have chosen.
As a rule, on the Dashboard UI, user identity is defined by user name and family name. But it is not always the case. User identity is retrieved from the Grid certificate subject and depends on it's format.
To check your Dashboard user identifier, go to the Dashboard interactive job monitoring page:
http://dashb-cms-job.cern.ch/dashboard/request.py/jobsummary
Click on the bar with the 'analysis' label and sort by user. Looking on the user names listed next to the bars or in the table below, you will find out your Dashboard identifier. At the task monitoring page you see the list of all your tasks with the distribution of jobs by their current status.
You can bookmark the link to task monitoring application containing information only about your tasks:
http://dashb-cms-job-task.cern.ch/dashboard/request.py
/taskmonitoring#action=tasksTable&usergridname=USERNAME
where USERNAME is your Dashboard identifier
You can also bookmark the link to a particular task:
http://dashb-cms-job-task.cern.ch/dashboard/request.py/
taskmonitoring#action=taskJobs&usergridname=USERNAME&taskmonid=TASKMONITORID
where TASKMONITORID is project name in CRAB.
- The front page shows the overview of all tasks submitted during the selected time range (3 days by default) in table and graphical form.
Every page is reloaded every few minutes providing the most up-to-date information.
- Clicking on the information icon next to the task name you get the meta information of a particular task, like name of the input dataset, CMSSW version to be used, time stamp when task had been registered in the Dashboard.
- Clicking on the number of jobs corresponding to a given status gives you a detailed information of all jobs of a chosen category: Grid job id, identifier of the job inside the task, how many times the job had been resubmitted, site where job is processed, time stamps of the job processing (UTC).
- In case when the job was resubmitted multiple times, clicking at the number at the "Submission Attempts" column allows you to see all resubmissions corresponding to a given job inside the task. Here we are referring not to the resubmissions triggered by resource broker, but to resubmissions done by the user.
- The application provides various graphs showing distribution of jobs by site, time, failure reason, and graphs showing time consumed by the task, the graph showing progress of the task in terms of processed events. Th plots either generated by default on a appropriate page or can be selected clicking on the "Plot Selection" link in the table. Graphs can be zoomed-in and zoomed-out.
- Jobs listed in the "Successful" category are those which had accomplished properly from the application point of view and for which Dashboard did not get evidence that the job had been aborted by the Grid.
- The jobs listed as "Failed" are those which either failed from the application point of view or had been aborted by the Grid or cancelled by the user/crab_server. By failure from the application point of view we mean non 0 status from the CMS application or problems related to the saving of the output files at the storage element, or problems while sourcing of the CMS environment at the site. To discover the reason of failure of the particular job, click on the number of jobs in the "Failed" category, you get the list of all failed jobs with the Grid status and application exit code. Moving cursor at the status value you get more detailed reason of failure.
- Please, pay attention, for back navigation between the task monitoring pages do not use "back" button in the browser, use buttons provided on the task monitoring pages.
- On the task page showing the overview of all jobs belonging to a task next to the task name you see the small Dashboard logo icon. Clicking on it you get the bar plot of distribution of jobs of the task by a chosen attribute, like site, computing element or resource broker. Sometimes this distribution can help you to find the problem of a given site, CE or resource broker and can allow you to exclude the problematic service or site when you resubmit your jobs. See more details in the section "Using Dashboard Interactive Interface".
If you see any discrepancy between information in the Dashboard and the output of the Crab status command
Sometime you can notice the discrepancy of the information in the Dashboard and the Crab status command.
- Dashboard does not have user credentials and can not directly query Grid Logging and Bookkeeping system to get status about a particular job. It relies on the job status reports sent to the dashboard either from the jobs themselves or job status information of the Grid monitoring systems like RGMA or ICRTM. That is why if you see non-consistent data in Crab and Dashboard related to the Grid status of a given job, you should believe Crab.
- On the other hand Dashboard gets real time information about jobs which is reported to the Dashboard from the jobs running at the worker nodes. So when you see that according to the Dashboard the job had terminated while Crab still considers the job to be running, it means that the job had already finished on the worker node and sent it's exit status, while Grid Logging and Bookkeeping system did not yet update the job status. If you see that a delay in update of crab status for the job which had terminated according to Dashboard takes too long (more than half a hour), the problem can be related to the Grid services.
- For some sites, e.g. T3_US_FNALLPC, nothing is reported to dashboard until the job is done.
You can follow the CRAB3 Troubleshooting guide for more on how to troubleshoot your job and contact crab support.
Using Dashboard Interactive Interface
One of the purposes of the Dashboard Interactive Interface is to show the correlations of the job failures or inefficiencies in processing (pending too long in the queue for example) with a particular site or Grid service like Resource Broker.
When you see that all jobs of a particular task are failing and it is not clear to you whether it is a problem of your code or the problem related to the site misconfiguration, Dashboard Interactive Interface can help you to find it out.
- First thing you can check is whether jobs of other users are failing at the same site with the same failure code. Go to:
http://dashb-cms-job.cern.ch/dashboard/request.py/jobsummary
By default on the interactive user interface you see all jobs submitted during the selected time window. If you tick the checkbox 'terminated' then you would see all jobs which are currently either in pending or running status or those which had been terminated from the date selected as the beginning of the time range to now regardless the time when the jobs had been submitted. Be aware that all dates in Dashboard including UI are UTC.
You can sort the jobs by user, site, computing element, resource broker, application, task. Clicking at any bar of the plot would allow you to sort the subset of jobs shown in a particular bar by various attributes. Clicking at any number in the table would allow you to get detailed information about the selected subset of jobs, like processing time stamps, exit code of application, Grid job id etc...
If you sort your jobs by task and then click on a particular task name in the table, the task monitoring page for this task would be opened.
- Trying to understand the reason of the failures
Click on the bar with the 'analysis' label and sort by site. Dark green colour corresponds to the jobs which finsihed properly, pink one corresponds to the jobs which were properly handled by the Grid, but failed from the application point of view. Red colour corresponds to the jobs aborted by the Grid. Clicking on the number corresponding to the failed or aborted jobs in the table below gives you the list of all failed or aborted jobs with their failure reason.
Looking at the plot provided via link below you can see that there are no jobs which succeeded in Taipei
https://twiki.cern.ch/twiki/pub/CMSPublic/WorkBookMonitoringTutorial/tut0.pdf
Let's sort Taipei jobs by user
https://twiki.cern.ch/twiki/pub/CMSPublic/WorkBookMonitoringTutorial/Tut1.pdf
You see that there were several users running their jobs at the site and nobody managed to run the jobs properly.
Clicking on the number of the failed jobs in the table we get the detailed view of the failed jobs with application exit code 8000, which very often indicates the data access problems (beware those images are from some time ago with older CMSSW releases, now failure to open file gives exit code=8020). So failures of the jobs could be related to the site misconfiguration rather than to a problem in the user code. If the problematic site does not represent the only location of data required by your task you can put the site in the black list (ce_black_list=Site_Name) of the Crab configuration file and resubmit the task. If you feel you have to do this black listing, also contact the crab support team, more information at the CRAB3 Troubleshooting page.
Review Status
Complete review with minor fixes. The page gives a very good illustration of the Dashboard monitoring .
Responsible: JuliaAndreeva
Last reviewed by: DaveEvans 28 Feb 2008
5.9 The Role of the T2 Resources
Complete:
Detailed Review status
Goals of this page:
This page is intended to familiarize you with performing a large scale CMS analysis on the Grid. In particular, you will learn
- the role of the Tier-s for user analysis,
- the organization of data at the Tier-2 sites,
- how to find and request datasets,
- where to store your job output,
- and how to elevate, delete and deregister a private dataset.
It is important that you also become familar with running a Grid analysis with Crab.
Contents
Introduction
The Tier-2 centers in CMS are the only location, besides the specialized analysis facility at CERN, where users are able to obtain guaranteed access to CMS data samples. The Tier-1 centers are used primarily for organized processing and storage. The Tier-2s are specified with data export and network capacity to allow the centers to refresh the data in disk storage regularly for analysis. A nominal Tier-2 will deploy 810 TB of storage for CMS in 2012. The CMS expectation for the global 2012 Tier-2 capacity is 27 PB of usable disk space. In order to manage such a large and highly distributed resource CMS has tried to introduce policy and structure to the Tier-2 storage and processing.
Storage Organisation at a Tier-2
Apart from 30 TB storage space for central services, like MC production, and buffers, the main storage areas of interest for a user are:
- 200 TB central space
Here datasets of major interest for the whole collaboration, like primary skims or the main Monte Carlo samples, are stored. This space is controlled by AnalysisOperations.
- 250 TB (125 TB * 2 groups) space for the detector and physics groups.
Datasets which are of particular interest for the groups associated to a Tier-2 site, like sub-skims or special MC samples.
- In the order of (e.g. 40 users * 4 TB) 160 TB "Grid home space" for local/national user.
This quota can be extended by additional local/national resources. Mainly the output files from Crab user analysis jobs will be stored in this area.
- 170 TB local space.
Data samples of interest for the local or national community. The movement and deletion of the data is fully under the responsibility and control of the site.
Sites larger than nominal will provide resources for more central space, three groups, and additional regional space. Sites smaller than nominal may provide resources for only one physics group, or only central space, or if sufficiently small, only for simulated event production.
How to find a dataset?
If you have identified the physics processes which contribute to the background of your analysis and for your signal you want to know over which datasets you have to run your analysis. From the dataset names this is usually not so obvious. As a general tip you should subscribe to your preferred detector & physics (PAG/POG/DPG) groups’ Hypernews mailing list and to hn-cms-physics-announcements. Sometimes your group provides this information on the group's information page and documentation systems like TWikis or webpages. Ask your colleagues! If you have identified the names of the relevant datasets you should check whether they are available for analysis by utilizing the DAS
Data Aggregation System or alternatively the PhEDEx
Physics Experiment Data Export.
How to request a replication of a dataset?
The datasets you want to analyse have to be fully present at a Tier-2 (or at your local Tier-3) site. If shown to be present only at a Tier-1 center you can request a Phedex transfer to copy datasets to Tier-2 (and Tier-3) sites. Please consult the responsibles of a Tier-2/3 site operated by your national community if the datasets will be accounted towards local Tier-2/3 space, or the data managers of the physics group you are associated with whether they agree to store the datasets in their Tier-2 group space. After their agreement please give a reasonable explaination in the Phedex request comment field and choose the appropriate group from the corresponding pull-down menu or use local in case of transfers for the local/national community. With Phedex transfers you can not copy datasets into your personal Grid home space.
Where to store your data output?
Usually your Crab analysis job produces an amount of output which is too large to be transfered by the Grid sandbox mechanism. Therefore you should direct your job output to your associated Grid user home storage space using the stage-out option in Crab. Using CERN resources like Castor pools will probably be restricted in the near future, so for the majority of the CMS users a Tier-2 site will provide the output capacity. Usually your Grid home space will be at a Tier-2 site which your country is operating for CMS, if more than one site is present, ask your country's IT contact persons how they distribute their users internally. In case your institute or lab operates a Tier-3 site which has a suffient capability to receive CMS analysis output data over the Grid, also such a site could be used, however CMS support is only on best effort basis. Countries without own CMS Tier-2 centers and with no functional Tier-3 should contact their country representatives who have to negotiate with other sites to provide storage space for guest users.
Your associated Tier-2 provides you with in the order of 4 TB (exact amount to be negotiated with your Tier-2) space, usually only protected on hardware (e.g. Raid disks) level but without a backup mechanism. If there are additional local or national resources available it could be more, for details consult your Tier-2 contact persons.
Presently the Grid storage systems do not provide a quota system, therefore the local Tier-2 support will review the user space utilization regularly. Please try to be carefull not to overfill your home area.
If you register the output of your Crab job in DBS, all CMS users can have access to your data.
How to move a private dataset into offical space and how to delete and deregister a dataset?
In CMS official datasets and user datasets are differentiated. Whereas official datasets are produced centralized, the users are allowed to produce and store their own datasets containing any kind of data at a Tier-2 center. There are no requirements concerning data quality, usefulness and appropriate size to be stored on tape. The data is located in the private user space at the users home Tier-2 and can be registered in a local scope bookkeeping to use provided Grid tools in order to perform a distributed analysis. In principle, this dataset can be analysed by any user of the collaboration, however only at the Tier-2 center hosting the dataset, which has naturally a limited number of job slots. Later it could be possible, that the dataset created by the user becomes important for many other users or even a whole analysis group. To provide a better availability it is reasonable to distribute the dataset to further Tier-2 centers or even to a Tier-1 center for custodial storage on tape. However, the CMS data transfer system can only handle official data registered in the central bookkeeping. Therefore, it is necessary that the user dataset becomes an official dataset fitting all the requirements of CMS. The StoreResults service provides a mechanism to elevate user datasets to central bookkeeping by doing the following tasks:
- Validation, through authentication and roles, ensures that the data is generally useful.
- Merge the files into a size suitable for tape storage.
- Inject data into the central bookkeeping and data transfer system.
The current system is ad-hoc based around a Savannah request/problem tracker for approvals and on the legacy CMS ProdAgent production framework. For the long term future a complete rewrite based on forthcoming new common CMS tools is presently discussed.
Further information can be found in URL1.
To delete data from the user‘s home space the usage of Grid commands and the knowlegde of the physical file names is necessary. Please contact your local Tier-2 data manager and ask for advice and help.
To invalidate private dataset registrations in a local-scope database in order to synchronise with deleted data samples is not a trivial action so far, a user friendly tool might become available in the future. Until then please consult the DBS removal instruction pages.
Information sources
CMS computing Technical Design Report
Presentation
(for 2009 storage resources)
Review status
Review with minor modifications. Link to monitor the status of different sites added. The page gives a good overview of the Tier2 resources.
Responsible: ThomasKress
Last reviewed by: IanFisk 19 May 2009
5.10 Transferring MC Sample/Data Files
Complete:
Detailed Review status
Goals of this page:
This page is intended to familiarize you with making a PhEDEx subscription and monitoring the progress of transfers to a site. In particular, you will learn:
- what PhEDEx is?
- why you should transfer data to your T2?
- how to make a PhEDEx subscription to a site?
- how to monitor PhEDEx transfers?
Contents
PhEDEx is the CMS data placement tool. It sits above various grid middleware [SRM (Storage Resource Manager), FTS (File Transfer Service)] to manage large scale transfers between CMS centres. CMS sites run a series of agents which run the requested transfers, verify that transfers have completed correctly and publish what data is available for transfer.
A normal user does not interact with this machinery. They will fill in a web form to make a request for data transfer to a site. Once this is approved the PhEDEx machinery takes over and makes sure that the transfer is complete.
Why do I need to transfer MC sample or Data files?
For general analysis CMS will use T2 centres, as the T0 and T1 sites will be busy carrying out reconstruction, re-reconstruction, skimming and AOD production. This means MC sample or data files need to be transferred from the T1 centres out to the T2's. It is up to the people working at T2 sites to choose which MC sample or data files goes to the site, and make the appropriate PhEDEx request.
If you run a CRAB job you and find that your MC sample or data is not located at a T2 centre you can request to have it transferred there using PhEDEx.
You do not need to have your MC sample or data at "your" T2 to run analysis, CRAB will run on it in any location. However, you may find it useful to make a copy at your local T2 as this will increase the number of sites you can run your analysis at.
For PhEDEx requests, if you are working with an analysis group, you can choose that group, however, that may mean files will be deleted in a regular cleanup. Otherwise choose "local" for the User Group.
No permission to view CMS.PhedexUserDocsSubscribeData
Copy files from other sites using gfal-copy command
Instructions are on the CRAB3 FAQ twiki on how to use gfal tools to find and copy files from another site's Storage Element.
Instructions for FNAL-LPC
The copyfiles.py
script can be used to copy single files or a directory of files using gfal-copy or xrdcp from another site to T3_US_FNALLPC.
The getSiteInfo.py
script can be useful to get the information of the site's endpoint to obtain a single file through gfal-copy, it is used by the copyfiles.py script above.
Review status
Substantial modifications due to depreciation of DBS. Instructions with snapshots for
PheDex subscription using DAS interface added.
Responsible: SimonMetson
Last reviewed by: YourName - date
5.10 Data Organization Explained
Complete:
Detailed Review status
Goals of this page:
This page is intended to provide you with an overview of the terms used in Data Management in CMS, thus providing you an appreciation to how
data is organized. It is background information only.
Contents
Dataset Bookkeeping System (DBS): “Which data exist?”
The Dataset Bookkeeping System (DBS) provides the means to define, discover and use CMS event data.
The main features that DBS provides are:
- Data Description: keeps dataset definition along with attributes characterising the dataset like the application that produced the data, the type of content resulting from a degree of processing applied to the data (RAW, RECO, etc),etc… The DBS also provides information regarding the “provenance” of the data it describes.
- Data Discovery: stores information about (real and simulated) CMS data in a queryable format. The supported queries allow users to discover available data and how they are organized (logically) in term of packaging units (files and file-blocks).
Answers the question “Which data exist?”
- Easiest way for user to query this information is via the Data Aggregation Service (DAS) as described in Chapter Locating Data Samples
Data Location Service (DLS): "Where is the data?"
The Data Location Service (DLS) provides the means to locate replicas of data in the distributed computing system.
The DLS provide the names of Storage Elements of sites hosting the data.
Answers the question “Where is the data?”
The Event Data Model (EDM) in CMSSW is based on simple files.
In the data management you will see two terms used when discussing files:
Logical File Name (LFN)
- This is a site-independent name for a file.
- It doesn't contain either the actual protocol used to read the file or any of the site-specific information about the place where it is located.
- it is preferred that you use this for all production files as then it is possible for a site to change specifics of the access and location without breaking your config file.
- A production LFN in general begins with /store and looks like this in a cmsRun cfg file:
process.source = cms.Source("PoolSource",
fileNames = cms.untracked.vstring(
'/store/mc/SAM/GenericTTbar/GEN-SIM-RECO/CMSSW_5_3_1_START53_V5-v1/0013/CE4D66EB-5AAE-E111-96D6-003048D37524.root'
)
)
Physical File Name (PFN)
- This is site-dependent name for a file.
- Local access to a file at a site. (Note that reading files at remote sites specifying protocol in PFN doesn’t work)
- The cmsRun application will automatically convert production LFN's into the appropriate PFN for the site where you are running. So you don't need to know the PFN yourself!!
- If you really want to know the PFN, the algorithm that convert LFN to PFN is site dependent and is defined in the so called TrivialFileCatalog at the site ( TrivialFileCatalog of the various sites are in CVS COMP/SITECONF
/SiteName/PhEDEx/storage.xml )
The EdmFileUtil utility in your CMSSW environment can be used to get the PFN from a given LFN:
cd work/CMSSW_5_3_5/src/
cmsenv
edmFileUtil -d /store/mc/SAM/GenericTTbar/GEN-SIM-RECO/CMSSW_5_3_1_START53_V5-v1/0013/CE4D66EB-5AAE-E111-96D6-003048D37524.root
will results in:
root://eoscms//eos/cms/store/mc/SAM/GenericTTbar/GEN-SIM-RECO/CMSSW_5_3_1_START53_V5-v1/0013/CE4D66EB-5AAE-E111-96D6-003048D37524.root?svcClass=default
For example accessing data locally at CERN you have the algorithm:
PFN = root://eoscms//eos/cms/ + LFN
and the cmsRun cfg file looks like:
process.source = cms.Source("PoolSource",
fileNames = cms.untracked.vstring(
'root://eoscms//eos/cms/store/mc/SAM/GenericTTbar/GEN-SIM-RECO/CMSSW_5_3_1_START53_V5-v1/0013/CE4D66EB-5AAE-E111-96D6-003048D37524.root?svcClass=default'
)
)
File Blocks
- Files are grouped together into FileBlock.
- A file block is the minimum quantum of data that is replicated between sites.
- Each given file block may be at one or more sites.
Dataset
- Fileblocks are grouped in datasets.
- Dataset is a set of fileblocks corresponding to a single sample and produced with a single cfg file.
DatasetPath
The DatasetPath is a string that identifies a dataset. It consists of 3 parts:
/Primarydataset/Processeddataset/DataTier
where:
- Primary dataset: name that describes the physics channel
- Processed dataset: name that describe the kind of processing applied
- Data Tier: describes the kind of event information stored from each step in the simulation and reconstruction chain. Examples of data tiers include RAW and RECO, and for MC, GEN, SIM and DIGI. A given dataset may consist of multiple data tiers, e.g., the term GEN-SIM-DIGI-RECO includes the generation (MC), the simulation (Geant), digitalization and reconstruction steps.
Review status
Complete review. Information regarding deprecation of DBS and migration to DAS has been added. Figures have been added for better understanding.
Last reviewed by: Main.David L Evans - fill in date when done -
Responsible: StefanoBelforte
-- FrankWuerthwein - 06-Dec-2009