TWiki
>
CMSPublic Web
>
SWGuide
>
WorkBook
>
WorkBookCRAB2Tutorial
(revision 110) (raw view)
Edit
Attach
PDF
<!-- /ActionTrackerPlugin --> <LINK href="/twiki/pub/TWiki/KupuContrib/kuputwiki.css" type=text/css rel=stylesheet> <style type="text/css" media="all"> pre { text-align: left; padding: 10px; color: black; font-size: 12px; } pre.command {background-color: lightgrey;} pre.cfg {background-color: lightblue;} pre.code {background-color: lightpink;} pre.output {background-color: lightgreen;} div{ font-family:arial,verdana,sans-serif; font-size:13px; margin-top:15px; margin-bottom:15px; width: 100%; white-space: pre-wrap; /* css-3 */ white-space: -moz-pre-wrap; /* Mozilla, since 1999 */ white-space: -pre-wrap; /* Opera 4-6 */ white-space: -o-pre-wrap; /* Opera 7 */ word-wrap: break-word; /* Internet Explorer 5.5+ */ } div.command {background-color: lightgrey;} div.cfg {background-color: lightblue;} div.code {background-color: lightpink;} div.output {background-color: lightgreen;} </style> ---+ 5.6.1 Running CMSSW code on the Grid using !CRAB %COMPLETE5% %BR% [[#ReviewStatus][Detailed Review status]] ---++ WARNING * *You should always use [[https://twiki.cern.ch/twiki/bin/view/CMS/SWGuideCrab#How_to_get_CRAB][latest production CRAB version]]* * *This tutorial may be outdated* since it was prepared for a live lesson at a specific time and thus refers to a particular dataset and CMSSW version that may not be available when you read this (and where you try it). * as of June 2013 you should be able to kickstart your Crab work using CMSSW 5_3_8 and the dataset /RelValProdTTbar/JobRobot-MC_3XY_V24_JobRobot-v1/GEN-SIM-DIGI-RECO as MC data and /SingleMu/Run2012B-13Jul2012-v1/AOD as real data. <!-- ---++ Contents * [[#PreRequisites][Prerequisites to run the tutorial]] * [[#SetUpEnv1][ Recipe for the tutorial]] * [[#SetUpEnv][Setup local Environment and prepare user analysis code]] * [[#SetUpCRABEnv][CRAB setup]] * [[#LocateCfg][Locate the dataset and prepare !CRAB submission]] * [[#SelectData][ Select data to access]] * [[#SetConfiguration][CRAB configuration]] * [[#SeCopy][Run CRAB on CMS.MonteCarlo data copying the output to an SE]] * [[#Conf1][CRAB configuration file for CMS.MonteCarlo data]] * [[#SetRunCrab][Run Crab]] * [[#JobCreation][ Job Creation]] * [[#CMS.JobSubmission][ Job Submission]] * [[#JobStatusCheck][ Job Status Check]] * [[#JobOutputRetrieval][ Job Output Retrieval]] * [[#CrabReport][Report information]] * [[#CopyData][Copy the output from the SE to the local User Interface]] * [[#CrabPub][Run Crab publishing]] * [[#RealData][Run !CRAB on real data copying the output to an SE]] * [[#Conf2][CRAB configuration file for real data with lumi mask]] * [[#JobCreation2][ Job Creation]] * [[#CMSJobSubmission2][ Job Submission]] * [[#JobStatusCheck2][ Job Status Check]] * [[#JobOutputRetrieval2][ Job Output Retrieval]] * [[#CrabReport2][Report information]] * [[#CopyData2][Copy the output from the SE to the local User Interface]] * [[#JustSe][Run Crab retrieving your output (without copying to a Storage Element)]] * [[#MoreDoc][Where to find more on CRAB]] --> Contents: %TOC% #PreRequisites ---++ Prerequisites to run the tutorial * to have a valid Grid certificate * to be registered to the CMS virtual organization * *to get the Grid certificate and to register to VO CMS please follow the [[https://twiki.cern.ch/twiki/bin/view/CMS/SWGuideRunningGridPrerequisites][CRAB howto instructions]]* * to be registered to the siteDB * *please follow the instruction at [[https://twiki.cern.ch/twiki/bin/view/CMS/SiteDBForCRAB][siteDB registration for CRAB]]* * to have access to lxplus machines or to an SLC5 User Interface #SetUpEnv1 ---++ Recipe for the tutorial For this tutorial we will refer to !CMS software: * *CMSSW_5_3_8* and we will use an already prepared CMSSW analysis code to analyze the sample: <!-- * The tutorial will focus on the basic workflow using the dataset: _/RelValBeamHalo/CMSSW_5_2_5_cand1-START52_V9-v1/GEN-SIM-DIGI-RAW_ (MC dataset) and _/SingleMu/Run2012B-TOPMuPlusJets-PromptSkim-v1/AOD_ (real data): [[#RealData][CRAB configuration file for real data with lumi mask]] --> * The tutorial will focus on the basic workflow using the dataset: _/RelValProdTTbar/JobRobot-MC_3XY_V24_JobRobot-v1/GEN-SIM-DIGI-RECO_ (MC dataset) and _/SingleMu/Run2012B-13Jul2012-v1/AOD_ (real data): [[#RealData][CRAB configuration file for real data with lumi mask]] We will use the central installation of CRAB available at CERN: <!-- * *CRAB_2_8_1* --> * *CRAB_2_8_5* The example is written to use the _csh_ shell family. If you want to use the Bourne Shell replace _csh_ with _sh_. *Legend of colors for this tutorial* <verbatim class="command"> BEIGE background for the commands to execute (cut&paste) </verbatim> <verbatim style="font-size: 13px" class="output"> GREEN background for the output sample of the executed commands (nearly what you should see in your terminal) </verbatim> %SYNTAX{ syntax="sh" style="width:200"}% BLUE background for the configuration files (cut&paste) %ENDSYNTAX% #SetUpEnv ---++ Setup local Environment and prepare user analysis code In order to submit jobs to the Grid, you *must* have an access to a LCG User Interface (LCG UI). It will allow you to access WLCG-affiliated resources in a fully transparent way. LXPLUS users can get an LCG UI via AFS by: <verbatim class="command"> source /afs/cern.ch/cms/LCG/LCG-2/UI/cms_ui_env.csh </verbatim> Install CMSSW project in a directory of your choice. In this case we create a "TESTfirst " directory: <verbatim class="command"> mkdir TEST cd TEST cmsrel CMSSW_5_3_8 #cmsrel is an alias of scramv1 project CMSSW CMSSW_5_3_8 cd CMSSW_5_3_8/src/ cmsenv #cmsenv is an alias for scramv1 runtime -csh </verbatim> For this tutorial we are going to use as CMSSW configuration file, the tutorial.py: %SYNTAX{ syntax="python"}% import FWCore.ParameterSet.Config as cms process = cms.Process('Slurp') process.source = cms.Source("PoolSource", fileNames = cms.untracked.vstring()) process.maxEvents = cms.untracked.PSet( input = cms.untracked.int32(10) ) process.options = cms.untracked.PSet( wantSummary = cms.untracked.bool(True) ) process.output = cms.OutputModule("PoolOutputModule", outputCommands = cms.untracked.vstring("drop *", "keep recoTracks_*_*_*"), fileName = cms.untracked.string('outfile.root'), ) process.out_step = cms.EndPath(process.output) %ENDSYNTAX% #SetUpCRABEnv ---++ !CRAB setup %BLUE%Setup on lxplus:%ENDCOLOR% In order to setup and use !CRAB from any directory, source the script =crab.(c)sh= located in =/afs/cern.ch/cms/ccs/wm/scripts/Crab/=, which always points to the latest version of !CRAB. After the source of the script it's possible to use !CRAB from any directory (typically use it on your CMSSW working directory). <verbatim class="command"> source /afs/cern.ch/cms/ccs/wm/scripts/Crab/crab.csh </verbatim> *Warning*: in order to have the correct environment, the order to source env files has always to be * source of UI env * setup of CMSSW software * source of !CRAB env #LocateCfg ---++ Locate the dataset and prepare !CRAB submission In order to run our analysis over a whole dataset, we have to find first the data name and then put it on the !crab configuration file. #SelectData ---+++ Data selection To select data you want to access, use the *DAS* web page where available datasets are listed [[https://cmsweb.cern.ch/das/][Data Aggregation Service (DAS)]] . For this tutorial we'll use : <!-- /RelValBeamHalo/CMSSW_5_2_5_cand1-START52_V9-v1/GEN-SIM-DIGI-RAW --> %SYNTAX{ syntax="sh" style="width:200"}% /RelValProdTTbar/JobRobot-MC_3XY_V24_JobRobot-v1/GEN-SIM-DIGI-RECO (MC data) %ENDSYNTAX% * Beware: datasets availability as sites changes with time, if you are trying to follow this tutorial after the date it was given, you may need to use another one #SetConfiguration ---++ !CRAB configuration Modify the !CRAB configuration file =crab.cfg= according to your needs: a fully documented template is available at =$CRABPATH/full_crab.cfg=, a template with essential parameters is available at =$CRABPATH/crab.cfg= . The default name of configuration file is crab.cfg, but you can rename it as you want. *Copy one of these files in your local area*. For guidance, see the list and description of configuration parameters in the on-line [[http://cmsdoc.cern.ch/cms/ccs/wm/www/Crab/Docs/crab-online-manual.html][CRAB manual]]. For this tutorial, the only relevant sections of the file are =[CRAB]=, =[CMSSW]= and =[USER]= . #MainConfiguration ---+++ Configuration parameters The list of the main parameters you need to specify on your crab.cfg: * *pset*: the CMSSW configuration file name; * *output_file*: the output file name produced by your pset; if in the !CMSSW pset the output is defined in !TFileService, the file is automatically handled by !CRAB, and there is no need to specify it on this parameter; * *datasetpath*: the full dataset name you want to analyze; * <i><b>Jobs splitting</b></i>: * By event: only for MC data. You need to specify 2 of these parameters: *total_number_of_events*, *number_of_jobs*, *events_per_job* * specify the _total_number_of_events_ and the _number_of_jobs_: this will assing to each job a number of events equal to _total_number_of_events/number_of_jobs_ * specify the _total_number_of_events_ and the _events_per_job_: this will assign to each job _events_per_job_ events and will calculate the number of jobs by _total_number_of_events/events_per_job_; * or you can specify the _number_of_jobs_ and the _events_per_job_; * By lumi: real data require it. You need to specify 2 of these parameters: *total_number_of_lumis*, *lumis_per_job*, *number_of_jobs* * because jobs in split-by-lumi mode process entire rather than partial files, you will often end up with fewer jobs processing more lumis than expected. Additionally, a single job cannot analyze files from multiple blocks in DBS. So these parameters are "advice" to CRAB rather than determinative. * specify the _lumis_per_job_ and the _number_of_jobs_: the total number of lumis processed will be _number_of_jobs_ x _lumis_per_job_ * or you can specify the _total_number_of_lumis_ and the _number_of_jobs_ * *lumi_mask*: the filename of a JSON file that describes which runs and lumis to process. CRAB will skip luminosity blocks not listed in the file. * *return_data*: this can be 0 or 1; if it is one you will retrieve your output files to your local working area; * *copy_data*: this can be 0 or 1; if it is one you will copy your output files to a remote Storage Element; * *local_stage_out*: this can be 0 or 1; if this is one your produced output is copied to the closeSE in the case of failure of the copy to the SE specified in your crab.cfg * *publish_data*: this can be 0 or 1; if it is one you can publish your produced data to a local !DBS; * *use_server*: the usage for crab server is deprecated now, so by default this parameter is set to 0; <!-- one of the available servers will be used depending on the client release; --> * *scheduler*: the name of the scheduler you want to use; * *jobtype*: the type of the jobs. #SeCopy ---++ Run CRAB on CMS.MonteCarlo data copying the output to an SE The chance to copy the output to an existing *Storage Element* allows to bypass the output size limit constraint, to publish the data on a local !DBS and then to easily re-run over the published data. In order to make !CRAB copies to a Storage Element you have to add the following information on the Crab configuration file: * that we want to copy our results adding *copy_data=1* and *return_data=0* (it is not allowed to have both at 1); * add the *official CMS site name* where we are going to copy our results (examples just for this session with Legnaro _T2_IT_Legnaro_ !StorageElement); the name of other official sites can be found in the [[https://cmsweb.cern.ch/sitedb/sitelist/][siteDB]] <!--#Conf1--> ---+++ !CRAB configuration file for CMS.MonteCarlo data You can find more details on this at the corresponding link on the [[https://twiki.cern.ch/twiki/bin/view/CMS/SWGuideCrabFaq#How_to_store_output_with_CRAB_2][Crab FAQ page]]. The !CRAB configuration file (default name crab.cfg) should be located at the same location as the !CMSSW parameter-set to be used by !CRAB with the following content: %SYNTAX{ syntax="sh"}% [CMSSW] total_number_of_events = 10 number_of_jobs = 5 pset = tutorial.py datasetpath = /RelValProdTTbar/JobRobot-MC_3XY_V24_JobRobot-v1/GEN-SIM-DIGI-RECO output_file = outfile.root [USER] return_data = 0 copy_data = 1 storage_element = T2_IT_Legnaro user_remote_dir = TutGridSchool [CRAB] scheduler = remoteGlidein jobtype = cmssw %ENDSYNTAX% <!-- scheduler=glite use_server=1 --> #SetRunCrab ---+++Run Crab Once your =crab.cfg= is ready and the whole underlying environment is set up, you can start running !CRAB. !CRAB supports command line help which can be useful for the first time. You can get it via: <verbatim class="command"> crab -h </verbatim> #JobCreation ---+++ Job Creation The job creation checks the availability of the selected dataset and prepares *all* the jobs for submission according to the selected job splitting specified in the crab.cfg * By default the creation process creates a !CRAB project directory (default: crab_0_date_time) in the current working directory, where the related crab configuration file is cached for further usage, avoiding interference with other (already created) projects * Using the [USER] _ui_working_dir_ parameter in the configuration file !CRAB allows the user to chose the project name, so that it can be used later to distinguish multiple !CRAB projects in the same directory. <verbatim class="command"> crab -create </verbatim> that takes by default the configuration file called crab.cfg associated for this tutorial with MC data. The creation command could ask for proxy/myproxy passwords the first time you use it and it should produce a similar screen output like: <!-- [lxplus444] $ crab -create crab: Version 2.8.1 running on Tue Jul 24 17:59:34 2012 CET (15:59:34 UTC) crab. Working options: scheduler glite job type CMSSW server ON (use_server) working directory /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_120724_175934/ Enter GRID pass phrase: Your identity: /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=fanzago/CN=610896/CN=Federica Fanzago Creating temporary proxy .............................. Done Contacting lcg-voms.cern.ch:15002 [/DC=ch/DC=cern/OU=computers/CN=lcg-voms.cern.ch] "cms" Failed Error: cms: Problems in DB communication. Trying next server for cms. Creating temporary proxy ................................................................. Done Contacting voms.cern.ch:15002 [/DC=ch/DC=cern/OU=computers/CN=voms.cern.ch] "cms" Done Creating proxy ........................ Done Your proxy is valid until Wed Aug 1 17:59:40 2012 crab: Contacting Data Discovery Services ... crab: Accessing DBS at: http://cmsdbsprod.cern.ch/cms_dbs_prod_global/servlet/DBSServlet crab: Requested dataset: /RelValBeamHalo/CMSSW_5_2_5_cand1-START52_V9-v1/GEN-SIM-DIGI-RAW has 9000 events in 1 blocks. crab: May not create the exact number_of_jobs requested. crab: 5 job(s) can run on 10 events. crab: List of jobs and available destination sites: Block 1: jobs 1-5: sites: T2_CH_CERN crab: Checking remote location crab: Creating 5 jobs, please wait... crab: Total of 5 jobs created. Log file is /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_120724_175934/log/crab.log * the project directory called crab_0_120724_175934 is created --> <verbatim style="font-size: 13px" class="output"> $ crab -create crab: Version 2.8.5 running on Wed Feb 20 17:39:32 2013 CET (16:39:32 UTC) crab. Working options: scheduler remoteGlidein job type CMSSW server OFF working directory /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130220_173930/ Enter GRID pass phrase: Your identity: /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=fanzago/CN=610896/CN=Federica Fanzago Creating temporary proxy .................................. Done Contacting voms.cern.ch:15002 [/DC=ch/DC=cern/OU=computers/CN=voms.cern.ch] "cms" Done Creating proxy ............................................... Done Your proxy is valid until Thu Feb 28 17:40:02 2013 verify if user DN is mapped in CERN's SSO OK. user ready for SiteDB switchover on March 12, 2013 crab: Contacting Data Discovery Services ... crab: Accessing DBS at: http://cmsdbsprod.cern.ch/cms_dbs_prod_global/servlet/DBSServlet crab: Requested dataset: /RelValProdTTbar/JobRobot-MC_3XY_V24_JobRobot-v1/GEN-SIM-DIGI-RECO has 300000 events in 1 blocks. crab: May not create the exact number_of_jobs requested. crab: 5 job(s) can run on 50 events. crab: List of jobs and available destination sites: Block 1: jobs 1-5: sites: T2_HU_Budapest, T2_CH_CSCS, T2_ES_IFCA, T2_FR_CCIN2P3, T2_IT_Bari, T2_RU_SINP, T3_IT_Bologna, T2_KR_KNU, T2_UK_SGrid_Bristol, T2_FR_GRIF_LLR, T2_RU_INR, T2_CN_Beijing, T2_US_MIT, T2_RU_PNPI, T2_TR_METU, T2_UK_London_IC, T2_DE_DESY, T2_TW_Taiwan, T2_US_UCSD, T2_RU_RRC_KI, T2_PL_Warsaw, T2_PT_LIP_Lisbon, T2_US_Caltech, T2_PT_NCG_Lisbon, T2_BR_SPRACE, T2_IT_Rome, T2_US_Purdue, T2_BE_IIHE, T2_IT_Legnaro, T2_ES_CIEMAT, T2_DE_RWTH, T2_RU_JINR, T2_CH_CERN, T2_FR_GRIF_IRFU, T2_UA_KIPT, T2_UK_SGrid_RALPP, T2_PK_NCP, T2_UK_London_Brunel, T2_RU_IHEP, T2_IT_Pisa, T2_IN_TIFR, T2_US_Vanderbilt, T2_US_Florida, T2_RU_ITEP, T2_FR_IPHC, T2_BE_UCL, T2_US_Wisconsin, T2_US_Nebraska, T3_UK_London_RHUL, T2_FI_HIP, T2_EE_Estonia crab: Checking remote location crab: Creating 5 jobs, please wait... crab: Total of 5 jobs created. Log file is /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130220_173930/log/crab.log </verbatim> * the project directory called crab_0_130220_173930 is created #JobSubmission ---+++ Job Submission With the submission command it's possible to specify a combination of jobs and job-ranges separated by comma (e.g.: =1,2,3-4), the default is all. To submit all jobs of the last created project with the default name, it's enough to execute the following command: <verbatim class="command"> crab -submit </verbatim> to submit a specific project: <verbatim class="command"> crab -submit -c <dir name> </verbatim> which should produce a similar screen output like: <!-- [lxplus444] $ crab -submit crab: Version 2.8.1 running on Tue Jul 24 18:02:39 2012 CET (16:02:39 UTC) crab. Working options: scheduler glite job type CMSSW server ON (default) working directory /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_120724_175934/ crab: Registering credential to the server : vocms21.cern.ch crab: Credential successfully delegated to the server. crab: Starting sending the project to the storage vocms21.cern.ch... crab: Task crab_0_120724_175934 successfully submitted to server vocms21.cern.ch crab: Total of 5 jobs submitted Log file is /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_120724_175934/log/crab.log --> <verbatim class="output" style="font-size: 13px"> $ crab -submit crab: Version 2.8.5 running on Wed Feb 20 17:42:10 2013 CET (16:42:10 UTC) crab. Working options: scheduler remoteGlidein job type CMSSW server OFF working directory /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130220_173930/ crab: Checking available resources... crab: Found compatible site(s) for job 1 crab: 1 blocks of jobs will be submitted crab: remotehost from Avail.List = submit-2.t2.ucsd.edu crab: contacting remote host submit-2.t2.ucsd.edu crab: Establishing gsissh ControlPath. Wait 2 sec ... crab: Establishing gsissh ControlPath. Wait 2 sec ... crab: Establishing gsissh ControlPath. Wait 2 sec ... crab: COPY FILES TO REMOTE HOST crab: SUBMIT TO REMOTE GLIDEIN FRONTEND Submitting 5 jobs 100% [=================================================================================================] please wait crab: Total of 5 jobs submitted. Log file is /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130220_173930/log/crab.log </verbatim> #JobStatusCheck ---+++ Job Status Check Check the status of the jobs in the latest !CRAB project with the following command: <verbatim class="command"> crab -status </verbatim> to check a specific project: <verbatim class="command"> crab -status -c <dir name> </verbatim> which should produce a similar screen output like: <!-- [lxplus444] $ crab -status crab: Version 2.8.1 running on Tue Jul 24 18:09:23 2012 CET (16:09:23 UTC) crab. Working options: scheduler glite job type CMSSW server ON (default) working directory /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_120724_175934/ crab: ID END STATUS ACTION ExeExitCode JobExitCode E_HOST ----- --- ----------------- ------------ ---------- ----------- --------- 1 N Running SubSuccess ce203.cern.ch 2 N Ready SubSuccess ce208.cern.ch 3 N Ready SubSuccess ce203.cern.ch 4 N Running SubSuccess ce204.cern.ch 5 N Ready SubSuccess ce208.cern.ch crab: 5 Total Jobs >>>>>>>>> 5 Jobs Submitted List of jobs Submitted: 1-5 crab: You can also follow the status of this task on : CMS Dashboard: http://dashb-cms-job-task.cern.ch/taskmon.html#task=fanzago_crab_0_120724_175934_0ml7t2 Server page: http://vocms21.cern.ch:8888/logginfo Your task name is: fanzago_crab_0_120724_175934_0ml7t2 Log file is /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_120724_175934/log/crab.log --> <verbatim style="font-size: 13px" class="output"> $ crab -status crab: Version 2.8.5 running on Wed Feb 20 17:43:04 2013 CET (16:43:04 UTC) crab. Working options: scheduler remoteGlidein job type CMSSW server OFF working directory /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130220_173930/ crab: Checking the status of all jobs: please wait crab: contacting remote host submit-2.t2.ucsd.edu crab: ID END STATUS ACTION ExeExitCode JobExitCode E_HOST ----- --- ----------------- ------------ ---------- ----------- --------- 1 N Submitted SubSuccess 2 N Submitted SubSuccess 3 N Submitted SubSuccess 4 N Submitted SubSuccess 5 N Submitted SubSuccess crab: 5 Total Jobs >>>>>>>>> 5 Jobs Submitted List of jobs Submitted: 1-5 crab: You can also follow the status of this task on : CMS Dashboard: http://dashb-cms-job-task.cern.ch/taskmon.html#task=fanzago_crab_0_130220_173930_68zw1c Your task name is: fanzago_crab_0_130220_173930_68zw1c Log file is /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130220_173930/log/crab.log </verbatim> <!-- %BLUE%Also, you can have a look at the web page of the server where you can see the status progress of you job.%ENDCOLOR% Simply, execute the command: <verbatim class='command'> crab -printId </verbatim><verbatim class='command'> crab -printId -c <dir name> </verbatim> And you will get the unique id of your jobs: <verbatim style="font-size: 13px" class="output"> [lxplus444] $ crab -printId -c crab_0_120724_175934/ crab: Version 2.8.1 running on Tue Jul 24 18:56:36 2012 CET (16:56:36 UTC) crab. Working options: scheduler glite job type CMSSW server ON (default) working directory /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_120724_175934/ crab: Task Id = fanzago_crab_0_120724_175934_0ml7t2 -------------------------------------------------------------------------------------------- crab: You can also follow the status of this task on : CMS Dashboard: http://dashb-cms-job-task.cern.ch/taskmon.html#task=fanzago_crab_0_120724_175934_0ml7t2 Server page: http://vocms21.cern.ch:8888/logginfo Your task name is: fanzago_crab_0_120724_175934_0ml7t2 Log file is /afs/cern.ch/user/f/fanzago/scratch0/TEST_RELEASE/TUTORIAL/crab_0_120724_175934//log/crab.log </verbatim> --> #JobOutputRetrieval ---+++ Job Output Retrieval For the jobs which are in the "Done" status it is possible to retrieve the log files of the jobs (just the log files, because the output files are copied to the Storage Element associated to the T2 specified on the crab.cfg and infact return_data is 0). The following command retrieves the log files of all "Done" jobs of the last created !CRAB project: <verbatim class="command"> crab -getoutput </verbatim> to get the output of a specific project: <verbatim class="command"> crab -getoutput -c <dir name> </verbatim> the job results (CMSSW_n.stdout, CMSSW_n.stderr and crab_fjr_n.xml) will be copied in the =res= subdirectory of your crab project: <!-- [lxplus444] $ crab -get -c crab_0_120724_175934/ crab: Version 2.8.1 running on Tue Jul 24 19:17:40 2012 CET (17:17:40 UTC) crab. Working options: scheduler glite job type CMSSW server ON (default) working directory /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_120724_175934/ crab: Starting retrieving output from server vocms21.cern.ch... crab: Results of Jobs # 1 are in /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_120724_175934//res/ crab: Results of Jobs # 2 are in /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_120724_175934//res/ crab: Results of Jobs # 3 are in /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_120724_175934//res/ crab: Results of Jobs # 4 are in /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_120724_175934//res/ crab: Results of Jobs # 5 are in /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_120724_175934//res/ Log file is /afs/cern.ch/user/f/fanzago/scratch0//TUTORIAL/crab_0_120724_175934//log/crab.log --> <verbatim style="font-size: 13px" class="output"> $ crab -get crab: Version 2.8.5 running on Wed Feb 20 20:17:02 2013 CET (19:17:02 UTC) crab. Working options: scheduler remoteGlidein job type CMSSW server OFF working directory /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130220_173930/ crab: contacting remote host submit-2.t2.ucsd.edu crab: RETRIEVE FILE out_files_1.tgz for job #1 crab: RETRIEVE FILE crab_fjr_1.xml for job #1 crab: Results of Jobs # 1 are in /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130220_173930/res/ crab: contacting remote host submit-2.t2.ucsd.edu crab: RETRIEVE FILE out_files_2.tgz for job #2 crab: RETRIEVE FILE crab_fjr_2.xml for job #2 crab: RETRIEVE FILE out_files_3.tgz for job #3 crab: RETRIEVE FILE crab_fjr_3.xml for job #3 crab: RETRIEVE FILE out_files_4.tgz for job #4 crab: RETRIEVE FILE crab_fjr_4.xml for job #4 crab: RETRIEVE FILE out_files_5.tgz for job #5 crab: RETRIEVE FILE crab_fjr_5.xml for job #5 crab: Results of Jobs # 2 are in /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130220_173930/res/ crab: Results of Jobs # 3 are in /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130220_173930/res/ crab: Results of Jobs # 4 are in /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130220_173930/res/ crab: Results of Jobs # 5 are in /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130220_173930/res/ Log file is /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130220_173930/log/crab.log </verbatim> The stderr is an empty file, the stdout is the output of the wrapper of your analysis code (the output of CMSSW.sh script created by CRAB) and the crab_fjr.xml is the FrameworkJobReport created by your analysis code. #CrabReport ---+++ Use the -report option Print a short report about the task, namely the total number of events and files processed/requested/available, the name of the dataset path, a summary of the status of the jobs, and so on. A summary file of the runs and luminosity sections processed is written to res/. In principle -report should generate all the info needed for an analysis. Command to execute: <verbatim class='command'> crab -report </verbatim> Example of execution: <!-- [lxplus444] $ crab -report -c crab_0_120724_175934/ crab: Version 2.8.1 running on Tue Jul 24 19:20:14 2012 CET (17:20:14 UTC) crab. Working options: scheduler glite job type CMSSW server ON (default) working directory /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_120724_175934/ crab: -------------------- Dataset: /RelValBeamHalo/CMSSW_5_2_5_cand1-START52_V9-v1/GEN-SIM-DIGI-RAW Remote output : SE: T2_IT_Legnaro t2-srm-02.lnl.infn.it srmPath: srm://t2-srm-02.lnl.infn.it:8443/srm/managerv2?SFN=/pnfs/lnl.infn.it/data/cms/store/user/fanzago/TutGridSchool/ Total Events read: 10 Total Files read: 5 Total Jobs : 5 Luminosity section summary file: /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_120724_175934/res/lumiSummary.json # Jobs: Cleared:5 ---------------------------- Log file is /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_120724_175934//log/crab.log --> <verbatim style="font-size: 13px" class="output"> $ crab -report crab: Version 2.8.5 running on Thu Feb 21 02:17:06 2013 CET (01:17:06 UTC) crab. Working options: scheduler remoteGlidein job type CMSSW server OFF working directory /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130220_173930/ crab: -------------------- Dataset: /RelValProdTTbar/JobRobot-MC_3XY_V24_JobRobot-v1/GEN-SIM-DIGI-RECO Remote output : SE: T2_IT_Legnaro t2-srm-02.lnl.infn.it srmPath: srm://t2-srm-02.lnl.infn.it:8443/srm/managerv2?SFN=/pnfs/lnl.infn.it/data/cms/store/user/fanzago/TutGridSchool/ Total Events read: 50 Total Files read: 5 Total Jobs : 5 Luminosity section summary file: /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130220_173930/res/lumiSummary.json # Jobs: Retrieved:5 ---------------------------- crab: The summary file inputLumiSummaryOfTask.json about input run and lumi isn't created crab: No json file to compare Log file is /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130220_173930/log/crab.log </verbatim> The message "The summary file inputLumiSummaryOfTask.json about input run and lumi isn't created" isn't an error but a message that means input data didn't provide lumi section info, as expected for the MC data. The full srm path will allow you to know where your data has been stored and to perform operations by hand on it. As example you can delete the data using *srmrm* command and check the content of the remote directory through *srmls*. In this case the remote directory is: %SYNTAX{ syntax="sh"}% srm://t2-srm-02.lnl.infn.it:8443/srm/managerv2?SFN=/pnfs/lnl.infn.it/data/cms/store/user/fanzago/TutGridSchool/ %ENDSYNTAX% It could be necessary to substitute the ? with the "?" in the srm path, depending on the shell you are using. Additional srm commands include =srmrm=, =srmrmdir=, =srmmv=, for moving files within an srm system, =srmcp= which can copy files locally. Note that to copy files locally, =srmcp= may require the additional flag "-2" to ensure that the version 2 client is used. Here is the content of the file containing the luminosity summary _/crab_0_130220_173930/res/lumiSummary.json_: <!-- {"1": [[666666, 666666]]} --> <verbatim style="font-size: 13px" class="output"> {"1": [[39, 39]]} </verbatim> #CopyData ---+++ Copy the output from the SE to the local User Interface Option that can be used only if your output have been previously copied by CRAB on a remote SE. By default the -copyData copies your output from the remote SE to the local CRAB working directory (under res). Otherwise you can copy the output from the remote SE to another one, specifying either -dest_se=<the remote SE official name> or -dest_endpoint=<the complete endpoint of remote SE>. If dest_se is used, CRAB finds the correct path where the output can be stored. The command to execute in order to retrieve locally the remote output files to your local user interface is: <verbatim class='command'> crab -copyData ## or crab -copyData -c <dir name> </verbatim> An example of execution: <!-- [lxplus444] $ crab -copyData -c crab_0_120724_175934/crab: Version 2.8.1 running on Tue Jul 24 19:23:41 2012 CET (17:23:41 UTC) crab. Working options: scheduler glite job type CMSSW server ON (default) working directory /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_120724_175934/ crab: Copy file locally. Output dir: /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_120724_175934//res/ crab: Starting copy... directory/afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_120724_175934//res/already exists crab: Copy success for file: outfile_3_1_OnZ.root crab: Copy success for file: outfile_2_1_tyW.root crab: Copy success for file: outfile_1_1_JIQ.root crab: Copy success for file: outfile_4_1_rVx.root crab: Copy success for file: outfile_5_1_TmT.root Log file is /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_120724_175934//log/crab.log --> <verbatim style="font-size: 13px" class="output"> $ crab -copyData crab: Version 2.8.5 running on Thu Feb 21 02:49:18 2013 CET (01:49:18 UTC) crab. Working options: scheduler remoteGlidein job type CMSSW server OFF working directory /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130220_173930/ crab: Copy file locally. Output dir: /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130220_173930/res/ crab: Starting copy... directory/afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130220_173930/res/already exists crab: Copy success for file: outfile_1_1_aOu.root crab: Copy failed for file: outfile_4_1_Pi9.root Copy failed because : Problem copying outfile_4_1_Pi9.root file'Permission denied!' crab: Copy success for file: outfile_2_1_bC1.root crab: Copy success for file: outfile_5_1_yna.root crab: Copy success for file: outfile_3_1_96A.root Log file is /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130220_173930/log/crab.log </verbatim> #CrabPub ---+++ Publish your result in DBS The publication of the produced data to a !DBS allows to re-run over the produced data that has been published. The instructions to follow are below, and here is the link to the [[https://twiki.cern.ch/twiki/bin/view/CMS/SWGuideCrabForPublication][how to]]. You have to add to the Crab configuration file more information specifying the data name to publish and the !DBS url instance where to register the output results. %SYNTAX{ syntax="sh"}% [USER] .... publish_data = 1 publish_data_name = what_you_want dbs_url_for_publication = url_local_dbs .... %ENDSYNTAX% Warning: * all the parameters related publication have to be added in the configuration file before creation of jobs, even if the publication step is executed after retrieving of job output. * for this tutorial we will publish the data to the test !DBS instance !https://cmsdbsprod.cern.ch:8443/cms_dbs_ph_analysis_02_writer/servlet/DBSServlet. This instance is only for publication test, so the maintaining of published data is not guarantee for long time and the publication here doesn't require writing authorization. If you belong to a !PAG group, you have to publish your data to the !DBS associated to your group, checking at the [[https://twiki.cern.ch/twiki/bin/view/CMS/DBSInstanceAccessList][DBS access twiki page]] the correct !DBS url and which role in voms you need to be an allowed user. * remember to change the _ui_working_dir_ value in the configuration file to create a new project (if you don't use the default name of crab project), otherwise the creation step will fail with the error message "project already exists, please remove it before create new task " ---+++ Run Crab publishing your results You can also run your analysis code publishing the results copied to a remote Storage Element. Here below an example of the !CRAB configuration file, coherent with this tutorial: *For MC data* (crab.cfg) %SYNTAX{ syntax="sh"}% [CMSSW] total_number_of_events = 50 number_of_jobs = 10 pset = tutorial.py datasetpath = /RelValProdTTbar/JobRobot-MC_3XY_V24_JobRobot-v1/GEN-SIM-DIGI-RECO output_file = outfile.root [USER] return_data = 0 copy_data = 1 storage_element = T2_IT_Legnaro publish_data = 1 publish_data_name = FedeTutGrid dbs_url_for_publication = https://cmsdbsprod.cern.ch:8443/cms_dbs_ph_analysis_02_writer/servlet/DBSServlet [CRAB] scheduler = remoteGlidein jobtype = cmssw %ENDSYNTAX% And with this crab.cfg you can re-do complete the workflow as described before plus the publication step: * creation * submission * status progress monitoring * output retrieval * publish the results #PublishData ---+++ Use the -publish option After having done the previous workflow untill the retrieval of you jobs, you can publish the output data that have been stored in the Storage Element indicated in the crab.cfg using <verbatim class="command"> crab -publish </verbatim> or <verbatim class="command"> crab -publish -c <dir name> </verbatim> to publish outputs of a specific project. It is not necessary all the jobs are done and retrieved. You can publish your output at different time. It will look for all the CMS.FrameworkJobReport ( crab-project-dir/res/crab_fjr_*.xml ) produced by each jobs and will extract from there the information (i.e. number of events, LFN,....) to publish. ---++++ Publication output example <verbatim style="font-size: 13px" class="output"> $ crab -publish -c crab_0_130221_030014/ crab: Version 2.8.5 running on Tue Mar 5 12:04:57 2013 CET (11:04:57 UTC) crab. Working options: scheduler remoteGlidein job type CMSSW server OFF working directory /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130221_030014/ crab: <dbs_url_for_publication> = https://cmsdbsprod.cern.ch:8443/cms_dbs_ph_analysis_02_writer/servlet/DBSServlet file_list = ['/afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130221_030014//res//crab_fjr_1.xml', '/afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130221_030014//res//crab_fjr_2.xml', '/afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130221_030014//res//crab_fjr_3.xml', '/afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130221_030014//res//crab_fjr_4.xml', '/afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130221_030014//res//crab_fjr_5.xml'] crab: --->>> Start dataset publication crab: --->>> Importing parent dataset in the dbs: /RelValProdTTbar/JobRobot-MC_3XY_V24_JobRobot-v1/GEN-SIM-DIGI-RECO crab: --->>> Importing all parents level Block /RelValProdTTbar/JobRobot-MC_3XY_V24_JobRobot-v1/GEN-SIM-RAW#c9d3a01e-a3a1-4fde-8104-1c7b024b5ef6 is already at destination Block /RelValProdTTbar/JobRobot-MC_3XY_V24_JobRobot-v1/GEN-SIM-DIGI-RECO#8f881129-b4fd-4d88-902a-f7ca78a9da8f is already at destination crab: --->>> duration of all parents import (sec): 3.43028283119 crab: Import ok of dataset /RelValProdTTbar/JobRobot-MC_3XY_V24_JobRobot-v1/GEN-SIM-DIGI-RECO crab: PrimaryDataset = RelValProdTTbar crab: ProcessedDataset = fanzago-FedeTutGrid-c8295e0370df515614ca6812ce2cfe77 crab: <User Dataset Name> = /RelValProdTTbar/fanzago-FedeTutGrid-c8295e0370df515614ca6812ce2cfe77/USER debug_verbose:crab::Primary: {'Type': 'mc', 'Name': 'RelValProdTTbar'} primary = {'Type': 'mc', 'Name': 'RelValProdTTbar'} ... crab: --->>> End dataset publication INFO:crab::--->>> End dataset publication crab: --->>> Start files publication INFO:crab::--->>> Start files publication DEBUG:crab::FJR = /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130221_030014//res//crab_fjr_1.xml DEBUG:crab::--->>> LFN of file to publish = /store/user/fanzago/RelValProdTTbar/FedeTutGrid/c8295e0370df515614ca6812ce2cfe77/outfile_1_1_haS.root DEBUG:crab::--->>> Inserting file in blocks = ['/RelValProdTTbar/fanzago-FedeTutGrid-c8295e0370df515614ca6812ce2cfe77/USER#c5c0a5bc-aa35-4dcb-ade4-52211e5e8332'] DEBUG:crab::FJR = /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130221_030014//res//crab_fjr_2.xml DEBUG:crab::--->>> LFN of file to publish = /store/user/fanzago/RelValProdTTbar/FedeTutGrid/c8295e0370df515614ca6812ce2cfe77/outfile_2_1_Nw2.root ... crab: --->>> End files publication INFO:crab::--->>> End files publication crab: --->>> Check data publication: dataset /RelValProdTTbar/fanzago-FedeTutGrid-c8295e0370df515614ca6812ce2cfe77/USER in DBS url https://cmsdbsprod.cern.ch:8443/cms_dbs_ph_analysis_02_writer/servlet/DBSServlet INFO:crab::--->>> Check data publication: dataset /RelValProdTTbar/fanzago-FedeTutGrid-c8295e0370df515614ca6812ce2cfe77/USER in DBS url https://cmsdbsprod.cern.ch:8443/cms_dbs_ph_analysis_02_writer/servlet/DBSServlet === dataset /RelValProdTTbar/fanzago-FedeTutGrid-c8295e0370df515614ca6812ce2cfe77/USER === dataset description = ===== File block name: /RelValProdTTbar/fanzago-FedeTutGrid-c8295e0370df515614ca6812ce2cfe77/USER#c5c0a5bc-aa35-4dcb-ade4-52211e5e8332 File block located at: ['t2-srm-02.lnl.infn.it'] File block status: 0 Number of files: 5 Number of Bytes: 3279142 Number of Events: 50 total events: 50 in dataset: /RelValProdTTbar/fanzago-FedeTutGrid-c8295e0370df515614ca6812ce2cfe77/USER crab: You can obtain more info about files of the dataset using: crab -checkPublication -USER.dataset_to_check=/RelValProdTTbar/fanzago-FedeTutGrid-c8295e0370df515614ca6812ce2cfe77/USER -USER.dbs_url_for_publication=https://cmsdbsprod.cern.ch:8443/cms_dbs_ph_analysis_02_writer/servlet/DBSServlet -debug INFO:crab::You can obtain more info about files of the dataset using: crab -checkPublication -USER.dataset_to_check=/RelValProdTTbar/fanzago-FedeTutGrid-c8295e0370df515614ca6812ce2cfe77/USER -USER.dbs_url_for_publication=https://cmsdbsprod.cern.ch:8443/cms_dbs_ph_analysis_02_writer/servlet/DBSServlet -debug Log file is /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130221_030014//log/crab.log </verbatim> Warning: some versions of CMSSW switch off the debug mode of crab, so a lot of duplicated info can be reported at screen level. ---++++ Check the result of data publication and analyze your published data Note that: * !CRAB by default publishes all files finished correctly, including files with 0 events * !CRAB by default imports all dataset parents of your dataset To check if your data have been published you can use the option: <div class="command"> crab -checkPublication -USER.dataset_to_check=your_dataset_path -USER.dbs_url_for_publication=url_local_dbs -debug<br/> </div> where dbs_url_for_publication is the dbs_url you have written in the crab.cfg file and name_of_your_dataset is the name of dataset published by !CRAB primarydataset/publish_data_name/USER (it is also printed by !CRAB in corrispondence of the line "User Dataset Name" when you run the _crab -publish_ command). The output is: <verbatim style="font-size: 13px" class="output"> $ crab -checkPublication -USER.dataset_to_check=/RelValProdTTbar/fanzago-FedeTutGrid-c8295e0370df515614ca6812ce2cfe77/USER -USER.dbs_url_for_publication=https://cmsdbsprod.cern.ch:8443/cms_dbs_ph_analysis_02_writer/servlet/DBSServlet -debug crab: /afs/cern.ch/cms/ccs/wm/scripts/Crab/CRAB_2_8_5_patch1/python/crab.py -checkPublication -USER.dataset_to_check=/RelValProdTTbar/fanzago-FedeTutGrid-c8295e0370df515614ca6812ce2cfe77/USER -USER.dbs_url_for_publication=https://cmsdbsprod.cern.ch:8443/cms_dbs_ph_analysis_02_writer/servlet/DBSServlet -debug crab: Version 2.8.5 running on Tue Mar 5 12:11:37 2013 CET (11:11:37 UTC) crab. Working options: scheduler remoteGlidein job type CMSSW server OFF working directory /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130304_120142/ crab: Downloading file [http://cmsdoc.cern.ch/cms/LCG/crab/config/] to [/afs/cern.ch/user/f/fanzago/.cms_crab/allowed_releases.conf]. crab: Service initialised ({'endpoint': 'http://cmsdoc.cern.ch/cms/LCG/crab/config/', 'maxcachereuse': 24.0, 'cachepath': '/afs/cern.ch/user/f/fanzago/.cms_crab', 'basepath': '/cms/LCG/crab/config/', 'method': None, 'timeout': 20, 'requests': {'endpoint': 'http://cmsdoc.cern.ch/cms/LCG/crab/config/', 'timeout': 20, 'cachepath': '/afs/cern.ch/user/f/fanzago/.cms_crab', 'cacheduration': 0.5, 'host': 'cmsdoc.cern.ch', 'accept_type': 'text/html', 'content_type': 'application/x-www-form-urlencoded', 'logger': <logging.Logger object at 0xf928f10>, 'type': 'txt/csv', 'conn': <httplib.HTTPConnection instance at 0xf9334d0>}, 'logger': <logging.Logger object at 0xf928f10>, 'cacheduration': 0.5, 'type': 'txt/csv', 'inputdata': {}}): host: cmsdoc.cern.ch, basepath: /cms/LCG/crab/config/ (text/html) cache: /afs/cern.ch/user/f/fanzago/.cms_crab (duration 0.5 hours, max reuse 24.0 hours) crab: Service initialised ({'endpoint': 'https://cmsweb.cern.ch/sitedb/json/index/', 'maxcachereuse': 24.0, 'cachepath': '/afs/cern.ch/user/f/fanzago/.cms_sitedbcache', 'basepath': '/sitedb/json/index/', 'method': None, 'timeout': 30, 'requests': {'host': 'cmsweb.cern.ch', 'endpoint': 'https://cmsweb.cern.ch/sitedb/json/index/', 'accept_type': 'text/html', 'content_type': 'application/x-www-form-urlencoded', 'logger': <logging.Logger object at 0xf928f10>, 'cachepath': '/afs/cern.ch/user/f/fanzago/.cms_sitedbcache', 'conn': <httplib.HTTPSConnection instance at 0xfa903b0>}, 'logger': <logging.Logger object at 0xf928f10>, 'cacheduration': 0.5, 'inputdata': {}}): host: cmsweb.cern.ch, basepath: /sitedb/json/index/ (text/html) cache: /afs/cern.ch/user/f/fanzago/.cms_sitedbcache (duration 0.5 hours, max reuse 24.0 hours) crab: Service initialised ({'endpoint': 'https://cmsweb.cern.ch/sitedb/json/index/', 'maxcachereuse': 24.0, 'cachepath': '/afs/cern.ch/user/f/fanzago/.cms_sitedbcache', 'basepath': '/sitedb/json/index/', 'method': None, 'timeout': 30, 'requests': {'host': 'cmsweb.cern.ch', 'endpoint': 'https://cmsweb.cern.ch/sitedb/json/index/', 'accept_type': 'text/html', 'content_type': 'application/x-www-form-urlencoded', 'logger': <logging.Logger object at 0xf928f10>, 'cachepath': '/afs/cern.ch/user/f/fanzago/.cms_sitedbcache', 'conn': <httplib.HTTPSConnection instance at 0xfa90440>}, 'logger': <logging.Logger object at 0xf928f10>, 'cacheduration': 0.5, 'inputdata': {}}): host: cmsweb.cern.ch, basepath: /sitedb/json/index/ (text/html) cache: /afs/cern.ch/user/f/fanzago/.cms_sitedbcache (duration 0.5 hours, max reuse 24.0 hours) crab: Input whitelist: crab: Input blacklist: crab: Converted whitelist: crab: Converted blacklist: crab: Downloading file [http://cmsdoc.cern.ch/cms/LCG/crab/config/] to [/afs/cern.ch/user/f/fanzago/.cms_crab/myproxy_server.conf]. crab: Service initialised ({'endpoint': 'http://cmsdoc.cern.ch/cms/LCG/crab/config/', 'maxcachereuse': 24.0, 'cachepath': '/afs/cern.ch/user/f/fanzago/.cms_crab', 'basepath': '/cms/LCG/crab/config/', 'method': None, 'timeout': 20, 'requests': {'endpoint': 'http://cmsdoc.cern.ch/cms/LCG/crab/config/', 'timeout': 20, 'cachepath': '/afs/cern.ch/user/f/fanzago/.cms_crab', 'cacheduration': 0.5, 'host': 'cmsdoc.cern.ch', 'accept_type': 'text/html', 'content_type': 'application/x-www-form-urlencoded', 'logger': <logging.Logger object at 0xf928f10>, 'type': 'txt/csv', 'conn': <httplib.HTTPConnection instance at 0xfa904d0>}, 'logger': <logging.Logger object at 0xf928f10>, 'cacheduration': 0.5, 'type': 'txt/csv', 'inputdata': {}}): host: cmsdoc.cern.ch, basepath: /cms/LCG/crab/config/ (text/html) cache: /afs/cern.ch/user/f/fanzago/.cms_crab (duration 0.5 hours, max reuse 24.0 hours) crab: Downloading file [http://cmsdoc.cern.ch/cms/LCG/crab/config/] to [/afs/cern.ch/user/f/fanzago/.cms_crab/site_black_list.conf]. crab: Service initialised ({'endpoint': 'http://cmsdoc.cern.ch/cms/LCG/crab/config/', 'maxcachereuse': 24.0, 'cachepath': '/afs/cern.ch/user/f/fanzago/.cms_crab', 'basepath': '/cms/LCG/crab/config/', 'method': None, 'timeout': 20, 'requests': {'endpoint': 'http://cmsdoc.cern.ch/cms/LCG/crab/config/', 'timeout': 20, 'cachepath': '/afs/cern.ch/user/f/fanzago/.cms_crab', 'cacheduration': 0.5, 'host': 'cmsdoc.cern.ch', 'accept_type': 'text/html', 'content_type': 'application/x-www-form-urlencoded', 'logger': <logging.Logger object at 0xf928f10>, 'type': 'txt/csv', 'conn': <httplib.HTTPConnection instance at 0xfa904d0>}, 'logger': <logging.Logger object at 0xf928f10>, 'cacheduration': 0.5, 'type': 'txt/csv', 'inputdata': {}}): host: cmsdoc.cern.ch, basepath: /cms/LCG/crab/config/ (text/html) cache: /afs/cern.ch/user/f/fanzago/.cms_crab (duration 0.5 hours, max reuse 24.0 hours) crab: Enforced black list: <Downloader.Downloader instance at 0xfa90440> crab: --->>> Check data publication: dataset /RelValProdTTbar/fanzago-FedeTutGrid-c8295e0370df515614ca6812ce2cfe77/USER in DBS url https://cmsdbsprod.cern.ch:8443/cms_dbs_ph_analysis_02_writer/servlet/DBSServlet PrimaryDataset = RelValProdTTbar ProcessedDataset = fanzago-FedeTutGrid-c8295e0370df515614ca6812ce2cfe77 DataTier = USER datasets matching your requirements= [{'RunsList': [], 'Name': 'fanzago-FedeTutGrid-c8295e0370df515614ca6812ce2cfe77', 'PathList': ['/RelValProdTTbar/fanzago-FedeTutGrid-c8295e0370df515614ca6812ce2cfe77/USER'], 'LastModifiedBy': '/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=fanzago/CN=610896/CN=Federica Fanzago', 'AlgoList': [{'ExecutableName': 'cmsRun', 'ApplicationVersion': 'CMSSW_5_3_8', 'ParameterSetID': {'Hash': 'c8295e0370df515614ca6812ce2cfe77'}, 'ApplicationFamily': 'cmsRun'}], 'XtCrossSection': 0.0, 'Status': 'VALID', 'ParentList': [], 'AcquisitionEra': '', 'PhysicsGroup': 'NoGroup', 'Description': '', 'GlobalTag': '', 'PrimaryDataset': {'Name': 'RelValProdTTbar'}, 'TierList': ['USER'], 'CreatedBy': '/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=fanzago/CN=610896/CN=Federica Fanzago', 'PhysicsGroupConverner': 'NO_CONVENOR', 'CreationDate': '1362481519', 'LastModificationDate': '1362481520'}] === dataset /RelValProdTTbar/fanzago-FedeTutGrid-c8295e0370df515614ca6812ce2cfe77/USER === dataset description = ===== File block name: /RelValProdTTbar/fanzago-FedeTutGrid-c8295e0370df515614ca6812ce2cfe77/USER#c5c0a5bc-aa35-4dcb-ade4-52211e5e8332 File block located at: ['t2-srm-02.lnl.infn.it'] File block status: 0 Number of files: 5 Number of Bytes: 3279142 Number of Events: 50 --------- info about files -------- Size Events LFN FileStatus 666747 10 /store/user/fanzago/RelValProdTTbar/FedeTutGrid/c8295e0370df515614ca6812ce2cfe77/outfile_1_1_haS.root 635831 10 /store/user/fanzago/RelValProdTTbar/FedeTutGrid/c8295e0370df515614ca6812ce2cfe77/outfile_2_1_Nw2.root 648594 10 /store/user/fanzago/RelValProdTTbar/FedeTutGrid/c8295e0370df515614ca6812ce2cfe77/outfile_4_2_VKk.root 682364 10 /store/user/fanzago/RelValProdTTbar/FedeTutGrid/c8295e0370df515614ca6812ce2cfe77/outfile_5_1_bi0.root 645606 10 /store/user/fanzago/RelValProdTTbar/FedeTutGrid/c8295e0370df515614ca6812ce2cfe77/outfile_3_1_rWE.root total events: 50 in dataset: /RelValProdTTbar/fanzago-FedeTutGrid-c8295e0370df515614ca6812ce2cfe77/USER Log file is /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130304_120142/log/crab.log </verbatim> If you want to analyze your published data you have to modify your crab.cfg specifying the datasetpath name of your dataset and the dbs_url where data are published %SYNTAX{ syntax="sh"}% [CMSSW] .... datasetpath=your_dataset_path dbs_url=url_local_dbs %ENDSYNTAX% If you found that data of your interest is in the DBS instance "https://cmsdbsprod.cern.ch:8443/cms_dbs_ph_analysis_02_writer/servlet/DBSServlet" you can specify %SYNTAX{ syntax="sh"}% https://cmsdbsprod.cern.ch:8443/cms_dbs_ph_analysis_02_writer/servlet/DBSServlet %ENDSYNTAX% The creation output will be something similar to: <!-- $ crab -create -cfg crab.cfg crab: Version 2.8.1 running on Wed Jul 25 01:49:24 2012 CET (23:49:24 UTC) crab. Working options: scheduler glite job type CMSSW server ON (default) working directory /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_120725_014924/ crab: Contacting Data Discovery Services ... crab: Accessing DBS at: https://cmsdbsprod.cern.ch:8443/cms_dbs_ph_analysis_02_writer/servlet/DBSServlet crab: Requested dataset: /RelValBeamHalo/fanzago-publish_TutGridSchool-f30a6bb13f516198b2814e83414acca1/USER has 10 events in 1 blocks. crab: May not create the exact number_of_jobs requested. crab: 5 job(s) can run on 10 events. crab: List of jobs and available destination sites: Block 1: jobs 1-5: sites: T2_IT_Legnaro crab: Checking remote location crab: WARNING: The stageout directory already exists. Be careful not to accidentally mix outputs from different tasks crab: Creating 5 jobs, please wait... crab: Total of 5 jobs created. Log file is /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_120725_014924/log/crab.log --> <verbatim style="font-size: 13px" class="output"> $ crab -create crab: Version 2.8.5 running on Tue Mar 5 12:19:06 2013 CET (11:19:06 UTC) crab. Working options: scheduler remoteGlidein job type CMSSW server OFF working directory /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130305_121906/ verify if user DN is mapped in CERN's SSO OK. user ready for SiteDB switchover on March 12, 2013 crab: Contacting Data Discovery Services ... crab: Accessing DBS at: https://cmsdbsprod.cern.ch:8443/cms_dbs_ph_analysis_02_writer/servlet/DBSServlet crab: Requested dataset: /RelValProdTTbar/fanzago-FedeTutGrid-c8295e0370df515614ca6812ce2cfe77/USER has 50 events in 1 blocks. crab: May not create the exact number_of_jobs requested. crab: 5 job(s) can run on 50 events. crab: List of jobs and available destination sites: Block 1: jobs 1-5: sites: T2_IT_Legnaro crab: Creating 5 jobs, please wait... crab: Total of 5 jobs created. </verbatim> #RealData ---++ Run !CRAB on real data copying the output to an SE Running !CRAB on real data has no major difference with running !CRAB on CMS.MonteCarlo data. The main difference is related on the configuration preparation for the !CRAB workflow, as showed in the next section. <!--#Conf2--> ---+++ !CRAB configuration file for real data with lumi mask You can find more details on this at the corresponding link on the [[https://twiki.cern.ch/twiki/bin/view/CMS/SWGuideCrabFaq#How_to_store_output_with_CRAB_2][Crab FAQ page]]. The !CRAB configuration file (default name crab.cfg) should be located at the same location as the !CMSSW parameter-set to be used by !CRAB. <!-- The dataset used is: _/Mu/Run2010A-Nov4ReReco_v1/RECO_ --> The dataset used is: _/SingleMu/Run2012B-13Jul2012-v1/AOD_ <!-- In this example it is specified the user working directory name _crab_lumi_. Here it is an example for this tutorial: --> *For real data* (crab_lumi.cfg) <!-- [CMSSW] lumis_per_job = 50 number_of_jobs = 10 pset = tutorial.py datasetpath = /SingleMu/Run2012B-13Jul2012-v1/AOD lumi_mask = Cert_190456-195947_8TeV_PromptReco_Collisions12_JSON_v2.txt output_file = outfile.root [USER] ui_working_dir = crab_lumi return_data = 0 email = yourEmailAddressHere@cern.ch copy_data = 1 storage_element = T2_IT_Legnaro user_remote_dir = TutGridSchool_real [CRAB] scheduler = glite jobtype = cmssw use_server = 1 --> %SYNTAX{ syntax="sh"}% [CMSSW] lumis_per_job = 50 number_of_jobs = 10 pset = tutorial.py datasetpath = /SingleMu/Run2012B-13Jul2012-v1/AOD lumi_mask = Cert_190456-195947_8TeV_PromptReco_Collisions12_JSON_v2.txt output_file = outfile.root [USER] return_data = 0 copy_data = 1 publish_data = 1 storage_element = T2_IT_Legnaro publish_data_name = FedeTutGridGlide_data dbs_url_for_publication = https://cmsdbsprod.cern.ch:8443/cms_dbs_ph_analysis_02_writer/servlet/DBSServlet [CRAB] scheduler = remoteGlidein jobtype = cmssw %ENDSYNTAX% where the lumi_mask file can be downloaded with <verbatim class="command"> wget --no-check-certificate https://cms-service-dqm.web.cern.ch/cms-service-dqm/CAF/certification/Collisions12/8TeV/Prompt/Cert_190456-195947_8TeV_PromptReco_Collisions12_JSON_v2.txt </verbatim> For the tutorial we are using a subset of run and lumi (using a lumiMask.json file). The lumi_mask file (Cert_190456-195947_8TeV_PromptReco_Collisions12_JSON_v2.txt) contains: %SYNTAX{ syntax="sh"}% {"190645": [[10, 110]], "190704": [[1, 3]], "190705": [[1, 5], [7, 76], [78, 336], [338, 350], [353, 384]], ... "195937": [[1, 28], [31, 186], [188, 400]], "195947": [[23, 62], [64, 88]]} %ENDSYNTAX% #JobCreation2 ---+++ Job Creation Creating jobs for real data is analogous to montecarlo data. To not overwrite previous run for this tutorial, it is suggested to use a dedicated cfg: <verbatim class="command"> crab -create -cfg crab_lumi.cfg </verbatim> that takes as configuration file the file name specified with the option -cfg, in this case the crab_lumi.cfg associated for this tutorial with real data. <!-- [lxplus444] $ crab -create -cfg crab_lumi.cfg crab: Version 2.8.1 running on Tue Jul 24 19:09:34 2012 CET (17:09:34 UTC) crab. Working options: scheduler glite job type CMSSW server ON (use_server) working directory /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_lumi crab: Contacting Data Discovery Services ... crab: Accessing DBS at: http://cmsdbsprod.cern.ch/cms_dbs_prod_global/servlet/DBSServlet crab: Requested (A)DS /SingleMu/Run2012B-TOPMuPlusJets-PromptSkim-v1/AOD has 13 block(s). crab: Each job will process about 10 lumis. crab: Requested number of lumis reached. crab: 6 jobs created to run on 100 lumis crab: Checking remote location crab: WARNING: The stageout directory already exists. Be careful not to accidentally mix outputs from different tasks crab: Creating 6 jobs, please wait... crab: Total of 6 jobs created. Log file is /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_lumi/log/crab.log --> <verbatim style="font-size: 13px" class="output"> $ crab -create -cfg crab_lumi.cfg crab: Version 2.8.5 running on Tue Mar 5 14:47:56 2013 CET (13:47:56 UTC) crab. Working options: scheduler remoteGlidein job type CMSSW server OFF working directory /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130305_144756/ verify if user DN is mapped in CERN's SSO OK. user ready for SiteDB switchover on March 12, 2013 crab: Contacting Data Discovery Services ... crab: Accessing DBS at: http://cmsdbsprod.cern.ch/cms_dbs_prod_global/servlet/DBSServlet crab: Requested (A)DS /SingleMu/Run2012B-TOPMuPlusJets-PromptSkim-v1/AOD has 13 block(s). crab: Requested number of lumis reached. crab: 8 jobs created to run on 500 lumis crab: Checking remote location crab: Creating 8 jobs, please wait... crab: Total of 8 jobs created. Log file is /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130305_144756/log/crab.log </verbatim> * The project directory called crab_0_130305_144756 is created. * As explained the number of created jobs can not match the number of jobs required in the configuration file (9 created but 10 required jobs). #CMSJobSubmission2 ---+++ Job Submission Job submission is always analogous: <!-- [lxplus444] $ crab -submit -c crab_lumi crab: Version 2.8.1 running on Tue Jul 24 19:09:52 2012 CET (17:09:52 UTC) crab. Working options: scheduler glite job type CMSSW server ON (default) working directory /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_lumi/ crab: Registering credential to the server : vocms22.cern.ch crab: Your proxy lacks of retrieval and renewal policies for the requested server. crab: Renew your myproxy credentials. crab: Please renew MyProxy delegated proxy: Your identity: /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=fanzago/CN=610896/CN=Federica Fanzago Enter GRID pass phrase for this identity: Creating proxy ................................... Done Proxy Verify OK Your proxy is valid until: Tue Jul 31 19:09:58 2012 A proxy valid for 168 hours (7.0 days) for user /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=fanzago/CN=610896/CN=Federica Fanzago now exists on myproxy.cern.ch. crab: Credential successfully delegated to the server. crab: Starting sending the project to the storage vocms22.cern.ch... crab: Task crab_lumi successfully submitted to server vocms22.cern.ch crab: Total of 6 jobs submitted Log file is /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_lumi/log/crab.log --> <verbatim class="output" style="font-size: 13px"> $ crab -submit crab: Version 2.8.5 running on Tue Mar 5 14:54:39 2013 CET (13:54:39 UTC) crab. Working options: scheduler remoteGlidein job type CMSSW server OFF working directory /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130305_144756/ crab: Checking available resources... crab: Found compatible site(s) for job 1 crab: 1 blocks of jobs will be submitted crab: remotehost from Avail.List = submit-2.t2.ucsd.edu crab: contacting remote host submit-2.t2.ucsd.edu crab: Establishing gsissh ControlPath. Wait 2 sec ... crab: Establishing gsissh ControlPath. Wait 2 sec ... crab: COPY FILES TO REMOTE HOST crab: SUBMIT TO REMOTE GLIDEIN FRONTEND Submitting 8 jobs 100% [=================================================================================================================] please wait crab: Total of 8 jobs submitted. Log file is /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130305_144756/log/crab.log </verbatim> #JobStatusCheck2 ---+++ Job Status Check Check the status of the jobs in the latest !CRAB project with the following command: <verbatim class="command"> crab -status </verbatim> to check a specific project: <verbatim class="command"> crab -status -c <dir name> </verbatim> which should produce a similar screen output like: <!-- [lxplus428] $ crab -status -c crab_lumi crab: Version 2.8.1 running on Wed Jul 25 00:40:52 2012 CET (22:40:52 UTC) crab. Working options: scheduler glite job type CMSSW server ON (default) working directory /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_lumi crab: ID END STATUS ACTION ExeExitCode JobExitCode E_HOST ----- --- ----------------- ------------ ---------- ----------- --------- 1 Y Done Terminated 0 0 red-gw2.unl.edu 2 Y Done Terminated 0 0 red.unl.edu 3 Y Done Terminated 0 0 red-gw2.unl.edu 4 Y Done Terminated 0 0 red-gw1.unl.edu 5 Y Done Terminated 0 0 red.unl.edu 6 Y Done Terminated 0 0 red-gw2.unl.edu crab: ExitCodes Summary >>>>>>>>> 6 Jobs with Wrapper Exit Code : 0 List of jobs: 1-6 See https://twiki.cern.ch/twiki/bin/view/CMS/JobExitCodes for Exit Code meaning crab: 6 Total Jobs crab: You can also follow the status of this task on : CMS Dashboard: http://dashb-cms-job-task.cern.ch/taskmon.html#task=fanzago_crab_lumi_t456uk Server page: http://vocms22.cern.ch:8888/logginfo Your task name is: fanzago_crab_lumi_t456uk Log file is /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_lumi/log/crab.log --> <verbatim style="font-size: 13px" class="output"> $ crab -status crab: Version 2.8.5 running on Tue Mar 5 14:59:36 2013 CET (13:59:36 UTC) crab. Working options: scheduler remoteGlidein job type CMSSW server OFF working directory /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130305_144756/ crab: Checking the status of all jobs: please wait crab: contacting remote host submit-2.t2.ucsd.edu crab: ID END STATUS ACTION ExeExitCode JobExitCode E_HOST ----- --- ----------------- ------------ ---------- ----------- --------- 1 N Running SubSuccess cream02.iihe.ac.be 2 N Running SubSuccess cream02.iihe.ac.be 3 N Running SubSuccess cream02.iihe.ac.be 4 N Running SubSuccess cream02.iihe.ac.be 5 N Submitted SubSuccess 6 N Running SubSuccess cream02.iihe.ac.be 7 N Running SubSuccess cream02.iihe.ac.be 8 N Running SubSuccess red-gw2.unl.edu crab: 8 Total Jobs >>>>>>>>> 1 Jobs Submitted List of jobs Submitted: 5 >>>>>>>>> 7 Jobs Running List of jobs Running: 1-4,6-8 crab: You can also follow the status of this task on : CMS Dashboard: http://dashb-cms-job-task.cern.ch/taskmon.html#task=fanzago_crab_0_130305_144756_db2r51 Your task name is: fanzago_crab_0_130305_144756_db2r51 </verbatim> <!-- %BLUE%Also, you can have a look at the web page of the server where you can see the status progress of you job.%ENDCOLOR% Simply, execute the command: <verbatim class='command'> crab -printId </verbatim><verbatim class='command'> crab -printId -c <dir name> </verbatim> And you will get the unique id of your jobs: <verbatim style="font-size: 13px" class="output"> [lxplus428] $ crab -printId -c crab_lumi/ crab: Version 2.8.1 running on Wed Jul 25 00:43:25 2012 CET (22:43:25 UTC) crab. Working options: scheduler glite job type CMSSW server ON (default) working directory /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_lumi/ crab: Task Id = fanzago_crab_lumi_t456uk -------------------------------------------------------------------------------------------- crab: You can also follow the status of this task on : CMS Dashboard: http://dashb-cms-job-task.cern.ch/taskmon.html#task=fanzago_crab_lumi_t456uk Server page: http://vocms22.cern.ch:8888/logginfo Your task name is: fanzago_crab_lumi_t456uk Log file is /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_lumi/log/crab.log </verbatim> Copy the unique id of your task (in the above example: _fanzago_crab_lumi_t456uk_), go to the link of the server being used (in this case the [[http://vocms22.cern.ch:8888/logginfo][CERN Server]], paste the unique id in the text field and press the "Show" button. --> #JobOutputRetrieval2 ---+++ Job Output Retrieval For the jobs which are in the "Done" status it is possible to retrieve the log files of the jobs (just the log files, because the output files are copied to the Storage Element associated to the T2 specified on the crab.cfg and infact return_data is 0). The following command retrieves the log files of all "Done" jobs of the last created !CRAB project: <verbatim class="command"> crab -getoutput </verbatim> to get the output of a specific project: <verbatim class="command"> crab -getoutput -c <dir name> </verbatim> the job results will be copied in the =res= subdirectory of your crab project: <!-- [lxplus428] $ crab -get -c crab_lumi crab: Version 2.8.1 running on Wed Jul 25 00:40:38 2012 CET (22:40:38 UTC) crab. Working options: scheduler glite job type CMSSW server ON (default) working directory /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_lumi crab: Starting retrieving output from server vocms22.cern.ch... crab: Results of Jobs # 1 are in /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_lumi//res/ crab: Results of Jobs # 2 are in /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_lumi//res/ crab: Results of Jobs # 3 are in /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_lumi//res/ crab: Results of Jobs # 4 are in /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_lumi//res/ crab: Results of Jobs # 5 are in /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_lumi//res/ crab: Results of Jobs # 6 are in /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_0_lumi//res/ Log file is /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_lumi//log/crab.log --> <verbatim style="font-size: 13px" class="output"> $ crab -get crab: Version 2.8.5 running on Tue Mar 5 15:15:32 2013 CET (14:15:32 UTC) crab. Working options: scheduler remoteGlidein job type CMSSW server OFF working directory /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130305_144756/ crab: contacting remote host submit-2.t2.ucsd.edu crab: RETRIEVE FILE out_files_1.tgz for job #1 crab: RETRIEVE FILE crab_fjr_1.xml for job #1 crab: Results of Jobs # 1 are in /afs/cern.ch/user/f/fanzago/scratch0/TEST_RELEASE/TEST_PATC2/TEST_2_8_2/crab_0_130305_144756/res/ ... Log file is /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130305_144756/log/crab.log </verbatim> #CrabReport ---+++ Use the -report option As for the CMS.MonteCarlo data example, it is possible to run the report command: <verbatim class='command'> crab -report -c <dir name> </verbatim> <!-- [lxplus428] $ crab -report -c crab_lumi crab: Version 2.8.1 running on Wed Jul 25 00:53:02 2012 CET (22:53:02 UTC) crab. Working options: scheduler glite job type CMSSW server ON (default) working directory /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_lumi/ crab: -------------------- Dataset: /SingleMu/Run2012B-TOPMuPlusJets-PromptSkim-v1/AOD Remote output : SE: T2_IT_Legnaro t2-srm-02.lnl.infn.it srmPath: srm://t2-srm-02.lnl.infn.it:8443/srm/managerv2?SFN=/pnfs/lnl.infn.it/data/cms/store/user/fanzago/TutGridSchool_real/ Total Events read: 9178 Total Files read: 13 Total Jobs : 6 Luminosity section summary file: /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_lumi/res/lumiSummary.json # Jobs: Cleared:6 ---------------------------- Log file is /afs/cern.ch/user/f/fanzago/scratch0/TUTORIAL/crab_lumi/log/crab.log --> <verbatim style="font-size: 13px" class="output"> $ crab -report crab: Version 2.8.5 running on Tue Mar 5 15:18:00 2013 CET (14:18:00 UTC) crab. Working options: scheduler remoteGlidein job type CMSSW server OFF working directory /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130305_144756/ crab: -------------------- Dataset: /SingleMu/Run2012B-TOPMuPlusJets-PromptSkim-v1/AOD Remote output : SE: T2_IT_Legnaro t2-srm-02.lnl.infn.it srmPath: srm://t2-srm-02.lnl.infn.it:8443/srm/managerv2?SFN=/pnfs/lnl.infn.it/data/cms/store/user/fanzago/SingleMu/FedeTutGridGlide_data/${PSETHASH}/ Total Events read: 39942 Total Files read: 29 Total Jobs : 8 Luminosity section summary file: /afs/cern.ch/user/f/fanzago/scratch0/TEST_RELEASE/TEST_PATC2/TEST_2_8_2/crab_0_130305_144756/res/lumiSummary.json # Jobs: Retrieved:8 ---------------------------- crab: Summary file of input run and lumi to be analize with this task: /afs/cern.ch/user/f/fanzago/scratch0/TEST_RELEASE/TEST_PATC2/TEST_2_8_2/crab_0_130305_144756/res//inputLumiSummaryOfTask.json crab: to complete your analysis, you have to analyze the run and lumi reported in the /afs/cern.ch/user/f/fanzago/scratch0/TEST_RELEASE/TEST_PATC2/TEST_2_8_2/crab_0_130305_144756/res//missingLumiSummary.json file Log file is /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130305_144756/log/crab.log </verbatim> And the content of files containing the luminosity info about the task are: the original lumiMask.json file used in the creation of your task <verbatim style="font-size: 13px" class="output"> $ cat Cert_190456-195947_8TeV_PromptReco_Collisions12_JSON_v2.txt {"190645": [[10, 110]], "190704": [[1, 3]], "190705": [[1, 5], [7, 76], [78, 336], [338, 350], [353, 384]], "190738": [[1, 130], [133, 226], [229, 355]], ... </verbatim> the lumi sections that your created jobs have to analyze <verbatim style="font-size: 13px" class="output"> $ cat crab_0_130609_231016/res/inputLumiSummaryOfTask.json {"195947": [[27, 27], [36, 36]], "194108": [[95, 96], [117, 121], [123, 126], [149, 152], [154, 157], [160, 161], [166, 169], [172, 174], [176, 176], [185, 185], [187, 187], [190, 191], [196, 197], [200, 201], [206, 209], [211, 212], [216, 221], [231, 232], [234, 235], [238, 243], [249, 250], [277, 278], [285, 286], [305, 308], [311, 312], [333, 334], [438, 439], [520, 520], [527, 527]], ... </verbatim> the lumi sections really analyzed by your correctly terminated jobs <verbatim style="font-size: 13px" class="output"> $ cat /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130305_144756/res/lumiSummary.json cat crab_0_130609_231016/res/lumiSummary.json {"194424": [[63, 63], [92, 92], [121, 121], [123, 123], [168, 173], [176, 177], [184, 185], [187, 187], [199, 200], [202, 203], [207, 207], [213, 213], [220, 221], [256, 256], [557, 557], [559, 559], [562, 562], [564, 564], [599, 599], [602, 602], [607, 607], [609, 609], [639, 639], [648, 649], [656, 656], [658, 658], [660, 660]], "194108": [[95, 96], [117, 121], [123, 126], [149, 152], [154, 157], [160, 161], [166, 169], [172, 174], [176, 176], [185, 185], [187, 187], [190, 191], [196, 197], [200, 201], [206, 209], [211, 212], [216, 221], [231, 232], [234, 235], [238, 243], [249, 250], [277, 278], [285, 286], [305, 308], [311, 312], [333, 334], [438, 439], [520, 520], [527, 527]], ... </verbatim> and the missing lumi (difference between the lumiMask and lumiSummary) that you can analyze creating a new task and using this file as new lumiMask file <verbatim style="font-size: 13px" class="output"> $ cat /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130305_144756/res//missingLumiSummary.json file cat crab_0_130609_231016/res/missingLumiSummary.json file {"190645": [[10, 110]], "190704": [[1, 3]], "190705": [[1, 5], [7, 76], [78, 336], [338, 350], [353, 384]], "190738": [[1, 130], [133, 226], [229, 355]], ... </verbatim> To create a task to analyze the missing lumis you can use the missingLumiSummary.json file as lumiMask.json file in your crab.cfg %SYNTAX{ syntax="sh"}% [CMSSW] total_number_of_lumis = -1 number_of_jobs = 10 pset = tutorial.py datasetpath = /SingleMu/Run2012B-13Jul2012-v1/AOD lumi_mask = crab_0_130305_144756/res/missingLumiSummary.json output_file = outfile.root [USER] return_data = 0 copy_data = 1 publish_data =1 storage_element = T2_IT_Legnaro publish_data_name = FedeTutGridGlide_data dbs_url_for_publication = https://cmsdbsprod.cern.ch:8443/cms_dbs_ph_analysis_02_writer/servlet/DBSServlet [CRAB] scheduler = remoteGlidein jobtype = cmssw %ENDSYNTAX% <verbatim style="font-size: 13px" class="output"> $ crab -create -cfg crab_missing.cfg crab: Version 2.8.5 running on Tue Mar 5 15:22:50 2013 CET (14:22:50 UTC) crab. Working options: scheduler remoteGlidein job type CMSSW server OFF working directory /afs/cern.ch/user/f/fanzago/scratch0/TEST/crab_0_130305_152250/ verify if user DN is mapped in CERN's SSO OK. user ready for SiteDB switchover on March 12, 2013 crab: Contacting Data Discovery Services ... crab: Accessing DBS at: http://cmsdbsprod.cern.ch/cms_dbs_prod_global/servlet/DBSServlet crab: Requested (A)DS /SingleMu/Run2012B-TOPMuPlusJets-PromptSkim-v1/AOD has 13 block(s). crab: Each job will process about 192 lumis. crab: 9 jobs created to run on 1918 lumis crab: Checking remote location crab: WARNING: The stageout directory already exists. Be careful not to accidentally mix outputs from different tasks crab: Creating 9 jobs, please wait... crab: Total of 9 jobs created. </verbatim> and submit them as usual. The created jobs will analyze all the missing lumi of the original lumiMask.json file #JustSe ---++ Run Crab retrieving your output (without copying to a Storage Element) You can also run your analysis code without interacting with a remote Storage Element, but retrieving the outputs to your workspace area (under the res dir of the project). Here below an example of the !CRAB configuration file, coerent with this tutorial: %SYNTAX{ syntax="sh"}% [CMSSW] total_number_of_events = 100 number_of_jobs = 10 pset = tutorial.py datasetpath = /RelValProdTTbar/JobRobot-MC_3XY_V24_JobRobot-v1/GEN-SIM-DIGI-RECO output_file = outfile.root [USER] return_data = 1 [CRAB] scheduler = remoteGlidein jobtype = cmssw %ENDSYNTAX% And with this crab.cfg in place you can re-do de workflow as described before (a part of the publication step): * creation * submission * status progress monitoring * output retrieval (in this step you'll be able to retrieve directly the real output produced by your pset file) #MoreDoc ---++ Where to find more on !CRAB * [[https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideCrab][CRAB Home]] * [[SWGuideCrabHowTo][HowTos]] * [[https://twiki.cern.ch/twiki/bin/view/CMS/SWGuideCrabFaq][CRAB FAQ]] * [[https://twiki.cern.ch/twiki/bin/view/CMS/WorkBookGridJobDiagnosisTemplate][WorkBookGridJobDiagnosisTemplate]]: Steps to identify the problems you experience with your grid analysis jobs. * [[https://hypernews.cern.ch/HyperNews/CMS/get/crabFeedback.html][CRAB mailing list]] where to send feedback and ask support in case of jobs problem (please send to us your crab.cfg file and the job stderr - stdout - log otherwise we are not able to provide support) Note also that all CMS members using the Grid must subscribe to the [[https://hypernews.cern.ch/HyperNews/CMS/get/gridAnnounce.html][Grid Annoucements CMS.HyperNews forum]]. #ReviewStatus ---++ Review status | *Reviewer/Editor and Date (copy from screen)* | *Comments* | | Main.JohnStupak - 4-June-2013 | Review, minor revisions, updated real data dataset to an existing dataset | | Main.NitishDhingra - 2012-04-07 | See detailed comments below. | | Main.MattiaCinquilli - 2010-04-15 | Update for tutorial | | Main.FedericaFanzago - 18 Feb 2009 | Update for tutorial | | Main.AndriusJuodagalvis - 2009-08-21 | Added an instance of url_local_dbs | %TWISTY{mode="div" showlink="Detailed comments 07-Apr-2012 " hidelink="Hide " firststart="hide" showimgright="%ICONURLPATH{toggleopen-small}%" hideimgright="%ICONURLPATH{toggleclose-small}%"}% Complete Review, Minor Changes. Page gives a good idea of doing a physics analysis using CRAB %ENDTWISTY% %RESPONSIBLE% Main.FedericaFanzago %BR%
Attachments
Attachments
Topic attachments
I
Attachment
History
Action
Size
Date
Who
Comment
css
tutorial.css
r1
manage
0.3 K
2010-04-14 - 10:19
MattiaCinquilli
Edit
|
Attach
|
Watch
|
P
rint version
|
H
istory
:
r120
|
r112
<
r111
<
r110
<
r109
|
B
acklinks
|
V
iew topic
|
Raw edit
|
More topic actions...
Topic revision: r110 - 2013-06-11
-
JohnStupak
Log In
CMSPublic
CMSPublic Web
CMSPrivate Web
Create New Topic
Index
Search
Changes
Notifications
Statistics
Preferences
Create
a LeftBar
Public webs
Public webs
ABATBEA
ACPP
ADCgroup
AEGIS
AfricaMap
AgileInfrastructure
ALICE
AliceEbyE
AliceSPD
AliceSSD
AliceTOF
AliFemto
ALPHA
Altair
ArdaGrid
ASACUSA
AthenaFCalTBAna
Atlas
AtlasLBNL
AXIALPET
CAE
CALICE
CDS
CENF
CERNSearch
CLIC
Cloud
CloudServices
CMS
Controls
CTA
CvmFS
DB
DefaultWeb
DESgroup
DPHEP
DM-LHC
DSSGroup
EGEE
EgeePtf
ELFms
EMI
ETICS
FIOgroup
FlukaTeam
Frontier
Gaudi
GeneratorServices
GuidesInfo
HardwareLabs
HCC
HEPIX
ILCBDSColl
ILCTPC
IMWG
Inspire
IPv6
IT
ItCommTeam
ITCoord
ITdeptTechForum
ITDRP
ITGT
ITSDC
LAr
LCG
LCGAAWorkbook
Leade
LHCAccess
LHCAtHome
LHCb
LHCgas
LHCONE
LHCOPN
LinuxSupport
Main
Medipix
Messaging
MPGD
NA49
NA61
NA62
NTOF
Openlab
PDBService
Persistency
PESgroup
Plugins
PSAccess
PSBUpgrade
R2Eproject
RCTF
RD42
RFCond12
RFLowLevel
ROXIE
Sandbox
SocialActivities
SPI
SRMDev
SSM
Student
SuperComputing
Support
SwfCatalogue
TMVA
TOTEM
TWiki
UNOSAT
Virtualization
VOBox
WITCH
XTCA
Cern Search
TWiki Search
Google Search
CMSPublic
All webs
Copyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use
Discourse
or
Send feedback