Crab2 Basic Workflows

THIS TWIKI PAGE REFERS TO CRAB2 AND IS THEREFORE LARGELY OBSOLETE. PLEASE USE SWGuideCrab INSTEAD

Crab Configuration File

This section describes how to edit the crab.cfg file according to the level of use of the tool, from the simplest to the more advanced. However the levels all describe the basic functionalities of CRAB2.

Level 1: return results locally

This basic workflow will enable you to

  • Analyze your chosen dataset
    • To discover the data available you can use the DBS discovery page.
      • Detailed instructions on how to do this are given here
  • Return the results, e.g. .root file with histograms, to your User Interface (UI), e.g. lxplus

This default configuration tells CRAB how to submit your job and what to do with the output. You need to adjust it according to your analysis needs.

[CRAB]
jobtype                  = cmssw    
scheduler                = remoteGlidein
use_server              = 0

[CMSSW]
datasetpath              =
pset                     =
output_file              = 
total_number_of_lumis   = -1
lumis_per_job            = 1000

[USER]
thresholdLevel=  An_integer_in_0_100
eMail = YourEmailAddress
return_data              = 1

Where:

  • datasetpath. - your input dataset, e.g. datasetpath = /TauolaTTbar/Summer08_IDEAL_V9_AODSIM_v1/AODSIM
  • pset - your CMSSW configuration file, e.g. myconfig_cfg.py
  • output_file = the .root file produced by your CMSSW job, e.g. myoutput.root

Level 2: copy results to a Storage Element

This basic workflow will enable you to

  • Analyse your chosen dataset
  • Store the results, e.g. .root file with histograms or a large ntuple, to an official (i.e. it is reported in the SiteDB site list) remote Storage Element (SE)
    • Note: The log files (stdout, stderr, framework job report) will still be returned to the UI when you getoutput.

[CRAB]
jobtype                  = cmssw    
scheduler                = remoteGlidein
use_server              = 0

[CMSSW]
datasetpath              =
pset                     =
total_number_of_lumis   = -1
lumis_per_job            = 1000

[USER]
thresholdLevel=   An_integer_in_0_100
eMail = YourEmailAddress
return_data              = 0
copy_data                = 1
storage_element          = 
user_remote_dir = myResults_v1

Where:

  • return_data = 0, instructs CRAB not to return the output_file to the UI
  • copy_data = 1, instructs CRAB to copy the output_file to the SE.
  • storage_element, name of SE, e.g. T2_IT_Bari
  • user_remote_dir = myResults_v1, instructs CRAB to store the results in the directory, myResults_v1, which will be relative to /store/user/username/ on the storage element that you are sending the results to. It is important that this is unique because if the files already exist then CRAB will not be able to overwrite them and job creation will fail.

Level 3: publish copied results in a Storage Element to a DBS instance

This basic workflow will enable you to

  • Analyse your chosen dataset
  • Store the results
  • Publish the results into a local DBS so you can process them with further Grid jobs

You can only publish the results if you configure CRAB as described here before submitting to the Grid. Your have to set your crab.cfg in this way:

     [USER]
     copy_data = 1
     storage_element = "official_CMS_site_name"  (e.g. T2_IT_legnaro)
     publish_data=1
     publish_data_name = data_name_to_publish 
     
    • storage_element is the name of the SE reported in the CMS.SiteDB https://cmsweb.cern.ch/sitedb/sitelist/ . The mapping between the StorageElement name and CMS site names is reported here. Note that there is one exception: you have to use T2_RU_IHEP_Disk instead of T2_RU_IHEP.
    • publish_data = 1, instructs CRAB that you wish to publish your output_files to DBS.
    • publish_data_name is a descriptive string that will appear in the dataset name when published (e.g. myprocessingCMSSW_1_6_8).
    • in this case the storage_path where user can store data is automatically discovery by CRAB.

More detailed documentation on publication is provided here.

Analyse published results

Of course you can:

  • Process results published to a local DBS instance.
  • Either return or store the results

To analyse results that have been published in a local DBS you may use a CRAB configuration identical to any other, with the addition that you must specify the DBS instance to which the data was published. To do this you must modify the [CMSSW] section of your CRAB configuration file as indicate here, e.g.

datasetpath              =<primarydataset>/<publish_data_name>/USER
pset                     = <your CMSSW configuration file, e.g. myconfig_cfg.py>
total_number_of_lumis   = -1
lumis_per_job            = 1000
dbs_url                  = http://cmsdbsprod.cern.ch/cms_dbs_ph_analysis_02/servlet/DBSServlet

Basic Crab Commands

The basic workflow of commands you must use to run your job, monitor it and retrieve the results is shown below:

basic_workflow-1.png



  • "crab -create", create all jobs (no submission!)
  • "crab -submit", submit all jobs above created
    • The above two commands can be done in a single step ("crab -create -submit")
  • "crab -status", check the status of all jobs above submitted
  • "crab -getoutput", retrieve the output of all jobs above submitted and terminated
  • "crab -report", prints a summary of the task progress and creates a lumiSummary.json file to be used e.g. in lumiCalc.py (see the LumiCalc TWiki) to compute the luminosity those output files correspond to
  • "crab -publish", publish the results stored in the SE

-- MarcoCalloni - 27-Jan-2010

Topic attachments
I Attachment History Action Size Date Who Comment
PNGpng basic_workflow-1.png r1 manage 48.7 K 2010-03-03 - 10:21 MarcoCalloni  
Edit | Attach | Watch | Print version | History: r19 < r18 < r17 < r16 < r15 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r19 - 2016-06-01 - StefanoBelforte
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback