Difference: CRAB3FAQ (1 vs. 108)

Revision 1082019-09-11 - StefanoBelforte

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 293 to 293
 

Can I send CRAB output to CERNBOX ?

Added:
>
>
<!--/twistyPlugin twikiMakeVisibleInline-->
 Yes, by doing both the following:
  1. indicating T2_CH_CERNBOX as storage location in CRAB configuration
  2. asking CERNBOX administrators (which are NOT in CMS) to grant proper permission to your DN
Line: 303 to 304
 The T2_CH_CERNBOX site is not listed among CMS Sites e.g. in https://cms-cric.cern.ch/cms/site/index/ but a trivial file catalog exist for it, and it is known to PhEDEx, i.e. it is a known storage location for CMS in https://cmsweb.cern.ch/phedex/prod/Components::Status (and https://cms-cric.cern.ch/cms/storageunit/detail/T2_CH_CERNBOX/ ) which allows CRAB to use the T2_CH_CERNBOX string to map logical file names of the kind /store/user/somename to gsiftp://eosuserftp.cern.ch/eos/user/s/somename which is the proper end point for writing to CERNBOX.

But since CERNBOX is not part of CMS disk, but a space which CERN offers to all users, access to it is not controlled by CMS, so in order to be able to write there from a grid node using gsiftp (differntly from using e.g. CERNBOX client or fuse mount on lxplus), users need to ask for help from CERN, e.g. via CERN help desk or a SNOW ticket.

Added:
>
>
<!--/twistyPlugin-->
 

Jobs/Task status

Revision 1072019-09-08 - StefanoBelforte

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 996 to 996
 

Miscellanea

Added:
>
>

How can I use CRAB to submit to my local batch system

<!--/twistyPlugin twikiMakeVisibleInline-->
CRAB3 only support submission to the CMS global pool, an HTCondor pool comprised of glideinWms pilots running on distributed grid sites. But it is possible to use CRAB machinery for data discovery, job splitting, and job preparation (i.e. configure a set of jobs to be executed to process a given input dataset) and execute those jobs locally, interactively or in users preferred batch system.

In this case the CRAB Server machinery will not be involved and the following differences must be kept in mind

  • crab submit will not be used and no crab commands will make sense
  • bookkeeping and resubmissions will be on user side
  • there will be no stageout and no publication, it will be up to users to user local batch system machinery to take care of output retrieval
  • CRAB will not prepare batch system specific submission instruction, only one generic script to be executed in each job and a set of files to be sent toghether with that script so that it can customize itself with the input specifics of each single job.
  • It will be up to the user to find and use whatever feature is available in the local system to pass to each job a different numeric argument so that they execute as jon 1...N in the task, rather than N copies of job 1
  • if the running jobs require a grid proxy (e.g. to user xrootd to read from remote sites) it is user responsibility to take care

The way to do this is via the crab preparelocal command. Please refer to crab preparelocal help

There is an example of how to use this to submit one CRAB task on the CERN HTCondor batch system. Those instructions will also work for FNAL LPC HTCondor.

<!--/twistyPlugin-->
 

How CRAB finds data in input datasets from DBS

<!--/twistyPlugin twikiMakeVisibleInline-->

Revision 1062019-08-24 - StefanoBelforte

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 304 to 304
  But since CERNBOX is not part of CMS disk, but a space which CERN offers to all users, access to it is not controlled by CMS, so in order to be able to write there from a grid node using gsiftp (differntly from using e.g. CERNBOX client or fuse mount on lxplus), users need to ask for help from CERN, e.g. via CERN help desk or a SNOW ticket.
Changed:
<
<

Jobs status

>
>

Jobs/Task status

 

My jobs are still idle/pending/queued. How can I know why and what can I do?

Line: 312 to 312
 If jobs are pending for more than ~12 hours, there is certainly a problem somewhere. The first thing to do is to identify to which site(s) the jobs were submitted and check the site(s) status in the Site Readiness Monitor page, http://cms-site-readiness.web.cern.ch/cms-site-readiness/SiteReadiness/HTML/SiteReadinessReport.html. For example, the "HammerCloud" row will tell whether analysis jobs are running at the site and their success rate, and the "Maintenance" row will tell whether the site had/has a downtime (clicking on the corresponding date inset in the table will open a new web page where the downtime reason is explained). If everything looks fine with the site(s) status, it may be that the user jobs are not running because they requested more resources (memory per core) than what the site(s) can offer (see What is the maximum memory per job (maxMemoryMB) I can request?).
<!--/twistyPlugin-->
Added:
>
>

crab status says that Task is FAILED w/o any other information

<!--/twistyPlugin twikiMakeVisibleInline-->
This can happen when using Automatic Splitting if all of the probe jobs failed (see CRAB3FAQ#What_is_the_Automatic_splitting for a description of probe jobs). Example:
CRAB project directory:      /afs/cern.ch/work/b/belforte/CRAB3/TC3/dbg/zuolo/crab_TTTo2L2Nu_TuneCP5_13TeV-powheg-pythia8_1
Task name:         190823_123943:dzuolo_crab_TTTo2L2Nu_TuneCP5_13TeV-powheg-pythia8_1
Grid scheduler - Task Worker:   crab3@vocms0107.cern.ch - crab-prod-tw01
Status on the CRAB server:   SUBMITTED
Task URL to use for HELP:   https://cmsweb.cern.ch/crabserver/ui/task/190823_123943%3Adzuolo_crab_TTTo2L2Nu_TuneCP5_13TeV-powheg-pythia8_1
Dashboard monitoring URL:   http://dashb-cms-job.cern.ch/dashboard/templates/task-analysis/#user=dzuolo&refresh=0&table=Jobs&p=1&records=25&activemenu=2&status=&site=&tid=190823_123943%3Adzuolo_crab_TTTo2L2Nu_TuneCP5_13TeV-powheg-pythia8_1
New dashboard monitoring URL:   https://monit-grafana.cern.ch/d/cmsTMDetail/cms-task-monitoring-task-view?orgId=11&var-user=dzuolo&var-task=190823_123943%3Adzuolo_crab_TTTo2L2Nu_TuneCP5_13TeV-powheg-pythia8_1
In case of issues with the new dashboard, please provide feedback to hn-cms-computing-tools@cern.ch
Status on the scheduler:   FAILED

No publication information (publication has been disabled in the CRAB configuration file)
Log file is /afs/cern.ch/work/b/belforte/CRAB3/TC3/dbg/zuolo/crab_TTTo2L2Nu_TuneCP5_13TeV-powheg-pythia8_1/crab.log

in this case you can more information with crab status --long
<!--/twistyPlugin-->

When using automatic splitting all probe jobs failes with 50664 (time limit exceed)

<!--/twistyPlugin twikiMakeVisibleInline-->
User configuration parameter have no effect on the time limit for probe jobs. The are always configured for terminating after 15min, but cmsRun can only stop on lumi boundaries. Probe jobs are allowed to run up to 1h, but if not even that is sufficient, they will fail as in example below. In this case you should avoid using Automatic Splitting and fall back to Lumiosity Splitting with a few lumis per job.

 Job State        Most Recent Site        Runtime   Mem (MB)      CPU %    Retries   Restarts      Waste       Exit Code
 0-1 no output    T1_US_FNAL              1:00:18       1548         97          0          0    0:00:10           50664
 0-2 no output    T1_US_FNAL              1:00:15       1509         97          0          0    0:00:10           50664
 0-3 no output    T1_US_FNAL              1:00:18       1445        100          0          0    0:00:10           50664
 0-4 no output    T1_US_FNAL              1:00:17       1402         98          0          0    0:00:11           50664
 0-5 no output    T2_US_MIT               1:00:16       1656         94          0          0    0:00:10           50664

<!--/twistyPlugin-->
 

CRAB commands

crab checkusername fails with "Error: Failed to retrieve username from SiteDB."

Revision 1052019-08-09 - StefanoBelforte

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 711 to 711
 </>
<!--/twistyPlugin-->
Added:
>
>

Segmentation Fault (exit code 11 or 139)

<!--/twistyPlugin twikiMakeVisibleInline-->
Usually segmentation faults are well reproducible and can be debugged by running locally on same input files as the CRAB job (e.g. via using [[https://twiki.cern.ch/twiki/bin/view/CMSPublic/CRAB3Commands#crab_preparelocal][crab preparelocal]). Here's some general hint on how to tackle them from https://hypernews.cern.ch/HyperNews/CMS/get/computing-tools/5166/2.html :

A segfault is typically caused by invalid memory access (e.g. reading out of bounds of an array or dereferencing a null or random pointer). A simple step forward is to recompile the offending code with debug symbols, e.g.

  • USER_CXXFLAGS="-g" scram b
and run again. Then the stack trace will show the source file and line number where the segfault occurred. If the cause is not evident, you can add printouts or use gdb.
<!--/twistyPlugin-->
 

[ERROR] Operation expired

<!--/twistyPlugin twikiMakeVisibleInline-->

Revision 1042019-08-07 - StefanoBelforte

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 696 to 696
  Exit code 50660 means "Application terminated by wrapper because using too much RAM (RSS)" (as documented here). The amount of RAM that a job can use on a grid node is always limited and if memory need keeps increasing as the job run (so called "memory leak") the job will need to be killed. Grid sites used by CMS guarantee at least 2.5 GB of RAM per core, so allowing for some overhead, CRAB default is to ask 2GB per job. This is usually enough to run full RECO and user jobs should not normally need more. So the user first action when getting this error is to make sure that code is not leaking memory nor allocating useless large structures. If more RAM is really needed, it can be requested via the JobType.maxMemoryMB parameter in CRAB configuration file. Uselessly requesting too much RAM is very likely to result in wasted CPU (we will run less jobs then there are CPU cores available in a node, to spread the available RAM in fewer, larger, chunks), so you have to be careful, abuse will be monitored and tasks may get killed.
Added:
>
>
Each user is responsible for her/his code and needs to make sure that memory usage is under control. Various tools exist to identify and prevent memory leaks in C++ which are not in CRAB documentation scope. Generally speaking when investigating memory usage you want to make sure that you run on the same input as one job which resulte in memory problems, as usage can depend on number, sequence and kind of event processed. User may also benefic from the [[https://twiki.cern.ch/twiki/bin/view/CMSPublic/CRAB3Commands#crab_preparelocal][crab preparelocal] command to replay one specific job interactively and monitor memory usage.
 An important exception is in case the user runs multi-threaded applications, in particular CMSSW. In that case a single job will use multiple cores and not only can, but must use more than the default 2GB of RAM. It is up to the user to request the proper amount of memory, e.g. after measuring it running the code interactively, or by looking up what Production is using in similar workflows. As a generic rule of thumb, (1+1*num_threads) GB may be a good starting point.

<!--/twistyPlugin-->

Revision 1032019-06-23 - LeonardoCristella

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 89 to 89
 
<!--/twistyPlugin twikiMakeVisibleInline-->
The Data.splitting parameter has now a default value: 'Automatic'.
With such a setting the task processing is split into three stages:
Changed:
<
<
  1. A "probe" stage, where some probe jobs are submitted to estimate the event throughput of the CMSSW parameter-set configuration provided by the user in the JobType.psetName parameter and possible further arguments. Probe jobs have a job id of the form 0-[1,2,3,...], they can not be resubmitted and the task will fail if none of the probe jobs complete successfully.
>
>
  1. A "probe" stage, where some probe jobs are submitted to estimate the event throughput of the CMSSW parameter-set configuration provided by the user in the JobType.psetName parameter and possible further arguments. Probe jobs have a job id of the form 0-[1,2,3,...], they can not be resubmitted and the task will fail if none of the probe jobs complete successfully. The output files transfer is disabled for probe jobs.
 
  1. A "main" stage, very similar to the conventional stage for other splitting modes, in which a number of main jobs (automatically determined by the probe stage) will process the dataset. These jobs can not be manually resubmitted and have a fixed maximum runtime (specified in the Data.unitsPerJob parameter), after which they gracefully stop processing input data. The remaining data will be processed in the next stage (tail) and their jobs labelled as "rescheduled" in the main stage (in the dashboard they will always appear as "failed").
  2. Some possible "tail" stages. If some main job does not finish successfully ("rescheduled" in the previous stage) or does not completely process the amount of data assigned to it due to the automatically configured maximum job run time, tail jobs are created and submitted in order to fully process the dataset. Tail jobs have a job id of the form n-[1,2,3,...], where n=1,2,... represents the tail stage number. For small tasks, less than 100 jobs, one tail stage is started when all jobs have completed (successfully or failed). For larger tasks, a first tail stage collects all remaining input data from the first 50% of completed jobs, followed by a stage that processes data when 80% of jobs have completed, and finally a stage collecting leftover input data at 100% job completion.
    Failed tail jobs can be manually resubmitted by the users.

Revision 1022019-06-11 - TodorTrendafilovIvanov

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 296 to 296
 Yes, by doing both the following:
  1. indicating T2_CH_CERNBOX as storage location in CRAB configuration
  2. asking CERNBOX administrators (which are NOT in CMS) to grant proper permission to your DN
Added:
>
>
  Explanation:

Revision 1012019-02-25 - LeonardoCristella

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 12 to 12
 pre.note {background-color: white;}
Changed:
<
<
CRAB Logo
>
>
CRAB Logo
 

CRAB3 Frequently Asked Questions

Line: 17 to 17
 

CRAB3 Frequently Asked Questions

Complete: 3 Go to SWGuideCrab
Added:
>
>
 Help Notice: This is a large page, it works best if you search in it for your problem using the browser search function.
By default all answers are collapsed and search only uses the questions text. If you do not find what you need, you can use the buttons below to search inside answers as well:
Changed:
<
<
>
>
 
 
Changed:
<
<
Help if you still have problems, check the CRAB3Troubleshoot guide before asking for support
>
>
Help if you still have problems, check the CRAB3Troubleshoot guide before asking for support
 
Contents:
Line: 34 to 34
  Therefore you should remove credentials from myproxy and then issue the crab commad again. To remove stale credentials:
Changed:
<
<
>
>
 grep myproxy-info /crab.log
Changed:
<
<
# example: grep myproxy-info crab_20160308_140433/crab.log
>
>
# example: grep myproxy-info crab_20160308_140433/crab.log
 you will get something like
Changed:
<
<
 command: myproxy-info -l ec95456d3589ed395dc47d3ada8c94c67ee588f1 -s myproxy.cern.ch
>
>
 command: myproxy-info -l ec95456d3589ed395dc47d3ada8c94c67ee588f1 -s myproxy.cern.ch
  then simply issue a myproxy-destroy command with same arguments:
Changed:
<
<
>
>
 # example. In real life replace the long hex string with the one from your crab.log
Changed:
<
<
myproxy-destroy -l ec95456d3589ed395dc47d3ada8c94c67ee588f1 -s myproxy.cern.ch
>
>
myproxy-destroy -l ec95456d3589ed395dc47d3ada8c94c67ee588f1 -s myproxy.cern.ch
 
Changed:
<
<
If things still fail after than, send the following additional info in your request for support, replacing the long hex string with the one that you found in crab.log (ec95456d3589ed395dc47d3ada8c94c67ee588f1 in the above example):
>
>
If things still fail after than, send the following additional info in your request for support, replacing the long hex string with the one that you found in crab.log (ec95456d3589ed395dc47d3ada8c94c67ee588f1 in the above example):
 
  • output of voms-proxy-info -all
Changed:
<
<
  • output of myproxy-info -d -l -s myproxy.cern.ch
>
>
  • output of myproxy-info -d -l <long-hex-string> -s myproxy.cern.ch
 
  • content of you crab.log as an attachment
Added:
>
>
 
<!--/twistyPlugin-->

CRAB setup

Line: 61 to 61
 

Does CRAB setup conflict with CMSSW setup

<!--/twistyPlugin twikiMakeVisibleInline-->
Changed:
<
<
No. CRAB client runs within the CMSSW environment.
>
>
No. CRAB client runs within the CMSSW environment.
 Make sure you always do cmsenv before source /cvmfs/cms.cern.ch/crab3/crab.sh
<!--/twistyPlugin-->
Line: 81 to 80
 
<!--/twistyPlugin twikiMakeVisibleInline-->
CRAB requests by default a maximum memory of 2000 MB. This is the maximum memory per core all sites guarantee they will run. Some sites, but not many, offer a bit more (typically 2500 MB); and some sites even offer 4000 MB for special users. Memory limits offered by each site are accessible in the GlideinWMS VO Factory Monitor page, http://glidein.grid.iu.edu/factory/monitor/ (choose "Current Status of the Factory" and click on a site CE listed under "Entry Name" in the table), but this should not be considered a documentation. The best advice we can give is: stick to the default, and if you think you need more, find out first if there are sites (and which ones) which can run jobs in that case. If you need help, you can write to us.
Changed:
<
<
note.gif Note: In case of a multi-threaded job (config.JobType.numCores > 1) most likely the default memory value is not enough. The user share of computing resources accounts for the requested memory per core.
>
>
note.gif Note: In case of a multi-threaded job (config.JobType.numCores > 1) most likely the default memory value is not enough. The user share of computing resources accounts for the requested memory per core.
 
<!--/twistyPlugin-->
Line: 91 to 90
 The Data.splitting parameter has now a default value: 'Automatic'.
With such a setting the task processing is split into three stages:
  1. A "probe" stage, where some probe jobs are submitted to estimate the event throughput of the CMSSW parameter-set configuration provided by the user in the JobType.psetName parameter and possible further arguments. Probe jobs have a job id of the form 0-[1,2,3,...], they can not be resubmitted and the task will fail if none of the probe jobs complete successfully.
Changed:
<
<
  1. A "main" stage, very similar to the conventional stage for other splitting modes, in which a number of main jobs (automatically determined by the probe stage) will process the dataset. These jobs can not be manually resubmitted and have a fixed maximum runtime (specified in the Data.unitsPerJob parameter), after which they gracefully stop processing input data. The remaining data will be processed in the next stage (tail) and their jobs labelled as "rescheduled" in the main stage (in the dashboard they will always appear as "failed").
  2. Some possible "tail" stages. If some main job does not finish successfully ("rescheduled" in the previous stage) or does not completely process the amount of data assigned to it due to the automatically configured maximum job run time, tail jobs are created and submitted in order to fully process the dataset. Tail jobs have a job id of the form n-[1,2,3,...], where n=1,2,... represents the tail stage number. For small tasks, less than 100 jobs, one tail stage is started when all jobs have completed (successfully or failed). For larger tasks, a first tail stage collects all remaining input data from the first 50% of completed jobs, followed by a stage that processes data when 80% of jobs have completed, and finally a stage collecting leftover input data at 100% job completion.
    Failed tail jobs can be manually resubmitted by the users.
>
>
  1. A "main" stage, very similar to the conventional stage for other splitting modes, in which a number of main jobs (automatically determined by the probe stage) will process the dataset. These jobs can not be manually resubmitted and have a fixed maximum runtime (specified in the Data.unitsPerJob parameter), after which they gracefully stop processing input data. The remaining data will be processed in the next stage (tail) and their jobs labelled as "rescheduled" in the main stage (in the dashboard they will always appear as "failed").
  2. Some possible "tail" stages. If some main job does not finish successfully ("rescheduled" in the previous stage) or does not completely process the amount of data assigned to it due to the automatically configured maximum job run time, tail jobs are created and submitted in order to fully process the dataset. Tail jobs have a job id of the form n-[1,2,3,...], where n=1,2,... represents the tail stage number. For small tasks, less than 100 jobs, one tail stage is started when all jobs have completed (successfully or failed). For larger tasks, a first tail stage collects all remaining input data from the first 50% of completed jobs, followed by a stage that processes data when 80% of jobs have completed, and finally a stage collecting leftover input data at 100% job completion.
    Failed tail jobs can be manually resubmitted by the users.
  Once the probe stage is completed, the plain crab status command shows only the main and tail jobs. For the list of all jobs add the --long option. </>
<!--/twistyPlugin-->
Line: 143 to 141
 One can use the crab purge command to delete from the CRAB cache files associated to a given task. Actually, crab purge deletes only user input sandboxes (because there is no API to delete other files), but since they are supposed to be the main space consumers in the CRAB cache, this should be enough. If for some reason the crab purge command does not work, one can alternatively use the REST interface of the crabcache component. Instructions oriented for CRAB3 operators can be found here. Jordan Tucker has written the following script based on these instructions that removes all the input sandboxes from the user CRAB cache area (a valid proxy and the CRAB environment are required):

Show Hide script
<!--/twistyPlugin twikiMakeVisibleInline-->
Changed:
<
<
>
>
 #!/usr/bin/env python

import json

Line: 220 to 217
  if '.log' in x: continue print 'remove', x
Changed:
<
<
h.fileremove(x)
>
>
h.fileremove(x)
 
<!--/twistyPlugin-->

note.gif Note: Once a task has been submitted, one can safely delete the input sandbox from the CRAB cache, as the sandbox is transferred to the worker nodes from the schedulers.

Line: 250 to 247
 With CRAB3 this should not be any different than with CRAB2. CRAB will look up for the user's username registered in SiteDB (which is the username of the CERN primary account) using for the query the user's DN (which in turn is extracted from the user's credentials) and will try to stage out to /store/user/<username>/ (by default). If the store user area uses a different username, itís up to the destination site to remap that (via a symbolic link or something similar). The typical case is Fermilab; to request the mapping of the store user area, FNAL users should follow the directions on the usingEOSatLPC web page to open a ServiceNow ticket to get this fixed.

To prevent stage out failures, and in case the user has provided in the Data.outLFN parameter of the CRAB configuration file an LFN directory path of the kind /store/user/[<some-username>/<subdir>*] (i.e. a store path that starts with /store/user/), CRAB will check if some-username matches with the user's username extracted from SiteDB. If it doesn't, it will give an error message and not submit the task. The error message would be something like this:

Changed:
<
<
>
>
 Error contacting the server. Server answered with: Invalid input parameter
Changed:
<
<
Reason is: The parameter Data.outLFN in the CRAB configuration file must start with either '/store/user//' or '/store/group//' (or '/store/local//' if publication is off), wher...
>
>
Reason is: The parameter Data.outLFN in the CRAB configuration file must start with either '/store/user//' or '/store/group//' (or '/store/local//' if publication is off), wher...
  Unfortunately the "Reason is:" message it cut at 200 characters. The message should read:
Changed:
<
<
Reason is: The parameter Data.outLFN in the CRAB configuration file must start with either '/store/user/<username>/' or '/store/group/<groupname>/' (or '/store/local/<something>/' if publication is off), where username is your username as registered in SiteDB (i.e. the username of your CERN primary account).
>
>
Reason is: The parameter Data.outLFN in the CRAB configuration file must start with either '/store/user/<username>/' or '/store/group/<groupname>/' (or '/store/local/<something>/' if publication is off), where username is your username as registered in SiteDB (i.e. the username of your CERN primary account).
  A similar message should be given by crab checkwrite if the user does crab checkwrite --site=<CMS-site-name> --lfn=/store/user/<some-username>. </>
<!--/twistyPlugin-->
Line: 267 to 266
 
<!--/twistyPlugin twikiMakeVisibleInline-->
First of all, does CRAB know at all that the job should produce the output file in question? To check that, open one of the job log files linked from the task monitoring pages. Very close to the top it is printed the list of output files that CRAB expects to see once the job finishes (shown below is the case of job number 1 in the task):
Changed:
<
<
>
>
 ==== HTCONDOR JOB SUMMARY at ... START ==== CRAB ID: 1 Execution site: ... Current hostname: ... Destination site: ...
Changed:
<
<
Output files: my_output_file.root=my_output_file_1.root
>
>
Output files: my_output_file.root=my_output_file_1.root
  If the output file in question doesn't appear in that list, then CRAB doesn't know about it, and of course it will not be transferred. This doesn't mean that the output file was not produced; it is simply that CRAB has to know beforehand what are the output files that the job produces.

If the output file is produced by either PoolOutputModule or TFileService, CRAB will automatically recognize the name of the output file when the user submits the task and it will add the output file name to the list of expected output files. On the other hand, if the output file is produced by any other module, the user has to specify the output file name in the CRAB configuration parameter JobType.outputFiles in order for CRAB to know about it. Note that this parameter takes a python list, so the right way to specify it is:

Deleted:
<
<
config.JobType.outputFiles = ['my_output_file.root']
 
Added:
>
>
config.JobType.outputFiles = ['my_output_file.root']
 
<!--/twistyPlugin-->

Can I delete a dataset I published in DBS?

Line: 317 to 319
 crab checkusername uses the following sequence of bash commands, which you should try to execute one by one (make sure you have a valid proxy) to check if they return what is expected.

1) It gets the path to the users proxy file with the command

Changed:
<
<
which scram >/dev/null 2>&1 && eval `scram unsetenv -sh`; voms-proxy-info -path
>
>
which scram >/dev/null 2>&1 && eval `scram unsetenv -sh`; voms-proxy-info -path
  which should return something like
Changed:
<
<
/tmp/x509up_u57506
>
>
/tmp/x509up_u57506
  2) It defines the path to the CA certificates directory with the following python command
Changed:
<
<
>
>
 import os capath = os.environ['X509_CERT_DIR'] if 'X509_CERT_DIR' in os.environ else "/etc/grid-security/certificates"
Changed:
<
<
print capath
>
>
print capath
  which should be equivalent to the following bash command
Changed:
<
<
>
>
 if [ "x$X509_CERT_DIR" = "x" ]; then capath=$X509_CERT_DIR; else capath=/etc/grid-security/certificates; fi
Changed:
<
<
echo $capath
>
>
echo $capath
  and which in lxplus should result in
Changed:
<
<
/etc/grid-security/certificates
>
>
/etc/grid-security/certificates
  3) It uses the proxy file and the capath to query https://cmsweb.cern.ch/sitedb/data/prod/whoami
Changed:
<
<
curl -s --capath <output-from-command-2-above> --cert <output-from-command-1-above> --key <output-from-command-1-above> 'https://cmsweb.cern.ch/sitedb/data/prod/whoami'
>
>
curl -s --capath <output-from-command-2-above> --cert <output-from-command-1-above> --key <output-from-command-1-above> 'https://cmsweb.cern.ch/sitedb/data/prod/whoami'
  which should return something like
Changed:
<
<
>
>
 {"result": [ {"dn": "/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=atanasi/CN=710186/CN=Andres Jorge Tanasijczuk", "login": "atanasi", "method": "X509Proxy", "roles": {"operator": {"group": ["crab3"], "site": []}}, "name": "Andres Jorge Tanasijczuk"}
Changed:
<
<
]}
>
>
]}
  4) Finally it parses the output from the above query to extract the username from the "login" field (in my case it is atanasi).
Line: 360 to 378
 
<!--/twistyPlugin twikiMakeVisibleInline-->
When doing crab submit the user may get this error messages:
Changed:
<
<
>
>
 Error contacting the server. Server answered with: Invalid input parameter
Changed:
<
<
Reason is: User quota limit reached; cannot upload the file
>
>
Reason is: User quota limit reached; cannot upload the file
  Error explanation: The user has reached the limit of 4.88GB in its CRAB cache area. Read more in this FAQ.
Line: 375 to 395
 
<!--/twistyPlugin twikiMakeVisibleInline-->
Typical error in crab status:
Changed:
<
<
>
>
 Failure message: The CRAB server backend was not able to (re)submit the task, because the Grid scheduler answered with an error. This is probably a temporary glitch. Please try again later. If the error persists send an e-mail to hn-cms-computing-tools@cern.ch<mailto:hn-cms-computing-tools@cern.ch>. Error reason: Trapped exception in Dagman.Fork: Unable to edit jobs matching constraint File "/data/srv/TaskManager/3.3.1512.rc6/slc6_amd64_gcc481/cms/crabtaskworker/3.3.1512.rc6/lib/python2.6/site-packages/TaskWorker/Actions/DagmanResubmitter.py", line 113, in executeInternal
Changed:
<
<
schedd.edit(rootConst, "HoldKillSig", 'SIGKILL')
>
>
schedd.edit(rootConst, "HoldKillSig", 'SIGKILL')
  As the error message says, this should be a temporary failure. One should just keep trying until it works. But after doing crab resubmit, give it some time to process the resubmission request; it may take a couple of minutes to see the jobs reacting to the resubmission.
<!--/twistyPlugin-->
Line: 387 to 409
 
<!--/twistyPlugin twikiMakeVisibleInline-->
After doing crab submit and crab status the user may get this error message:
Changed:
<
<
>
>
 Task status: UNKNOWN
Changed:
<
<
Error during task injection: Task failed to bootstrap on schedd
>
>
Error during task injection: Task failed to bootstrap on schedd
  Error explanation: The submission of the task to the scheduler machine has failed.
Line: 401 to 426
 
<!--/twistyPlugin twikiMakeVisibleInline-->
When doing crab status the user may get one of these error messages:
Changed:
<
<
>
>
 Error during task injection: <task-name>: Failed to contact Schedd: Failed to fetch ads from schedd.
Added:
>
>

Error during task information retrieval:        <task-name>: Failed to contact Schedd: .
 
Changed:
<
<
Error during task information retrieval: <task-name>: Failed to contact Schedd: . Error explanation: This is a temporary communication error with the scheduler machine (submission node), most probably because the scheduler is overloaded.
>
>
Error explanation: This is a temporary communication error with the scheduler machine (submission node), most probably because the scheduler is overloaded.
  What to do: Try again after a couple of minutes.
<!--/twistyPlugin-->
Line: 435 to 466
 Some more discussion is in this thread: https://hypernews.cern.ch/HyperNews/CMS/get/computing-tools/2928.html

There are a few datasets in DBS which do no satisfy this limit, if someone really needs to process those, the only way is to do one job per file using the userInputFiles feature of CRAB. An annotated example of how to do this in python is below, note that you have to disable DBS publication, indicate split by file and provide input file locations, other configuaration parameters can be set as usual:

Changed:
<
<
>
>
  # this will use CRAB client API from RawCommand import crabCommand
Line: 488 to 520
  result = crabCommand('submit', config = config)
Changed:
<
<
print (result)
>
>
print (result)
 
<!--/twistyPlugin-->
Line: 504 to 537
 Those dataset can only be processed if CRAB can ignore the lumi-list information, i.e. using `config.Data.splitting = 'FileBased' and avoiding any extra request which would eventually result in the need to use lumi information. This means no run range, no lumi mask, and no secondary dataset (since CRAB will need to use lum info to match input files from the two datasets). Note that useParent is allowed since in that case CRAB uses parentage information stored in DBS to match input files.

In practice your crabConfig file must have:

Changed:
<
<
>
>
 config.Data.splitting = 'FileBased' config.Data.runRange = ''
Changed:
<
<
config.Data.lumiMask = ''
>
>
config.Data.lumiMask = ''
  ( the paremeters with an assigned null value `` can be omitted, but if present must indicate the null string )

and must NOT contain the following parameter

Changed:
<
<
config.Data.secondaryInputDataset
>
>
config.Data.secondaryInputDataset
 
<!--/twistyPlugin-->
Line: 523 to 561
 It is important that you as a user are prepared for this to happen and know how to remain productive in your physics analysis with the least effort. While there is a long tradition of "resubmit them until they work", this is hardly useful any more. And while we can't prevent users from trying we can not guarantee that it will work, nor that resubmitted jobs will succeed.
Changed:
<
<
In the case where the missing data sample is important, the best recommendation we can give to users is to
USE GENERAL/GENERIC RESCUE PROCEDURES, RATHER THEN TRY AT ALL COSTS TO REVIVE A DEAD TASK.
>
>
In the case where the missing data sample is important, the best recommendation we can give to users is to
USE GENERAL/GENERIC RESCUE PROCEDURES, RATHER THEN TRY AT ALL COSTS TO REVIVE A DEAD TASK.
 We will always welcome problem reports and will try to improve when resubmission failures can be due to CRAB internals, but surely you do not want to hold your breath in the meanwhile.

The safest path is therefore:

Changed:
<
<
  1. let running jobs die or complete and dust settle
  2. use crab klill to make sure everything stops
  3. take stock of what's published in DBS at that point and make sure that it matches what's on disk
>
>
  1. let running jobs die or complete and dust settle
  2. use crab klill to make sure everything stops
  3. take stock of what's published in DBS at that point and make sure that it matches what's on disk
 
    • if your output is not in DBS, you can use crab report, but while DBS information is available forever, crab commands on a specific task may not
Changed:
<
<
  1. assess whether it is more important to get the last percentage of statistics or go on with other work. Do you really need 100% completion in this task ?
  2. if full statistics is needed, create a recovery task for the missing lumis and run it writing to the same dataset as the original task.
>
>
  1. assess whether it is more important to get the last percentage of statistics or go on with other work. Do you really need 100% completion in this task ?
  2. if full statistics is needed, create a recovery task for the missing lumis and run it writing to the same dataset as the original task.
 
Changed:
<
<
Recovery task is an important concept that can be useful in many circumstances. Please find instructions in this FAQ
>
>
Recovery task is an important concept that can be useful in many circumstances. Please find instructions in this FAQ
  At the other extreme there's: forget about this, and resubmit a new task with new output dataset. In between it is a murky land where many recipes may be more efficient according to details, but no general simple rule can be given and there's space for individual creativity and/or desperation.
Line: 545 to 584
 
<!--/twistyPlugin twikiMakeVisibleInline-->
When doing crab submit the user may get one of these error messages:
Changed:
<
<
>
>
 Syntax error in CRAB configuration: invalid syntax (<CRAB-configuration-file-name>.py, <line-where-error-occurred>)
Added:
>
>
 
Added:
>
>
 Syntax error in CRAB configuration:
Changed:
<
<
'Configuration' object has no attribute '<attribute-name>' Error explanation: The CRAB configuration file could not be loaded, because there is a syntax error somewhere in it.
>
>
'Configuration' object has no attribute '<attribute-name>'

Error explanation: The CRAB configuration file could not be loaded, because there is a syntax error somewhere in it.

  What to do: Check the CRAB configuration file and fix it. There could be a misspelled parameter or section name, or you could be trying to use a configuration attribute (parameter or section) that was not defined. To get more details on where the error occurred, do:
Changed:
<
<
>
>
 python
Changed:
<
<
import <CRAB-configuration-file-name> #without the '.py'
>
>
import <CRAB-configuration-file-name> #without the '.py'
  which gives:
Changed:
<
<
>
>
 Traceback (most recent call last): File "", line 1, in File "<CRAB-configuration-file-name>.py", <line-where-error-occurred> <error-python-code>
Changed:
<
<
^
>
>
^
  or
Changed:
<
<
>
>
 Traceback (most recent call last): File "", line 1, in File "<CRAB-configuration-file-name>.py", <line-where-error-occurred>, in <error-python-code>
Changed:
<
<
AttributeError: 'Configuration' object has no attribute '<attribute-name>'
>
>
AttributeError: 'Configuration' object has no attribute '<attribute-name>'
  For more information about the CRAB configuration file, see CRAB3ConfigurationFile.
<!--/twistyPlugin-->
Line: 595 to 645
 You should inspect the stdout of one job to find the exception message and traceback which may guide you to the solution.

A particular case is when the exception says An exception of category 'DictionaryNotFound' occurred, like in this example:

Changed:
<
<
>
>
 
Begin Fatal Exception 08-Jun-2017 18:18:04 CEST----------------------- An exception of category 'DictionaryNotFound' occurred while [0] Constructing the EventProcessor Exception Message: No Dictionary for class: 'edm::Wrapper<edm::DetSetVector >'
Changed:
<
<

End Fatal Exception -------------------------------------------------
>
>

End Fatal Exception -------------------------------------------------
  in this case, most likely the input data have been produced with a CMSSW version not compatible with the one used in CRAB job. In general it's not supported reading data with a release older than what it was produced with.

To find out which release was used to produce a given dataser of file, adapt following examples to your situation:

Changed:
<
<
>
>
 belforte@lxplus045/~> dasgoclient --query "release dataset=/DoubleMuon/Run2016C-18Apr2017-v1/AOD" ["CMSSW_8_0_28"] belforte@lxplus045/~>
Added:
>
>
 
Added:
>
>
 belforte@lxplus045/~> dasgoclient --query "release file=/store/data/Run2016C/DoubleMuon/AOD/18Apr2017-v1/100001/56D1FA6E-D334-E711-9967-0025905A48B2.root" ["CMSSW_8_0_28"]
Changed:
<
<
belforte@lxplus045/~>
>
>
belforte@lxplus045/~>
 </>
<!--/twistyPlugin-->

Exit code 8028

Line: 630 to 685
  Leaving the AAA error aside for a while, the first thing to contemplate here is to understand why the file could not be loaded from the local storage. Was it because the file is not available at the execution site? And if so, was it supposed to be available? If not, can you force CRAB to submit the jobs to the site(s) where the file is hosted? CRAB should do that automatically if the input file is one from the input dataset specified in the CRAB configuration parameter Data.inputDataset, unless you have set Data.ignoreLocality = True, or except in cases like using a (secondary) pile-up dataset. If yours is the last case, please read Using pile-up in this same twiki.
Changed:
<
<
If you intentionally wanted (and had a good reason) to run jobs reading the input files via AAA, then yes, we have to care about why AAA failed. Since AAA must be able to access any CMS site, the next thing is to discard a transient problem: you can submit your jobs again and see if the error persists. Ultimately, you should write to the Computing Tools HyperNews forum (this is a forum for all kind of issues with CMS computing tools, not only CRAB) following these instructions.
>
>
If you intentionally wanted (and had a good reason) to run jobs reading the input files via AAA, then yes, we have to care about why AAA failed. Since AAA must be able to access any CMS site, the next thing is to discard a transient problem: you can submit your jobs again and see if the error persists. Ultimately, you should write to the Computing Tools HyperNews forum (this is a forum for all kind of issues with CMS computing tools, not only CRAB) following these instructions.
 </>
<!--/twistyPlugin-->
Added:
>
>
 

Exit code 50660

<!--/twistyPlugin twikiMakeVisibleInline-->
Line: 648 to 704
 
<!--/twistyPlugin twikiMakeVisibleInline-->
The most common reasons for this error are:
  1. The user is trying to analyze an input dataset, but he/she has specified in the CRAB configuration file JobType.pluginName = 'PrivateMC' instead of JobType.pluginName = 'Analysis'.
Changed:
<
<
  1. The user is generating MC events, correctly specifying in the CRAB configuration file JobType.pluginName = 'PrivateMC', but in the CMSSW parameter set configuration he/she has specified a source of type PoolSource. The solution is to not specify a PoolSource. Note: This doesn't mean to remove process.source completely, as this attribute must be present. One could set process.source = cms.Source("EmptySource") if no input source is used.
>
>
  1. The user is generating MC events, correctly specifying in the CRAB configuration file JobType.pluginName = 'PrivateMC', but in the CMSSW parameter set configuration he/she has specified a source of type PoolSource. The solution is to not specify a PoolSource. Note: This doesn't mean to remove process.source completely, as this attribute must be present. One could set process.source = cms.Source("EmptySource") if no input source is used.
 
<!--/twistyPlugin-->
Added:
>
>
 

[ERROR] Operation expired

<!--/twistyPlugin twikiMakeVisibleInline-->
Line: 654 to 711
 

[ERROR] Operation expired

<!--/twistyPlugin twikiMakeVisibleInline-->
Changed:
<
<
Some sites configuration can not handle remote access of large files (> 10 GB) and XRootD fails with a message like
>
>
Some sites configuration can not handle remote access of large files (> 10 GB) and XRootD fails with a message like
 == CMSSW: [1] Reading branch EventAuxiliary == CMSSW: [2] Calling XrdFile::readv() == CMSSW: Additional Info:
Changed:
<
<
[a] Original error: '[ERROR] Operation expired' (errno=0, code=206, source=xrootd.echo.stfc.ac.uk:1094 (site T1_UK_RAL)).

>
>
[a] Original error: '[ERROR] Operation expired' (errno=0, code=206, source=xrootd.echo.stfc.ac.uk:1094 (site T1_UK_RAL)).
 As of winter 2019 this almost only happens for files stored at T1_UK_RAL. If you are in this situation, a way out is to submit a new task using CMSSW ≥ 10_4_0 with the following duplicateCheckMode option in the PSet PoolSource
Changed:
<
<
>
>
 process.source = cms.Source("PoolSource", [...] duplicateCheckMode = cms.untracked.string("noDuplicateCheck")
Changed:
<
<
)
>
>
)
  When that is not an option and the problem is persistent, you may need to ask for a replica of the data at another site.
Line: 686 to 740
 The general problem is that CMSSW parameter-set configurations don't like to be loaded twice. In that respect, each time the CRAB client loads a CMSSW configuration, it saves it in a local (temporary) cache identifying the loaded module with a key constructed out of the following three pieces: the full path to the module and the python variables sys.path and sys.argv.

A problem arises when the CRAB configuration parameter JobType.pyCfgParams is used. The arguments in JobType.pyCfgParams are added by CRAB to sys.argv, affecting the value of the key that identifies a CMSSW parameter-set in the above mentioned cache. And that's in principle fine, as changing the arguments passed to the CMSSW parameter-set may change the event processor. But when a python process has to do more than one submission (like the case of multicrab for multiple submissions), the CMSSW parameter-set is loaded again every time the JobType.pyCfgParams is changed and this may result in "duplicate process" errors. Below are two examples of these kind of errors:

Changed:
<
<
>
>
 CmsRunFailure CMSSW error message follows. Fatal Exception An exception of category 'Configuration' occurred while
Changed:
<
<
[0] Constructing the EventProcessor [1] Constructing module: class=...... label=......
>
>
† †[0] Constructing the EventProcessor † †[1] Constructing module: class=...... label=......
 Exception Message: Duplicate Process The process name ...... was previously used on these products. Please modify the configuration file to use a distinct process name.
Added:
>
>
 
Added:
>
>
 CmsRunFailure CMSSW error message follows. Fatal Exception
Line: 707 to 764
 in vString categories duplication of the string ...... The above are from MessageLogger configuration validation. In most cases, these involve lines that the logger configuration code
Changed:
<
<
would not process, but which the cfg creator obviously meant to have effect.
>
>
would not process, but which the cfg creator obviously meant to have effect.
 One option would be to try to not use JobType.pyCfgParams. But if this is not possible, the more general ad-hoc solution would be to fork the submission into a different python process. For example, if you are doing something like documented in Multicrab using the crabCommand API then we suggest to replace each
Changed:
<
<
submit(config)
>
>
submit(config)
  by
Changed:
<
<
>
>
 from multiprocessing import Process p = Process(target=submit, args=(config,)) p.start()
Changed:
<
<
p.join()
>
>
p.join()
  (Of course, from multiprocessing import Process needs to be executed only once, so put it outside any loop.)
<!--/twistyPlugin-->
Deleted:
<
<

Multiple submission produces different PSetDump.py files

 
Added:
>
>

Multiple submission produces different PSetDump.py files

 
<!--/twistyPlugin twikiMakeVisibleInline-->
If the PSetDump.py file (found in task_directory/inputs) differs for the tasks from a multiple-submission python file, try forking the submission into different python processes, as recommended in the previous FAQ.
<!--/twistyPlugin-->
Line: 747 to 807
 It is impossible to guarantee that a given task will always complete to 100% success in a short amount of time. At the same time it is impossible to make sure that all desired input data is available when the task is submitted. Moreover both good sense and experience show that the larger a task is, the larger is the chance it hits some problem. Large workflows therefore benefit from the possibility to run them sort of iteratively, with a short (hopefully one or two at most) succession of smaller and smaller tasks.
Deleted:
<
<

Recovery task: When

 
Added:
>
>

Recovery task: When

 A partial list of real life events where a recovery task is user's fastest and simplest way to get work done:
  • Something went wrong in the global infrastructure and some jobs are lost beyond recovery
  • Something went wrong inside CRAB (bugs, hardware...) which can't be fixed by crab resubmit command
Line: 772 to 833
 
  1. submit a new CRAB tasks B which process the missing lumis (listB = listIn - listA)

Details are slighly different if you published output in DBS or not:

Changed:
<
<
output in DBS
follow the procedure in this FAQ
output not in DBS
follow the procedure in this Workbook example
>
>
output in DBS
follow the procedure in this FAQ
output not in DBS
follow the procedure in this Workbook example
 
<!--/twistyPlugin-->
Line: 783 to 844
 While data taking is progressing, corresponding datasets in DBS and lumi-mask files are growing. Also data quality is sometimes improved for already existing data, leading to updated lumi-masks which compared to older lumi-masks include luminosity sections that were previously filtered out. Both of these situations lead to the common case where one would like to run a task (lets call it task B) over an input dataset partially analyzed already in a previous task (lets call it task A), where task B should skip the data already analyzed in task A.

This can be accomplished with a few lines in the CRAB configuration file, see an annotated example below.

Changed:
<
<
>
>
 from UserUtilities import config, getLumiListInValidFiles from LumiList import LumiList
Line: 816 to 878
 # and there we, process from input dataset all the lumi listed in the current officialLumiMask file, skipping the ones you already have. config.Data.lumiMask = 'my_lumi_mask.json' config.Data.outputDatasetTag = <TaskA-outputDatasetTag> # add to your existing dataset
Changed:
<
<
...
>
>
...
  IMPORTANT NOTE : in this way you will add any lumi section in the intial data set that was turned from bad to good in the golden list after you ran Task-A, but if some of those data evolved the other way around (from good to bad), there is no way to remove those from your published datasets.
Line: 858 to 922
 

Using pile-up

<!--/twistyPlugin twikiMakeVisibleInline-->
Changed:
<
<
Important Instructions:
Make sure you run your jobs at the site where the pile-up sample is. Not where the signal is.
>
>
Important Instructions:
Make sure you run your jobs at the site where the pile-up sample is. Not where the signal is.
 This requires you to overrdie the location list that CRAB would extract from the inputDataset.
Changed:
<
<
Rational and details:
The pile-up files have be specified in the CMSSW parameter-set configuration file. There is no way yet to tell in the CRAB configuration file that one wants to use a pile-up dataset as a secondary input dataset. That means that CRAB doesn't know that the CMSSW code will want to access pile-up files; CRAB only knows about the primary input dataset (if any). This means that, assuming there is a primary input dataset, when CRAB does data discovery to figure out to which sites should it submit the jobs, it will only take into account the input dataset specified in the CRAB configuration file (in the Data.inputDataset parameter) and submit the jobs to sites where this dataset is hosted. If there is no primary input dataset, CRAB will submit the jobs to the less busy sites. In any case, if the pile-up files are not hosted in the execution sites, they will be accessed via AAA (Xrootd). But reading the "signal" events directly from the local storage and the pile-up events via AAA is more inefficient than doing the other way around, since for each "signal" event that is read one needs to read in general many (> 20) pile-up events. Therefore, it is highly recommended that the user forces CRAB to submit the jobs to the sites where the pile-up dataset is hosted by whitelisting these sites using the parameter Site.whitelist in the CRAB configuration file. Note that one also needs to set Data.ignoreLocality = True in the CRAB configuration file in case of using a primary input dataset so to avoid CRAB doing data discovery and eventually complain (and fail to submit) that the input dataset is not available in the whitelisted sites. One can use DAS to get the list of sites that host a dataset.
>
>
Rational and details:
The pile-up files have be specified in the CMSSW parameter-set configuration file. There is no way yet to tell in the CRAB configuration file that one wants to use a pile-up dataset as a secondary input dataset. That means that CRAB doesn't know that the CMSSW code will want to access pile-up files; CRAB only knows about the primary input dataset (if any). This means that, assuming there is a primary input dataset, when CRAB does data discovery to figure out to which sites should it submit the jobs, it will only take into account the input dataset specified in the CRAB configuration file (in the Data.inputDataset parameter) and submit the jobs to sites where this dataset is hosted. If there is no primary input dataset, CRAB will submit the jobs to the less busy sites. In any case, if the pile-up files are not hosted in the execution sites, they will be accessed via AAA (Xrootd). But reading the "signal" events directly from the local storage and the pile-up events via AAA is more inefficient than doing the other way around, since for each "signal" event that is read one needs to read in general many (> 20) pile-up events. Therefore, it is highly recommended that the user forces CRAB to submit the jobs to the sites where the pile-up dataset is hosted by whitelisting these sites using the parameter Site.whitelist in the CRAB configuration file. Note that one also needs to set Data.ignoreLocality = True in the CRAB configuration file in case of using a primary input dataset so to avoid CRAB doing data discovery and eventually complain (and fail to submit) that the input dataset is not available in the whitelisted sites. One can use DAS to get the list of sites that host a dataset.
 
<!--/twistyPlugin-->
Added:
>
>
 

Miscellanea

How CRAB finds data in input datasets from DBS

Line: 904 to 969
 LFNs are names like /store/user/mario/myoutput; note that a directory is also a file name.

For example, for the LFN /store/user/username/myfile.root stored in T2_IT_Pisa you can do the following (make sure you did cmsenv before, so to use a new version of curl), where you can replace the first two lines with the values which are useful to you and simply copy/paste the long curl command:

Changed:
<
<
>
>
 site=T2_IT_Pisa lfn=/store/user/username/myfile.root
Changed:
<
<
curl -ks "https://cmsweb.cern.ch/phedex/datasvc/perl/prod/lfn2pfn?node=${site}&lfn=${lfn}&protocol=srmv2" | grep PFN | cut -d "'" -f4
>
>
curl -ks "https://cmsweb.cern.ch/phedex/datasvc/perl/prod/lfn2pfn?node=${site}&lfn=${lfn}&protocol=srmv2" | grep PFN | cut -d "'" -f4
  which returns:
Changed:
<
<
srm://stormfe1.pi.infn.it:8444/srm/managerv2?SFN=/cms/store/user/username/myfile.root
>
>
srm://stormfe1.pi.infn.it:8444/srm/managerv2?SFN=/cms/store/user/username/myfile.root
 
<!--
To see full details, you call the PhEDEx API passing the following query data: {'protocol': 'srmv2', 'node': '<CMS-node-name>', 'lfn': '<logical-path-name>'}. For example, for the LFN /store/user/username/myfile.root stored in T2_IT_Pisa, the URL to query the API would be https://cmsweb.cern.ch/phedex/datasvc/xml/prod/lfn2pfn?protocol=srmv2&node=T2_IT_Pisa&lfn=/store/user/username/myfile.root, which would show an output in xml format as this:
Line: 928 to 997
 
-->

Before executing the gfal commands, make sure to have a valid proxy:

Changed:
<
<
voms-proxy-init -voms cms
>
>
voms-proxy-init -voms cms
 Enter GRID pass phrase for this identity: Contacting voms2.cern.ch:15002 [/DC=ch/DC=cern/OU=computers/CN=voms2.cern.ch] "cms"... Remote VOMS server contacted succesfully.
Line: 937 to 1010
  Created proxy in /tmp/x509up_u<user-id>.
Changed:
<
<
Your proxy is valid until
>
>
Your proxy is valid until
  The most useful gfal commands and their usage syntax for listing/removing/copying files/directories are in the examples below (it is recommended to unset the environment when executing gfal commands, i.e. to add env -i in front of the commands). See also the man entry for each command (man gfal-ls etc.):

List a (remote) path:

Changed:
<
<
env -i X509_USER_PROXY=/tmp/x509up_u$UID gfal-ls <physical-path-name-to-directory>
>
>
env -i X509_USER_PROXY=/tmp/x509up_u$UID gfal-ls <physical-path-name-to-directory>
  Remove a (remote) file:
Changed:
<
<
env -i X509_USER_PROXY=/tmp/x509up_u$UID gfal-rm <physical-path-name-to-file>
>
>
env -i X509_USER_PROXY=/tmp/x509up_u$UID gfal-rm <physical-path-name-to-file>
  Recursively remove a (remote) directory and all files in it:
Changed:
<
<
env -i X509_USER_PROXY=/tmp/x509up_u$UID gfal-rm -r <physical-path-name-to-directory>
>
>
env -i X509_USER_PROXY=/tmp/x509up_u$UID gfal-rm -r <physical-path-name-to-directory>
  Copy a (remote) file to a directory in the local machine:
Deleted:
<
<
env -i X509_USER_PROXY=/tmp/x509up_u$UID gfal-copy <physical-path-name-to-source-file> file://<absolute-path-to-local-destination-directory>
 
Changed:
<
<
Note: the starts with / therefore there are three consecutive / characters like file:///tmp/somefilename.root
>
>
env -i X509_USER_PROXY=/tmp/x509up_u$UID gfal-copy <physical-path-name-to-source-file> file://<absolute-path-to-local-destination-directory>
Note: the <absolute-path-to-local-destination-directory> starts with / therefore there are three consecutive / characters like file:///tmp/somefilename.root
 
<!--/twistyPlugin-->

Why are my jobs submitted to a site that I had explicitly blacklisted (not whitelisted)?

Line: 965 to 1046
 
<!--/twistyPlugin twikiMakeVisibleInline-->
There is a site overflow mechanism in place, which takes place after CRAB submission. Sites are divided in regions of good WAN/xrootd connectivity (e.g. US, Italy, Germany etc.), then jobs queued at one site A for too long are allowed to overflow to a well connected site B which does not host the requested input data but from where data will be read over xrootd. Rationale is that even if those jobs were to fail due to unable to read data or a problem in site B, they will be automatically resubmitted, so nothing is lost with respect to keeping those jobs idle in the queue waiting for free slots at site A. The site overflow can be turned off via the Debug.extraJDL CRAB configuration parameter :
Changed:
<
<
>
>
 config.section_("Debug")
Changed:
<
<
config.Debug.extraJDL = ['+CMS_ALLOW_OVERFLOW=False']
>
>
config.Debug.extraJDL = ['+CMS_ALLOW_OVERFLOW=False']
  Note: if you change this configuration option for an already-created task (for instance if you noticed a lot of job failures at a particular site and even after blacklisting the jobs keep going back), you can't simply change the option in the configuration and resubmit. You'll have to kill the existing task and make a new task to get the option to be accepted. You can't simply change it during resubmission.
<!--/twistyPlugin-->
Line: 982 to 1064
 There is a tool written in python called LumiList.py (available in the WMCore library; is the same code as cmssw/FWCore/PythonUtilities/python/LumiList.py ) that can be used to do lumi-mask arithmetics. The arithmetics can even be done inside the CRAB configuration file (that's the advantage of having the configuration file written in python). Below are some examples.

Example 1: A run range selection can be achieved by selecting from the original lumi-mask file the run range of interest.

Changed:
<
<
>
>
 from LumiList import LumiList

lumiList = LumiList(filename='my_original_lumi_mask.json') lumiList.selectRuns([x for x in range(193093,193999+1)]) lumiList.writeJSON('my_lumi_mask.json')

Changed:
<
<
config.Data.lumiMask = 'my_lumi_mask.json'
>
>
config.Data.lumiMask = 'my_lumi_mask.json'
  Example 2: Use a new lumi-mask file that is the intersection of two other lumi-mask files.
Changed:
<
<
>
>
 from LumiList import LumiList

originalLumiList1 = LumiList(filename='my_original_lumi_mask_1.json')

Line: 1000 to 1085
 newLumiList = originalLumiList1 & originalLumiList2 newLumiList.writeJSON('my_lumi_mask.json')
Changed:
<
<
config.Data.lumiMask = 'my_lumi_mask.json'
>
>
config.Data.lumiMask = 'my_lumi_mask.json'
  Example 3: Use a new lumi-mask file that is the union of two other lumi-mask files.
Changed:
<
<
>
>
 from LumiList import LumiList

originalLumiList1 = LumiList(filename='my_original_lumi_mask_1.json')

Line: 1011 to 1098
 newLumiList = originalLumiList1 | originalLumiList2 newLumiList.writeJSON('my_lumi_mask.json')
Changed:
<
<
config.Data.lumiMask = 'my_lumi_mask.json'
>
>
config.Data.lumiMask = 'my_lumi_mask.json'
  Example 4: Use a new lumi-mask file that is the subtraction of two other lumi-mask files.
Changed:
<
<
>
>
 from LumiList import LumiList

originalLumiList1 = LumiList(filename='my_original_lumi_mask_1.json')

Line: 1022 to 1111
 newLumiList = originalLumiList1 - originalLumiList2 newLumiList.writeJSON('my_lumi_mask.json')
Changed:
<
<
config.Data.lumiMask = 'my_lumi_mask.json'
>
>
config.Data.lumiMask = 'my_lumi_mask.json'
 </>
<!--/twistyPlugin-->

User quota in the CRAB scheduler machines

<!--/twistyPlugin twikiMakeVisibleInline-->
Each user has a home directory with 100GB of disk space in each of the scheduler machines (schedd for short) assigned to CRAB3 for submitting jobs to the Grid. Whenever a task is submitted by the CRAB server to a schedd, a task directory is created in this space containing among other things CRAB libraries and scripts needed to run the jobs. Log files from Condor/DAGMan and CRAB itself are also placed there. (What is not available in the schedds are the cmsRun log files, except for the snippet available in the CRAB job log file.) As a guidance, a task with 100 jobs uses on average 50MB of space, but this number depends a lot on the number of resubmissions, since each resubmission produces its log files. If a user reaches his/her quota in a given schedd, he/she will not be able to submit more jobs via that schedd (he/she may still be able to submit via other schedd, but since the user can not choose the schedd to which to submit -the choice is done by the CRAB server-, he/she would have to keep trying the submission until the task goes to a schedd with non-exahusted quota). To avoid that, task directories are automatically removed from the schedds after 30 days of their last modification. If a user reaches 50% of its quota in a given schedd, an automatic e-mail similar to the one shown below is sent to him/her.
Changed:
<
<
>
>
 Subject: WARNING: Reaching your quota

Dear analysis user ,

Line: 1043 to 1133
  https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideCrabFaq#Disk_space_for_output_files If you have any questions, please contact hn-cms-computing-tools(AT)cern.ch Regards,
Changed:
<
<
CRAB support
>
>
CRAB support
  This e-mail has a link (https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideCrabFaq#How_to_clean_up_your_directory_i) to the instructions on how to clean up space in the user's home directory in a schedd. A user can follow the instructions in that page, or alternatively use the crab purge command.
<!--/twistyPlugin-->
Line: 1054 to 1145
 
<!--/twistyPlugin twikiMakeVisibleInline-->
To overcome the CRAB3 vs CMSSW environment conflicts, you can use the following script available in CVMFS (/cvmfs/cms.cern.ch/crab3/crab-env-bootstrap.sh) without need to source the CRAB3 environment. You could do something like this:
Changed:
<
<
>
>
 cmsenv # DO NOT setup the CRAB3 environment alias crab='/cvmfs/cms.cern.ch/crab3/crab-env-bootstrap.sh' crab submit crab status ...
Changed:
<
<
# check that you can run cmsRun locally
>
>
# check that you can run cmsRun locally
  Details:
Line: 1074 to 1167
 
<!--/twistyPlugin twikiMakeVisibleInline-->
The problematic pset is FWCore/ParameterSet/Mixins.py from CMSSW:
Deleted:
<
<
[line 714] p = tLPTest("MyType",** { "a"+str(x): tLPTestType(x) for x in xrange(0,300) } )
 
Changed:
<
<
This uses dictionary comprehensions, a feature available in python > 2.7. While CMSSW (setup via cmsenv) uses python > 2.7, CRAB (setup via /cvmfs/cms.cern.ch/crab3/crab.sh) still uses python 2.6.8. To overcome this problem, don't setup the CRAB environment and instead use the crab-env-bootstrap.sh script (see this FAQ).
>
>
[line 714] p = tLPTest("MyType",** { "a"+str(x): tLPTestType(x) for x in xrange(0,300) } )

This uses dictionary comprehensions, a feature available in python > 2.7. While CMSSW (setup via cmsenv) uses python > 2.7, CRAB (setup via /cvmfs/cms.cern.ch/crab3/crab.sh) still uses python 2.6.8. To overcome this problem, don't setup the CRAB environment and instead use the crab-env-bootstrap.sh script (see this FAQ).

 
<!--/twistyPlugin-->

Revision 1002019-02-25 - LeonardoCristella

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 12 to 12
 pre.note {background-color: white;}
Changed:
<
<
CRAB Logo
>
>
CRAB Logo
 

CRAB3 Frequently Asked Questions

Line: 17 to 17
 

CRAB3 Frequently Asked Questions

Complete: 3 Go to SWGuideCrab
Deleted:
<
<
 Help Notice: This is a large page, it works best if you search in it for your problem using the browser search function.
By default all answers are collapsed and search only uses the questions text. If you do not find what you need, you can use the buttons below to search inside answers as well:
Changed:
<
<
 
>
>
 
Changed:
<
<
Help if you still have problems, check the CRAB3Troubleshoot guide before asking for support
>
>
Help if you still have problems, check the CRAB3Troubleshoot guide before asking for support
 
Contents:
Line: 34 to 34
  Therefore you should remove credentials from myproxy and then issue the crab commad again. To remove stale credentials:
Changed:
<
<
>
>
 grep myproxy-info /crab.log
Changed:
<
<
# example: grep myproxy-info crab_20160308_140433/crab.log
>
>
# example: grep myproxy-info crab_20160308_140433/crab.log
 you will get something like
Changed:
<
<
 command: myproxy-info -l ec95456d3589ed395dc47d3ada8c94c67ee588f1 -s myproxy.cern.ch
>
>
 command: myproxy-info -l ec95456d3589ed395dc47d3ada8c94c67ee588f1 -s myproxy.cern.ch
  then simply issue a myproxy-destroy command with same arguments:
Changed:
<
<
>
>
 # example. In real life replace the long hex string with the one from your crab.log
Changed:
<
<
myproxy-destroy -l ec95456d3589ed395dc47d3ada8c94c67ee588f1 -s myproxy.cern.ch
>
>
myproxy-destroy -l ec95456d3589ed395dc47d3ada8c94c67ee588f1 -s myproxy.cern.ch
 
Changed:
<
<
If things still fail after than, send the following additional info in your request for support, replacing the long hex string with the one that you found in crab.log (ec95456d3589ed395dc47d3ada8c94c67ee588f1 in the above example):
>
>
If things still fail after than, send the following additional info in your request for support, replacing the long hex string with the one that you found in crab.log (ec95456d3589ed395dc47d3ada8c94c67ee588f1 in the above example):
 
  • output of voms-proxy-info -all
Changed:
<
<
  • output of myproxy-info -d -l <long-hex-string> -s myproxy.cern.ch
>
>
  • output of myproxy-info -d -l -s myproxy.cern.ch
 
  • content of you crab.log as an attachment
Deleted:
<
<
 
<!--/twistyPlugin-->

CRAB setup

Line: 61 to 61
 

Does CRAB setup conflict with CMSSW setup

<!--/twistyPlugin twikiMakeVisibleInline-->
Changed:
<
<
No. CRAB client runs within the CMSSW environment.
>
>
No. CRAB client runs within the CMSSW environment.
 Make sure you always do cmsenv before source /cvmfs/cms.cern.ch/crab3/crab.sh
<!--/twistyPlugin-->
Line: 80 to 81
 
<!--/twistyPlugin twikiMakeVisibleInline-->
CRAB requests by default a maximum memory of 2000 MB. This is the maximum memory per core all sites guarantee they will run. Some sites, but not many, offer a bit more (typically 2500 MB); and some sites even offer 4000 MB for special users. Memory limits offered by each site are accessible in the GlideinWMS VO Factory Monitor page, http://glidein.grid.iu.edu/factory/monitor/ (choose "Current Status of the Factory" and click on a site CE listed under "Entry Name" in the table), but this should not be considered a documentation. The best advice we can give is: stick to the default, and if you think you need more, find out first if there are sites (and which ones) which can run jobs in that case. If you need help, you can write to us.
Changed:
<
<
note.gif Note: In case of a multi-threaded job (config.JobType.numCores > 1) most likely the default memory value is not enough. The user share of computing resources accounts for the requested memory per core.
>
>
note.gif Note: In case of a multi-threaded job (config.JobType.numCores > 1) most likely the default memory value is not enough. The user share of computing resources accounts for the requested memory per core.
 
<!--/twistyPlugin-->
Line: 90 to 91
 The Data.splitting parameter has now a default value: 'Automatic'.
With such a setting the task processing is split into three stages:
  1. A "probe" stage, where some probe jobs are submitted to estimate the event throughput of the CMSSW parameter-set configuration provided by the user in the JobType.psetName parameter and possible further arguments. Probe jobs have a job id of the form 0-[1,2,3,...], they can not be resubmitted and the task will fail if none of the probe jobs complete successfully.
Changed:
<
<
  1. A "main" stage, very similar to the conventional stage for other splitting modes, in which a number of main jobs (automatically determined by the probe stage) will process the dataset. These jobs can not be manually resubmitted and have a fixed maximum runtime (specified in the Data.unitsPerJob parameter), after which they gracefully stop processing input data. The remaining data will be processed in the next stage (tail) and their jobs labelled as "rescheduled" in the main stage (in the dashboard they will always appear as "failed").
  2. Some possible "tail" stages. If some main job does not finish successfully ("rescheduled" in the previous stage) or does not completely process the amount of data assigned to it due to the automatically configured maximum job run time, tail jobs are created and submitted in order to fully process the dataset. Tail jobs have a job id of the form n-[1,2,3,...], where n=1,2,... represents the tail stage number. For small tasks, less than 100 jobs, one tail stage is started when all jobs have completed (successfully or failed). For larger tasks, a first tail stage collects all remaining input data from the first 50% of completed jobs, followed by a stage that processes data when 80% of jobs have completed, and finally a stage collecting leftover input data at 100% job completion.
    Failed tail jobs can be manually resubmitted by the users.
>
>
  1. A "main" stage, very similar to the conventional stage for other splitting modes, in which a number of main jobs (automatically determined by the probe stage) will process the dataset. These jobs can not be manually resubmitted and have a fixed maximum runtime (specified in the Data.unitsPerJob parameter), after which they gracefully stop processing input data. The remaining data will be processed in the next stage (tail) and their jobs labelled as "rescheduled" in the main stage (in the dashboard they will always appear as "failed").
  2. Some possible "tail" stages. If some main job does not finish successfully ("rescheduled" in the previous stage) or does not completely process the amount of data assigned to it due to the automatically configured maximum job run time, tail jobs are created and submitted in order to fully process the dataset. Tail jobs have a job id of the form n-[1,2,3,...], where n=1,2,... represents the tail stage number. For small tasks, less than 100 jobs, one tail stage is started when all jobs have completed (successfully or failed). For larger tasks, a first tail stage collects all remaining input data from the first 50% of completed jobs, followed by a stage that processes data when 80% of jobs have completed, and finally a stage collecting leftover input data at 100% job completion.
    Failed tail jobs can be manually resubmitted by the users.
  Once the probe stage is completed, the plain crab status command shows only the main and tail jobs. For the list of all jobs add the --long option. </>
<!--/twistyPlugin-->
Line: 141 to 143
 One can use the crab purge command to delete from the CRAB cache files associated to a given task. Actually, crab purge deletes only user input sandboxes (because there is no API to delete other files), but since they are supposed to be the main space consumers in the CRAB cache, this should be enough. If for some reason the crab purge command does not work, one can alternatively use the REST interface of the crabcache component. Instructions oriented for CRAB3 operators can be found here. Jordan Tucker has written the following script based on these instructions that removes all the input sandboxes from the user CRAB cache area (a valid proxy and the CRAB environment are required):

Show Hide script
<!--/twistyPlugin twikiMakeVisibleInline-->
Changed:
<
<
>
>
 #!/usr/bin/env python

import json

Line: 217 to 220
  if '.log' in x: continue print 'remove', x
Changed:
<
<
h.fileremove(x)
>
>
h.fileremove(x)
 
<!--/twistyPlugin-->

note.gif Note: Once a task has been submitted, one can safely delete the input sandbox from the CRAB cache, as the sandbox is transferred to the worker nodes from the schedulers.

Line: 247 to 250
 With CRAB3 this should not be any different than with CRAB2. CRAB will look up for the user's username registered in SiteDB (which is the username of the CERN primary account) using for the query the user's DN (which in turn is extracted from the user's credentials) and will try to stage out to /store/user/<username>/ (by default). If the store user area uses a different username, itís up to the destination site to remap that (via a symbolic link or something similar). The typical case is Fermilab; to request the mapping of the store user area, FNAL users should follow the directions on the usingEOSatLPC web page to open a ServiceNow ticket to get this fixed.

To prevent stage out failures, and in case the user has provided in the Data.outLFN parameter of the CRAB configuration file an LFN directory path of the kind /store/user/[<some-username>/<subdir>*] (i.e. a store path that starts with /store/user/), CRAB will check if some-username matches with the user's username extracted from SiteDB. If it doesn't, it will give an error message and not submit the task. The error message would be something like this:

Changed:
<
<
>
>
 Error contacting the server. Server answered with: Invalid input parameter
Changed:
<
<
Reason is: The parameter Data.outLFN in the CRAB configuration file must start with either '/store/user//' or '/store/group//' (or '/store/local//' if publication is off), wher...
>
>
Reason is: The parameter Data.outLFN in the CRAB configuration file must start with either '/store/user//' or '/store/group//' (or '/store/local//' if publication is off), wher...
  Unfortunately the "Reason is:" message it cut at 200 characters. The message should read:
Changed:
<
<
Reason is: The parameter Data.outLFN in the CRAB configuration file must start with either '/store/user/<username>/' or '/store/group/<groupname>/' (or '/store/local/<something>/' if publication is off), where username is your username as registered in SiteDB (i.e. the username of your CERN primary account).
>
>
Reason is: The parameter Data.outLFN in the CRAB configuration file must start with either '/store/user/<username>/' or '/store/group/<groupname>/' (or '/store/local/<something>/' if publication is off), where username is your username as registered in SiteDB (i.e. the username of your CERN primary account).
  A similar message should be given by crab checkwrite if the user does crab checkwrite --site=<CMS-site-name> --lfn=/store/user/<some-username>. </>
<!--/twistyPlugin-->
Line: 266 to 267
 
<!--/twistyPlugin twikiMakeVisibleInline-->
First of all, does CRAB know at all that the job should produce the output file in question? To check that, open one of the job log files linked from the task monitoring pages. Very close to the top it is printed the list of output files that CRAB expects to see once the job finishes (shown below is the case of job number 1 in the task):
Changed:
<
<
>
>
 ==== HTCONDOR JOB SUMMARY at ... START ==== CRAB ID: 1 Execution site: ... Current hostname: ... Destination site: ...
Changed:
<
<
Output files: my_output_file.root=my_output_file_1.root
>
>
Output files: my_output_file.root=my_output_file_1.root
  If the output file in question doesn't appear in that list, then CRAB doesn't know about it, and of course it will not be transferred. This doesn't mean that the output file was not produced; it is simply that CRAB has to know beforehand what are the output files that the job produces.

If the output file is produced by either PoolOutputModule or TFileService, CRAB will automatically recognize the name of the output file when the user submits the task and it will add the output file name to the list of expected output files. On the other hand, if the output file is produced by any other module, the user has to specify the output file name in the CRAB configuration parameter JobType.outputFiles in order for CRAB to know about it. Note that this parameter takes a python list, so the right way to specify it is:

Added:
>
>
config.JobType.outputFiles = ['my_output_file.root']
 
Deleted:
<
<
config.JobType.outputFiles = ['my_output_file.root']
 
<!--/twistyPlugin-->

Can I delete a dataset I published in DBS?

Line: 319 to 317
 crab checkusername uses the following sequence of bash commands, which you should try to execute one by one (make sure you have a valid proxy) to check if they return what is expected.

1) It gets the path to the users proxy file with the command

Changed:
<
<
which scram >/dev/null 2>&1 && eval `scram unsetenv -sh`; voms-proxy-info -path
>
>
which scram >/dev/null 2>&1 && eval `scram unsetenv -sh`; voms-proxy-info -path
  which should return something like
Changed:
<
<
/tmp/x509up_u57506
>
>
/tmp/x509up_u57506
  2) It defines the path to the CA certificates directory with the following python command
Changed:
<
<
>
>
 import os capath = os.environ['X509_CERT_DIR'] if 'X509_CERT_DIR' in os.environ else "/etc/grid-security/certificates"
Changed:
<
<
print capath
>
>
print capath
  which should be equivalent to the following bash command
Changed:
<
<
>
>
 if [ "x$X509_CERT_DIR" = "x" ]; then capath=$X509_CERT_DIR; else capath=/etc/grid-security/certificates; fi
Changed:
<
<
echo $capath
>
>
echo $capath
  and which in lxplus should result in
Changed:
<
<
/etc/grid-security/certificates
>
>
/etc/grid-security/certificates
  3) It uses the proxy file and the capath to query https://cmsweb.cern.ch/sitedb/data/prod/whoami
Changed:
<
<
curl -s --capath <output-from-command-2-above> --cert <output-from-command-1-above> --key <output-from-command-1-above> 'https://cmsweb.cern.ch/sitedb/data/prod/whoami'
>
>
curl -s --capath <output-from-command-2-above> --cert <output-from-command-1-above> --key <output-from-command-1-above> 'https://cmsweb.cern.ch/sitedb/data/prod/whoami'
  which should return something like
Changed:
<
<
>
>
 {"result": [ {"dn": "/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=atanasi/CN=710186/CN=Andres Jorge Tanasijczuk", "login": "atanasi", "method": "X509Proxy", "roles": {"operator": {"group": ["crab3"], "site": []}}, "name": "Andres Jorge Tanasijczuk"}
Changed:
<
<
]}
>
>
]}
  4) Finally it parses the output from the above query to extract the username from the "login" field (in my case it is atanasi).
Line: 378 to 360
 
<!--/twistyPlugin twikiMakeVisibleInline-->
When doing crab submit the user may get this error messages:
Changed:
<
<
>
>
 Error contacting the server. Server answered with: Invalid input parameter
Changed:
<
<
Reason is: User quota limit reached; cannot upload the file
>
>
Reason is: User quota limit reached; cannot upload the file
  Error explanation: The user has reached the limit of 4.88GB in its CRAB cache area. Read more in this FAQ.
Line: 395 to 375
 
<!--/twistyPlugin twikiMakeVisibleInline-->
Typical error in crab status:
Changed:
<
<
>
>
 Failure message: The CRAB server backend was not able to (re)submit the task, because the Grid scheduler answered with an error. This is probably a temporary glitch. Please try again later. If the error persists send an e-mail to hn-cms-computing-tools@cern.ch<mailto:hn-cms-computing-tools@cern.ch>. Error reason: Trapped exception in Dagman.Fork: Unable to edit jobs matching constraint File "/data/srv/TaskManager/3.3.1512.rc6/slc6_amd64_gcc481/cms/crabtaskworker/3.3.1512.rc6/lib/python2.6/site-packages/TaskWorker/Actions/DagmanResubmitter.py", line 113, in executeInternal
Changed:
<
<
schedd.edit(rootConst, "HoldKillSig", 'SIGKILL')
>
>
schedd.edit(rootConst, "HoldKillSig", 'SIGKILL')
  As the error message says, this should be a temporary failure. One should just keep trying until it works. But after doing crab resubmit, give it some time to process the resubmission request; it may take a couple of minutes to see the jobs reacting to the resubmission.
<!--/twistyPlugin-->
Line: 409 to 387
 
<!--/twistyPlugin twikiMakeVisibleInline-->
After doing crab submit and crab status the user may get this error message:
Changed:
<
<
>
>
 Task status: UNKNOWN
Changed:
<
<
Error during task injection: Task failed to bootstrap on schedd
>
>
Error during task injection: Task failed to bootstrap on schedd
  Error explanation: The submission of the task to the scheduler machine has failed.
Line: 426 to 401
 
<!--/twistyPlugin twikiMakeVisibleInline-->
When doing crab status the user may get one of these error messages:
Changed:
<
<
>
>
 Error during task injection: <task-name>: Failed to contact Schedd: Failed to fetch ads from schedd.
Deleted:
<
<

Error during task information retrieval:        <task-name>: Failed to contact Schedd: .
 
Changed:
<
<
Error explanation: This is a temporary communication error with the scheduler machine (submission node), most probably because the scheduler is overloaded.
>
>
Error during task information retrieval: <task-name>: Failed to contact Schedd: . Error explanation: This is a temporary communication error with the scheduler machine (submission node), most probably because the scheduler is overloaded.
  What to do: Try again after a couple of minutes.
<!--/twistyPlugin-->
Line: 466 to 435
 Some more discussion is in this thread: https://hypernews.cern.ch/HyperNews/CMS/get/computing-tools/2928.html

There are a few datasets in DBS which do no satisfy this limit, if someone really needs to process those, the only way is to do one job per file using the userInputFiles feature of CRAB. An annotated example of how to do this in python is below, note that you have to disable DBS publication, indicate split by file and provide input file locations, other configuaration parameters can be set as usual:

Changed:
<
<
>
>
  # this will use CRAB client API from RawCommand import crabCommand
Line: 520 to 488
  result = crabCommand('submit', config = config)
Changed:
<
<
print (result)
>
>
print (result)
 
<!--/twistyPlugin-->
Line: 537 to 504
 Those dataset can only be processed if CRAB can ignore the lumi-list information, i.e. using `config.Data.splitting = 'FileBased' and avoiding any extra request which would eventually result in the need to use lumi information. This means no run range, no lumi mask, and no secondary dataset (since CRAB will need to use lum info to match input files from the two datasets). Note that useParent is allowed since in that case CRAB uses parentage information stored in DBS to match input files.

In practice your crabConfig file must have:

Changed:
<
<
>
>
 config.Data.splitting = 'FileBased' config.Data.runRange = ''
Changed:
<
<
config.Data.lumiMask = ''
>
>
config.Data.lumiMask = ''
  ( the paremeters with an assigned null value `` can be omitted, but if present must indicate the null string )

and must NOT contain the following parameter

Changed:
<
<
config.Data.secondaryInputDataset
>
>
config.Data.secondaryInputDataset
 
<!--/twistyPlugin-->
Line: 561 to 523
 It is important that you as a user are prepared for this to happen and know how to remain productive in your physics analysis with the least effort. While there is a long tradition of "resubmit them until they work", this is hardly useful any more. And while we can't prevent users from trying we can not guarantee that it will work, nor that resubmitted jobs will succeed.
Changed:
<
<
In the case where the missing data sample is important, the best recommendation we can give to users is to
USE GENERAL/GENERIC RESCUE PROCEDURES, RATHER THEN TRY AT ALL COSTS TO REVIVE A DEAD TASK.
>
>
In the case where the missing data sample is important, the best recommendation we can give to users is to
USE GENERAL/GENERIC RESCUE PROCEDURES, RATHER THEN TRY AT ALL COSTS TO REVIVE A DEAD TASK.
 We will always welcome problem reports and will try to improve when resubmission failures can be due to CRAB internals, but surely you do not want to hold your breath in the meanwhile.

The safest path is therefore:

Changed:
<
<
  1. let running jobs die or complete and dust settle
  2. use crab klill to make sure everything stops
  3. take stock of what's published in DBS at that point and make sure that it matches what's on disk
>
>
  1. let running jobs die or complete and dust settle
  2. use crab klill to make sure everything stops
  3. take stock of what's published in DBS at that point and make sure that it matches what's on disk
 
    • if your output is not in DBS, you can use crab report, but while DBS information is available forever, crab commands on a specific task may not
Changed:
<
<
  1. assess whether it is more important to get the last percentage of statistics or go on with other work. Do you really need 100% completion in this task ?
  2. if full statistics is needed, create a recovery task for the missing lumis and run it writing to the same dataset as the original task.
>
>
  1. assess whether it is more important to get the last percentage of statistics or go on with other work. Do you really need 100% completion in this task ?
  2. if full statistics is needed, create a recovery task for the missing lumis and run it writing to the same dataset as the original task.
 
Changed:
<
<
Recovery task is an important concept that can be useful in many circumstances. Please find instructions in this FAQ
>
>
Recovery task is an important concept that can be useful in many circumstances. Please find instructions in this FAQ
  At the other extreme there's: forget about this, and resubmit a new task with new output dataset. In between it is a murky land where many recipes may be more efficient according to details, but no general simple rule can be given and there's space for individual creativity and/or desperation.
Line: 584 to 545
 
<!--/twistyPlugin twikiMakeVisibleInline-->
When doing crab submit the user may get one of these error messages:
Changed:
<
<
>
>
 Syntax error in CRAB configuration: invalid syntax (<CRAB-configuration-file-name>.py, <line-where-error-occurred>)
Deleted:
<
<
 
Deleted:
<
<
 Syntax error in CRAB configuration:
Changed:
<
<
'Configuration' object has no attribute '<attribute-name>'

Error explanation: The CRAB configuration file could not be loaded, because there is a syntax error somewhere in it.

>
>
'Configuration' object has no attribute '<attribute-name>' Error explanation: The CRAB configuration file could not be loaded, because there is a syntax error somewhere in it.
  What to do: Check the CRAB configuration file and fix it. There could be a misspelled parameter or section name, or you could be trying to use a configuration attribute (parameter or section) that was not defined. To get more details on where the error occurred, do:
Changed:
<
<
>
>
 python
Changed:
<
<
import <CRAB-configuration-file-name> #without the '.py'
>
>
import <CRAB-configuration-file-name> #without the '.py'
  which gives:
Changed:
<
<
>
>
 Traceback (most recent call last): File "", line 1, in File "<CRAB-configuration-file-name>.py", <line-where-error-occurred> <error-python-code>
Changed:
<
<
^
>
>
^
  or
Changed:
<
<
>
>
 Traceback (most recent call last): File "", line 1, in File "<CRAB-configuration-file-name>.py", <line-where-error-occurred>, in <error-python-code>
Changed:
<
<
AttributeError: 'Configuration' object has no attribute '<attribute-name>'
>
>
AttributeError: 'Configuration' object has no attribute '<attribute-name>'
  For more information about the CRAB configuration file, see CRAB3ConfigurationFile.
<!--/twistyPlugin-->
Line: 645 to 595
 You should inspect the stdout of one job to find the exception message and traceback which may guide you to the solution.

A particular case is when the exception says An exception of category 'DictionaryNotFound' occurred, like in this example:

Changed:
<
<
>
>
 
Begin Fatal Exception 08-Jun-2017 18:18:04 CEST----------------------- An exception of category 'DictionaryNotFound' occurred while [0] Constructing the EventProcessor Exception Message: No Dictionary for class: 'edm::Wrapper<edm::DetSetVector >'
Changed:
<
<

End Fatal Exception -------------------------------------------------
>
>

End Fatal Exception -------------------------------------------------
  in this case, most likely the input data have been produced with a CMSSW version not compatible with the one used in CRAB job. In general it's not supported reading data with a release older than what it was produced with.

To find out which release was used to produce a given dataser of file, adapt following examples to your situation:

Changed:
<
<
>
>
 belforte@lxplus045/~> dasgoclient --query "release dataset=/DoubleMuon/Run2016C-18Apr2017-v1/AOD" ["CMSSW_8_0_28"] belforte@lxplus045/~>
Deleted:
<
<
 
Deleted:
<
<
 belforte@lxplus045/~> dasgoclient --query "release file=/store/data/Run2016C/DoubleMuon/AOD/18Apr2017-v1/100001/56D1FA6E-D334-E711-9967-0025905A48B2.root" ["CMSSW_8_0_28"]
Changed:
<
<
belforte@lxplus045/~>
>
>
belforte@lxplus045/~>
 </>
<!--/twistyPlugin-->

Exit code 8028

Line: 685 to 630
  Leaving the AAA error aside for a while, the first thing to contemplate here is to understand why the file could not be loaded from the local storage. Was it because the file is not available at the execution site? And if so, was it supposed to be available? If not, can you force CRAB to submit the jobs to the site(s) where the file is hosted? CRAB should do that automatically if the input file is one from the input dataset specified in the CRAB configuration parameter Data.inputDataset, unless you have set Data.ignoreLocality = True, or except in cases like using a (secondary) pile-up dataset. If yours is the last case, please read Using pile-up in this same twiki.
Changed:
<
<
If you intentionally wanted (and had a good reason) to run jobs reading the input files via AAA, then yes, we have to care about why AAA failed. Since AAA must be able to access any CMS site, the next thing is to discard a transient problem: you can submit your jobs again and see if the error persists. Ultimately, you should write to the Computing Tools HyperNews forum (this is a forum for all kind of issues with CMS computing tools, not only CRAB) following these instructions.
>
>
If you intentionally wanted (and had a good reason) to run jobs reading the input files via AAA, then yes, we have to care about why AAA failed. Since AAA must be able to access any CMS site, the next thing is to discard a transient problem: you can submit your jobs again and see if the error persists. Ultimately, you should write to the Computing Tools HyperNews forum (this is a forum for all kind of issues with CMS computing tools, not only CRAB) following these instructions.
 </>
<!--/twistyPlugin-->
Deleted:
<
<
 

Exit code 50660

<!--/twistyPlugin twikiMakeVisibleInline-->
Line: 704 to 648
 
<!--/twistyPlugin twikiMakeVisibleInline-->
The most common reasons for this error are:
  1. The user is trying to analyze an input dataset, but he/she has specified in the CRAB configuration file JobType.pluginName = 'PrivateMC' instead of JobType.pluginName = 'Analysis'.
Changed:
<
<
  1. The user is generating MC events, correctly specifying in the CRAB configuration file JobType.pluginName = 'PrivateMC', but in the CMSSW parameter set configuration he/she has specified a source of type PoolSource. The solution is to not specify a PoolSource. Note: This doesn't mean to remove process.source completely, as this attribute must be present. One could set process.source = cms.Source("EmptySource") if no input source is used.
>
>
  1. The user is generating MC events, correctly specifying in the CRAB configuration file JobType.pluginName = 'PrivateMC', but in the CMSSW parameter set configuration he/she has specified a source of type PoolSource. The solution is to not specify a PoolSource. Note: This doesn't mean to remove process.source completely, as this attribute must be present. One could set process.source = cms.Source("EmptySource") if no input source is used.
 
<!--/twistyPlugin-->
Deleted:
<
<
 

[ERROR] Operation expired

<!--/twistyPlugin twikiMakeVisibleInline-->
Line: 711 to 654
 

[ERROR] Operation expired

<!--/twistyPlugin twikiMakeVisibleInline-->
Changed:
<
<
Some sites configuration can not handle remote access of large files (> 10 GB) and XRootD fails with a message like
XrdAdaptor::RequestManager::requestFailure Open(name='root://xrootd.echo.stfc.ac.uk//store/data/Run2017F/ZeroBias2/RAW-RECO/05Apr2018-v1/30000/0A7DD85F-0439-E811-BDF8-0CC47AACFCDE.root', flags=0x10, permissions=0660, old source=xrootd.echo.stfc.ac.uk:1094 (site T1_UK_RAL)) => timeout when waiting for file open
>
>
Some sites configuration can not handle remote access of large files (> 10 GB) and XRootD fails with a message like
 
Changed:
<
<
Additional Info:
>
>
== CMSSW:    [1] Reading branch EventAuxiliary
== CMSSW:    [2] Calling XrdFile::readv()
== CMSSW:    Additional Info:

      [a] Original error: '[ERROR] Operation expired' (errno=0, code=206, source=xrootd.echo.stfc.ac.uk:1094 (site T1_UK_RAL)).

 
Deleted:
<
<
[a] Original error: '[ERROR] Operation expired' (errno=0, code=206, source=xrootd.echo.stfc.ac.uk:1094 (site T1_UK_RAL)).
 As of winter 2019 this almost only happens for files stored at T1_UK_RAL. If you are in this situation, a way out is to submit a new task using CMSSW ≥ 10_4_0 with the following duplicateCheckMode option in the PSet PoolSource
Changed:
<
<
>
>
 process.source = cms.Source("PoolSource", [...] duplicateCheckMode = cms.untracked.string("noDuplicateCheck")
Changed:
<
<
)
>
>
)
  When that is not an option and the problem is persistent, you may need to ask for a replica of the data at another site.
Line: 741 to 686
 The general problem is that CMSSW parameter-set configurations don't like to be loaded twice. In that respect, each time the CRAB client loads a CMSSW configuration, it saves it in a local (temporary) cache identifying the loaded module with a key constructed out of the following three pieces: the full path to the module and the python variables sys.path and sys.argv.

A problem arises when the CRAB configuration parameter JobType.pyCfgParams is used. The arguments in JobType.pyCfgParams are added by CRAB to sys.argv, affecting the value of the key that identifies a CMSSW parameter-set in the above mentioned cache. And that's in principle fine, as changing the arguments passed to the CMSSW parameter-set may change the event processor. But when a python process has to do more than one submission (like the case of multicrab for multiple submissions), the CMSSW parameter-set is loaded again every time the JobType.pyCfgParams is changed and this may result in "duplicate process" errors. Below are two examples of these kind of errors:

Changed:
<
<
>
>
 CmsRunFailure CMSSW error message follows. Fatal Exception An exception of category 'Configuration' occurred while
Changed:
<
<
† †[0] Constructing the EventProcessor † †[1] Constructing module: class=...... label=......
>
>
[0] Constructing the EventProcessor [1] Constructing module: class=...... label=......
 Exception Message: Duplicate Process The process name ...... was previously used on these products. Please modify the configuration file to use a distinct process name.
Deleted:
<
<
 
Deleted:
<
<
 CmsRunFailure CMSSW error message follows. Fatal Exception
Line: 765 to 707
 in vString categories duplication of the string ...... The above are from MessageLogger configuration validation. In most cases, these involve lines that the logger configuration code
Changed:
<
<
would not process, but which the cfg creator obviously meant to have effect.
>
>
would not process, but which the cfg creator obviously meant to have effect.
 One option would be to try to not use JobType.pyCfgParams. But if this is not possible, the more general ad-hoc solution would be to fork the submission into a different python process. For example, if you are doing something like documented in Multicrab using the crabCommand API then we suggest to replace each
Changed:
<
<
submit(config)
>
>
submit(config)
  by
Changed:
<
<
>
>
 from multiprocessing import Process p = Process(target=submit, args=(config,)) p.start()
Changed:
<
<
p.join()
>
>
p.join()
  (Of course, from multiprocessing import Process needs to be executed only once, so put it outside any loop.)
<!--/twistyPlugin-->
Line: 834 to 772
 
  1. submit a new CRAB tasks B which process the missing lumis (listB = listIn - listA)

Details are slighly different if you published output in DBS or not:

Changed:
<
<
output in DBS
follow the procedure in this FAQ
output not in DBS
follow the procedure in this Workbook example
>
>
output in DBS
follow the procedure in this FAQ
output not in DBS
follow the procedure in this Workbook example
 
<!--/twistyPlugin-->
Line: 845 to 783
 While data taking is progressing, corresponding datasets in DBS and lumi-mask files are growing. Also data quality is sometimes improved for already existing data, leading to updated lumi-masks which compared to older lumi-masks include luminosity sections that were previously filtered out. Both of these situations lead to the common case where one would like to run a task (lets call it task B) over an input dataset partially analyzed already in a previous task (lets call it task A), where task B should skip the data already analyzed in task A.

This can be accomplished with a few lines in the CRAB configuration file, see an annotated example below.

Changed:
<
<
>
>
 from UserUtilities import config, getLumiListInValidFiles from LumiList import LumiList
Line: 879 to 816
 # and there we, process from input dataset all the lumi listed in the current officialLumiMask file, skipping the ones you already have. config.Data.lumiMask = 'my_lumi_mask.json' config.Data.outputDatasetTag = <TaskA-outputDatasetTag> # add to your existing dataset
Changed:
<
<
...
>
>
...
  IMPORTANT NOTE : in this way you will add any lumi section in the intial data set that was turned from bad to good in the golden list after you ran Task-A, but if some of those data evolved the other way around (from good to bad), there is no way to remove those from your published datasets.
Line: 923 to 858
 

Using pile-up

<!--/twistyPlugin twikiMakeVisibleInline-->
Changed:
<
<
Important Instructions:
Make sure you run your jobs at the site where the pile-up sample is. Not where the signal is.
>
>
Important Instructions:
Make sure you run your jobs at the site where the pile-up sample is. Not where the signal is.
 This requires you to overrdie the location list that CRAB would extract from the inputDataset.
Changed:
<
<
Rational and details:
The pile-up files have be specified in the CMSSW parameter-set configuration file. There is no way yet to tell in the CRAB configuration file that one wants to use a pile-up dataset as a secondary input dataset. That means that CRAB doesn't know that the CMSSW code will want to access pile-up files; CRAB only knows about the primary input dataset (if any). This means that, assuming there is a primary input dataset, when CRAB does data discovery to figure out to which sites should it submit the jobs, it will only take into account the input dataset specified in the CRAB configuration file (in the Data.inputDataset parameter) and submit the jobs to sites where this dataset is hosted. If there is no primary input dataset, CRAB will submit the jobs to the less busy sites. In any case, if the pile-up files are not hosted in the execution sites, they will be accessed via AAA (Xrootd). But reading the "signal" events directly from the local storage and the pile-up events via AAA is more inefficient than doing the other way around, since for each "signal" event that is read one needs to read in general many (> 20) pile-up events. Therefore, it is highly recommended that the user forces CRAB to submit the jobs to the sites where the pile-up dataset is hosted by whitelisting these sites using the parameter Site.whitelist in the CRAB configuration file. Note that one also needs to set Data.ignoreLocality = True in the CRAB configuration file in case of using a primary input dataset so to avoid CRAB doing data discovery and eventually complain (and fail to submit) that the input dataset is not available in the whitelisted sites. One can use DAS to get the list of sites that host a dataset.
>
>
Rational and details:
The pile-up files have be specified in the CMSSW parameter-set configuration file. There is no way yet to tell in the CRAB configuration file that one wants to use a pile-up dataset as a secondary input dataset. That means that CRAB doesn't know that the CMSSW code will want to access pile-up files; CRAB only knows about the primary input dataset (if any). This means that, assuming there is a primary input dataset, when CRAB does data discovery to figure out to which sites should it submit the jobs, it will only take into account the input dataset specified in the CRAB configuration file (in the Data.inputDataset parameter) and submit the jobs to sites where this dataset is hosted. If there is no primary input dataset, CRAB will submit the jobs to the less busy sites. In any case, if the pile-up files are not hosted in the execution sites, they will be accessed via AAA (Xrootd). But reading the "signal" events directly from the local storage and the pile-up events via AAA is more inefficient than doing the other way around, since for each "signal" event that is read one needs to read in general many (> 20) pile-up events. Therefore, it is highly recommended that the user forces CRAB to submit the jobs to the sites where the pile-up dataset is hosted by whitelisting these sites using the parameter Site.whitelist in the CRAB configuration file. Note that one also needs to set Data.ignoreLocality = True in the CRAB configuration file in case of using a primary input dataset so to avoid CRAB doing data discovery and eventually complain (and fail to submit) that the input dataset is not available in the whitelisted sites. One can use DAS to get the list of sites that host a dataset.
 
<!--/twistyPlugin-->
Deleted:
<
<
 

Miscellanea

How CRAB finds data in input datasets from DBS

Line: 970 to 904
 LFNs are names like /store/user/mario/myoutput; note that a directory is also a file name.

For example, for the LFN /store/user/username/myfile.root stored in T2_IT_Pisa you can do the following (make sure you did cmsenv before, so to use a new version of curl), where you can replace the first two lines with the values which are useful to you and simply copy/paste the long curl command:

Changed:
<
<
>
>
 site=T2_IT_Pisa lfn=/store/user/username/myfile.root
Changed:
<
<
curl -ks "https://cmsweb.cern.ch/phedex/datasvc/perl/prod/lfn2pfn?node=${site}&lfn=${lfn}&protocol=srmv2" | grep PFN | cut -d "'" -f4
>
>
curl -ks "https://cmsweb.cern.ch/phedex/datasvc/perl/prod/lfn2pfn?node=${site}&lfn=${lfn}&protocol=srmv2" | grep PFN | cut -d "'" -f4
  which returns:
Changed:
<
<
srm://stormfe1.pi.infn.it:8444/srm/managerv2?SFN=/cms/store/user/username/myfile.root
>
>
srm://stormfe1.pi.infn.it:8444/srm/managerv2?SFN=/cms/store/user/username/myfile.root
 
<!--
To see full details, you call the PhEDEx API passing the following query data: {'protocol': 'srmv2', 'node': '<CMS-node-name>', 'lfn': '<logical-path-name>'}. For example, for the LFN /store/user/username/myfile.root stored in T2_IT_Pisa, the URL to query the API would be https://cmsweb.cern.ch/phedex/datasvc/xml/prod/lfn2pfn?protocol=srmv2&node=T2_IT_Pisa&lfn=/store/user/username/myfile.root, which would show an output in xml format as this:
Line: 998 to 928
 
-->

Before executing the gfal commands, make sure to have a valid proxy:

Changed:
<
<
voms-proxy-init -voms cms
>
>
voms-proxy-init -voms cms
 Enter GRID pass phrase for this identity: Contacting voms2.cern.ch:15002 [/DC=ch/DC=cern/OU=computers/CN=voms2.cern.ch] "cms"... Remote VOMS server contacted succesfully.
Line: 1011 to 937
  Created proxy in /tmp/x509up_u<user-id>.
Changed:
<
<
Your proxy is valid until
>
>
Your proxy is valid until
  The most useful gfal commands and their usage syntax for listing/removing/copying files/directories are in the examples below (it is recommended to unset the environment when executing gfal commands, i.e. to add env -i in front of the commands). See also the man entry for each command (man gfal-ls etc.):

List a (remote) path:

Changed:
<
<
env -i X509_USER_PROXY=/tmp/x509up_u$UID gfal-ls <physical-path-name-to-directory>
>
>
env -i X509_USER_PROXY=/tmp/x509up_u$UID gfal-ls <physical-path-name-to-directory>
  Remove a (remote) file:
Changed:
<
<
env -i X509_USER_PROXY=/tmp/x509up_u$UID gfal-rm <physical-path-name-to-file>
>
>
env -i X509_USER_PROXY=/tmp/x509up_u$UID gfal-rm <physical-path-name-to-file>
  Recursively remove a (remote) directory and all files in it:
Changed:
<
<
env -i X509_USER_PROXY=/tmp/x509up_u$UID gfal-rm -r <physical-path-name-to-directory>
>
>
env -i X509_USER_PROXY=/tmp/x509up_u$UID gfal-rm -r <physical-path-name-to-directory>
  Copy a (remote) file to a directory in the local machine:
Added:
>
>
env -i X509_USER_PROXY=/tmp/x509up_u$UID gfal-copy <physical-path-name-to-source-file> file://<absolute-path-to-local-destination-directory>
 
Changed:
<
<
env -i X509_USER_PROXY=/tmp/x509up_u$UID gfal-copy <physical-path-name-to-source-file> file://<absolute-path-to-local-destination-directory>
Note: the <absolute-path-to-local-destination-directory> starts with / therefore there are three consecutive / characters like file:///tmp/somefilename.root
>
>
Note: the starts with / therefore there are three consecutive / characters like file:///tmp/somefilename.root
 
<!--/twistyPlugin-->

Why are my jobs submitted to a site that I had explicitly blacklisted (not whitelisted)?

Line: 1047 to 965
 
<!--/twistyPlugin twikiMakeVisibleInline-->
There is a site overflow mechanism in place, which takes place after CRAB submission. Sites are divided in regions of good WAN/xrootd connectivity (e.g. US, Italy, Germany etc.), then jobs queued at one site A for too long are allowed to overflow to a well connected site B which does not host the requested input data but from where data will be read over xrootd. Rationale is that even if those jobs were to fail due to unable to read data or a problem in site B, they will be automatically resubmitted, so nothing is lost with respect to keeping those jobs idle in the queue waiting for free slots at site A. The site overflow can be turned off via the Debug.extraJDL CRAB configuration parameter :
Changed:
<
<
>
>
 config.section_("Debug")
Changed:
<
<
config.Debug.extraJDL = ['+CMS_ALLOW_OVERFLOW=False']
>
>
config.Debug.extraJDL = ['+CMS_ALLOW_OVERFLOW=False']
  Note: if you change this configuration option for an already-created task (for instance if you noticed a lot of job failures at a particular site and even after blacklisting the jobs keep going back), you can't simply change the option in the configuration and resubmit. You'll have to kill the existing task and make a new task to get the option to be accepted. You can't simply change it during resubmission.
<!--/twistyPlugin-->
Line: 1065 to 982
 There is a tool written in python called LumiList.py (available in the WMCore library; is the same code as cmssw/FWCore/PythonUtilities/python/LumiList.py ) that can be used to do lumi-mask arithmetics. The arithmetics can even be done inside the CRAB configuration file (that's the advantage of having the configuration file written in python). Below are some examples.

Example 1: A run range selection can be achieved by selecting from the original lumi-mask file the run range of interest.

Changed:
<
<
>
>
 from LumiList import LumiList

lumiList = LumiList(filename='my_original_lumi_mask.json') lumiList.selectRuns([x for x in range(193093,193999+1)]) lumiList.writeJSON('my_lumi_mask.json')

Changed:
<
<
config.Data.lumiMask = 'my_lumi_mask.json'
>
>
config.Data.lumiMask = 'my_lumi_mask.json'
  Example 2: Use a new lumi-mask file that is the intersection of two other lumi-mask files.
Changed:
<
<
>
>
 from LumiList import LumiList

originalLumiList1 = LumiList(filename='my_original_lumi_mask_1.json')

Line: 1086 to 1000
 newLumiList = originalLumiList1 & originalLumiList2 newLumiList.writeJSON('my_lumi_mask.json')
Changed:
<
<
config.Data.lumiMask = 'my_lumi_mask.json'
>
>
config.Data.lumiMask = 'my_lumi_mask.json'
  Example 3: Use a new lumi-mask file that is the union of two other lumi-mask files.
Changed:
<
<
>
>
 from LumiList import LumiList

originalLumiList1 = LumiList(filename='my_original_lumi_mask_1.json')

Line: 1099 to 1011
 newLumiList = originalLumiList1 | originalLumiList2 newLumiList.writeJSON('my_lumi_mask.json')
Changed:
<
<
config.Data.lumiMask = 'my_lumi_mask.json'
>
>
config.Data.lumiMask = 'my_lumi_mask.json'
  Example 4: Use a new lumi-mask file that is the subtraction of two other lumi-mask files.
Changed:
<
<
>
>
 from LumiList import LumiList

originalLumiList1 = LumiList(filename='my_original_lumi_mask_1.json')

Line: 1112 to 1022
 newLumiList = originalLumiList1 - originalLumiList2 newLumiList.writeJSON('my_lumi_mask.json')
Changed:
<
<
config.Data.lumiMask = 'my_lumi_mask.json'
>
>
config.Data.lumiMask = 'my_lumi_mask.json'
 </>
<!--/twistyPlugin-->

User quota in the CRAB scheduler machines

<!--/twistyPlugin twikiMakeVisibleInline-->
Each user has a home directory with 100GB of disk space in each of the scheduler machines (schedd for short) assigned to CRAB3 for submitting jobs to the Grid. Whenever a task is submitted by the CRAB server to a schedd, a task directory is created in this space containing among other things CRAB libraries and scripts needed to run the jobs. Log files from Condor/DAGMan and CRAB itself are also placed there. (What is not available in the schedds are the cmsRun log files, except for the snippet available in the CRAB job log file.) As a guidance, a task with 100 jobs uses on average 50MB of space, but this number depends a lot on the number of resubmissions, since each resubmission produces its log files. If a user reaches his/her quota in a given schedd, he/she will not be able to submit more jobs via that schedd (he/she may still be able to submit via other schedd, but since the user can not choose the schedd to which to submit -the choice is done by the CRAB server-, he/she would have to keep trying the submission until the task goes to a schedd with non-exahusted quota). To avoid that, task directories are automatically removed from the schedds after 30 days of their last modification. If a user reaches 50% of its quota in a given schedd, an automatic e-mail similar to the one shown below is sent to him/her.
Changed:
<
<
>
>
 Subject: WARNING: Reaching your quota

Dear analysis user ,

Line: 1134 to 1043
  https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideCrabFaq#Disk_space_for_output_files If you have any questions, please contact hn-cms-computing-tools(AT)cern.ch Regards,
Changed:
<
<
CRAB support
>
>
CRAB support
  This e-mail has a link (https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideCrabFaq#How_to_clean_up_your_directory_i) to the instructions on how to clean up space in the user's home directory in a schedd. A user can follow the instructions in that page, or alternatively use the crab purge command.
<!--/twistyPlugin-->
Line: 1146 to 1054
 
<!--/twistyPlugin twikiMakeVisibleInline-->
To overcome the CRAB3 vs CMSSW environment conflicts, you can use the following script available in CVMFS (/cvmfs/cms.cern.ch/crab3/crab-env-bootstrap.sh) without need to source the CRAB3 environment. You could do something like this:
Changed:
<
<
>
>
 cmsenv # DO NOT setup the CRAB3 environment alias crab='/cvmfs/cms.cern.ch/crab3/crab-env-bootstrap.sh' crab submit crab status ...
Changed:
<
<
# check that you can run cmsRun locally
>
>
# check that you can run cmsRun locally
  Details:
Line: 1168 to 1074
 
<!--/twistyPlugin twikiMakeVisibleInline-->
The problematic pset is FWCore/ParameterSet/Mixins.py from CMSSW:
Added:
>
>
[line 714] p = tLPTest("MyType",** { "a"+str(x): tLPTestType(x) for x in xrange(0,300) } )
 
Changed:
<
<
[line 714] p = tLPTest("MyType",** { "a"+str(x): tLPTestType(x) for x in xrange(0,300) } )

This uses dictionary comprehensions, a feature available in python > 2.7. While CMSSW (setup via cmsenv) uses python > 2.7, CRAB (setup via /cvmfs/cms.cern.ch/crab3/crab.sh) still uses python 2.6.8. To overcome this problem, don't setup the CRAB environment and instead use the crab-env-bootstrap.sh script (see this FAQ).

>
>
This uses dictionary comprehensions, a feature available in python > 2.7. While CMSSW (setup via cmsenv) uses python > 2.7, CRAB (setup via /cvmfs/cms.cern.ch/crab3/crab.sh) still uses python 2.6.8. To overcome this problem, don't setup the CRAB environment and instead use the crab-env-bootstrap.sh script (see this FAQ).
 
<!--/twistyPlugin-->

Revision 992019-02-19 - StefanoBelforte

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 719 to 719
  [a] Original error: '[ERROR] Operation expired' (errno=0, code=206, source=xrootd.echo.stfc.ac.uk:1094 (site T1_UK_RAL)).
Changed:
<
<
If you see such message in a job log, submitting a new task in a CMSSW ≥ 10_4_0 with the following duplicateCheckMode option in the PSet PoolSource
>
>
As of winter 2019 this almost only happens for files stored at T1_UK_RAL. If you are in this situation, a way out is to submit a new task using CMSSW ≥ 10_4_0 with the following duplicateCheckMode option in the PSet PoolSource
 
process.source = cms.Source("PoolSource",
   [...]
   duplicateCheckMode = cms.untracked.string("noDuplicateCheck")
)
Changed:
<
<
will prevent the error to happen.
>
>
When that is not an option and the problem is persistent, you may need to ask for a replica of the data at another site.
 
<!--/twistyPlugin-->

Revision 982019-02-19 - LeonardoCristella

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 707 to 707
 
  1. The user is generating MC events, correctly specifying in the CRAB configuration file JobType.pluginName = 'PrivateMC', but in the CMSSW parameter set configuration he/she has specified a source of type PoolSource. The solution is to not specify a PoolSource. Note: This doesn't mean to remove process.source completely, as this attribute must be present. One could set process.source = cms.Source("EmptySource") if no input source is used.
</>
<!--/twistyPlugin-->
Added:
>
>

[ERROR] Operation expired

<!--/twistyPlugin twikiMakeVisibleInline-->
Some sites configuration can not handle remote access of large files (> 10 GB) and XRootD fails with a message like
XrdAdaptor::RequestManager::requestFailure Open(name='root://xrootd.echo.stfc.ac.uk//store/data/Run2017F/ZeroBias2/RAW-RECO/05Apr2018-v1/30000/0A7DD85F-0439-E811-BDF8-0CC47AACFCDE.root', flags=0x10, permissions=0660, old source=xrootd.echo.stfc.ac.uk:1094 (site T1_UK_RAL)) => timeout when waiting for file open

  Additional Info:

      [a] Original error: '[ERROR] Operation expired' (errno=0, code=206, source=xrootd.echo.stfc.ac.uk:1094 (site T1_UK_RAL)).
If you see such message in a job log, submitting a new task in a CMSSW ≥ 10_4_0 with the following duplicateCheckMode option in the PSet PoolSource
process.source = cms.Source("PoolSource",
   [...]
   duplicateCheckMode = cms.untracked.string("noDuplicateCheck")
)
will prevent the error to happen.
<!--/twistyPlugin-->
 

CRAB Client API

Multiple submission fails with a CMSSW "duplicate process" error

Revision 972019-01-04 - LeonardoCristella

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 45 to 45
  then simply issue a myproxy-destroy command with same arguments:
Changed:
<
<
#exampe. In real life replace the long hex string with the one from your crab.log
>
>
# example. In real life replace the long hex string with the one from your crab.log
 myproxy-destroy -l ec95456d3589ed395dc47d3ada8c94c67ee588f1 -s myproxy.cern.ch
Changed:
<
<
If things still fail after than, send the following additional info in your request for support , replacing the long hex string with the one that you found in crab.log:
  • output of : voms-proxy-info -all
  • output of '=myproxy-info -d -l -s myproxy.cern.ch
  • content of you crab.log
>
>
If things still fail after than, send the following additional info in your request for support, replacing the long hex string with the one that you found in crab.log (ec95456d3589ed395dc47d3ada8c94c67ee588f1 in the above example):
  • output of voms-proxy-info -all
  • output of myproxy-info -d -l <long-hex-string> -s myproxy.cern.ch
  • content of you crab.log as an attachment
 

</>

<!--/twistyPlugin-->

Revision 962018-12-14 - StefanoBelforte

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 119 to 119
 

What are the files CRAB adds to the user input sandbox?

<!--/twistyPlugin twikiMakeVisibleInline-->
Changed:
<
<
CRAB adds to the user input sandbox the following directories/files:
>
>
CRAB adds to the user input sandbox the following directories/files, when a directory is incuded all the containted subdirectories are also recursively included:
 
  • The directories $CMSSW_BASE/lib, $CMSSW_BASE/biglib and $CMSSW_BASE/module. One can also tell CRAB to include the directory $CMSSW_BASE/python by setting JobType.sendPythonFolder = True in the CRAB configuration.
  • Any data and interface directory recursively found in $CMSSW_BASE/src.
  • All additional directories/files specified in the CRAB configuration parameter JobType.inputFiles.

Revision 952018-11-13 - StefanoBelforte

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 292 to 292
 Users do not have permissions to delete a dataset or a file from DBS. Instead, what users can do is to change the status of the dataset or of individual files in the dataset. For more details see Changing a dataset or file status in DBS.
<!--/twistyPlugin-->
Added:
>
>

Can I send CRAB output to CERNBOX ?

Yes, by doing both the following:

  1. indicating T2_CH_CERNBOX as storage location in CRAB configuration
  2. asking CERNBOX administrators (which are NOT in CMS) to grant proper permission to your DN

Explanation:

The T2_CH_CERNBOX site is not listed among CMS Sites e.g. in https://cms-cric.cern.ch/cms/site/index/ but a trivial file catalog exist for it, and it is known to PhEDEx, i.e. it is a known storage location for CMS in https://cmsweb.cern.ch/phedex/prod/Components::Status (and https://cms-cric.cern.ch/cms/storageunit/detail/T2_CH_CERNBOX/ ) which allows CRAB to use the T2_CH_CERNBOX string to map logical file names of the kind /store/user/somename to gsiftp://eosuserftp.cern.ch/eos/user/s/somename which is the proper end point for writing to CERNBOX.

But since CERNBOX is not part of CMS disk, but a space which CERN offers to all users, access to it is not controlled by CMS, so in order to be able to write there from a grid node using gsiftp (differntly from using e.g. CERNBOX client or fuse mount on lxplus), users need to ask for help from CERN, e.g. via CERN help desk or a SNOW ticket.

 

Jobs status

My jobs are still idle/pending/queued. How can I know why and what can I do?

Revision 942018-09-28 - LeonardoCristella

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 91 to 91
 The Data.splitting parameter has now a default value: 'Automatic'.
With such a setting the task processing is split into three stages:
  1. A "probe" stage, where some probe jobs are submitted to estimate the event throughput of the CMSSW parameter-set configuration provided by the user in the JobType.psetName parameter and possible further arguments. Probe jobs have a job id of the form 0-[1,2,3,...], they can not be resubmitted and the task will fail if none of the probe jobs complete successfully.
Changed:
<
<
  1. A "main" stage, very similar to the conventional stage for other splitting modes, in which a number of main jobs (automatically determined by the probe stage) will process the dataset. These jobs can not be manually resubmitted and have a fixed maximum runtime (specified in the Data.unitsPerJob parameter), after which they gracefully stop processing input data. The remaining data will be processed in the next stage (tail) and accounted as "rescheduled" jobs in the main stage.
>
>
  1. A "main" stage, very similar to the conventional stage for other splitting modes, in which a number of main jobs (automatically determined by the probe stage) will process the dataset. These jobs can not be manually resubmitted and have a fixed maximum runtime (specified in the Data.unitsPerJob parameter), after which they gracefully stop processing input data. The remaining data will be processed in the next stage (tail) and their jobs labelled as "rescheduled" in the main stage (in the dashboard they will always appear as "failed").
 
  1. Some possible "tail" stages. If some main job does not finish successfully ("rescheduled" in the previous stage) or does not completely process the amount of data assigned to it due to the automatically configured maximum job run time, tail jobs are created and submitted in order to fully process the dataset. Tail jobs have a job id of the form n-[1,2,3,...], where n=1,2,... represents the tail stage number. For small tasks, less than 100 jobs, one tail stage is started when all jobs have completed (successfully or failed). For larger tasks, a first tail stage collects all remaining input data from the first 50% of completed jobs, followed by a stage that processes data when 80% of jobs have completed, and finally a stage collecting leftover input data at 100% job completion.
    Failed tail jobs can be manually resubmitted by the users.

Once the probe stage is completed, the plain crab status command shows only the main and tail jobs. For the list of all jobs add the --long option.

Revision 932018-08-27 - StefanoBelforte

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 49 to 49
 myproxy-destroy -l ec95456d3589ed395dc47d3ada8c94c67ee588f1 -s myproxy.cern.ch
Added:
>
>
If things still fail after than, send the following additional info in your request for support , replacing the long hex string with the one that you found in crab.log:
  • output of : voms-proxy-info -all
  • output of '=myproxy-info -d -l -s myproxy.cern.ch
  • content of you crab.log
 </>
<!--/twistyPlugin-->

CRAB setup

Revision 922018-07-16 - LeonardoCristella

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 422 to 422
 What to do: Try again after a couple of minutes. </>
<!--/twistyPlugin-->
Changed:
<
<

crab submit fails with "Splitting task ... with LumiBased method does not generate any job"

>
>

crab submit fails with "Splitting task ... on dataset ... with ... method does not generate any job"

 
<!--/twistyPlugin twikiMakeVisibleInline-->
This is not a CRAB error.

Revision 912018-07-13 - LeonardoCristella

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 667 to 667
  Leaving the AAA error aside for a while, the first thing to contemplate here is to understand why the file could not be loaded from the local storage. Was it because the file is not available at the execution site? And if so, was it supposed to be available? If not, can you force CRAB to submit the jobs to the site(s) where the file is hosted? CRAB should do that automatically if the input file is one from the input dataset specified in the CRAB configuration parameter Data.inputDataset, unless you have set Data.ignoreLocality = True, or except in cases like using a (secondary) pile-up dataset. If yours is the last case, please read Using pile-up in this same twiki.
Changed:
<
<
If you intentionally wanted (and had a good reason) to run jobs reading the input files via AAA, then yes, we have to care about why AAA failed. The first thing then is to check that the site where the input file is hosted supports access to its storage via AAA, which is true for the majority of the sites, but not all. If you want to know if your file is available through AAA you can check this link, and in particular you can do cmsenv and then use this command xrdfs cms-xrd-global.cern.ch locate /store/path/to/file. If AAA is supported, the next thing is to discard a transient problem: you can submit your jobs again and see if the error persists. Ultimately, you should write to the Computing Tools HyperNews forum (this is a forum for all kind of issues with CMS computing tools, not only CRAB).
>
>
If you intentionally wanted (and had a good reason) to run jobs reading the input files via AAA, then yes, we have to care about why AAA failed. Since AAA must be able to access any CMS site, the next thing is to discard a transient problem: you can submit your jobs again and see if the error persists. Ultimately, you should write to the Computing Tools HyperNews forum (this is a forum for all kind of issues with CMS computing tools, not only CRAB) following these instructions.
 
<!--/twistyPlugin-->

Revision 902018-07-03 - LeonardoCristella

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 85 to 85
 With such a setting the task processing is split into three stages:
  1. A "probe" stage, where some probe jobs are submitted to estimate the event throughput of the CMSSW parameter-set configuration provided by the user in the JobType.psetName parameter and possible further arguments. Probe jobs have a job id of the form 0-[1,2,3,...], they can not be resubmitted and the task will fail if none of the probe jobs complete successfully.
  2. A "main" stage, very similar to the conventional stage for other splitting modes, in which a number of main jobs (automatically determined by the probe stage) will process the dataset. These jobs can not be manually resubmitted and have a fixed maximum runtime (specified in the Data.unitsPerJob parameter), after which they gracefully stop processing input data. The remaining data will be processed in the next stage (tail) and accounted as "rescheduled" jobs in the main stage.
Changed:
<
<
  1. Some possible "tail" stages. If some main job does not finish successfully or does not completely process the amount of data assigned to it due to the automatically configured maximum job run time, tail jobs are created and submitted in order to fully process the dataset. Tail jobs have a job id of the form n-[1,2,3,...], where n=1,2,... represents the tail stage number. For small tasks, less than 100 jobs, one tail stage is started when all jobs have completed (successfully or failed). For larger tasks, a first tail stage collects all remaining input data from the first 50% of completed jobs, followed by a stage that processes data when 80% of jobs have completed, and finally a stage collecting leftover input data at 100% job completion.
    Failed tail jobs can be manually resubmitted by the users.
>
>
  1. Some possible "tail" stages. If some main job does not finish successfully ("rescheduled" in the previous stage) or does not completely process the amount of data assigned to it due to the automatically configured maximum job run time, tail jobs are created and submitted in order to fully process the dataset. Tail jobs have a job id of the form n-[1,2,3,...], where n=1,2,... represents the tail stage number. For small tasks, less than 100 jobs, one tail stage is started when all jobs have completed (successfully or failed). For larger tasks, a first tail stage collects all remaining input data from the first 50% of completed jobs, followed by a stage that processes data when 80% of jobs have completed, and finally a stage collecting leftover input data at 100% job completion.
    Failed tail jobs can be manually resubmitted by the users.
  Once the probe stage is completed, the plain crab status command shows only the main and tail jobs. For the list of all jobs add the --long option. </>
<!--/twistyPlugin-->

Revision 892018-06-13 - StefanoBelforte

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 439 to 439
  </>
<!--/twistyPlugin-->
Changed:
<
<
>
>
 

crab submit fails with "Block ...  contains more than 100000 lumis and cannot be processed for splitting. For memory/time contraint big blocks are not allowed. Use another dataset as input."

<!--/twistyPlugin twikiMakeVisibleInline-->
Line: 507 to 507
 
<!--/twistyPlugin-->
Added:
>
>
 

crab submit fails with "Block ...  contains more than 100000 lumis."

<!--/twistyPlugin twikiMakeVisibleInline-->

Revision 882018-05-22 - StefanoBelforte

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 19 to 19
 
Complete: 3 Go to SWGuideCrab

Help Notice: This is a large page, it works best if you search in it for your problem using the browser search function.

Changed:
<
<
By default all answers are collapsed and search only used the questions text. If you do not find what you need, you can use the buttons below to search inside answers as well:
>
>
By default all answers are collapsed and search only uses the questions text. If you do not find what you need, you can use the buttons below to search inside answers as well:
   

Help if you still have problems, check the CRAB3Troubleshoot guide before asking for support

Line: 668 to 668
 If you intentionally wanted (and had a good reason) to run jobs reading the input files via AAA, then yes, we have to care about why AAA failed. The first thing then is to check that the site where the input file is hosted supports access to its storage via AAA, which is true for the majority of the sites, but not all. If you want to know if your file is available through AAA you can check this link, and in particular you can do cmsenv and then use this command xrdfs cms-xrd-global.cern.ch locate /store/path/to/file. If AAA is supported, the next thing is to discard a transient problem: you can submit your jobs again and see if the error persists. Ultimately, you should write to the Computing Tools HyperNews forum (this is a forum for all kind of issues with CMS computing tools, not only CRAB).
<!--/twistyPlugin-->
Added:
>
>

Exit code 50660

<!--/twistyPlugin twikiMakeVisibleInline-->

Exit code 50660 means "Application terminated by wrapper because using too much RAM (RSS)" (as documented here). The amount of RAM that a job can use on a grid node is always limited and if memory need keeps increasing as the job run (so called "memory leak") the job will need to be killed. Grid sites used by CMS guarantee at least 2.5 GB of RAM per core, so allowing for some overhead, CRAB default is to ask 2GB per job. This is usually enough to run full RECO and user jobs should not normally need more. So the user first action when getting this error is to make sure that code is not leaking memory nor allocating useless large structures. If more RAM is really needed, it can be requested via the JobType.maxMemoryMB parameter in CRAB configuration file. Uselessly requesting too much RAM is very likely to result in wasted CPU (we will run less jobs then there are CPU cores available in a node, to spread the available RAM in fewer, larger, chunks), so you have to be careful, abuse will be monitored and tasks may get killed.

An important exception is in case the user runs multi-threaded applications, in particular CMSSW. In that case a single job will use multiple cores and not only can, but must use more than the default 2GB of RAM. It is up to the user to request the proper amount of memory, e.g. after measuring it running the code interactively, or by looking up what Production is using in similar workflows. As a generic rule of thumb, (1+1*num_threads) GB may be a good starting point.

<!--/twistyPlugin-->
 

Illegal parameter found in configuration. The parameter is named: 'numberEventsInLuminosityBlock'

<!--/twistyPlugin twikiMakeVisibleInline-->

Revision 872018-05-22 - LeonardoCristella

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 893 to 893
 

How many jobs can I run at the same time ?

<!--/twistyPlugin twikiMakeVisibleInline-->
Changed:
<
<
CRAB runs jobs on the Grid using a global HTCondor pool created via glideInWms machinery, thing of it like a global batch system whith execution nodes all over the places. The most important thing which control how many jobs can you run is the overall number of execution slots (CPU's) available for your jobs, i.e. that match your requirement of data access, memory and running time. Then HTCondor tries hard to give to every user the same share of computing resources, i.e. equal resources to everyone at any given time. You are not penalized for having run more jobs yesterday, and not rewared either for not having used your share in the past. In computing the share that you use, HTCondor considers both the number of cores and the number of GB's of RAM that you are using. As of October 2016 the weigth is : (#cores + #GBytes)
>
>
CRAB runs jobs on the Grid using a global HTCondor pool created via glideInWms machinery, think of it like a global batch system whith execution nodes all over the places. The most important thing which controls how many jobs you can run is the overall number of execution slots (CPU's) available for your jobs, i.e. that match your requirement of data access, memory and running time. Then HTCondor tries hard to give to every user the same share of computing resources, i.e. equal resources to everyone at any given time. You are not penalized for having run more jobs yesterday, and not rewared either for not having used your share in the past. To assess the user share, HTCondor considers only the number of cores that you are using (until May 2017 also the number of RAM GB's was accounted for).
 
Changed:
<
<
Beware thus of asking for much more memory per core that you need.
>
>
Beware thus of asking for much more memory per core than you need (see What is the maximum memory per job (maxMemoryMB) I can request?).
 
<!--/twistyPlugin-->

Revision 862018-05-21 - LeonardoCristella

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 72 to 72
 

What is the maximum memory per job (maxMemoryMB) I can request?

<!--/twistyPlugin twikiMakeVisibleInline-->
Changed:
<
<
CRAB requests by default a maximum memory of 2000 MB. This is the maximum all sites guarantee they will run. Some sites, but not many, offer a bit more (typically 2500 MB); and some sites even offer 4000 MB for special users. Memory limits offered by each site are accessible in the GlideinWMS VO Factory Monitor page, http://glidein.grid.iu.edu/factory/monitor/ (choose "Current Status of the Factory" and click on a site CE listed under "Entry Name" in the table), but this should not be considered a documentation. The best advice we can give is: stick to the default, and if you think you need more, find out first if there are sites (and which ones) which can run jobs in that case. If you need help, you can write to us.
>
>
CRAB requests by default a maximum memory of 2000 MB. This is the maximum memory per core all sites guarantee they will run. Some sites, but not many, offer a bit more (typically 2500 MB); and some sites even offer 4000 MB for special users. Memory limits offered by each site are accessible in the GlideinWMS VO Factory Monitor page, http://glidein.grid.iu.edu/factory/monitor/ (choose "Current Status of the Factory" and click on a site CE listed under "Entry Name" in the table), but this should not be considered a documentation. The best advice we can give is: stick to the default, and if you think you need more, find out first if there are sites (and which ones) which can run jobs in that case. If you need help, you can write to us.

note.gif Note: In case of a multi-threaded job (config.JobType.numCores > 1) most likely the default memory value is not enough. The user share of computing resources accounts for the requested memory per core.

 
<!--/twistyPlugin-->

What is the 'Automatic' splitting mode?

Line: 287 to 290
 

My jobs are still idle/pending/queued. How can I know why and what can I do?

<!--/twistyPlugin twikiMakeVisibleInline-->
Changed:
<
<
If jobs are pending for more than ~12 hours, there is certainly a problem somewhere. The first thing to do is to identify to which site(s) the jobs were submitted and check the site(s) status in the Site Readiness Monitor page, http://cms-site-readiness.web.cern.ch/cms-site-readiness/SiteReadiness/HTML/SiteReadinessReport.html. For example, the "HammerCloud" row will tell whether analysis jobs are running at the site and their success rate, and the "Maintenance" row will tell whether the site had/has a downtime (clicking on the corresponding date inset in the table will open a new web page where the downtime reason is explained). If everything looks fine with the site(s) status, it may be that the user jobs are not running because they requested more resources (memory) than what the site(s) can offer (see What is the maximum memory per job (maxMemoryMB) I can request?).
>
>
If jobs are pending for more than ~12 hours, there is certainly a problem somewhere. The first thing to do is to identify to which site(s) the jobs were submitted and check the site(s) status in the Site Readiness Monitor page, http://cms-site-readiness.web.cern.ch/cms-site-readiness/SiteReadiness/HTML/SiteReadinessReport.html. For example, the "HammerCloud" row will tell whether analysis jobs are running at the site and their success rate, and the "Maintenance" row will tell whether the site had/has a downtime (clicking on the corresponding date inset in the table will open a new web page where the downtime reason is explained). If everything looks fine with the site(s) status, it may be that the user jobs are not running because they requested more resources (memory per core) than what the site(s) can offer (see What is the maximum memory per job (maxMemoryMB) I can request?).
 
<!--/twistyPlugin-->

CRAB commands

Line: 893 to 896
 CRAB runs jobs on the Grid using a global HTCondor pool created via glideInWms machinery, thing of it like a global batch system whith execution nodes all over the places. The most important thing which control how many jobs can you run is the overall number of execution slots (CPU's) available for your jobs, i.e. that match your requirement of data access, memory and running time. Then HTCondor tries hard to give to every user the same share of computing resources, i.e. equal resources to everyone at any given time. You are not penalized for having run more jobs yesterday, and not rewared either for not having used your share in the past. In computing the share that you use, HTCondor considers both the number of cores and the number of GB's of RAM that you are using. As of October 2016 the weigth is : (#cores + #GBytes)
Changed:
<
<
Beware thus of asking for much more memory that you need.
>
>
Beware thus of asking for much more memory per core that you need.
 
<!--/twistyPlugin-->

Revision 852018-05-16 - StefanoBelforte

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 463 to 463
 # DBS client returns a list of dictionaries, but we want a list of Logical File Names lfnList = [ dic['logical_file_name'] for dic in fileDictList ]
Changed:
<
<
# this now standard CRAB configuration
>
>
# this is now standard CRAB configuration
  from WMCore.Configuration import Configuration config = Configuration()
Line: 504 to 504
  </>
<!--/twistyPlugin-->
Added:
>
>

crab submit fails with "Block ...  contains more than 100000 lumis."

<!--/twistyPlugin twikiMakeVisibleInline-->
The message is self explaining. CRAB server will die due to lack of memory if it needs to process luminosity lists with millions of entries per block. There are two known cases where this can happen:
  • MC datasets which have been created with improper use of lumisections. MC lumi sections have no relation with luminosity but are used only to allow processing less than a file in one job via split by lumi algorithm, in this case it makes no sense to have more lumis than events.
  • nanoAOD or similar super-extra-high compact event formats where one year of data fits in a few files
Those dataset can only be processed if CRAB can ignore the lumi-list information, i.e. using `config.Data.splitting = 'FileBased' and avoiding any extra request which would eventually result in the need to use lumi information. This means no run range, no lumi mask, and no secondary dataset (since CRAB will need to use lum info to match input files from the two datasets). Note that useParent is allowed since in that case CRAB uses parentage information stored in DBS to match input files.

In practice your crabConfig file must have:

config.Data.splitting = 'FileBased'
config.Data.runRange = ''
config.Data.lumiMask  = ''

( the paremeters with an assigned null value `` can be omitted, but if present must indicate the null string )

and must NOT contain the following parameter

config.Data.secondaryInputDataset

<!--/twistyPlugin-->
 

CRAB fails to resubmit some jobs

<!--/twistyPlugin twikiMakeVisibleInline-->

Revision 842018-05-11 - LeonardoCristella

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 84 to 84
 
  1. A "main" stage, very similar to the conventional stage for other splitting modes, in which a number of main jobs (automatically determined by the probe stage) will process the dataset. These jobs can not be manually resubmitted and have a fixed maximum runtime (specified in the Data.unitsPerJob parameter), after which they gracefully stop processing input data. The remaining data will be processed in the next stage (tail) and accounted as "rescheduled" jobs in the main stage.
  2. Some possible "tail" stages. If some main job does not finish successfully or does not completely process the amount of data assigned to it due to the automatically configured maximum job run time, tail jobs are created and submitted in order to fully process the dataset. Tail jobs have a job id of the form n-[1,2,3,...], where n=1,2,... represents the tail stage number. For small tasks, less than 100 jobs, one tail stage is started when all jobs have completed (successfully or failed). For larger tasks, a first tail stage collects all remaining input data from the first 50% of completed jobs, followed by a stage that processes data when 80% of jobs have completed, and finally a stage collecting leftover input data at 100% job completion.
    Failed tail jobs can be manually resubmitted by the users.
Changed:
<
<
Once the probe stage is completed, the plain crab status command shows only the main and tail jobs. For the list of all jobs add the - long option.
>
>
Once the probe stage is completed, the plain crab status command shows only the main and tail jobs. For the list of all jobs add the --long option.
 
<!--/twistyPlugin-->

CRAB cache

Revision 832018-04-23 - LeonardoCristella

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 83 to 83
 
  1. A "probe" stage, where some probe jobs are submitted to estimate the event throughput of the CMSSW parameter-set configuration provided by the user in the JobType.psetName parameter and possible further arguments. Probe jobs have a job id of the form 0-[1,2,3,...], they can not be resubmitted and the task will fail if none of the probe jobs complete successfully.
  2. A "main" stage, very similar to the conventional stage for other splitting modes, in which a number of main jobs (automatically determined by the probe stage) will process the dataset. These jobs can not be manually resubmitted and have a fixed maximum runtime (specified in the Data.unitsPerJob parameter), after which they gracefully stop processing input data. The remaining data will be processed in the next stage (tail) and accounted as "rescheduled" jobs in the main stage.
  3. Some possible "tail" stages. If some main job does not finish successfully or does not completely process the amount of data assigned to it due to the automatically configured maximum job run time, tail jobs are created and submitted in order to fully process the dataset. Tail jobs have a job id of the form n-[1,2,3,...], where n=1,2,... represents the tail stage number. For small tasks, less than 100 jobs, one tail stage is started when all jobs have completed (successfully or failed). For larger tasks, a first tail stage collects all remaining input data from the first 50% of completed jobs, followed by a stage that processes data when 80% of jobs have completed, and finally a stage collecting leftover input data at 100% job completion.
    Failed tail jobs can be manually resubmitted by the users.
Added:
>
>
Once the probe stage is completed, the plain crab status command shows only the main and tail jobs. For the list of all jobs add the - long option.
 </>
<!--/twistyPlugin-->

CRAB cache

Line: 1088 to 1090
 

Changed:
<
<
-- AndresTanasijczuk - 23 Oct 2014
>
>
LeonardoCristella - 2018-04-23

Revision 822018-04-23 - LeonardoCristella

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 81 to 81
 The Data.splitting parameter has now a default value: 'Automatic'.
With such a setting the task processing is split into three stages:
  1. A "probe" stage, where some probe jobs are submitted to estimate the event throughput of the CMSSW parameter-set configuration provided by the user in the JobType.psetName parameter and possible further arguments. Probe jobs have a job id of the form 0-[1,2,3,...], they can not be resubmitted and the task will fail if none of the probe jobs complete successfully.
Changed:
<
<
  1. A "main" stage, very similar to the conventional stage for other splitting modes, in which a number of main jobs (automatically determined by the probe stage) will process the dataset. These jobs can not be manually resubmitted and have a fixed maximum runtime (specified in the Data.unitsPerJob parameter), after which they gracefully stop processing input data. The remaining data will be processed in the next stage (tail).
>
>
  1. A "main" stage, very similar to the conventional stage for other splitting modes, in which a number of main jobs (automatically determined by the probe stage) will process the dataset. These jobs can not be manually resubmitted and have a fixed maximum runtime (specified in the Data.unitsPerJob parameter), after which they gracefully stop processing input data. The remaining data will be processed in the next stage (tail) and accounted as "rescheduled" jobs in the main stage.
 
  1. Some possible "tail" stages. If some main job does not finish successfully or does not completely process the amount of data assigned to it due to the automatically configured maximum job run time, tail jobs are created and submitted in order to fully process the dataset. Tail jobs have a job id of the form n-[1,2,3,...], where n=1,2,... represents the tail stage number. For small tasks, less than 100 jobs, one tail stage is started when all jobs have completed (successfully or failed). For larger tasks, a first tail stage collects all remaining input data from the first 50% of completed jobs, followed by a stage that processes data when 80% of jobs have completed, and finally a stage collecting leftover input data at 100% job completion.
    Failed tail jobs can be manually resubmitted by the users.
</>
<!--/twistyPlugin-->

Revision 812018-02-26 - JoseHernandez

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 81 to 81
 The Data.splitting parameter has now a default value: 'Automatic'.
With such a setting the task processing is split into three stages:
  1. A "probe" stage, where some probe jobs are submitted to estimate the event throughput of the CMSSW parameter-set configuration provided by the user in the JobType.psetName parameter and possible further arguments. Probe jobs have a job id of the form 0-[1,2,3,...], they can not be resubmitted and the task will fail if none of the probe jobs complete successfully.
Changed:
<
<
  1. A "main" stage, very similar to the conventional stage for other splitting modes, in which a number of main jobs (automatically determined by the probe stage) will process the dataset. These jobs can not be resubmitted and have a fixed maximum runtime (specified in the Data.unitsPerJob parameter), after which they gracefully stop processing input data. The remaining data will be processed in the next stage (tail).
  2. Some possible "tail" stages. If some main job does not finish successfully or does not completely process the amount of data assigned to it due to the automatically configured maximum job run time, tail jobs are created and submitted in order to fully process the dataset. Tail jobs have a job id of the form n-[1,2,3,...], where n=1,2,... represents the tail stage number. Failed tail jobs can be manually resubmitted by the users.
>
>
  1. A "main" stage, very similar to the conventional stage for other splitting modes, in which a number of main jobs (automatically determined by the probe stage) will process the dataset. These jobs can not be manually resubmitted and have a fixed maximum runtime (specified in the Data.unitsPerJob parameter), after which they gracefully stop processing input data. The remaining data will be processed in the next stage (tail).
  2. Some possible "tail" stages. If some main job does not finish successfully or does not completely process the amount of data assigned to it due to the automatically configured maximum job run time, tail jobs are created and submitted in order to fully process the dataset. Tail jobs have a job id of the form n-[1,2,3,...], where n=1,2,... represents the tail stage number. For small tasks, less than 100 jobs, one tail stage is started when all jobs have completed (successfully or failed). For larger tasks, a first tail stage collects all remaining input data from the first 50% of completed jobs, followed by a stage that processes data when 80% of jobs have completed, and finally a stage collecting leftover input data at 100% job completion.
    Failed tail jobs can be manually resubmitted by the users.
 </>
<!--/twistyPlugin-->

CRAB cache

Revision 802018-02-21 - StefanoBelforte

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 78 to 78
 

What is the 'Automatic' splitting mode?

<!--/twistyPlugin twikiMakeVisibleInline-->
Changed:
<
<
The Data.splitting parameter has [https://hypernews.cern.ch/HyperNews/CMS/get/computing-tools/3578.html][now]] a default value: 'Automatic'.
>
>
The Data.splitting parameter has now a default value: 'Automatic'.
 With such a setting the task processing is split into three stages:
  1. A "probe" stage, where some probe jobs are submitted to estimate the event throughput of the CMSSW parameter-set configuration provided by the user in the JobType.psetName parameter and possible further arguments. Probe jobs have a job id of the form 0-[1,2,3,...], they can not be resubmitted and the task will fail if none of the probe jobs complete successfully.
  2. A "main" stage, very similar to the conventional stage for other splitting modes, in which a number of main jobs (automatically determined by the probe stage) will process the dataset. These jobs can not be resubmitted and have a fixed maximum runtime (specified in the Data.unitsPerJob parameter), after which they gracefully stop processing input data. The remaining data will be processed in the next stage (tail).

Revision 792018-02-21 - LeonardoCristella

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 78 to 78
 

What is the 'Automatic' splitting mode?

<!--/twistyPlugin twikiMakeVisibleInline-->
Changed:
<
<
The Data.splitting parameter has [][now]] a default value: 'Automatic'.
>
>
The Data.splitting parameter has [https://hypernews.cern.ch/HyperNews/CMS/get/computing-tools/3578.html][now]] a default value: 'Automatic'.
 With such a setting the task processing is split into three stages:
  1. A "probe" stage, where some probe jobs are submitted to estimate the event throughput of the CMSSW parameter-set configuration provided by the user in the JobType.psetName parameter and possible further arguments. Probe jobs have a job id of the form 0-[1,2,3,...], they can not be resubmitted and the task will fail if none of the probe jobs complete successfully.
  2. A "main" stage, very similar to the conventional stage for other splitting modes, in which a number of main jobs (automatically determined by the probe stage) will process the dataset. These jobs can not be resubmitted and have a fixed maximum runtime (specified in the Data.unitsPerJob parameter), after which they gracefully stop processing input data. The remaining data will be processed in the next stage (tail).

Revision 782018-02-21 - LeonardoCristella

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 75 to 75
 CRAB requests by default a maximum memory of 2000 MB. This is the maximum all sites guarantee they will run. Some sites, but not many, offer a bit more (typically 2500 MB); and some sites even offer 4000 MB for special users. Memory limits offered by each site are accessible in the GlideinWMS VO Factory Monitor page, http://glidein.grid.iu.edu/factory/monitor/ (choose "Current Status of the Factory" and click on a site CE listed under "Entry Name" in the table), but this should not be considered a documentation. The best advice we can give is: stick to the default, and if you think you need more, find out first if there are sites (and which ones) which can run jobs in that case. If you need help, you can write to us.
<!--/twistyPlugin-->
Added:
>
>

What is the 'Automatic' splitting mode?

<!--/twistyPlugin twikiMakeVisibleInline-->
The Data.splitting parameter has [][now]] a default value: 'Automatic'.
With such a setting the task processing is split into three stages:
  1. A "probe" stage, where some probe jobs are submitted to estimate the event throughput of the CMSSW parameter-set configuration provided by the user in the JobType.psetName parameter and possible further arguments. Probe jobs have a job id of the form 0-[1,2,3,...], they can not be resubmitted and the task will fail if none of the probe jobs complete successfully.
  2. A "main" stage, very similar to the conventional stage for other splitting modes, in which a number of main jobs (automatically determined by the probe stage) will process the dataset. These jobs can not be resubmitted and have a fixed maximum runtime (specified in the Data.unitsPerJob parameter), after which they gracefully stop processing input data. The remaining data will be processed in the next stage (tail).
  3. Some possible "tail" stages. If some main job does not finish successfully or does not completely process the amount of data assigned to it due to the automatically configured maximum job run time, tail jobs are created and submitted in order to fully process the dataset. Tail jobs have a job id of the form n-[1,2,3,...], where n=1,2,... represents the tail stage number. Failed tail jobs can be manually resubmitted by the users.
<!--/twistyPlugin-->
 

CRAB cache

User quota in the CRAB cache

Revision 772018-01-24 - LeonardoCristella

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 12 to 12
 pre.note {background-color: white;}
Changed:
<
<
CRAB Logo
>
>
CRAB Logo
 

CRAB3 Frequently Asked Questions

Line: 17 to 17
 

CRAB3 Frequently Asked Questions

Complete: 3 Go to SWGuideCrab
Added:
>
>
 Help Notice: This is a large page, it works best if you search in it for your problem using the browser search function.
By default all answers are collapsed and search only used the questions text. If you do not find what you need, you can use the buttons below to search inside answers as well:
Changed:
<
<
>
>
 
 
Changed:
<
<
Help if you still have problems, check the CRAB3Troubleshoot guide before asking for support
>
>
Help if you still have problems, check the CRAB3Troubleshoot guide before asking for support
 
Contents:
Line: 34 to 34
  Therefore you should remove credentials from myproxy and then issue the crab commad again. To remove stale credentials:
Changed:
<
<
>
>
 grep myproxy-info /crab.log
Changed:
<
<
# example: grep myproxy-info crab_20160308_140433/crab.log
>
>
# example: grep myproxy-info crab_20160308_140433/crab.log
 you will get something like
Changed:
<
<
 command : myproxy-info -l ec95456d3589ed395dc47d3ada8c94c67ee588f1 -s myproxy.cern.ch
>
>
 command : myproxy-info -l ec95456d3589ed395dc47d3ada8c94c67ee588f1 -s myproxy.cern.ch
  then simply issue a myproxy-destroy command with same arguments:
Changed:
<
<
>
>
 #exampe. In real life replace the long hex string with the one from your crab.log
Changed:
<
<
myproxy-destroy -l ec95456d3589ed395dc47d3ada8c94c67ee588f1 -s myproxy.cern.ch
>
>
myproxy-destroy -l ec95456d3589ed395dc47d3ada8c94c67ee588f1 -s myproxy.cern.ch
 
<!--/twistyPlugin-->
Line: 56 to 55
 

Does CRAB setup conflict with CMSSW setup

<!--/twistyPlugin twikiMakeVisibleInline-->
Changed:
<
<
No. CRAB client runs within the CMSSW environment.
>
>
No. CRAB client runs within the CMSSW environment.
 Make sure you always do cmsenv before source /cvmfs/cms.cern.ch/crab3/crab.sh
<!--/twistyPlugin-->
Line: 123 to 120
 One can use the crab purge command to delete from the CRAB cache files associated to a given task. Actually, crab purge deletes only user input sandboxes (because there is no API to delete other files), but since they are supposed to be the main space consumers in the CRAB cache, this should be enough. If for some reason the crab purge command does not work, one can alternatively use the REST interface of the crabcache component. Instructions oriented for CRAB3 operators can be found here. Jordan Tucker has written the following script based on these instructions that removes all the input sandboxes from the user CRAB cache area (a valid proxy and the CRAB environment are required):

Show Hide script
<!--/twistyPlugin twikiMakeVisibleInline-->
Changed:
<
<
>
>
 #!/usr/bin/env python

import json

Line: 200 to 196
  if '.log' in x: continue print 'remove', x
Changed:
<
<
h.fileremove(x)
>
>
h.fileremove(x)
 
<!--/twistyPlugin-->

note.gif Note: Once a task has been submitted, one can safely delete the input sandbox from the CRAB cache, as the sandbox is transferred to the worker nodes from the schedulers.

Line: 230 to 226
 With CRAB3 this should not be any different than with CRAB2. CRAB will look up for the user's username registered in SiteDB (which is the username of the CERN primary account) using for the query the user's DN (which in turn is extracted from the user's credentials) and will try to stage out to /store/user/<username>/ (by default). If the store user area uses a different username, itís up to the destination site to remap that (via a symbolic link or something similar). The typical case is Fermilab; to request the mapping of the store user area, FNAL users should follow the directions on the usingEOSatLPC web page to open a ServiceNow ticket to get this fixed.

To prevent stage out failures, and in case the user has provided in the Data.outLFN parameter of the CRAB configuration file an LFN directory path of the kind /store/user/[<some-username>/<subdir>*] (i.e. a store path that starts with /store/user/), CRAB will check if some-username matches with the user's username extracted from SiteDB. If it doesn't, it will give an error message and not submit the task. The error message would be something like this:

Changed:
<
<
>
>
 Error contacting the server. Server answered with: Invalid input parameter
Changed:
<
<
Reason is: The parameter Data.outLFN in the CRAB configuration file must start with either '/store/user//' or '/store/group//' (or '/store/local//' if publication is off), wher...
>
>
Reason is: The parameter Data.outLFN in the CRAB configuration file must start with either '/store/user//' or '/store/group//' (or '/store/local//' if publication is off), wher...
  Unfortunately the "Reason is:" message it cut at 200 characters. The message should read:
Changed:
<
<
Reason is: The parameter Data.outLFN in the CRAB configuration file must start with either '/store/user/<username>/' or '/store/group/<groupname>/' (or '/store/local/<something>/' if publication is off), where username is your username as registered in SiteDB (i.e. the username of your CERN primary account).
>
>
Reason is: The parameter Data.outLFN in the CRAB configuration file must start with either '/store/user/<username>/' or '/store/group/<groupname>/' (or '/store/local/<something>/' if publication is off), where username is your username as registered in SiteDB (i.e. the username of your CERN primary account).
  A similar message should be given by crab checkwrite if the user does crab checkwrite --site=<CMS-site-name> --lfn=/store/user/<some-username>. </>
<!--/twistyPlugin-->
Line: 247 to 245
 
<!--/twistyPlugin twikiMakeVisibleInline-->
First of all, does CRAB know at all that the job should produce the output file in question? To check that, open one of the job log files linked from the task monitoring pages. Very close to the top it is printed the list of output files that CRAB expects to see once the job finishes (shown below is the case of job number 1 in the task):
Changed:
<
<
>
>
 ==== HTCONDOR JOB SUMMARY at ... START ==== CRAB ID: 1 Execution site: ... Current hostname: ... Destination site: ...
Changed:
<
<
Output files: my_output_file.root=my_output_file_1.root
>
>
Output files: my_output_file.root=my_output_file_1.root
  If the output file in question doesn't appear in that list, then CRAB doesn't know about it, and of course it will not be transferred. This doesn't mean that the output file was not produced; it is simply that CRAB has to know beforehand what are the output files that the job produces.

If the output file is produced by either PoolOutputModule or TFileService, CRAB will automatically recognize the name of the output file when the user submits the task and it will add the output file name to the list of expected output files. On the other hand, if the output file is produced by any other module, the user has to specify the output file name in the CRAB configuration parameter JobType.outputFiles in order for CRAB to know about it. Note that this parameter takes a python list, so the right way to specify it is:

Deleted:
<
<
config.JobType.outputFiles = ['my_output_file.root']
 
Added:
>
>
config.JobType.outputFiles = ['my_output_file.root']
 
<!--/twistyPlugin-->

Can I delete a dataset I published in DBS?

Line: 285 to 286
 crab checkusername uses the following sequence of bash commands, which you should try to execute one by one (make sure you have a valid proxy) to check if they return what is expected.

1) It gets the path to the users proxy file with the command

Changed:
<
<
which scram >/dev/null 2>&1 && eval `scram unsetenv -sh`; voms-proxy-info -path
>
>
which scram >/dev/null 2>&1 && eval `scram unsetenv -sh`; voms-proxy-info -path
  which should return something like
Changed:
<
<
/tmp/x509up_u57506
>
>
/tmp/x509up_u57506
  2) It defines the path to the CA certificates directory with the following python command
Changed:
<
<
>
>
 import os capath = os.environ['X509_CERT_DIR'] if 'X509_CERT_DIR' in os.environ else "/etc/grid-security/certificates"
Changed:
<
<
print capath
>
>
print capath
  which should be equivalent to the following bash command
Changed:
<
<
>
>
 if [ "x$X509_CERT_DIR" = "x" ]; then capath=$X509_CERT_DIR; else capath=/etc/grid-security/certificates; fi
Changed:
<
<
echo $capath
>
>
echo $capath
  and which in lxplus should result in
Changed:
<
<
/etc/grid-security/certificates
>
>
/etc/grid-security/certificates
  3) It uses the proxy file and the capath to query https://cmsweb.cern.ch/sitedb/data/prod/whoami
Changed:
<
<
curl -s --capath <output-from-command-2-above> --cert <output-from-command-1-above> --key <output-from-command-1-above> 'https://cmsweb.cern.ch/sitedb/data/prod/whoami'
>
>
curl -s --capath <output-from-command-2-above> --cert <output-from-command-1-above> --key <output-from-command-1-above> 'https://cmsweb.cern.ch/sitedb/data/prod/whoami'
  which should return something like
Changed:
<
<
>
>
 {"result": [ {"dn": "/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=atanasi/CN=710186/CN=Andres Jorge Tanasijczuk", "login": "atanasi", "method": "X509Proxy", "roles": {"operator": {"group": ["crab3"], "site": []}}, "name": "Andres Jorge Tanasijczuk"}
Changed:
<
<
]}
>
>
]}
  4) Finally it parses the output from the above query to extract the username from the "login" field (in my case it is atanasi).
Line: 328 to 345
 
<!--/twistyPlugin twikiMakeVisibleInline-->
When doing crab submit the user may get this error messages:
Changed:
<
<
>
>
 Error contacting the server. Server answered with: Invalid input parameter
Changed:
<
<
Reason is: User quota limit reached; cannot upload the file
>
>
Reason is: User quota limit reached; cannot upload the file
  Error explanation: The user has reached the limit of 4.88GB in its CRAB cache area. Read more in this FAQ.
Line: 343 to 362
 
<!--/twistyPlugin twikiMakeVisibleInline-->
Typical error in crab status:
Changed:
<
<
>
>
 Failure message: The CRAB server backend was not able to (re)submit the task, because the Grid scheduler answered with an error. This is probably a temporary glitch. Please try again later. If the error persists send an e-mail to hn-cms-computing-tools@cern.ch<mailto:hn-cms-computing-tools@cern.ch>. Error reason: Trapped exception in Dagman.Fork: Unable to edit jobs matching constraint File "/data/srv/TaskManager/3.3.1512.rc6/slc6_amd64_gcc481/cms/crabtaskworker/3.3.1512.rc6/lib/python2.6/site-packages/TaskWorker/Actions/DagmanResubmitter.py", line 113, in executeInternal
Changed:
<
<
schedd.edit(rootConst, "HoldKillSig", 'SIGKILL')
>
>
schedd.edit(rootConst, "HoldKillSig", 'SIGKILL')
  As the error message says, this should be a temporary failure. One should just keep trying until it works. But after doing crab resubmit, give it some time to process the resubmission request; it may take a couple of minutes to see the jobs reacting to the resubmission.
<!--/twistyPlugin-->
Line: 355 to 376
 
<!--/twistyPlugin twikiMakeVisibleInline-->
After doing crab submit and crab status the user may get this error message:
Changed:
<
<
>
>
 Task status: UNKNOWN
Changed:
<
<
Error during task injection: Task failed to bootstrap on schedd
>
>
Error during task injection: Task failed to bootstrap on schedd
  Error explanation: The submission of the task to the scheduler machine has failed.
Line: 369 to 393
 
<!--/twistyPlugin twikiMakeVisibleInline-->
When doing crab status the user may get one of these error messages:
Changed:
<
<
>
>
 Error during task injection: <task-name>: Failed to contact Schedd: Failed to fetch ads from schedd.
Added:
>
>
 
Changed:
<
<
Error during task information retrieval: <task-name>: Failed to contact Schedd: . Error explanation: This is a temporary communication error with the scheduler machine (submission node), most probably because the scheduler is overloaded.
>
>
Error during task information retrieval:        <task-name>: Failed to contact Schedd: .

Error explanation: This is a temporary communication error with the scheduler machine (submission node), most probably because the scheduler is overloaded.

  What to do: Try again after a couple of minutes.
<!--/twistyPlugin-->
Line: 402 to 433
 Some more discussion is in this thread: https://hypernews.cern.ch/HyperNews/CMS/get/computing-tools/2928.html

There are a few datasets in DBS which do no satisfy this limit, if someone really needs to process those, the only way is to do one job per file using the userInputFiles feature of CRAB. An annotated example of how to do this in python is below, note that you have to disable DBS publication, indicate split by file and provide input file locations, other configuaration parameters can be set as usual:

Changed:
<
<
>
>
  # this will use CRAB client API from RawCommand import crabCommand
Line: 455 to 487
  result = crabCommand('submit', config = config)
Changed:
<
<
print (result)
>
>
print (result)
 
<!--/twistyPlugin-->
Line: 465 to 498
 It is important that you as a user are prepared for this to happen and know how to remain productive in your physics analysis with the least effort. While there is a long tradition of "resubmit them until they work", this is hardly useful any more. And while we can't prevent users from trying we can not guarantee that it will work, nor that resubmitted jobs will succeed.
Changed:
<
<
In the case where the missing data sample is important, the best recommendation we can give to users is to
USE GENERAL/GENERIC RESCUE PROCEDURES, RATHER THEN TRY AT ALL COSTS TO REVIVE A DEAD TASK.
>
>
In the case where the missing data sample is important, the best recommendation we can give to users is to
USE GENERAL/GENERIC RESCUE PROCEDURES, RATHER THEN TRY AT ALL COSTS TO REVIVE A DEAD TASK.
 We will always welcome problem reports and will try to improve when resubmission failures can be due to CRAB internals, but surely you do not want to hold your breath in the meanwhile.

The safest path is therefore:

Changed:
<
<
  1. let running jobs die or complete and dust settle
  2. use crab klill to make sure everything stops
  3. take stock of what's published in DBS at that point and make sure that it matches what's on disk
>
>
  1. let running jobs die or complete and dust settle
  2. use crab klill to make sure everything stops
  3. take stock of what's published in DBS at that point and make sure that it matches what's on disk
 
    • if your output is not in DBS, you can use crab report, but while DBS information is available forever, crab commands on a specific task may not
Changed:
<
<
  1. assess whether it is more important to get the last percentage of statistics or go on with other work. Do you really need 100% completion in this task ?
  2. if full statistics is needed, create a recovery task for the missing lumis and run it writing to the same dataset as the original task.
>
>
  1. assess whether it is more important to get the last percentage of statistics or go on with other work. Do you really need 100% completion in this task ?
  2. if full statistics is needed, create a recovery task for the missing lumis and run it writing to the same dataset as the original task.
 
Changed:
<
<
Recovery task is an important concept that can be useful in many circumstances. Please find instructions in this FAQ
>
>
Recovery task is an important concept that can be useful in many circumstances. Please find instructions in this FAQ
  At the other extreme there's: forget about this, and resubmit a new task with new output dataset. In between it is a murky land where many recipes may be more efficient according to details, but no general simple rule can be given and there's space for individual creativity and/or desperation.
Line: 487 to 521
 
<!--/twistyPlugin twikiMakeVisibleInline-->
When doing crab submit the user may get one of these error messages:
Changed:
<
<
>
>
 Syntax error in CRAB configuration: invalid syntax (<CRAB-configuration-file-name>.py, <line-where-error-occurred>)
Added:
>
>
 
Added:
>
>
 Syntax error in CRAB configuration:
Changed:
<
<
'Configuration' object has no attribute '<attribute-name>' Error explanation: The CRAB configuration file could not be loaded, because there is a syntax error somewhere in it.
>
>
'Configuration' object has no attribute '<attribute-name>'

Error explanation: The CRAB configuration file could not be loaded, because there is a syntax error somewhere in it.

  What to do: Check the CRAB configuration file and fix it. There could be a misspelled parameter or section name, or you could be trying to use a configuration attribute (parameter or section) that was not defined. To get more details on where the error occurred, do:
Changed:
<
<
>
>
 python
Changed:
<
<
import <CRAB-configuration-file-name> #without the '.py'
>
>
import <CRAB-configuration-file-name> #without the '.py'
  which gives:
Changed:
<
<
>
>
 Traceback (most recent call last): File "", line 1, in File "<CRAB-configuration-file-name>.py", <line-where-error-occurred> <error-python-code>
Changed:
<
<
^
>
>
^
  or
Changed:
<
<
>
>
 Traceback (most recent call last): File "", line 1, in File "<CRAB-configuration-file-name>.py", <line-where-error-occurred>, in <error-python-code>
Changed:
<
<
AttributeError: 'Configuration' object has no attribute '<attribute-name>'
>
>
AttributeError: 'Configuration' object has no attribute '<attribute-name>'
  For more information about the CRAB configuration file, see CRAB3ConfigurationFile.
<!--/twistyPlugin-->
Line: 537 to 582
 You should inspect the stdout of one job to find the exception message and traceback which may guide you to the solution.

A particular case is when the exception says An exception of category 'DictionaryNotFound' occurred, like in this example:

Changed:
<
<
>
>
 
Begin Fatal Exception 08-Jun-2017 18:18:04 CEST----------------------- An exception of category 'DictionaryNotFound' occurred while [0] Constructing the EventProcessor Exception Message: No Dictionary for class: 'edm::Wrapper<edm::DetSetVector >'
Changed:
<
<

End Fatal Exception -------------------------------------------------
>
>

End Fatal Exception -------------------------------------------------
  in this case, most likely the input data have been produced with a CMSSW version not compatible with the one used in CRAB job. In general it's not supported reading data with a release older than what it was produced with.

To find out which release was used to produce a given dataser of file, adapt following examples to your situation:

Changed:
<
<
>
>
 belforte@lxplus045/~> dasgoclient --query "release dataset=/DoubleMuon/Run2016C-18Apr2017-v1/AOD" ["CMSSW_8_0_28"] belforte@lxplus045/~>
Added:
>
>
 
Added:
>
>
 belforte@lxplus045/~> dasgoclient --query "release file=/store/data/Run2016C/DoubleMuon/AOD/18Apr2017-v1/100001/56D1FA6E-D334-E711-9967-0025905A48B2.root" ["CMSSW_8_0_28"]
Changed:
<
<
belforte@lxplus045/~>
>
>
belforte@lxplus045/~>
 
<!--/twistyPlugin-->

Exit code 8028

Line: 572 to 622
  Leaving the AAA error aside for a while, the first thing to contemplate here is to understand why the file could not be loaded from the local storage. Was it because the file is not available at the execution site? And if so, was it supposed to be available? If not, can you force CRAB to submit the jobs to the site(s) where the file is hosted? CRAB should do that automatically if the input file is one from the input dataset specified in the CRAB configuration parameter Data.inputDataset, unless you have set Data.ignoreLocality = True, or except in cases like using a (secondary) pile-up dataset. If yours is the last case, please read Using pile-up in this same twiki.
Changed:
<
<
If you intentionally wanted (and had a good reason) to run jobs reading the input files via AAA, then yes, we have to care about why AAA failed. The first thing then is to check that the site where the input file is hosted supports access to its storage via AAA, which is true for the majority of the sites, but not all. If you want to know if your file is available through AAA you can check this link, and in particular you can do cmsenv and then use this command xrdfs cms-xrd-global.cern.ch locate /store/path/to/file. If AAA is supported, the next thing is to discard a transient problem: you can submit your jobs again and see if the error persists. Ultimately, you should write to the Computing Tools HyperNews forum (this is a forum for all kind of issues with CMS computing tools, not only CRAB).
>
>
If you intentionally wanted (and had a good reason) to run jobs reading the input files via AAA, then yes, we have to care about why AAA failed. The first thing then is to check that the site where the input file is hosted supports access to its storage via AAA, which is true for the majority of the sites, but not all. If you want to know if your file is available through AAA you can check this link, and in particular you can do cmsenv and then use this command xrdfs cms-xrd-global.cern.ch locate /store/path/to/file. If AAA is supported, the next thing is to discard a transient problem: you can submit your jobs again and see if the error persists. Ultimately, you should write to the Computing Tools HyperNews forum (this is a forum for all kind of issues with CMS computing tools, not only CRAB).
 </>
<!--/twistyPlugin-->

Illegal parameter found in configuration. The parameter is named: 'numberEventsInLuminosityBlock'

Line: 580 to 630
 
<!--/twistyPlugin twikiMakeVisibleInline-->
The most common reasons for this error are:
  1. The user is trying to analyze an input dataset, but he/she has specified in the CRAB configuration file JobType.pluginName = 'PrivateMC' instead of JobType.pluginName = 'Analysis'.
Changed:
<
<
  1. The user is generating MC events, correctly specifying in the CRAB configuration file JobType.pluginName = 'PrivateMC', but in the CMSSW parameter set configuration he/she has specified a source of type PoolSource. The solution is to not specify a PoolSource. Note: This doesn't mean to remove process.source completely, as this attribute must be present. One could set process.source = cms.Source("EmptySource") if no input source is used.
>
>
  1. The user is generating MC events, correctly specifying in the CRAB configuration file JobType.pluginName = 'PrivateMC', but in the CMSSW parameter set configuration he/she has specified a source of type PoolSource. The solution is to not specify a PoolSource. Note: This doesn't mean to remove process.source completely, as this attribute must be present. One could set process.source = cms.Source("EmptySource") if no input source is used.
 
<!--/twistyPlugin-->

CRAB Client API

Line: 591 to 641
 The general problem is that CMSSW parameter-set configurations don't like to be loaded twice. In that respect, each time the CRAB client loads a CMSSW configuration, it saves it in a local (temporary) cache identifying the loaded module with a key constructed out of the following three pieces: the full path to the module and the python variables sys.path and sys.argv.

A problem arises when the CRAB configuration parameter JobType.pyCfgParams is used. The arguments in JobType.pyCfgParams are added by CRAB to sys.argv, affecting the value of the key that identifies a CMSSW parameter-set in the above mentioned cache. And that's in principle fine, as changing the arguments passed to the CMSSW parameter-set may change the event processor. But when a python process has to do more than one submission (like the case of multicrab for multiple submissions), the CMSSW parameter-set is loaded again every time the JobType.pyCfgParams is changed and this may result in "duplicate process" errors. Below are two examples of these kind of errors:

Changed:
<
<
>
>
 CmsRunFailure CMSSW error message follows. Fatal Exception An exception of category 'Configuration' occurred while
Changed:
<
<
[0] Constructing the EventProcessor [1] Constructing module: class=...... label=......
>
>
† †[0] Constructing the EventProcessor † †[1] Constructing module: class=...... label=......
 Exception Message: Duplicate Process The process name ...... was previously used on these products. Please modify the configuration file to use a distinct process name.
Added:
>
>
 
Added:
>
>
 CmsRunFailure CMSSW error message follows. Fatal Exception
Line: 612 to 665
 in vString categories duplication of the string ...... The above are from MessageLogger configuration validation. In most cases, these involve lines that the logger configuration code
Changed:
<
<
would not process, but which the cfg creator obviously meant to have effect.
>
>
would not process, but which the cfg creator obviously meant to have effect.
 One option would be to try to not use JobType.pyCfgParams. But if this is not possible, the more general ad-hoc solution would be to fork the submission into a different python process. For example, if you are doing something like documented in Multicrab using the crabCommand API then we suggest to replace each
Changed:
<
<
submit(config)
>
>
submit(config)
  by
Changed:
<
<
>
>
 from multiprocessing import Process p = Process(target=submit, args=(config,)) p.start()
Changed:
<
<
p.join()
>
>
p.join()
  (Of course, from multiprocessing import Process needs to be executed only once, so put it outside any loop.) </>
<!--/twistyPlugin-->
Deleted:
<
<

Multiple submission produces different PSetDump.py files

 
Added:
>
>

Multiple submission produces different PSetDump.py files

 
<!--/twistyPlugin twikiMakeVisibleInline-->
If the PSetDump.py file (found in task_directory/inputs) differs for the tasks from a multiple-submission python file, try forking the submission into different python processes, as recommended in the previous FAQ.
<!--/twistyPlugin-->
Line: 652 to 708
 It is impossible to guarantee that a given task will always complete to 100% success in a short amount of time. At the same time it is impossible to make sure that all desired input data is available when the task is submitted. Moreover both good sense and experience show that the larger a task is, the larger is the chance it hits some problem. Large workflows therefore benefit from the possibility to run them sort of iteratively, with a short (hopefully one or two at most) succession of smaller and smaller tasks.
Deleted:
<
<

Recovery task: When

 
Added:
>
>

Recovery task: When

 A partial list of real life events where a recovery task is user's fastest and simplest way to get work done:
  • Something went wrong in the global infrastructure and some jobs are lost beyond recovery
  • Something went wrong inside CRAB (bugs, hardware...) which can't be fixed by crab resubmit command
Line: 677 to 734
 
  1. submit a new CRAB tasks B which process the missing lumis (listB = listIn - listA)

Details are slighly different if you published output in DBS or not:

Changed:
<
<
output in DBS
follow the procedure in this FAQ
output not in DBS
follow the procedure in this Workbook example
>
>
output in DBS
follow the procedure in this FAQ
output not in DBS
follow the procedure in this Workbook example
  </>
<!--/twistyPlugin-->
Line: 688 to 745
 While data taking is progressing, corresponding datasets in DBS and lumi-mask files are growing. Also data quality is sometimes improved for already existing data, leading to updated lumi-masks which compared to older lumi-masks include luminosity sections that were previously filtered out. Both of these situations lead to the common case where one would like to run a task (lets call it task B) over an input dataset partially analyzed already in a previous task (lets call it task A), where task B should skip the data already analyzed in task A.

This can be accomplished with a few lines in the CRAB configuration file, see an annotated example below.

Changed:
<
<
>
>
 from UserUtilities import config, getLumiListInValidFiles from LumiList import LumiList
Line: 721 to 779
 # and there we, process from input dataset all the lumi listed in the current officialLumiMask file, skipping the ones you already have. config.Data.lumiMask = 'my_lumi_mask.json' config.Data.outputDatasetTag = <TaskA-outputDatasetTag> # add to your existing dataset
Changed:
<
<
...
>
>
...
  IMPORTANT NOTE : in this way you will add any lumi section in the intial data set that was turned from bad to good in the golden list after you ran Task-A, but if some of those data evolved the other way around (from good to bad), there is no way to remove those from your published datasets.
Line: 763 to 823
 

Using pile-up

<!--/twistyPlugin twikiMakeVisibleInline-->
Changed:
<
<
Important Instructions:
Make sure you run your jobs at the site where the pile-up sample is. Not where the signal is.
>
>
Important Instructions:
Make sure you run your jobs at the site where the pile-up sample is. Not where the signal is.
 This requires you to overrdie the location list that CRAB would extract from the inputDataset.
Changed:
<
<
Rational and details:
The pile-up files have be specified in the CMSSW parameter-set configuration file. There is no way yet to tell in the CRAB configuration file that one wants to use a pile-up dataset as a secondary input dataset. That means that CRAB doesn't know that the CMSSW code will want to access pile-up files; CRAB only knows about the primary input dataset (if any). This means that, assuming there is a primary input dataset, when CRAB does data discovery to figure out to which sites should it submit the jobs, it will only take into account the input dataset specified in the CRAB configuration file (in the Data.inputDataset parameter) and submit the jobs to sites where this dataset is hosted. If there is no primary input dataset, CRAB will submit the jobs to the less busy sites. In any case, if the pile-up files are not hosted in the execution sites, they will be accessed via AAA (Xrootd). But reading the "signal" events directly from the local storage and the pile-up events via AAA is more inefficient than doing the other way around, since for each "signal" event that is read one needs to read in general many (> 20) pile-up events. Therefore, it is highly recommended that the user forces CRAB to submit the jobs to the sites where the pile-up dataset is hosted by whitelisting these sites using the parameter Site.whitelist in the CRAB configuration file. Note that one also needs to set Data.ignoreLocality = True in the CRAB configuration file in case of using a primary input dataset so to avoid CRAB doing data discovery and eventually complain (and fail to submit) that the input dataset is not available in the whitelisted sites. One can use DAS to get the list of sites that host a dataset.
>
>
Rational and details:
The pile-up files have be specified in the CMSSW parameter-set configuration file. There is no way yet to tell in the CRAB configuration file that one wants to use a pile-up dataset as a secondary input dataset. That means that CRAB doesn't know that the CMSSW code will want to access pile-up files; CRAB only knows about the primary input dataset (if any). This means that, assuming there is a primary input dataset, when CRAB does data discovery to figure out to which sites should it submit the jobs, it will only take into account the input dataset specified in the CRAB configuration file (in the Data.inputDataset parameter) and submit the jobs to sites where this dataset is hosted. If there is no primary input dataset, CRAB will submit the jobs to the less busy sites. In any case, if the pile-up files are not hosted in the execution sites, they will be accessed via AAA (Xrootd). But reading the "signal" events directly from the local storage and the pile-up events via AAA is more inefficient than doing the other way around, since for each "signal" event that is read one needs to read in general many (> 20) pile-up events. Therefore, it is highly recommended that the user forces CRAB to submit the jobs to the sites where the pile-up dataset is hosted by whitelisting these sites using the parameter Site.whitelist in the CRAB configuration file. Note that one also needs to set Data.ignoreLocality = True in the CRAB configuration file in case of using a primary input dataset so to avoid CRAB doing data discovery and eventually complain (and fail to submit) that the input dataset is not available in the whitelisted sites. One can use DAS to get the list of sites that host a dataset.
 
<!--/twistyPlugin-->
Added:
>
>
 

Miscellanea

How CRAB finds data in input datasets from DBS

Line: 809 to 870
 LFNs are names like /store/user/mario/myoutput; note that a directory is also a file name.

For example, for the LFN /store/user/username/myfile.root stored in T2_IT_Pisa you can do the following (make sure you did cmsenv before, so to use a new version of curl), where you can replace the first two lines with the values which are useful to you and simply copy/paste the long curl command:

Changed:
<
<
>
>
 site=T2_IT_Pisa lfn=/store/user/username/myfile.root
Changed:
<
<
curl -ks "https://cmsweb.cern.ch/phedex/datasvc/perl/prod/lfn2pfn?node=${site}&lfn=${lfn}&protocol=srmv2" | grep PFN | cut -d "'" -f4
>
>
curl -ks "https://cmsweb.cern.ch/phedex/datasvc/perl/prod/lfn2pfn?node=${site}&lfn=${lfn}&protocol=srmv2" | grep PFN | cut -d "'" -f4
  which returns:
Changed:
<
<
srm://stormfe1.pi.infn.it:8444/srm/managerv2?SFN=/cms/store/user/username/myfile.root
>
>
srm://stormfe1.pi.infn.it:8444/srm/managerv2?SFN=/cms/store/user/username/myfile.root
 
<!--
To see full details, you call the PhEDEx API passing the following query data: {'protocol': 'srmv2', 'node': '<CMS-node-name>', 'lfn': '<logical-path-name>'}. For example, for the LFN /store/user/username/myfile.root stored in T2_IT_Pisa, the URL to query the API would be https://cmsweb.cern.ch/phedex/datasvc/xml/prod/lfn2pfn?protocol=srmv2&node=T2_IT_Pisa&lfn=/store/user/username/myfile.root, which would show an output in xml format as this:
Line: 833 to 898
 
-->

Before executing the gfal commands, make sure to have a valid proxy:

Changed:
<
<
voms-proxy-init -voms cms
>
>
voms-proxy-init -voms cms
 Enter GRID pass phrase for this identity: Contacting voms2.cern.ch:15002 [/DC=ch/DC=cern/OU=computers/CN=voms2.cern.ch] "cms"... Remote VOMS server contacted succesfully.
Line: 842 to 911
  Created proxy in /tmp/x509up_u<user-id>.
Changed:
<
<
Your proxy is valid until
>
>
Your proxy is valid until
  The most useful gfal commands and their usage syntax for listing/removing/copying files/directories are in the examples below (it is recommended to unset the environment when executing gfal commands, i.e. to add env -i in front of the commands). See also the man entry for each command (man gfal-ls etc.):

List a (remote) path:

Changed:
<
<
env -i X509_USER_PROXY=/tmp/x509up_u$UID gfal-ls <physical-path-name-to-directory>
>
>
env -i X509_USER_PROXY=/tmp/x509up_u$UID gfal-ls <physical-path-name-to-directory>
  Remove a (remote) file:
Changed:
<
<
env -i X509_USER_PROXY=/tmp/x509up_u$UID gfal-rm <physical-path-name-to-file>
>
>
env -i X509_USER_PROXY=/tmp/x509up_u$UID gfal-rm <physical-path-name-to-file>
  Recursively remove a (remote) directory and all files in it:
Changed:
<
<
env -i X509_USER_PROXY=/tmp/x509up_u$UID gfal-rm -r <physical-path-name-to-directory>
>
>
env -i X509_USER_PROXY=/tmp/x509up_u$UID gfal-rm -r <physical-path-name-to-directory>
  Copy a (remote) file to a directory in the local machine:
Deleted:
<
<
env -i X509_USER_PROXY=/tmp/x509up_u$UID gfal-copy <physical-path-name-to-source-file> file://<absolute-path-to-local-destination-directory>
 
Changed:
<
<
Note: the starts with / therefore there are three consecutive / characters like file:///tmp/somefilename.root
>
>
env -i X509_USER_PROXY=/tmp/x509up_u$UID gfal-copy <physical-path-name-to-source-file> file://<absolute-path-to-local-destination-directory>
Note: the <absolute-path-to-local-destination-directory> starts with / therefore there are three consecutive / characters like file:///tmp/somefilename.root
 </>
<!--/twistyPlugin-->

Why are my jobs submitted to a site that I had explicitly blacklisted (not whitelisted)?

Line: 870 to 947
 
<!--/twistyPlugin twikiMakeVisibleInline-->
There is a site overflow mechanism in place, which takes place after CRAB submission. Sites are divided in regions of good WAN/xrootd connectivity (e.g. US, Italy, Germany etc.), then jobs queued at one site A for too long are allowed to overflow to a well connected site B which does not host the requested input data but from where data will be read over xrootd. Rationale is that even if those jobs were to fail due to unable to read data or a problem in site B, they will be automatically resubmitted, so nothing is lost with respect to keeping those jobs idle in the queue waiting for free slots at site A. The site overflow can be turned off via the Debug.extraJDL CRAB configuration parameter :
Changed:
<
<
>
>
 config.section_("Debug")
Changed:
<
<
config.Debug.extraJDL = ['+CMS_ALLOW_OVERFLOW=False']
>
>
config.Debug.extraJDL = ['+CMS_ALLOW_OVERFLOW=False']
  Note: if you change this configuration option for an already-created task (for instance if you noticed a lot of job failures at a particular site and even after blacklisting the jobs keep going back), you can't simply change the option in the configuration and resubmit. You'll have to kill the existing task and make a new task to get the option to be accepted. You can't simply change it during resubmission.
<!--/twistyPlugin-->
Line: 887 to 965
 There is a tool written in python called LumiList.py (available in the WMCore library; is the same code as cmssw/FWCore/PythonUtilities/python/LumiList.py ) that can be used to do lumi-mask arithmetics. The arithmetics can even be done inside the CRAB configuration file (that's the advantage of having the configuration file written in python). Below are some examples.

Example 1: A run range selection can be achieved by selecting from the original lumi-mask file the run range of interest.

Changed:
<
<
>
>
 from LumiList import LumiList

lumiList = LumiList(filename='my_original_lumi_mask.json') lumiList.selectRuns([x for x in range(193093,193999+1)]) lumiList.writeJSON('my_lumi_mask.json')

Changed:
<
<
config.Data.lumiMask = 'my_lumi_mask.json'
>
>
config.Data.lumiMask = 'my_lumi_mask.json'
  Example 2: Use a new lumi-mask file that is the intersection of two other lumi-mask files.
Changed:
<
<
>
>
 from LumiList import LumiList

originalLumiList1 = LumiList(filename='my_original_lumi_mask_1.json')

Line: 905 to 986
 newLumiList = originalLumiList1 & originalLumiList2 newLumiList.writeJSON('my_lumi_mask.json')
Changed:
<
<
config.Data.lumiMask = 'my_lumi_mask.json'
>
>
config.Data.lumiMask = 'my_lumi_mask.json'
  Example 3: Use a new lumi-mask file that is the union of two other lumi-mask files.
Changed:
<
<
>
>
 from LumiList import LumiList

originalLumiList1 = LumiList(filename='my_original_lumi_mask_1.json')

Line: 916 to 999
 newLumiList = originalLumiList1 | originalLumiList2 newLumiList.writeJSON('my_lumi_mask.json')
Changed:
<
<
config.Data.lumiMask = 'my_lumi_mask.json'
>
>
config.Data.lumiMask = 'my_lumi_mask.json'
  Example 4: Use a new lumi-mask file that is the subtraction of two other lumi-mask files.
Changed:
<
<
>
>
 from LumiList import LumiList

originalLumiList1 = LumiList(filename='my_original_lumi_mask_1.json')

Line: 927 to 1012
 newLumiList = originalLumiList1 - originalLumiList2 newLumiList.writeJSON('my_lumi_mask.json')
Changed:
<
<
config.Data.lumiMask = 'my_lumi_mask.json'
>
>
config.Data.lumiMask = 'my_lumi_mask.json'
 </>
<!--/twistyPlugin-->

User quota in the CRAB scheduler machines

<!--/twistyPlugin twikiMakeVisibleInline-->
Each user has a home directory with 100GB of disk space in each of the scheduler machines (schedd for short) assigned to CRAB3 for submitting jobs to the Grid. Whenever a task is submitted by the CRAB server to a schedd, a task directory is created in this space containing among other things CRAB libraries and scripts needed to run the jobs. Log files from Condor/DAGMan and CRAB itself are also placed there. (What is not available in the schedds are the cmsRun log files, except for the snippet available in the CRAB job log file.) As a guidance, a task with 100 jobs uses on average 50MB of space, but this number depends a lot on the number of resubmissions, since each resubmission produces its log files. If a user reaches his/her quota in a given schedd, he/she will not be able to submit more jobs via that schedd (he/she may still be able to submit via other schedd, but since the user can not choose the schedd to which to submit -the choice is done by the CRAB server-, he/she would have to keep trying the submission until the task goes to a schedd with non-exahusted quota). To avoid that, task directories are automatically removed from the schedds after 30 days of their last modification. If a user reaches 50% of its quota in a given schedd, an automatic e-mail similar to the one shown below is sent to him/her.
Changed:
<
<
>
>
 Subject: WARNING: Reaching your quota

Dear analysis user ,

Line: 948 to 1034
  https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideCrabFaq#Disk_space_for_output_files If you have any questions, please contact hn-cms-computing-tools(AT)cern.ch Regards,
Changed:
<
<
CRAB support
>
>
CRAB support
  This e-mail has a link (https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideCrabFaq#How_to_clean_up_your_directory_i) to the instructions on how to clean up space in the user's home directory in a schedd. A user can follow the instructions in that page, or alternatively use the crab purge command.
<!--/twistyPlugin-->
Line: 959 to 1046
 
<!--/twistyPlugin twikiMakeVisibleInline-->
To overcome the CRAB3 vs CMSSW environment conflicts, you can use the following script available in CVMFS (/cvmfs/cms.cern.ch/crab3/crab-env-bootstrap.sh) without need to source the CRAB3 environment. You could do something like this:
Changed:
<
<
>
>
 cmsenv # DO NOT setup the CRAB3 environment alias crab='/cvmfs/cms.cern.ch/crab3/crab-env-bootstrap.sh' crab submit crab status ...
Changed:
<
<
# check that you can run cmsRun locally
>
>
# check that you can run cmsRun locally
  Details:
Line: 979 to 1068
 
<!--/twistyPlugin twikiMakeVisibleInline-->
The problematic pset is FWCore/ParameterSet/Mixins.py from CMSSW:
Deleted:
<
<
[line 714] p = tLPTest("MyType",** { "a"+str(x): tLPTestType(x) for x in xrange(0,300) } )
 
Changed:
<
<
This uses dictionary comprehensions, a feature available in python > 2.7. While CMSSW (setup via cmsenv) uses python > 2.7, CRAB (setup via /cvmfs/cms.cern.ch/crab3/crab.sh) still uses python 2.6.8. To overcome this problem, don't setup the CRAB environment and instead use the crab-env-bootstrap.sh script (see this FAQ).
>
>
[line 714] p = tLPTest("MyType",** { "a"+str(x): tLPTestType(x) for x in xrange(0,300) } )

This uses dictionary comprehensions, a feature available in python > 2.7. While CMSSW (setup via cmsenv) uses python > 2.7, CRAB (setup via /cvmfs/cms.cern.ch/crab3/crab.sh) still uses python 2.6.8. To overcome this problem, don't setup the CRAB environment and instead use the crab-env-bootstrap.sh script (see this FAQ).

 
<!--/twistyPlugin-->

Revision 762018-01-24 - LeonardoCristella

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 12 to 12
 pre.note {background-color: white;}
Changed:
<
<
CRAB Logo
>
>
CRAB Logo
 

CRAB3 Frequently Asked Questions

Line: 17 to 17
 

CRAB3 Frequently Asked Questions

Complete: 3 Go to SWGuideCrab
Deleted:
<
<
 Help Notice: This is a large page, it works best if you search in it for your problem using the browser search function.
By default all answers are collapsed and search only used the questions text. If you do not find what you need, you can use the buttons below to search inside answers as well:
Changed:
<
<
 
>
>
 
Changed:
<
<
Help if you still have problems, check the CRAB3Troubleshoot guide before asking for support
>
>
Help if you still have problems, check the CRAB3Troubleshoot guide before asking for support
 
Contents:
Line: 34 to 34
  Therefore you should remove credentials from myproxy and then issue the crab commad again. To remove stale credentials:
Changed:
<
<
>
>
 grep myproxy-info /crab.log
Changed:
<
<
# example: grep myproxy-info crab_20160308_140433/crab.log
>
>
# example: grep myproxy-info crab_20160308_140433/crab.log
 you will get something like
Changed:
<
<
 command : myproxy-info -l ec95456d3589ed395dc47d3ada8c94c67ee588f1 -s myproxy.cern.ch
>
>
 command : myproxy-info -l ec95456d3589ed395dc47d3ada8c94c67ee588f1 -s myproxy.cern.ch
  then simply issue a myproxy-destroy command with same arguments:
Changed:
<
<
>
>
 #exampe. In real life replace the long hex string with the one from your crab.log
Changed:
<
<
myproxy-destroy -l ec95456d3589ed395dc47d3ada8c94c67ee588f1 -s myproxy.cern.ch
>
>
myproxy-destroy -l ec95456d3589ed395dc47d3ada8c94c67ee588f1 -s myproxy.cern.ch
 
<!--/twistyPlugin-->
Line: 55 to 56
 

Does CRAB setup conflict with CMSSW setup

<!--/twistyPlugin twikiMakeVisibleInline-->
Changed:
<
<
No. CRAB client runs within the CMSSW environment.
>
>
No. CRAB client runs within the CMSSW environment.
 Make sure you always do cmsenv before source /cvmfs/cms.cern.ch/crab3/crab.sh
<!--/twistyPlugin-->
Line: 120 to 123
 One can use the crab purge command to delete from the CRAB cache files associated to a given task. Actually, crab purge deletes only user input sandboxes (because there is no API to delete other files), but since they are supposed to be the main space consumers in the CRAB cache, this should be enough. If for some reason the crab purge command does not work, one can alternatively use the REST interface of the crabcache component. Instructions oriented for CRAB3 operators can be found here. Jordan Tucker has written the following script based on these instructions that removes all the input sandboxes from the user CRAB cache area (a valid proxy and the CRAB environment are required):

Show Hide script
<!--/twistyPlugin twikiMakeVisibleInline-->
Changed:
<
<
>
>
 #!/usr/bin/env python

import json

Line: 196 to 200
  if '.log' in x: continue print 'remove', x
Changed:
<
<
h.fileremove(x)
>
>
h.fileremove(x)
 
<!--/twistyPlugin-->

note.gif Note: Once a task has been submitted, one can safely delete the input sandbox from the CRAB cache, as the sandbox is transferred to the worker nodes from the schedulers.

Line: 226 to 230
 With CRAB3 this should not be any different than with CRAB2. CRAB will look up for the user's username registered in SiteDB (which is the username of the CERN primary account) using for the query the user's DN (which in turn is extracted from the user's credentials) and will try to stage out to /store/user/<username>/ (by default). If the store user area uses a different username, itís up to the destination site to remap that (via a symbolic link or something similar). The typical case is Fermilab; to request the mapping of the store user area, FNAL users should follow the directions on the usingEOSatLPC web page to open a ServiceNow ticket to get this fixed.

To prevent stage out failures, and in case the user has provided in the Data.outLFN parameter of the CRAB configuration file an LFN directory path of the kind /store/user/[<some-username>/<subdir>*] (i.e. a store path that starts with /store/user/), CRAB will check if some-username matches with the user's username extracted from SiteDB. If it doesn't, it will give an error message and not submit the task. The error message would be something like this:

Changed:
<
<
>
>
 Error contacting the server. Server answered with: Invalid input parameter
Changed:
<
<
Reason is: The parameter Data.outLFN in the CRAB configuration file must start with either '/store/user//' or '/store/group//' (or '/store/local//' if publication is off), wher...
>
>
Reason is: The parameter Data.outLFN in the CRAB configuration file must start with either '/store/user//' or '/store/group//' (or '/store/local//' if publication is off), wher...
  Unfortunately the "Reason is:" message it cut at 200 characters. The message should read:
Changed:
<
<
Reason is: The parameter Data.outLFN in the CRAB configuration file must start with either '/store/user/<username>/' or '/store/group/<groupname>/' (or '/store/local/<something>/' if publication is off), where username is your username as registered in SiteDB (i.e. the username of your CERN primary account).
>
>
Reason is: The parameter Data.outLFN in the CRAB configuration file must start with either '/store/user/<username>/' or '/store/group/<groupname>/' (or '/store/local/<something>/' if publication is off), where username is your username as registered in SiteDB (i.e. the username of your CERN primary account).
  A similar message should be given by crab checkwrite if the user does crab checkwrite --site=<CMS-site-name> --lfn=/store/user/<some-username>. </>
<!--/twistyPlugin-->
Line: 245 to 247
 
<!--/twistyPlugin twikiMakeVisibleInline-->
First of all, does CRAB know at all that the job should produce the output file in question? To check that, open one of the job log files linked from the task monitoring pages. Very close to the top it is printed the list of output files that CRAB expects to see once the job finishes (shown below is the case of job number 1 in the task):
Changed:
<
<
>
>
 ==== HTCONDOR JOB SUMMARY at ... START ==== CRAB ID: 1 Execution site: ... Current hostname: ... Destination site: ...
Changed:
<
<
Output files: my_output_file.root=my_output_file_1.root
>
>
Output files: my_output_file.root=my_output_file_1.root
  If the output file in question doesn't appear in that list, then CRAB doesn't know about it, and of course it will not be transferred. This doesn't mean that the output file was not produced; it is simply that CRAB has to know beforehand what are the output files that the job produces.

If the output file is produced by either PoolOutputModule or TFileService, CRAB will automatically recognize the name of the output file when the user submits the task and it will add the output file name to the list of expected output files. On the other hand, if the output file is produced by any other module, the user has to specify the output file name in the CRAB configuration parameter JobType.outputFiles in order for CRAB to know about it. Note that this parameter takes a python list, so the right way to specify it is:

Added:
>
>
config.JobType.outputFiles = ['my_output_file.root']
 
Deleted:
<
<
config.JobType.outputFiles = ['my_output_file.root']
 
<!--/twistyPlugin-->

Can I delete a dataset I published in DBS?

Line: 286 to 285
 crab checkusername uses the following sequence of bash commands, which you should try to execute one by one (make sure you have a valid proxy) to check if they return what is expected.

1) It gets the path to the users proxy file with the command

Changed:
<
<
which scram >/dev/null 2>&1 && eval `scram unsetenv -sh`; voms-proxy-info -path
>
>
which scram >/dev/null 2>&1 && eval `scram unsetenv -sh`; voms-proxy-info -path
  which should return something like
Changed:
<
<
/tmp/x509up_u57506
>
>
/tmp/x509up_u57506
  2) It defines the path to the CA certificates directory with the following python command
Changed:
<
<
>
>
 import os capath = os.environ['X509_CERT_DIR'] if 'X509_CERT_DIR' in os.environ else "/etc/grid-security/certificates"
Changed:
<
<
print capath
>
>
print capath
  which should be equivalent to the following bash command
Changed:
<
<
>
>
 if [ "x$X509_CERT_DIR" = "x" ]; then capath=$X509_CERT_DIR; else capath=/etc/grid-security/certificates; fi
Changed:
<
<
echo $capath
>
>
echo $capath
  and which in lxplus should result in
Changed:
<
<
/etc/grid-security/certificates
>
>
/etc/grid-security/certificates
  3) It uses the proxy file and the capath to query https://cmsweb.cern.ch/sitedb/data/prod/whoami
Changed:
<
<
curl -s --capath <output-from-command-2-above> --cert <output-from-command-1-above> --key <output-from-command-1-above> 'https://cmsweb.cern.ch/sitedb/data/prod/whoami'
>
>
curl -s --capath <output-from-command-2-above> --cert <output-from-command-1-above> --key <output-from-command-1-above> 'https://cmsweb.cern.ch/sitedb/data/prod/whoami'
  which should return something like
Changed:
<
<
>
>
 {"result": [ {"dn": "/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=atanasi/CN=710186/CN=Andres Jorge Tanasijczuk", "login": "atanasi", "method": "X509Proxy", "roles": {"operator": {"group": ["crab3"], "site": []}}, "name": "Andres Jorge Tanasijczuk"}
Changed:
<
<
]}
>
>
]}
  4) Finally it parses the output from the above query to extract the username from the "login" field (in my case it is atanasi).
Line: 345 to 328
 
<!--/twistyPlugin twikiMakeVisibleInline-->
When doing crab submit the user may get this error messages:
Changed:
<
<
>
>
 Error contacting the server. Server answered with: Invalid input parameter
Changed:
<
<
Reason is: User quota limit reached; cannot upload the file
>
>
Reason is: User quota limit reached; cannot upload the file
  Error explanation: The user has reached the limit of 4.88GB in its CRAB cache area. Read more in this FAQ.
Line: 362 to 343
 
<!--/twistyPlugin twikiMakeVisibleInline-->
Typical error in crab status:
Changed:
<
<
>
>
 Failure message: The CRAB server backend was not able to (re)submit the task, because the Grid scheduler answered with an error. This is probably a temporary glitch. Please try again later. If the error persists send an e-mail to hn-cms-computing-tools@cern.ch<mailto:hn-cms-computing-tools@cern.ch>. Error reason: Trapped exception in Dagman.Fork: Unable to edit jobs matching constraint File "/data/srv/TaskManager/3.3.1512.rc6/slc6_amd64_gcc481/cms/crabtaskworker/3.3.1512.rc6/lib/python2.6/site-packages/TaskWorker/Actions/DagmanResubmitter.py", line 113, in executeInternal
Changed:
<
<
schedd.edit(rootConst, "HoldKillSig", 'SIGKILL')
>
>
schedd.edit(rootConst, "HoldKillSig", 'SIGKILL')
  As the error message says, this should be a temporary failure. One should just keep trying until it works. But after doing crab resubmit, give it some time to process the resubmission request; it may take a couple of minutes to see the jobs reacting to the resubmission.
<!--/twistyPlugin-->
Line: 376 to 355
 
<!--/twistyPlugin twikiMakeVisibleInline-->
After doing crab submit and crab status the user may get this error message:
Changed:
<
<
>
>
 Task status: UNKNOWN
Changed:
<
<
Error during task injection: Task failed to bootstrap on schedd
>
>
Error during task injection: Task failed to bootstrap on schedd
  Error explanation: The submission of the task to the scheduler machine has failed.
Line: 393 to 369
 
<!--/twistyPlugin twikiMakeVisibleInline-->
When doing crab status the user may get one of these error messages:
Changed:
<
<
>
>
 Error during task injection: <task-name>: Failed to contact Schedd: Failed to fetch ads from schedd.
Deleted:
<
<
 
Changed:
<
<
Error during task information retrieval:        <task-name>: Failed to contact Schedd: .

Error explanation: This is a temporary communication error with the scheduler machine (submission node), most probably because the scheduler is overloaded.

>
>
Error during task information retrieval: <task-name>: Failed to contact Schedd: . Error explanation: This is a temporary communication error with the scheduler machine (submission node), most probably because the scheduler is overloaded.
  What to do: Try again after a couple of minutes.
<!--/twistyPlugin-->
Line: 433 to 402
 Some more discussion is in this thread: https://hypernews.cern.ch/HyperNews/CMS/get/computing-tools/2928.html

There are a few datasets in DBS which do no satisfy this limit, if someone really needs to process those, the only way is to do one job per file using the userInputFiles feature of CRAB. An annotated example of how to do this in python is below, note that you have to disable DBS publication, indicate split by file and provide input file locations, other configuaration parameters can be set as usual:

Changed:
<
<
>
>
  # this will use CRAB client API from RawCommand import crabCommand
Line: 487 to 455
  result = crabCommand('submit', config = config)
Changed:
<
<
print (result)
>
>
print (result)
 
<!--/twistyPlugin-->
Line: 498 to 465
 It is important that you as a user are prepared for this to happen and know how to remain productive in your physics analysis with the least effort. While there is a long tradition of "resubmit them until they work", this is hardly useful any more. And while we can't prevent users from trying we can not guarantee that it will work, nor that resubmitted jobs will succeed.
Changed:
<
<
In the case where the missing data sample is important, the best recommendation we can give to users is to
USE GENERAL/GENERIC RESCUE PROCEDURES, RATHER THEN TRY AT ALL COSTS TO REVIVE A DEAD TASK.
>
>
In the case where the missing data sample is important, the best recommendation we can give to users is to
USE GENERAL/GENERIC RESCUE PROCEDURES, RATHER THEN TRY AT ALL COSTS TO REVIVE A DEAD TASK.
 We will always welcome problem reports and will try to improve when resubmission failures can be due to CRAB internals, but surely you do not want to hold your breath in the meanwhile.

The safest path is therefore:

Changed:
<
<
  1. let running jobs die or complete and dust settle
  2. use crab klill to make sure everything stops
  3. take stock of what's published in DBS at that point and make sure that it matches what's on disk
>
>
  1. let running jobs die or complete and dust settle
  2. use crab klill to make sure everything stops
  3. take stock of what's published in DBS at that point and make sure that it matches what's on disk
 
    • if your output is not in DBS, you can use crab report, but while DBS information is available forever, crab commands on a specific task may not
Changed:
<
<
  1. assess whether it is more important to get the last percentage of statistics or go on with other work. Do you really need 100% completion in this task ?
  2. if full statistics is needed, create a recovery task for the missing lumis and run it writing to the same dataset as the original task.
>
>
  1. assess whether it is more important to get the last percentage of statistics or go on with other work. Do you really need 100% completion in this task ?
  2. if full statistics is needed, create a recovery task for the missing lumis and run it writing to the same dataset as the original task.
 
Changed:
<
<
Recovery task is an important concept that can be useful in many circumstances. Please find instructions in this FAQ
>
>
Recovery task is an important concept that can be useful in many circumstances. Please find instructions in this FAQ
  At the other extreme there's: forget about this, and resubmit a new task with new output dataset. In between it is a murky land where many recipes may be more efficient according to details, but no general simple rule can be given and there's space for individual creativity and/or desperation.
Line: 521 to 487
 
<!--/twistyPlugin twikiMakeVisibleInline-->
When doing crab submit the user may get one of these error messages:
Changed:
<
<
>
>
 Syntax error in CRAB configuration: invalid syntax (<CRAB-configuration-file-name>.py, <line-where-error-occurred>)
Deleted:
<
<
 
Deleted:
<
<
 Syntax error in CRAB configuration:
Changed:
<
<
'Configuration' object has no attribute '<attribute-name>'

Error explanation: The CRAB configuration file could not be loaded, because there is a syntax error somewhere in it.

>
>
'Configuration' object has no attribute '<attribute-name>' Error explanation: The CRAB configuration file could not be loaded, because there is a syntax error somewhere in it.
  What to do: Check the CRAB configuration file and fix it. There could be a misspelled parameter or section name, or you could be trying to use a configuration attribute (parameter or section) that was not defined. To get more details on where the error occurred, do:
Changed:
<
<
>
>
 python
Changed:
<
<
import <CRAB-configuration-file-name> #without the '.py'
>
>
import <CRAB-configuration-file-name> #without the '.py'
  which gives:
Changed:
<
<
>
>
 Traceback (most recent call last): File "", line 1, in File "<CRAB-configuration-file-name>.py", <line-where-error-occurred> <error-python-code>
Changed:
<
<
^
>
>
^
  or
Changed:
<
<
>
>
 Traceback (most recent call last): File "", line 1, in File "<CRAB-configuration-file-name>.py", <line-where-error-occurred>, in <error-python-code>
Changed:
<
<
AttributeError: 'Configuration' object has no attribute '<attribute-name>'
>
>
AttributeError: 'Configuration' object has no attribute '<attribute-name>'
  For more information about the CRAB configuration file, see CRAB3ConfigurationFile.
<!--/twistyPlugin-->
Line: 582 to 537
 You should inspect the stdout of one job to find the exception message and traceback which may guide you to the solution.

A particular case is when the exception says An exception of category 'DictionaryNotFound' occurred, like in this example:

Changed:
<
<
>
>
 
Begin Fatal Exception 08-Jun-2017 18:18:04 CEST----------------------- An exception of category 'DictionaryNotFound' occurred while [0] Constructing the EventProcessor Exception Message: No Dictionary for class: 'edm::Wrapper<edm::DetSetVector >'
Changed:
<
<

End Fatal Exception -------------------------------------------------
>
>

End Fatal Exception -------------------------------------------------
  in this case, most likely the input data have been produced with a CMSSW version not compatible with the one used in CRAB job. In general it's not supported reading data with a release older than what it was produced with.

To find out which release was used to produce a given dataser of file, adapt following examples to your situation:

Changed:
<
<
>
>
 belforte@lxplus045/~> dasgoclient --query "release dataset=/DoubleMuon/Run2016C-18Apr2017-v1/AOD" ["CMSSW_8_0_28"] belforte@lxplus045/~>
Deleted:
<
<
 
Deleted:
<
<
 belforte@lxplus045/~> dasgoclient --query "release file=/store/data/Run2016C/DoubleMuon/AOD/18Apr2017-v1/100001/56D1FA6E-D334-E711-9967-0025905A48B2.root" ["CMSSW_8_0_28"]
Changed:
<
<
belforte@lxplus045/~>
>
>
belforte@lxplus045/~>
 
<!--/twistyPlugin-->

Exit code 8028

Line: 622 to 572
  Leaving the AAA error aside for a while, the first thing to contemplate here is to understand why the file could not be loaded from the local storage. Was it because the file is not available at the execution site? And if so, was it supposed to be available? If not, can you force CRAB to submit the jobs to the site(s) where the file is hosted? CRAB should do that automatically if the input file is one from the input dataset specified in the CRAB configuration parameter Data.inputDataset, unless you have set Data.ignoreLocality = True, or except in cases like using a (secondary) pile-up dataset. If yours is the last case, please read Using pile-up in this same twiki.
Changed:
<
<
If you intentionally wanted (and had a good reason) to run jobs reading the input files via AAA, then yes, we have to care about why AAA failed. The first thing then is to check that the site where the input file is hosted supports access to its storage via AAA, which is true for the majority of the sites, but not all. If you want to know if your file is available through AAA you can check this link, and in particular you can do cmsenv and then use this command xrdfs cms-xrd-global.cern.ch locate /store/path/to/file. If AAA is supported, the next thing is to discard a transient problem: you can submit your jobs again and see if the error persists. Ultimately, you should write to the Computing Tools HyperNews forum (this is a forum for all kind of issues with CMS computing tools, not only CRAB).
>
>
If you intentionally wanted (and had a good reason) to run jobs reading the input files via AAA, then yes, we have to care about why AAA failed. The first thing then is to check that the site where the input file is hosted supports access to its storage via AAA, which is true for the majority of the sites, but not all. If you want to know if your file is available through AAA you can check this link, and in particular you can do cmsenv and then use this command xrdfs cms-xrd-global.cern.ch locate /store/path/to/file. If AAA is supported, the next thing is to discard a transient problem: you can submit your jobs again and see if the error persists. Ultimately, you should write to the Computing Tools HyperNews forum (this is a forum for all kind of issues with CMS computing tools, not only CRAB).
 </>
<!--/twistyPlugin-->

Illegal parameter found in configuration. The parameter is named: 'numberEventsInLuminosityBlock'

Line: 630 to 580
 
<!--/twistyPlugin twikiMakeVisibleInline-->
The most common reasons for this error are:
  1. The user is trying to analyze an input dataset, but he/she has specified in the CRAB configuration file JobType.pluginName = 'PrivateMC' instead of JobType.pluginName = 'Analysis'.
Changed:
<
<
  1. The user is generating MC events, correctly specifying in the CRAB configuration file JobType.pluginName = 'PrivateMC', but in the CMSSW parameter set configuration he/she has specified a source of type PoolSource. The solution is to not specify a PoolSource. Note: This doesn't mean to remove process.source completely, as this attribute must be present. One could set process.source = cms.Source("EmptySource") if no input source is used.
>
>
  1. The user is generating MC events, correctly specifying in the CRAB configuration file JobType.pluginName = 'PrivateMC', but in the CMSSW parameter set configuration he/she has specified a source of type PoolSource. The solution is to not specify a PoolSource. Note: This doesn't mean to remove process.source completely, as this attribute must be present. One could set process.source = cms.Source("EmptySource") if no input source is used.
 
<!--/twistyPlugin-->

CRAB Client API

Line: 641 to 591
 The general problem is that CMSSW parameter-set configurations don't like to be loaded twice. In that respect, each time the CRAB client loads a CMSSW configuration, it saves it in a local (temporary) cache identifying the loaded module with a key constructed out of the following three pieces: the full path to the module and the python variables sys.path and sys.argv.

A problem arises when the CRAB configuration parameter JobType.pyCfgParams is used. The arguments in JobType.pyCfgParams are added by CRAB to sys.argv, affecting the value of the key that identifies a CMSSW parameter-set in the above mentioned cache. And that's in principle fine, as changing the arguments passed to the CMSSW parameter-set may change the event processor. But when a python process has to do more than one submission (like the case of multicrab for multiple submissions), the CMSSW parameter-set is loaded again every time the JobType.pyCfgParams is changed and this may result in "duplicate process" errors. Below are two examples of these kind of errors:

Changed:
<
<
>
>
 CmsRunFailure CMSSW error message follows. Fatal Exception An exception of category 'Configuration' occurred while
Changed:
<
<
† †[0] Constructing the EventProcessor † †[1] Constructing module: class=...... label=......
>
>
[0] Constructing the EventProcessor [1] Constructing module: class=...... label=......
 Exception Message: Duplicate Process The process name ...... was previously used on these products. Please modify the configuration file to use a distinct process name.
Deleted:
<
<
 
Deleted:
<
<
 CmsRunFailure CMSSW error message follows. Fatal Exception
Line: 665 to 612
 in vString categories duplication of the string ...... The above are from MessageLogger configuration validation. In most cases, these involve lines that the logger configuration code
Changed:
<
<
would not process, but which the cfg creator obviously meant to have effect.
>
>
would not process, but which the cfg creator obviously meant to have effect.
 One option would be to try to not use JobType.pyCfgParams. But if this is not possible, the more general ad-hoc solution would be to fork the submission into a different python process. For example, if you are doing something like documented in Multicrab using the crabCommand API then we suggest to replace each
Changed:
<
<
submit(config)
>
>
submit(config)
  by
Changed:
<
<
>
>
 from multiprocessing import Process p = Process(target=submit, args=(config,)) p.start()
Changed:
<
<
p.join()
>
>
p.join()
  (Of course, from multiprocessing import Process needs to be executed only once, so put it outside any loop.) </>
<!--/twistyPlugin-->
Line: 734 to 677
 
  1. submit a new CRAB tasks B which process the missing lumis (listB = listIn - listA)

Details are slighly different if you published output in DBS or not:

Changed:
<
<
output in DBS
follow the procedure in this FAQ
output not in DBS
follow the procedure in this Workbook example
>
>
output in DBS
follow the procedure in this FAQ
output not in DBS
follow the procedure in this Workbook example
  </>
<!--/twistyPlugin-->
Line: 745 to 688
 While data taking is progressing, corresponding datasets in DBS and lumi-mask files are growing. Also data quality is sometimes improved for already existing data, leading to updated lumi-masks which compared to older lumi-masks include luminosity sections that were previously filtered out. Both of these situations lead to the common case where one would like to run a task (lets call it task B) over an input dataset partially analyzed already in a previous task (lets call it task A), where task B should skip the data already analyzed in task A.

This can be accomplished with a few lines in the CRAB configuration file, see an annotated example below.

Changed:
<
<
>
>
 from UserUtilities import config, getLumiListInValidFiles from LumiList import LumiList
Line: 778 to 720
 newLumiMask.writeJSON('my_lumi_mask.json') # and there we, process from input dataset all the lumi listed in the current officialLumiMask file, skipping the ones you already have. config.Data.lumiMask = 'my_lumi_mask.json'
Changed:
<
<
config.Data.outputDatasetTag = <TaskA-output-dataset-name> # add to your existing dataset ...
>
>
config.Data.outputDatasetTag = <TaskA-outputDatasetTag> # add to your existing dataset ...
  IMPORTANT NOTE : in this way you will add any lumi section in the intial data set that was turned from bad to good in the golden list after you ran Task-A, but if some of those data evolved the other way around (from good to bad), there is no way to remove those from your published datasets.
Line: 823 to 763
 

Using pile-up

<!--/twistyPlugin twikiMakeVisibleInline-->
Changed:
<
<
Important Instructions:
Make sure you run your jobs at the site where the pile-up sample is. Not where the signal is.
>
>
Important Instructions:
Make sure you run your jobs at the site where the pile-up sample is. Not where the signal is.
 This requires you to overrdie the location list that CRAB would extract from the inputDataset.
Changed:
<
<
Rational and details:
The pile-up files have be specified in the CMSSW parameter-set configuration file. There is no way yet to tell in the CRAB configuration file that one wants to use a pile-up dataset as a secondary input dataset. That means that CRAB doesn't know that the CMSSW code will want to access pile-up files; CRAB only knows about the primary input dataset (if any). This means that, assuming there is a primary input dataset, when CRAB does data discovery to figure out to which sites should it submit the jobs, it will only take into account the input dataset specified in the CRAB configuration file (in the Data.inputDataset parameter) and submit the jobs to sites where this dataset is hosted. If there is no primary input dataset, CRAB will submit the jobs to the less busy sites. In any case, if the pile-up files are not hosted in the execution sites, they will be accessed via AAA (Xrootd). But reading the "signal" events directly from the local storage and the pile-up events via AAA is more inefficient than doing the other way around, since for each "signal" event that is read one needs to read in general many (> 20) pile-up events. Therefore, it is highly recommended that the user forces CRAB to submit the jobs to the sites where the pile-up dataset is hosted by whitelisting these sites using the parameter Site.whitelist in the CRAB configuration file. Note that one also needs to set Data.ignoreLocality = True in the CRAB configuration file in case of using a primary input dataset so to avoid CRAB doing data discovery and eventually complain (and fail to submit) that the input dataset is not available in the whitelisted sites. One can use DAS to get the list of sites that host a dataset.
>
>
Rational and details:
The pile-up files have be specified in the CMSSW parameter-set configuration file. There is no way yet to tell in the CRAB configuration file that one wants to use a pile-up dataset as a secondary input dataset. That means that CRAB doesn't know that the CMSSW code will want to access pile-up files; CRAB only knows about the primary input dataset (if any). This means that, assuming there is a primary input dataset, when CRAB does data discovery to figure out to which sites should it submit the jobs, it will only take into account the input dataset specified in the CRAB configuration file (in the Data.inputDataset parameter) and submit the jobs to sites where this dataset is hosted. If there is no primary input dataset, CRAB will submit the jobs to the less busy sites. In any case, if the pile-up files are not hosted in the execution sites, they will be accessed via AAA (Xrootd). But reading the "signal" events directly from the local storage and the pile-up events via AAA is more inefficient than doing the other way around, since for each "signal" event that is read one needs to read in general many (> 20) pile-up events. Therefore, it is highly recommended that the user forces CRAB to submit the jobs to the sites where the pile-up dataset is hosted by whitelisting these sites using the parameter Site.whitelist in the CRAB configuration file. Note that one also needs to set Data.ignoreLocality = True in the CRAB configuration file in case of using a primary input dataset so to avoid CRAB doing data discovery and eventually complain (and fail to submit) that the input dataset is not available in the whitelisted sites. One can use DAS to get the list of sites that host a dataset.
 
<!--/twistyPlugin-->
Deleted:
<
<
 

Miscellanea

How CRAB finds data in input datasets from DBS

Line: 870 to 809
 LFNs are names like /store/user/mario/myoutput; note that a directory is also a file name.

For example, for the LFN /store/user/username/myfile.root stored in T2_IT_Pisa you can do the following (make sure you did cmsenv before, so to use a new version of curl), where you can replace the first two lines with the values which are useful to you and simply copy/paste the long curl command:

Changed:
<
<
>
>
 site=T2_IT_Pisa lfn=/store/user/username/myfile.root
Changed:
<
<
curl -ks "https://cmsweb.cern.ch/phedex/datasvc/perl/prod/lfn2pfn?node=${site}&lfn=${lfn}&protocol=srmv2" | grep PFN | cut -d "'" -f4
>
>
curl -ks "https://cmsweb.cern.ch/phedex/datasvc/perl/prod/lfn2pfn?node=${site}&lfn=${lfn}&protocol=srmv2" | grep PFN | cut -d "'" -f4
  which returns:
Changed:
<
<
srm://stormfe1.pi.infn.it:8444/srm/managerv2?SFN=/cms/store/user/username/myfile.root
>
>
srm://stormfe1.pi.infn.it:8444/srm/managerv2?SFN=/cms/store/user/username/myfile.root
 
<!--
To see full details, you call the PhEDEx API passing the following query data: {'protocol': 'srmv2', 'node': '<CMS-node-name>', 'lfn': '<logical-path-name>'}. For example, for the LFN /store/user/username/myfile.root stored in T2_IT_Pisa, the URL to query the API would be https://cmsweb.cern.ch/phedex/datasvc/xml/prod/lfn2pfn?protocol=srmv2&node=T2_IT_Pisa&lfn=/store/user/username/myfile.root, which would show an output in xml format as this:
Line: 898 to 833
 
-->

Before executing the gfal commands, make sure to have a valid proxy:

Changed:
<
<
voms-proxy-init -voms cms
>
>
voms-proxy-init -voms cms
 Enter GRID pass phrase for this identity: Contacting voms2.cern.ch:15002 [/DC=ch/DC=cern/OU=computers/CN=voms2.cern.ch] "cms"... Remote VOMS server contacted succesfully.
Line: 911 to 842
  Created proxy in /tmp/x509up_u<user-id>.
Changed:
<
<
Your proxy is valid until
>
>
Your proxy is valid until
  The most useful gfal commands and their usage syntax for listing/removing/copying files/directories are in the examples below (it is recommended to unset the environment when executing gfal commands, i.e. to add env -i in front of the commands). See also the man entry for each command (man gfal-ls etc.):

List a (remote) path:

Changed:
<
<
env -i X509_USER_PROXY=/tmp/x509up_u$UID gfal-ls <physical-path-name-to-directory>
>
>
env -i X509_USER_PROXY=/tmp/x509up_u$UID gfal-ls <physical-path-name-to-directory>
  Remove a (remote) file:
Changed:
<
<
env -i X509_USER_PROXY=/tmp/x509up_u$UID gfal-rm <physical-path-name-to-file>
>
>
env -i X509_USER_PROXY=/tmp/x509up_u$UID gfal-rm <physical-path-name-to-file>
  Recursively remove a (remote) directory and all files in it:
Changed:
<
<
env -i X509_USER_PROXY=/tmp/x509up_u$UID gfal-rm -r <physical-path-name-to-directory>
>
>
env -i X509_USER_PROXY=/tmp/x509up_u$UID gfal-rm -r <physical-path-name-to-directory>
  Copy a (remote) file to a directory in the local machine:
Added:
>
>
env -i X509_USER_PROXY=/tmp/x509up_u$UID gfal-copy <physical-path-name-to-source-file> file://<absolute-path-to-local-destination-directory>
 
Changed:
<
<
env -i X509_USER_PROXY=/tmp/x509up_u$UID gfal-copy <physical-path-name-to-source-file> file://<absolute-path-to-local-destination-directory>
Note: the <absolute-path-to-local-destination-directory> starts with / therefore there are three consecutive / characters like file:///tmp/somefilename.root
>
>
Note: the starts with / therefore there are three consecutive / characters like file:///tmp/somefilename.root
 </>
<!--/twistyPlugin-->

Why are my jobs submitted to a site that I had explicitly blacklisted (not whitelisted)?

Line: 947 to 870
 
<!--/twistyPlugin twikiMakeVisibleInline-->
There is a site overflow mechanism in place, which takes place after CRAB submission. Sites are divided in regions of good WAN/xrootd connectivity (e.g. US, Italy, Germany etc.), then jobs queued at one site A for too long are allowed to overflow to a well connected site B which does not host the requested input data but from where data will be read over xrootd. Rationale is that even if those jobs were to fail due to unable to read data or a problem in site B, they will be automatically resubmitted, so nothing is lost with respect to keeping those jobs idle in the queue waiting for free slots at site A. The site overflow can be turned off via the Debug.extraJDL CRAB configuration parameter :
Changed:
<
<
>
>
 config.section_("Debug")
Changed:
<
<
config.Debug.extraJDL = ['+CMS_ALLOW_OVERFLOW=False']
>
>
config.Debug.extraJDL = ['+CMS_ALLOW_OVERFLOW=False']
  Note: if you change this configuration option for an already-created task (for instance if you noticed a lot of job failures at a particular site and even after blacklisting the jobs keep going back), you can't simply change the option in the configuration and resubmit. You'll have to kill the existing task and make a new task to get the option to be accepted. You can't simply change it during resubmission.
<!--/twistyPlugin-->
Line: 965 to 887
 There is a tool written in python called LumiList.py (available in the WMCore library; is the same code as cmssw/FWCore/PythonUtilities/python/LumiList.py ) that can be used to do lumi-mask arithmetics. The arithmetics can even be done inside the CRAB configuration file (that's the advantage of having the configuration file written in python). Below are some examples.

Example 1: A run range selection can be achieved by selecting from the original lumi-mask file the run range of interest.

Changed:
<
<
>
>
 from LumiList import LumiList

lumiList = LumiList(filename='my_original_lumi_mask.json') lumiList.selectRuns([x for x in range(193093,193999+1)]) lumiList.writeJSON('my_lumi_mask.json')

Changed:
<
<
config.Data.lumiMask = 'my_lumi_mask.json'
>
>
config.Data.lumiMask = 'my_lumi_mask.json'
  Example 2: Use a new lumi-mask file that is the intersection of two other lumi-mask files.
Changed:
<
<
>
>
 from LumiList import LumiList

originalLumiList1 = LumiList(filename='my_original_lumi_mask_1.json')

Line: 986 to 905
 newLumiList = originalLumiList1 & originalLumiList2 newLumiList.writeJSON('my_lumi_mask.json')
Changed:
<
<
config.Data.lumiMask = 'my_lumi_mask.json'
>
>
config.Data.lumiMask = 'my_lumi_mask.json'
  Example 3: Use a new lumi-mask file that is the union of two other lumi-mask files.
Changed:
<
<
>
>
 from LumiList import LumiList

originalLumiList1 = LumiList(filename='my_original_lumi_mask_1.json')

Line: 999 to 916
 newLumiList = originalLumiList1 | originalLumiList2 newLumiList.writeJSON('my_lumi_mask.json')
Changed:
<
<
config.Data.lumiMask = 'my_lumi_mask.json'
>
>
config.Data.lumiMask = 'my_lumi_mask.json'
  Example 4: Use a new lumi-mask file that is the subtraction of two other lumi-mask files.
Changed:
<
<
>
>
 from LumiList import LumiList

originalLumiList1 = LumiList(filename='my_original_lumi_mask_1.json')

Line: 1012 to 927
 newLumiList = originalLumiList1 - originalLumiList2 newLumiList.writeJSON('my_lumi_mask.json')
Changed:
<
<
config.Data.lumiMask = 'my_lumi_mask.json'
>
>
config.Data.lumiMask = 'my_lumi_mask.json'
 </>
<!--/twistyPlugin-->

User quota in the CRAB scheduler machines

<!--/twistyPlugin twikiMakeVisibleInline-->
Each user has a home directory with 100GB of disk space in each of the scheduler machines (schedd for short) assigned to CRAB3 for submitting jobs to the Grid. Whenever a task is submitted by the CRAB server to a schedd, a task directory is created in this space containing among other things CRAB libraries and scripts needed to run the jobs. Log files from Condor/DAGMan and CRAB itself are also placed there. (What is not available in the schedds are the cmsRun log files, except for the snippet available in the CRAB job log file.) As a guidance, a task with 100 jobs uses on average 50MB of space, but this number depends a lot on the number of resubmissions, since each resubmission produces its log files. If a user reaches his/her quota in a given schedd, he/she will not be able to submit more jobs via that schedd (he/she may still be able to submit via other schedd, but since the user can not choose the schedd to which to submit -the choice is done by the CRAB server-, he/she would have to keep trying the submission until the task goes to a schedd with non-exahusted quota). To avoid that, task directories are automatically removed from the schedds after 30 days of their last modification. If a user reaches 50% of its quota in a given schedd, an automatic e-mail similar to the one shown below is sent to him/her.
Changed:
<
<
>
>
 Subject: WARNING: Reaching your quota

Dear analysis user ,

Line: 1034 to 948
  https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideCrabFaq#Disk_space_for_output_files If you have any questions, please contact hn-cms-computing-tools(AT)cern.ch Regards,
Changed:
<
<
CRAB support
>
>
CRAB support
  This e-mail has a link (https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideCrabFaq#How_to_clean_up_your_directory_i) to the instructions on how to clean up space in the user's home directory in a schedd. A user can follow the instructions in that page, or alternatively use the crab purge command.
<!--/twistyPlugin-->
Line: 1046 to 959
 
<!--/twistyPlugin twikiMakeVisibleInline-->
To overcome the CRAB3 vs CMSSW environment conflicts, you can use the following script available in CVMFS (/cvmfs/cms.cern.ch/crab3/crab-env-bootstrap.sh) without need to source the CRAB3 environment. You could do something like this:
Changed:
<
<
>
>
 cmsenv # DO NOT setup the CRAB3 environment alias crab='/cvmfs/cms.cern.ch/crab3/crab-env-bootstrap.sh' crab submit crab status ...
Changed:
<
<
# check that you can run cmsRun locally
>
>
# check that you can run cmsRun locally
  Details:
Line: 1068 to 979
 
<!--/twistyPlugin twikiMakeVisibleInline-->
The problematic pset is FWCore/ParameterSet/Mixins.py from CMSSW:
Added:
>
>
[line 714] p = tLPTest("MyType",** { "a"+str(x): tLPTestType(x) for x in xrange(0,300) } )
 
Changed:
<
<
[line 714] p = tLPTest("MyType",** { "a"+str(x): tLPTestType(x) for x in xrange(0,300) } )

This uses dictionary comprehensions, a feature available in python > 2.7. While CMSSW (setup via cmsenv) uses python > 2.7, CRAB (setup via /cvmfs/cms.cern.ch/crab3/crab.sh) still uses python 2.6.8. To overcome this problem, don't setup the CRAB environment and instead use the crab-env-bootstrap.sh script (see this FAQ).

>
>
This uses dictionary comprehensions, a feature available in python > 2.7. While CMSSW (setup via cmsenv) uses python > 2.7, CRAB (setup via /cvmfs/cms.cern.ch/crab3/crab.sh) still uses python 2.6.8. To overcome this problem, don't setup the CRAB environment and instead use the crab-env-bootstrap.sh script (see this FAQ).
 
<!--/twistyPlugin-->

Revision 752017-11-03 - ElliotHughes

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 143 to 143
  if user is None: user = getUsernameFromSiteDB()
Changed:
<
<
if not user:https://twiki.cern.ch/twiki/bin/view/CMSPublic/CRAB3FAQ#Why_are_my_jobs_submitted_to_a_s
>
>
if not user:
  raise Crab3ToolsException('could not get username from sitedb, returned %r' % user) self.user = user

Revision 742017-10-20 - AndrewMelo

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 952 to 952
 config.section_("Debug") config.Debug.extraJDL = ['+CMS_ALLOW_OVERFLOW=False']
Added:
>
>
Note: if you change this configuration option for an already-created task (for instance if you noticed a lot of job failures at a particular site and even after blacklisting the jobs keep going back), you can't simply change the option in the configuration and resubmit. You'll have to kill the existing task and make a new task to get the option to be accepted. You can't simply change it during resubmission.
 
<!--/twistyPlugin-->

What is glideinWms Overflow and how can I avoid using it ?

Revision 732017-09-29 - StefanoBelforte

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 109 to 109
 

How are the inputFiles handled in the user input sandbox?

<!--/twistyPlugin twikiMakeVisibleInline-->
Depending on whether filenames or directories are used in the config.JobType.inputFiles parameter, the directory structure inside the sandbox may be different and affect where the files are placed in the working directory of the job.
Changed:
<
<
  • Specific file names are always added to the root directory of the sandbox, whether an absolute or relative file name is used. For example, /afs/cern.ch/user/e/erupeika/supportFiles/PileupData2016B_69200.root will appear as PileupData2016B_69200.root in the sandbox (and will be extracted to the job's root working directory).
>
>
  • Specific file names are always added to the root directory of the sandbox, whether an absolute or relative file name is used. For example, both /afs/cern.ch/user/e/erupeika/supportFiles/foo.root or myfiles/foo.root will appear as foo.root in the sandbox and will be extracted as foo.root to the job's root working directory.
 
  • The directory structure inside each additional input file directory is maintained in the sandbox. The additional directories themselves will be located in the root directory of the sandbox. For example, if a directory foo with files bar1 and bar2 inside it is specified in the inputFiles parameter, the sandbox will contain foo, foo/bar1 and foo/bar2 (the working directory of the job will therefore also contain a directory foo with files bar1 and bar2).
Added:
>
>
  • For example if your application expects to find mydir/file1 you should put in crab configuration config.JobType.inputFiles='mydir' and of course avoid having extra stuff in that directory. While if you put config.Data.inputFiles='mydir/file1' your application needs to open file1
 
<!--/twistyPlugin-->

How can I clean my area in the CRAB cache?

Revision 722017-07-24 - MargueriteTonjes

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 222 to 222
 

Can I stage out my files into a /store/user/ area that uses a different username than the one of my CERN primary account?

<!--/twistyPlugin twikiMakeVisibleInline-->
Changed:
<
<
With CRAB3 this should not be any different than with CRAB2. CRAB will look up for the user's username registered in SiteDB (which is the username of the CERN primary account) using for the query the user's DN (which in turn is extracted from the user's credentials) and will try to stage out to /store/user/<username>/ (by default). If the store user area uses a different username, itís up to the destination site to remap that (via a symbolic link or something similar). The typical case is Fermilab; to request the mapping of the store user area, FNAL users should write to cms-t1(AT)fnal.gov with both usernames and the certificate DN, and they will sort it out.
>
>
With CRAB3 this should not be any different than with CRAB2. CRAB will look up for the user's username registered in SiteDB (which is the username of the CERN primary account) using for the query the user's DN (which in turn is extracted from the user's credentials) and will try to stage out to /store/user/<username>/ (by default). If the store user area uses a different username, itís up to the destination site to remap that (via a symbolic link or something similar). The typical case is Fermilab; to request the mapping of the store user area, FNAL users should follow the directions on the usingEOSatLPC web page to open a ServiceNow ticket to get this fixed.
  To prevent stage out failures, and in case the user has provided in the Data.outLFN parameter of the CRAB configuration file an LFN directory path of the kind /store/user/[<some-username>/<subdir>*] (i.e. a store path that starts with /store/user/), CRAB will check if some-username matches with the user's username extracted from SiteDB. If it doesn't, it will give an error message and not submit the task. The error message would be something like this:

Revision 712017-07-06 - StefanoBelforte

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 563 to 563
 For more information about the CRAB configuration file, see CRAB3ConfigurationFile.
<!--/twistyPlugin-->
Added:
>
>

Problems with the = .requestcache file= and/or the CRAB project directory

<!--/twistyPlugin twikiMakeVisibleInline-->
If a crab command fails with messages like "Cannot find .requestcache file" or "...  is not a valid CRAB project directory" or otherwise complains that it can not find the tasks you are trying to send the command to, a problem with local directory where crab submit caches relevant information is likely (maybe disk got full, or corrupted, or you removed a file unintentionally).

Please fine more information about the CRAB project directory and possible recovery action on your site in CRAB3Commands#CRAB_project_directory

<!--/twistyPlugin-->
 

Problems with job execution

Revision 702017-07-04 - StefanoBelforte

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 18 to 18
 
Complete: 3 Go to SWGuideCrab
Changed:
<
<
Contents:
>
>
Help Notice: This is a large page, it works best if you search in it for your problem using the browser search function.
By default all answers are collapsed and search only used the questions text. If you do not find what you need, you can use the buttons below to search inside answers as well:
 

Help if you still have problems, check the CRAB3Troubleshoot guide before asking for support

 
Changed:
<
<
This twiki is constantly under construction. You might also want to check the CRAB3CommonErrors page. If you don't find the answer to your question, write to the Computing Tools CMS HyperNews forum.
>
>
Contents:
 

Certificates, proxies and all that stuff

Line: 334 to 338
 note.gif Note: Even if crab checkusername gives an error retrieving the username from SiteDB, this should not stop you from trying to submit jobs with CRAB, because the error might just be a problem with crab checkusername itself and not a real problem with your registration in SiteDB (CRAB uses a different mechanism than the one described above to check the users' registration in SiteDB when attempting to submit jobs to the grid). </>
<!--/twistyPlugin-->
Changed:
<
<

crab resubmit fails with "Trapped exception in Dagman.Fork"

>
>

crab submit fails with "User quota limit reached; cannot upload the file"

<!--/twistyPlugin twikiMakeVisibleInline-->
When doing crab submit the user may get this error messages:

Error contacting the server.
Server answered with: Invalid input parameter
Reason is: User quota limit reached; cannot upload the file

Error explanation: The user has reached the limit of 4.88GB in its CRAB cache area. Read more in this FAQ.

What to do: Files in the CRAB cache are automatically deleted after 5 days, but the user can clean his/her cache area at any time. See how in this FAQ.

<!--/twistyPlugin-->

crab (re)submit fails with "Trapped exception in Dagman.Fork"

 
<!--/twistyPlugin twikiMakeVisibleInline-->
Typical error in crab status:
Changed:
<
<
Failure message: The CRAB server backend was not able to resubmit the task, because the Grid scheduler answered with an error. This is probably a temporary glitch. Please try again later. If the error persists send an e-mail to hn-cms-computing-tools@cern.ch<mailto:hn-cms-computing-tools@cern.ch>. Error reason: Trapped exception in Dagman.Fork: Unable to edit jobs matching constraint
>
>
Failure message: The CRAB server backend was not able to (re)submit the task, because the Grid scheduler answered with an error. This is probably a temporary glitch. Please try again later. If the error persists send an e-mail to hn-cms-computing-tools@cern.ch<mailto:hn-cms-computing-tools@cern.ch>. Error reason: Trapped exception in Dagman.Fork: Unable to edit jobs matching constraint
  File "/data/srv/TaskManager/3.3.1512.rc6/slc6_amd64_gcc481/cms/crabtaskworker/3.3.1512.rc6/lib/python2.6/site-packages/TaskWorker/Actions/DagmanResubmitter.py", line 113, in executeInternal schedd.edit(rootConst, "HoldKillSig", 'SIGKILL')
Line: 348 to 371
 As the error message says, this should be a temporary failure. One should just keep trying until it works. But after doing crab resubmit, give it some time to process the resubmission request; it may take a couple of minutes to see the jobs reacting to the resubmission.
<!--/twistyPlugin-->
Added:
>
>

crab submit fails with "Task failed to bootstrap on schedd"

<!--/twistyPlugin twikiMakeVisibleInline-->
After doing crab submit and crab status the user may get this error message:

Task status: UNKNOWN

Error during task injection:    Task failed to bootstrap on schedd

Error explanation: The submission of the task to the scheduler machine has failed.

What to do: Submit again.

<!--/twistyPlugin-->

crab submit fails with "Failed to contact Schedd"

<!--/twistyPlugin twikiMakeVisibleInline-->
When doing crab status the user may get one of these error messages:

Error during task injection:        <task-name>: Failed to contact Schedd: Failed to fetch ads from schedd.

Error during task information retrieval:        <task-name>: Failed to contact Schedd: .

Error explanation: This is a temporary communication error with the scheduler machine (submission node), most probably because the scheduler is overloaded.

What to do: Try again after a couple of minutes.

<!--/twistyPlugin-->
 

crab submit fails with "Splitting task ... with LumiBased method does not generate any job"

Line: 458 to 515
 In between it is a murky land where many recipes may be more efficient according to details, but no general simple rule can be given and there's space for individual creativity and/or desperation. </>
<!--/twistyPlugin-->
Added:
>
>

I get a "Syntax error in CRAB configuration"

<!--/twistyPlugin twikiMakeVisibleInline-->
When doing crab submit the user may get one of these error messages:

Syntax error in CRAB configuration:
invalid syntax (<CRAB-configuration-file-name>.py, <line-where-error-occurred>)

Syntax error in CRAB configuration:
'Configuration' object has no attribute '<attribute-name>'

Error explanation: The CRAB configuration file could not be loaded, because there is a syntax error somewhere in it.

What to do: Check the CRAB configuration file and fix it. There could be a misspelled parameter or section name, or you could be trying to use a configuration attribute (parameter or section) that was not defined. To get more details on where the error occurred, do:

python
import <CRAB-configuration-file-name> #without the '.py'

which gives:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<CRAB-configuration-file-name>.py", <line-where-error-occurred>
    <error-python-code>
                      ^

or

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<CRAB-configuration-file-name>.py", <line-where-error-occurred>, in <module>
    <error-python-code>
AttributeError: 'Configuration' object has no attribute '<attribute-name>'

For more information about the CRAB configuration file, see CRAB3ConfigurationFile.

<!--/twistyPlugin-->
 

Problems with job execution

Exit code 8001

Revision 692017-06-09 - StefanoBelforte

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 460 to 460
 

Problems with job execution

Added:
>
>

Exit code 8001

<!--/twistyPlugin twikiMakeVisibleInline-->
This indicate that cmsRun enconterred a bot better specified fatal exception. Usually means a problem in user code or configuration. You should inspect the stdout of one job to find the exception message and traceback which may guide you to the solution.

A particular case is when the exception says An exception of category 'DictionaryNotFound' occurred, like in this example:

----- Begin Fatal Exception 08-Jun-2017 18:18:04 CEST-----------------------
An exception of category 'DictionaryNotFound' occurred while
   [0] Constructing the EventProcessor
Exception Message:
No Dictionary for class: 'edm::Wrapper<edm::DetSetVector<CTPPSDiamondDigi> >'
----- End Fatal Exception -------------------------------------------------

in this case, most likely the input data have been produced with a CMSSW version not compatible with the one used in CRAB job. In general it's not supported reading data with a release older than what it was produced with.

To find out which release was used to produce a given dataser of file, adapt following examples to your situation:

belforte@lxplus045/~> dasgoclient --query "release dataset=/DoubleMuon/Run2016C-18Apr2017-v1/AOD"
["CMSSW_8_0_28"]
belforte@lxplus045/~> 

belforte@lxplus045/~> dasgoclient --query "release file=/store/data/Run2016C/DoubleMuon/AOD/18Apr2017-v1/100001/56D1FA6E-D334-E711-9967-0025905A48B2.root" 
["CMSSW_8_0_28"]
belforte@lxplus045/~> 

<!--/twistyPlugin-->
 

Exit code 8028

<!--/twistyPlugin twikiMakeVisibleInline-->

Revision 682017-06-09 - StefanoBelforte

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 441 to 441
 While there is a long tradition of "resubmit them until they work", this is hardly useful any more. And while we can't prevent users from trying we can not guarantee that it will work, nor that resubmitted jobs will succeed.

In the case where the missing data sample is important, the best recommendation we can give to users is to

Changed:
<
<
USE RESCUE PROCEDURES WITHOUT CARING FOR DETAILS.
>
>
USE GENERAL/GENERIC RESCUE PROCEDURES, RATHER THEN TRY AT ALL COSTS TO REVIVE A DEAD TASK.
 We will always welcome problem reports and will try to improve when resubmission failures can be due to CRAB internals, but surely you do not want to hold your breath in the meanwhile.

The safest path is therefore:

  1. let running jobs die or complete and dust settle
Added:
>
>
  1. use crab klill to make sure everything stops
 
  1. take stock of what's published in DBS at that point and make sure that it matches what's on disk
    • if your output is not in DBS, you can use crab report, but while DBS information is available forever, crab commands on a specific task may not
  2. assess whether it is more important to get the last percentage of statistics or go on with other work. Do you really need 100% completion in this task ?
Line: 578 to 579
 
  • original CRAB project directory if you did not publish output in DBS

The procedure to generate a recovery task is based on these simple steps:

Added:
>
>
  1. issue a crab kill . Killing the current task will guarantee that no change happens anymore
 
  1. make a list of lumis present in the desired input dataset (listIn)
  2. make a list of lumis successfully processed by original CRAB task A (listA)
  3. submit a new CRAB tasks B which process the missing lumis (listB = listIn - listA)

Revision 672017-06-09 - StefanoBelforte

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 550 to 550
 Recovery task is an important concept that can be useful in many circumstances.

The general idea is that a CRAB task has run to completion, all re-submission attempts done, but some of the necessary input data was not processed.

Changed:
<
<
A recovery task will run same executable and configuration the missed input data adding result to the same output destination (and DBS dataset) as the original task.
>
>
A recovery task will run the same executable and configuration on the missed input data, and will add results to the same output destination (and DBS dataset) as the original task.
 

Recovery task: Why

Changed:
<
<
CRAB developers try hard to give you a tool with perfect bookkeeping and full automation which brings each task to 100% success. Similarly do strive the operators of the global CMS submission infrastructure (aks HTCondor pool, aka glideIn) and the administrator of the many sites that contribute hardware resources for CMS data analysis. Yet at times things can go wrong, and we may not be able to investigate and fix every small glitch, and surely never within hours or days.
>
>
CRAB developers try hard to give you a tool with perfect bookkeeping and full automation which brings each task to 100% success. Similarly do strive the operators of the global CMS submission infrastructure (aka HTCondor pool, aka glideIn) and the administrator of the many sites that contribute hardware resources for CMS data analysis. Yet at times things can go wrong, and we may not be able to investigate and fix every small glitch, and surely never within hours or days.
  It is impossible to guarantee that a given task will always complete to 100% success in a short amount of time. At the same time it is impossible to make sure that all desired input data is available when the task is submitted. Moreover both good sense and experience show that the larger a task is, the larger is the chance it hits some problem. Large workflows therefore benefit from the possibility to run them sort of iteratively, with a short (hopefully one or two at most) succession of smaller and smaller tasks.

Revision 662017-06-07 - StefanoBelforte

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 374 to 374
  Some more discussion is in this thread: https://hypernews.cern.ch/HyperNews/CMS/get/computing-tools/2928.html
Changed:
<
<
There are a few datasets in DBS which do no satisfy this limit, if someone really needs to process those, the only way is to do one job per file using the userInputFiles feature of CRAB. An example of how to do this in python is below, note that you have to disable DBS publication and indicate split by file, other configuaration parameters can be set as usual:
>
>
There are a few datasets in DBS which do no satisfy this limit, if someone really needs to process those, the only way is to do one job per file using the userInputFiles feature of CRAB. An annotated example of how to do this in python is below, note that you have to disable DBS publication, indicate split by file and provide input file locations, other configuaration parameters can be set as usual:
 
Changed:
<
<
>
>
 
Added:
>
>
# this will use CRAB client API
 from RawCommand import crabCommand
Added:
>
>
# talk to DBS to get list of files in this dataset
 from dbs.apis.dbsClient import DbsApi dbs = DbsApi('https://cmsweb.cern.ch/dbs/prod/global/DBSReader')
Line: 388 to 390
  print ("dataset %s has %d files" % (dataset, len(fileDictList)))
Added:
>
>
# DBS client returns a list of dictionaries, but we want a list of Logical File Names
 lfnList = [ dic['logical_file_name'] for dic in fileDictList ]
Added:
>
>
# this now standard CRAB configuration
 from WMCore.Configuration import Configuration config = Configuration()
Line: 398 to 403
  config.section_("JobType") config.JobType.pluginName = 'Analysis'
Deleted:
<
<
config.JobType.psetName = 'demoanalyzer.py'
 
Added:
>
>
# in following line of course replace with your favorite pset config.JobType.psetName = 'demoanalyzer.py'
 config.section_("Data")
Added:
>
>
# following 3 lines are the trick to skip DBS data lookup in CRAB Server
 config.Data.userInputFiles = lfnList config.Data.splitting = 'FileBased' config.Data.unitsPerJob = 1
Added:
>
>
# since the input will have no metadata information, output can not be put in DBS
 config.Data.publication = False

config.section_("User")

Added:
>
>
#
  config.section_("Site")
Added:
>
>
# since there is no data discovery and no data location lookup in CRAB # you have to say where the input files are
 config.Site.whitelist = ['T2_CH_CERN']
Added:
>
>
 config.Site.storageSite = 'T2_CH_CERN'

result = crabCommand('submit', config = config)

Revision 652017-06-05 - StefanoBelforte

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 366 to 366
 
<!--/twistyPlugin-->
Added:
>
>

crab submit fails with "Block ...  contains more than 100000 lumis and cannot be processed for splitting. For memory/time contraint big blocks are not allowed. Use another dataset as input."

<!--/twistyPlugin twikiMakeVisibleInline-->
The message is self explaining. CRAB server will die due to lack of memory if it needs to process luminosity lists with millions of entries per block. This can only happen with MC datasets which have been created with improper use of lumisections, since the limit at 100k lumisection in one block would correspond for real data to 100 days of continuous data taking. For MC lumi sections have no relation with luminosity but are used only to allow processing less than a file in one job via split by lumi algorithm, in this case it makes no sense to have more lumis than events.

Some more discussion is in this thread: https://hypernews.cern.ch/HyperNews/CMS/get/computing-tools/2928.html

There are a few datasets in DBS which do no satisfy this limit, if someone really needs to process those, the only way is to do one job per file using the userInputFiles feature of CRAB. An example of how to do this in python is below, note that you have to disable DBS publication and indicate split by file, other configuaration parameters can be set as usual:


from CRABAPI.RawCommand import crabCommand

from dbs.apis.dbsClient import DbsApi
dbs = DbsApi('https://cmsweb.cern.ch/dbs/prod/global/DBSReader')

dataset = '/BsToJpsiPhiV2_BFilter_TuneZ2star_8TeV-pythia6-evtgen/Summer12_DR53X-PU_RD2_START53_V19F-v3/AODSIM'
fileDictList=dbs.listFiles(dataset=dataset)

print ("dataset %s has %d files" % (dataset, len(fileDictList)))

lfnList = [ dic['logical_file_name'] for dic in fileDictList ]

from WMCore.Configuration import Configuration
config = Configuration()

config.section_("General")
config.General.transferLogs = False

config.section_("JobType")
config.JobType.pluginName = 'Analysis'
config.JobType.psetName = 'demoanalyzer.py'

config.section_("Data")
config.Data.userInputFiles = lfnList
config.Data.splitting = 'FileBased'
config.Data.unitsPerJob = 1
config.Data.publication = False

config.section_("User")

config.section_("Site")
config.Site.whitelist = ['T2_CH_CERN']
config.Site.storageSite = 'T2_CH_CERN'

result = crabCommand('submit', config = config)

print (result)

<!--/twistyPlugin-->
 

CRAB fails to resubmit some jobs

<!--/twistyPlugin twikiMakeVisibleInline-->

Revision 642017-03-28 - StefanoBelforte

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 105 to 105
 

How are the inputFiles handled in the user input sandbox?

<!--/twistyPlugin twikiMakeVisibleInline-->
Depending on whether filenames or directories are used in the config.JobType.inputFiles parameter, the directory structure inside the sandbox may be different and affect where the files are placed in the working directory of the job.
Changed:
<
<
  • Specific file names are always added to the root directory of the sandbox, whether an absolute or relative file name is used. For example, afs/cern.ch/user/e/erupeika/supportFiles/PileupData2016B_69200.root will appear as PileupData2016B_69200.root in the sandbox (and will be extracted to the job's root working directory).
>
>
  • Specific file names are always added to the root directory of the sandbox, whether an absolute or relative file name is used. For example, /afs/cern.ch/user/e/erupeika/supportFiles/PileupData2016B_69200.root will appear as PileupData2016B_69200.root in the sandbox (and will be extracted to the job's root working directory).
 
  • The directory structure inside each additional input file directory is maintained in the sandbox. The additional directories themselves will be located in the root directory of the sandbox. For example, if a directory foo with files bar1 and bar2 inside it is specified in the inputFiles parameter, the sandbox will contain foo, foo/bar1 and foo/bar2 (the working directory of the job will therefore also contain a directory foo with files bar1 and bar2).
<!--/twistyPlugin-->

Revision 632017-03-27 - EmilisAntanasRupeika

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 102 to 102
 
  • The tweaked CMSSW parameter-set configuration file in pickle format (added as PSet.pkl) plus a simple PSet.py file to load the pickle file.
<!--/twistyPlugin-->
Added:
>
>

How are the inputFiles handled in the user input sandbox?

<!--/twistyPlugin twikiMakeVisibleInline-->
Depending on whether filenames or directories are used in the config.JobType.inputFiles parameter, the directory structure inside the sandbox may be different and affect where the files are placed in the working directory of the job.
  • Specific file names are always added to the root directory of the sandbox, whether an absolute or relative file name is used. For example, afs/cern.ch/user/e/erupeika/supportFiles/PileupData2016B_69200.root will appear as PileupData2016B_69200.root in the sandbox (and will be extracted to the job's root working directory).
  • The directory structure inside each additional input file directory is maintained in the sandbox. The additional directories themselves will be located in the root directory of the sandbox. For example, if a directory foo with files bar1 and bar2 inside it is specified in the inputFiles parameter, the sandbox will contain foo, foo/bar1 and foo/bar2 (the working directory of the job will therefore also contain a directory foo with files bar1 and bar2).
<!--/twistyPlugin-->
 

How can I clean my area in the CRAB cache?

<!--/twistyPlugin twikiMakeVisibleInline-->

Revision 622017-03-13 - StefanoBelforte

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 68 to 68
 

What is the maximum memory per job (maxMemoryMB) I can request?

<!--/twistyPlugin twikiMakeVisibleInline-->
Changed:
<
<
CRAB requests by default a maximum memory of 2000 MB. This is the maximum all sites guarantee they will run. Some sites, but not many, offer a bit more (typically 2500 MB); and some sites even offer 4000 MB for special users. Memory limits offered by each site are accessible in the GlideinWMS VO Factory Monitor page, http://vocms32.cern.ch/gfactory/ (choose "Current Status of the Factory" and click on a site CE listed under "Entry Name" in the table), but this should not be considered a documentation. The best advice we can give is: stick to the default, and if you think you need more, find out first if there are sites (and which ones) which can run jobs in that case. If you need help, you can write to us.
>
>
CRAB requests by default a maximum memory of 2000 MB. This is the maximum all sites guarantee they will run. Some sites, but not many, offer a bit more (typically 2500 MB); and some sites even offer 4000 MB for special users. Memory limits offered by each site are accessible in the GlideinWMS VO Factory Monitor page, http://glidein.grid.iu.edu/factory/monitor/ (choose "Current Status of the Factory" and click on a site CE listed under "Entry Name" in the table), but this should not be considered a documentation. The best advice we can give is: stick to the default, and if you think you need more, find out first if there are sites (and which ones) which can run jobs in that case. If you need help, you can write to us.
 
<!--/twistyPlugin-->

CRAB cache

Revision 612017-02-13 - AndrewMelo

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 719 to 719
 

Why are my jobs submitted to a site that I had explicitly blacklisted (not whitelisted)?

<!--/twistyPlugin twikiMakeVisibleInline-->
Changed:
<
<
There is a site overflow mechanism in place, which takes place after CRAB submission. Sites are divided in regions of good WAN/xrootd connectivity (e.g. US, Italy, Germani etc.), then jobs queued at one site A for too long are allowed to overflow to a well connected site B which does not host the requested input data but from where data will be reas over xrootd. Rational is that even if those jobs were to fail due to unable to read data or a problem in site B, they will be automatically resubmitted, so nothing is lost with respect to keeping those jobs idle in the queue waiting for free slots at site A.
>
>
There is a site overflow mechanism in place, which takes place after CRAB submission. Sites are divided in regions of good WAN/xrootd connectivity (e.g. US, Italy, Germany etc.), then jobs queued at one site A for too long are allowed to overflow to a well connected site B which does not host the requested input data but from where data will be read over xrootd. Rationale is that even if those jobs were to fail due to unable to read data or a problem in site B, they will be automatically resubmitted, so nothing is lost with respect to keeping those jobs idle in the queue waiting for free slots at site A.
 The site overflow can be turned off via the Debug.extraJDL CRAB configuration parameter :

Revision 602016-12-01 - StefanoBelforte

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 341 to 341
 As the error message says, this should be a temporary failure. One should just keep trying until it works. But after doing crab resubmit, give it some time to process the resubmission request; it may take a couple of minutes to see the jobs reacting to the resubmission.
<!--/twistyPlugin-->
Added:
>
>

crab submit fails with "Splitting task ... with LumiBased method does not generate any job"

<!--/twistyPlugin twikiMakeVisibleInline-->
This is not a CRAB error.

This usually happens when there is no lumi to process. I.e.the intersection of

  1. the input lumimask (if any)
  2. the selected run range (if any)
  3. the set of runs and lumis in the input dataset
is empty. Typical reasons are using a golden json lumimask from some data acquisition era on data from a different era or looking for a specific run in a dataset which does not include that run.

You should carefully cross check what you are trying to select, possibly use lumi arithemetic to verify, and only report this as a problem if you are sure that there is a bug. Typical error in crab status:

<!--/twistyPlugin-->
 

CRAB fails to resubmit some jobs

<!--/twistyPlugin twikiMakeVisibleInline-->

Revision 592016-10-28 - StefanoBelforte

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 716 to 716
 

Doing lumi-mask arithmetics

<!--/twistyPlugin twikiMakeVisibleInline-->
Changed:
<
<
There is a tool written in python called LumiList.py (available in the WMCore library; is the same code as cmssw/FWCore/PythonUtilities/python/LumiList.py) that can be used to do lumi-mask arithmetics. The arithmetics can even be done inside the CRAB configuration file (that's the advantage of having the configuration file written in python). Below are some examples.
>
>
There is a tool written in python called LumiList.py (available in the WMCore library; is the same code as cmssw/FWCore/PythonUtilities/python/LumiList.py ) that can be used to do lumi-mask arithmetics. The arithmetics can even be done inside the CRAB configuration file (that's the advantage of having the configuration file written in python). Below are some examples.
  Example 1: A run range selection can be achieved by selecting from the original lumi-mask file the run range of interest.

Revision 582016-10-10 - StefanoBelforte

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 48 to 48
 
<!--/twistyPlugin-->

CRAB setup

Changed:
<
<

crab-env-bootstrap.sh script to overcome the CRAB3 and CMSSW environment conflicts

>
>

Does CRAB setup conflict with CMSSW setup

 
<!--/twistyPlugin twikiMakeVisibleInline-->
Changed:
<
<
To overcome the CRAB3 vs CMSSW environment conflicts, you can use the following script available in CVMFS (/cvmfs/cms.cern.ch/crab3/crab-env-bootstrap.sh) without need to source the CRAB3 environment. You could do something like this:

cmsenv
# DO NOT setup the CRAB3 environment
alias crab='/cvmfs/cms.cern.ch/crab3/crab-env-bootstrap.sh'
crab submit
crab status
...
# check that you can run cmsRun locally

Details:

The usual way to setup CRAB3 is to first source the CMSSW environment using cmsenv and then source the CRAB3 environment using source /cvmfs/cms.cern.ch/crab3/crab.(c)sh. This setup procedure has the disadvantage that, depending on which CMSSW version is used, once the CRAB3 environment is sourced the CMSSW commands like cmsRun will stop working (also other useful commands like gfal-copy will not work). Solving this at the root and make the CRAB client RPM compatible to the CMSSW ones is not possible for the way the tools in the COMP repository are built, and because cmsweb has its own release cycle independent from CMSSW.

>
>
No. CRAB client runs within the CMSSW environment.
Make sure you always do cmsenv before source /cvmfs/cms.cern.ch/crab3/crab.sh
<!--/twistyPlugin-->
 
Changed:
<
<
To overcome this limitation we are now providing a wrapper bash script that can be run in place of the usual crab command. This wrapper script will take care of setting the environment in the correct way before running the usual crab command, and will leave the environment as it was when exiting. The script will be soon available in the CMSSW distribution under the name 'crab' and its usage will be transparent to the user: you will just run the crab commands as you would have done before. In the meantime, the script is available for testing here: /cvmfs/cms.cern.ch/crab3/crab-env-bootstrap.sh.
>
>

I need to use a (old ?) CMSSW release where CRAB client fails, what can I do ?

<!--/twistyPlugin twikiMakeVisibleInline-->
You can use the command below to get a fully consistent environment for CRAB, but be aware that cmsRun will not work anymore after that, you will need a separate shell for that:
  • source /cvmfs/cms.cern.ch/crab3/crab_standalone.sh
 
<!--/twistyPlugin-->

CRAB configuration file

Line: 400 to 390
 
  1. The user is generating MC events, correctly specifying in the CRAB configuration file JobType.pluginName = 'PrivateMC', but in the CMSSW parameter set configuration he/she has specified a source of type PoolSource. The solution is to not specify a PoolSource. Note: This doesn't mean to remove process.source completely, as this attribute must be present. One could set process.source = cms.Source("EmptySource") if no input source is used.
<!--/twistyPlugin-->
Deleted:
<
<

ERROR: SyntaxError: invalid syntax (Mixins.py, line 714)

<!--/twistyPlugin twikiMakeVisibleInline-->
The problematic pset is FWCore/ParameterSet/Mixins.py from CMSSW:

[line 714] p = tLPTest("MyType",** { "a"+str(x): tLPTestType(x) for x in xrange(0,300) } )

This uses dictionary comprehensions, a feature available in python > 2.7. While CMSSW (setup via cmsenv) uses python > 2.7, CRAB (setup via /cvmfs/cms.cern.ch/crab3/crab.sh) still uses python 2.6.8. To overcome this problem, don't setup the CRAB environment and instead use the crab-env-bootstrap.sh script (see this FAQ).

<!--/twistyPlugin-->
 

CRAB Client API

Multiple submission fails with a CMSSW "duplicate process" error

Line: 472 to 450
 If the PSetDump.py file (found in task_directory/inputs) differs for the tasks from a multiple-submission python file, try forking the submission into different python processes, as recommended in the previous FAQ.
<!--/twistyPlugin-->
Changed:
<
<

Running CRAB

>
>

More on CRAB tasks

 

Recovery task

<!--/twistyPlugin twikiMakeVisibleInline-->

Recovery task: What

Line: 517 to 495
 
<!--/twistyPlugin-->
Deleted:
<
<

How CRAB finds data in input datasets from DBS

<!--/twistyPlugin twikiMakeVisibleInline-->
The following remarks apply to the main input dataset provided to CRAB via the Data.inputDataset configuration parameter:

  • Dataset status: Datasets in DBS can have different status. This is controlled by Production and the norm is to use VALID datasets. Datasets with different status may occasionally be useful, e.g. for comparison or dedicated study of the problems which led them to be deprecated. In order to run over a dataset whose status is not VALID, one has to set Data.allowNonValidInputDataset = True in the CRAB configuration.
  • File status: Files have a is_file_valid flag in DBS, usually set to False when file is lost or corrupted. CRAB considers only valid files in the dataset. Invalid files are skipped.
  • Data location: A dataset in DBS is divided in blocks. Blocks can be migrated with PhEDEx, and PhEDEx is the only service that knows about the current locations (host sites) of a dataset block. Therefore CRAB queries PhEDEx to retrieve the locations of the blocks in a dataset. Next, a data location (aka PNN = Phedex None Name) is turned into a site where to run (aka PSN = Prosessing Site Name) using SiteDB. If a block has no valid locations in PhEDEx or no PSN associated in SiteDB, CRAB skips the block.
  • User datasets: For datasets created by users and published in DBS phys03 instance, the above is modified as follows:
    • Dataset status and File status flags are initially set to VALID by CRAB when the dataset is published; then can be changed by the user.
    • Data block location is tracked as origin_site_name in DBS and data are assumed to never move. If datasets are moved, the user can update the origin_site_name. There is no way to have multiple locations.
<!--/twistyPlugin-->
 

Dealing with a growing input dataset and/or changing lumi-mask

<!--/twistyPlugin twikiMakeVisibleInline-->
Line: 614 to 579
 

Using pile-up

<!--/twistyPlugin twikiMakeVisibleInline-->
Added:
>
>
Important Instructions:
Make sure you run your jobs at the site where the pile-up sample is. Not where the signal is.
This requires you to overrdie the location list that CRAB would extract from the inputDataset.

Rational and details:

 The pile-up files have be specified in the CMSSW parameter-set configuration file. There is no way yet to tell in the CRAB configuration file that one wants to use a pile-up dataset as a secondary input dataset. That means that CRAB doesn't know that the CMSSW code will want to access pile-up files; CRAB only knows about the primary input dataset (if any). This means that, assuming there is a primary input dataset, when CRAB does data discovery to figure out to which sites should it submit the jobs, it will only take into account the input dataset specified in the CRAB configuration file (in the Data.inputDataset parameter) and submit the jobs to sites where this dataset is hosted. If there is no primary input dataset, CRAB will submit the jobs to the less busy sites. In any case, if the pile-up files are not hosted in the execution sites, they will be accessed via AAA (Xrootd). But reading the "signal" events directly from the local storage and the pile-up events via AAA is more inefficient than doing the other way around, since for each "signal" event that is read one needs to read in general many (> 20) pile-up events. Therefore, it is highly recommended that the user forces CRAB to submit the jobs to the sites where the pile-up dataset is hosted by whitelisting these sites using the parameter Site.whitelist in the CRAB configuration file. Note that one also needs to set Data.ignoreLocality = True in the CRAB configuration file in case of using a primary input dataset so to avoid CRAB doing data discovery and eventually complain (and fail to submit) that the input dataset is not available in the whitelisted sites. One can use DAS to get the list of sites that host a dataset.
<!--/twistyPlugin-->

Miscellanea

Added:
>
>

How CRAB finds data in input datasets from DBS

<!--/twistyPlugin twikiMakeVisibleInline-->
The following remarks apply to the main input dataset provided to CRAB via the Data.inputDataset configuration parameter:

  • Dataset status: Datasets in DBS can have different status. This is controlled by Production and the norm is to use VALID datasets. Datasets with different status may occasionally be useful, e.g. for comparison or dedicated study of the problems which led them to be deprecated. In order to run over a dataset whose status is not VALID, one has to set Data.allowNonValidInputDataset = True in the CRAB configuration.
  • File status: Files have a is_file_valid flag in DBS, usually set to False when file is lost or corrupted. CRAB considers only valid files in the dataset. Invalid files are skipped.
  • Data location: A dataset in DBS is divided in blocks. Blocks can be migrated with PhEDEx, and PhEDEx is the only service that knows about the current locations (host sites) of a dataset block. Therefore CRAB queries PhEDEx to retrieve the locations of the blocks in a dataset. Next, a data location (aka PNN = Phedex None Name) is turned into a site where to run (aka PSN = Prosessing Site Name) using SiteDB. If a block has no valid locations in PhEDEx or no PSN associated in SiteDB, CRAB skips the block.
  • User datasets: For datasets created by users and published in DBS phys03 instance, the above is modified as follows:
    • Dataset status and File status flags are initially set to VALID by CRAB when the dataset is published; then can be changed by the user.
    • Data block location is tracked as origin_site_name in DBS and data are assumed to never move. If datasets are moved, the user can update the origin_site_name. There is no way to have multiple locations.
<!--/twistyPlugin-->

How many jobs can I run at the same time ?

<!--/twistyPlugin twikiMakeVisibleInline-->
CRAB runs jobs on the Grid using a global HTCondor pool created via glideInWms machinery, thing of it like a global batch system whith execution nodes all over the places. The most important thing which control how many jobs can you run is the overall number of execution slots (CPU's) available for your jobs, i.e. that match your requirement of data access, memory and running time. Then HTCondor tries hard to give to every user the same share of computing resources, i.e. equal resources to everyone at any given time. You are not penalized for having run more jobs yesterday, and not rewared either for not having used your share in the past. In computing the share that you use, HTCondor considers both the number of cores and the number of GB's of RAM that you are using. As of October 2016 the weigth is : (#cores + #GBytes)

Beware thus of asking for much more memory that you need.

<!--/twistyPlugin-->
 

How to predict how long my jobs will run for?

<!--/twistyPlugin twikiMakeVisibleInline-->
Line: 801 to 794
 This e-mail has a link (https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideCrabFaq#How_to_clean_up_your_directory_i) to the instructions on how to clean up space in the user's home directory in a schedd. A user can follow the instructions in that page, or alternatively use the crab purge command.
<!--/twistyPlugin-->
Added:
>
>

Obsolete/Deprecated stuff (kept here as permanent documentation just in case)

crab-env-bootstrap.sh script to overcome the CRAB3 and CMSSW environment conflicts

<!--/twistyPlugin twikiMakeVisibleInline-->
To overcome the CRAB3 vs CMSSW environment conflicts, you can use the following script available in CVMFS (/cvmfs/cms.cern.ch/crab3/crab-env-bootstrap.sh) without need to source the CRAB3 environment. You could do something like this:

cmsenv
# DO NOT setup the CRAB3 environment
alias crab='/cvmfs/cms.cern.ch/crab3/crab-env-bootstrap.sh'
crab submit
crab status
...
# check that you can run cmsRun locally

Details:

The usual way to setup CRAB3 is to first source the CMSSW environment using cmsenv and then source the CRAB3 environment using source /cvmfs/cms.cern.ch/crab3/crab.(c)sh. This setup procedure has the disadvantage that, depending on which CMSSW version is used, once the CRAB3 environment is sourced the CMSSW commands like cmsRun will stop working (also other useful commands like gfal-copy will not work). Solving this at the root and make the CRAB client RPM compatible to the CMSSW ones is not possible for the way the tools in the COMP repository are built, and because cmsweb has its own release cycle independent from CMSSW.

To overcome this limitation we are now providing a wrapper bash script that can be run in place of the usual crab command. This wrapper script will take care of setting the environment in the correct way before running the usual crab command, and will leave the environment as it was when exiting. The script will be soon available in the CMSSW distribution under the name 'crab' and its usage will be transparent to the user: you will just run the crab commands as you would have done before. In the meantime, the script is available for testing here: /cvmfs/cms.cern.ch/crab3/crab-env-bootstrap.sh.

<!--/twistyPlugin-->

ERROR: SyntaxError: invalid syntax (Mixins.py, line 714)

<!--/twistyPlugin twikiMakeVisibleInline-->
The problematic pset is FWCore/ParameterSet/Mixins.py from CMSSW:

[line 714] p = tLPTest("MyType",** { "a"+str(x): tLPTestType(x) for x in xrange(0,300) } )

This uses dictionary comprehensions, a feature available in python > 2.7. While CMSSW (setup via cmsenv) uses python > 2.7, CRAB (setup via /cvmfs/cms.cern.ch/crab3/crab.sh) still uses python 2.6.8. To overcome this problem, don't setup the CRAB environment and instead use the crab-env-bootstrap.sh script (see this FAQ).

<!--/twistyPlugin-->
 

-- AndresTanasijczuk - 23 Oct 2014

Revision 572016-07-22 - StefanoBelforte

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 712 to 712
 The site overflow can be turned off via the Debug.extraJDL CRAB configuration parameter:
Added:
>
>
config.section_("Debug")
 config.Debug.extraJDL = ['+CMS_ALLOW_OVERFLOW=False']
<!--/twistyPlugin-->

Revision 562016-05-25 - StefanoBelforte

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 20 to 20
 
Contents:
Changed:
<
<
This twiki is under construction. You might also want to check the CRAB3CommonErrors page. If you don't find the answer to your question, write to the Computing Tools CMS HyperNews forum.
>
>
This twiki is constantly under construction. You might also want to check the CRAB3CommonErrors page. If you don't find the answer to your question, write to the Computing Tools CMS HyperNews forum.
 

Certificates, proxies and all that stuff

Line: 141 to 141
  if user is None: user = getUsernameFromSiteDB()
Changed:
<
<
if not user:
>
>
if not user:https://twiki.cern.ch/twiki/bin/view/CMSPublic/CRAB3FAQ#Why_are_my_jobs_submitted_to_a_s
  raise Crab3ToolsException('could not get username from sitedb, returned %r' % user) self.user = user
Line: 708 to 708
 

Why are my jobs submitted to a site that I had explicitly blacklisted (not whitelisted)?

<!--/twistyPlugin twikiMakeVisibleInline-->
Changed:
<
<
There is a site overflow mechanism in place for US sites, which takes place after CRAB. That means that even if CRAB would submit the jobs to a given US site A, the site overflow allows the jobs to run on another US site B if site B has many more free slots than site A. Of course this may change the way the input dataset is read, i.e. it will be accessed via AAA if site B does not host the input dataset.
>
>
There is a site overflow mechanism in place, which takes place after CRAB submission. Sites are divided in regions of good WAN/xrootd connectivity (e.g. US, Italy, Germani etc.), then jobs queued at one site A for too long are allowed to overflow to a well connected site B which does not host the requested input data but from where data will be reas over xrootd. Rational is that even if those jobs were to fail due to unable to read data or a problem in site B, they will be automatically resubmitted, so nothing is lost with respect to keeping those jobs idle in the queue waiting for free slots at site A.
 The site overflow can be turned off via the Debug.extraJDL CRAB configuration parameter:
Line: 717 to 716
 
<!--/twistyPlugin-->
Added:
>
>

What is glideinWms Overflow and how can I avoid using it ?

See above FAQ
 

Doing lumi-mask arithmetics

<!--/twistyPlugin twikiMakeVisibleInline-->

Revision 552016-03-08 - StefanoBelforte

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 24 to 24
 

Certificates, proxies and all that stuff

Added:
>
>

crab command fails with Impossible to retrieve proxy from myproxy.cern.ch ...

<!--/twistyPlugin twikiMakeVisibleInline-->
This can be due to a stale credential in myproxy. CRAB client always tries to keep a valid one there, but there are some known edge cases where this fails, e.g. https://github.com/dmwm/CRABServer/issues/5168.

Therefore you should remove credentials from myproxy and then issue the crab commad again. To remove stale credentials:

grep myproxy-info <CRAB project directory>/crab.log
# example:  grep myproxy-info crab_20160308_140433/crab.log 
you will get something like
 command : myproxy-info -l ec95456d3589ed395dc47d3ada8c94c67ee588f1 -s myproxy.cern.ch

then simply issue a myproxy-destroy command with same arguments:

#exampe. In real life replace the long hex string with the one from your crab.log
myproxy-destroy -l ec95456d3589ed395dc47d3ada8c94c67ee588f1 -s myproxy.cern.ch

<!--/twistyPlugin-->
 

CRAB setup

crab-env-bootstrap.sh script to overcome the CRAB3 and CMSSW environment conflicts

Line: 481 to 504
 You must of course have around the original
  • scram project area
  • crab configuration file including the pset and any other file referenced in there
Changed:
<
<
  • original crab work directory if you did not publish output in DBS
>
>
  • original CRAB project directory if you did not publish output in DBS
  The procedure to generate a recovery task is based on these simple steps:
  1. make a list of lumis present in the desired input dataset (listIn)

Revision 542015-12-15 - AndresTanasijczuk

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 309 to 309
  4) Finally it parses the output from the above query to extract the username from the "login" field (in my case it is atanasi).
Changed:
<
<
When reporting a problem with crab checkusername with "Error: Failed to retrieve username from SiteDB." to the CRAB experts, it would be useful to add the output from the above commands.
>
>
When reporting a problem with crab checkusername with "Failed to retrieve username from SiteDB." to the CRAB experts, it would be useful to add the output from the above commands.
  note.gif Note: Even if crab checkusername gives an error retrieving the username from SiteDB, this should not stop you from trying to submit jobs with CRAB, because the error might just be a problem with crab checkusername itself and not a real problem with your registration in SiteDB (CRAB uses a different mechanism than the one described above to check the users' registration in SiteDB when attempting to submit jobs to the grid).
<!--/twistyPlugin-->
Added:
>
>

crab resubmit fails with "Trapped exception in Dagman.Fork"

<!--/twistyPlugin twikiMakeVisibleInline-->
Typical error in crab status:

Failure message: The CRAB server backend was not able to resubmit the task, because the Grid scheduler answered with an error. This is probably a temporary glitch. Please try again later. If the error persists send an e-mail to hn-cms-computing-tools@cern.ch<mailto:hn-cms-computing-tools@cern.ch>. Error reason: Trapped exception in Dagman.Fork: <type 'exceptions.RuntimeError'> Unable to edit jobs matching constraint <traceback object at 0xa113368>
  File "/data/srv/TaskManager/3.3.1512.rc6/slc6_amd64_gcc481/cms/crabtaskworker/3.3.1512.rc6/lib/python2.6/site-packages/TaskWorker/Actions/DagmanResubmitter.py", line 113, in executeInternal
    schedd.edit(rootConst, "HoldKillSig", 'SIGKILL')

As the error message says, this should be a temporary failure. One should just keep trying until it works. But after doing crab resubmit, give it some time to process the resubmission request; it may take a couple of minutes to see the jobs reacting to the resubmission.

<!--/twistyPlugin-->
 

CRAB fails to resubmit some jobs

Revision 532015-12-06 - WellsWulsin

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 435 to 435
 (Of course, from multiprocessing import Process needs to be executed only once, so put it outside any loop.) </>
<!--/twistyPlugin-->
Added:
>
>

Multiple submission produces different PSetDump.py files

<!--/twistyPlugin twikiMakeVisibleInline-->
If the PSetDump.py file (found in task_directory/inputs) differs for the tasks from a multiple-submission python file, try forking the submission into different python processes, as recommended in the previous FAQ.
<!--/twistyPlugin-->
 

Running CRAB

Recovery task

<!--/twistyPlugin twikiMakeVisibleInline-->

Revision 522015-11-13 - AndresTanasijczuk

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 369 to 369
 
  1. The user is generating MC events, correctly specifying in the CRAB configuration file JobType.pluginName = 'PrivateMC', but in the CMSSW parameter set configuration he/she has specified a source of type PoolSource. The solution is to not specify a PoolSource. Note: This doesn't mean to remove process.source completely, as this attribute must be present. One could set process.source = cms.Source("EmptySource") if no input source is used.
<!--/twistyPlugin-->
Added:
>
>

ERROR: SyntaxError: invalid syntax (Mixins.py, line 714)

<!--/twistyPlugin twikiMakeVisibleInline-->
The problematic pset is FWCore/ParameterSet/Mixins.py from CMSSW:

[line 714] p = tLPTest("MyType",** { "a"+str(x): tLPTestType(x) for x in xrange(0,300) } )

This uses dictionary comprehensions, a feature available in python > 2.7. While CMSSW (setup via cmsenv) uses python > 2.7, CRAB (setup via /cvmfs/cms.cern.ch/crab3/crab.sh) still uses python 2.6.8. To overcome this problem, don't setup the CRAB environment and instead use the crab-env-bootstrap.sh script (see this FAQ).

<!--/twistyPlugin-->
 

CRAB Client API

Multiple submission fails with a CMSSW "duplicate process" error

Revision 512015-11-05 - AndresTanasijczuk

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 656 to 656
 Note: the <absolute-path-to-local-destination-directory> starts with / therefore there are three consecutive / characters like file:///tmp/somefilename.root </>
<!--/twistyPlugin-->
Changed:
<
<

Why are my jobs submitted to a site that I had explicitly blacklisted?

>
>

Why are my jobs submitted to a site that I had explicitly blacklisted (not whitelisted)?

 
<!--/twistyPlugin twikiMakeVisibleInline-->
Changed:
<
<
There is a site overflow mechanism in place for T[1,2]_US sites, which takes place after CRAB. That means that even if CRAB would submit the jobs to a given US site A, the site overflow allows the jobs to run on another T2_US site B if site B has many more free slots than site A. Of course this may change the way the input dataset is read, i.e. it will be accessed via AAA if site B does not host the input dataset.
>
>
There is a site overflow mechanism in place for US sites, which takes place after CRAB. That means that even if CRAB would submit the jobs to a given US site A, the site overflow allows the jobs to run on another US site B if site B has many more free slots than site A. Of course this may change the way the input dataset is read, i.e. it will be accessed via AAA if site B does not host the input dataset.
  The site overflow can be turned off via the Debug.extraJDL CRAB configuration parameter:

Revision 502015-11-05 - AndresTanasijczuk

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 520 to 520
 newLumiMask.writeJSON('my_lumi_mask.json') # and there we, process from input dataset all the lumi listed in the current officialLumiMask file, skipping the ones you already have. config.Data.lumiMask = 'my_lumi_mask.json'
Changed:
<
<
config.Data.publishDataName = <TaskA-output-dataset-name> # add to your existing dataset
>
>
config.Data.outputDatasetTag = <TaskA-output-dataset-name> # add to your existing dataset
 ...
Line: 553 to 553
 filteredLumiMask = newLumiMask - taskALumis filteredLumiMask.writeJSON('my_lumi_mask.json') config.Data.lumiMask = 'my_lumi_mask.json'
Changed:
<
<
config.Data.publishDataName = <TaskA-output-dataset-name>
>
>
config.Data.outputDatasetTag = <TaskA-output-dataset-name>
 ...

Revision 492015-10-25 - AndresTanasijczuk

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 332 to 332
 
  1. let running jobs die or complete and dust settle
  2. take stock of what's published in DBS at that point and make sure that it matches what's on disk
    • if your output is not in DBS, you can use crab report, but while DBS information is available forever, crab commands on a specific task may not
Changed:
<
<
  1. assess whether it is more important to get the last percentage of statistics or go one with other work. Do you really need 100% completion in this task ?
>
>
  1. assess whether it is more important to get the last percentage of statistics or go on with other work. Do you really need 100% completion in this task ?
 
  1. if full statistics is needed, create a recovery task for the missing lumis and run it writing to the same dataset as the original task.

Recovery task is an important concept that can be useful in many circumstances. Please find instructions in this FAQ

Revision 482015-10-21 - StefanoBelforte

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 452 to 452
 

Recovery task: How

Changed:
<
<
You must have around the original
>
>
You must of course have around the original
 
  • scram project area
  • crab configuration file including the pset and any other file referenced in there
  • original crab work directory if you did not publish output in DBS
Added:
>
>
The procedure to generate a recovery task is based on these simple steps:
  1. make a list of lumis present in the desired input dataset (listIn)
  2. make a list of lumis successfully processed by original CRAB task A (listA)
  3. submit a new CRAB tasks B which process the missing lumis (listB = listIn - listA)

Details are slighly different if you published output in DBS or not:

output in DBS
follow the procedure in this FAQ
output not in DBS
follow the procedure in this Workbook example
 
Deleted:
<
<
There is currently no way to

This FAQ explains how to create the recovery task, even if it is differently titled: https://twiki.cern.ch/twiki/bin/view/CMSPublic/CRAB3FAQ#Dealing_with_a_growing_input_dat

 
<!--/twistyPlugin-->

How CRAB finds data in input datasets from DBS

Line: 483 to 487
 While data taking is progressing, corresponding datasets in DBS and lumi-mask files are growing. Also data quality is sometimes improved for already existing data, leading to updated lumi-masks which compared to older lumi-masks include luminosity sections that were previously filtered out. Both of these situations lead to the common case where one would like to run a task (lets call it task B) over an input dataset partially analyzed already in a previous task (lets call it task A), where task B should skip the data already analyzed in task A.

This can be accomplished with a few lines in the CRAB configuration file, see an annotated example below.

Deleted:
<
<
A more verbose discussion follows afer that.
 
from CRABClient.UserUtilities import config, getLumiListInValidFiles
Line: 523 to 526
  IMPORTANT NOTE : in this way you will add any lumi section in the intial data set that was turned from bad to good in the golden list after you ran Task-A, but if some of those data evolved the other way around (from good to bad), there is no way to remove those from your published datasets.
Added:
>
>
<!--
STEFANO COMMENTED OUT THE FOLLOWING TEXT BECAUSE HARD TO READ AND HARDLY USEFUL. 
 In full words and with all rationals:

  1. Get the CMSSW parameter-set configuration file and the CRAB configuration file used in task A. These files can be found inside the TGZ archive file located in the inputs subdirectory of task's A CRAB project directory. After untaring/unzipping the TGZ archive file, the desired files will appear in a debug subdirectory: the files are named originalPSet.py and crabConfig.py respectively.
Line: 552 to 558
 

The input dataset may eventually grow again and one may need to run a task over the extended input dataset with an extended lumi-mask. One can then repeat the process described above, except that in step 2 one has to obtain the lumiSummary.json files from all the previous tasks, and in step 3 one has to construct the lumi-mask to be used as input for the new task as the extended lumi-mask minus the union of luminosity sections already analyzed by the previous tasks (i.e. the union of the corresponding lumiSummary.json files).

Added:
>
>
END OF TEXT COMMENTED OUT BY STEFANO
 -->
 </>
<!--/twistyPlugin-->

Using pile-up

Revision 472015-10-21 - StefanoBelforte

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 315 to 315
 </>
<!--/twistyPlugin-->
Changed:
<
<

crab fails to resubmit some jobs

>
>

CRAB fails to resubmit some jobs

 
<!--/twistyPlugin twikiMakeVisibleInline-->
Deleted:
<
<
CRAB developers try hard to give you a tool with perfect bookkeeping and full automation which brings each task to 100% success. Similarly do strive the operators of the global CMS submission infrastructure (aks HTCondor pool, aka glideIn) and the administrator of the many sites that contribute hardware resources for CMS data analysis. Yet at times things can go wrong, and we may not be able to investigate and fix every small glitch, and surely never within hours or days.
 
Changed:
<
<
It is important that you as a user are prepared for this to happen and know how to remain productive in your Physics analysis with the least effort. While there is a long tradition of 'resubmit them until they work", this is hardly useful any more. And while we can't prevent users from trying we can not guarantee that it will work, nor that CRAB can keep tasks around for indefinite amount of time.
>
>
It is important that you as a user are prepared for this to happen and know how to remain productive in your physics analysis with the least effort. While there is a long tradition of "resubmit them until they work", this is hardly useful any more. And while we can't prevent users from trying we can not guarantee that it will work, nor that resubmitted jobs will succeed.
 
Changed:
<
<
In the case where the missing data sample is important, the best recommendation we can give to users is to USE RESCUE PROCEDURES WITHOUT CARING FOR DETAILS. We will always welcome problem reports and will try to improve when resubmission failures can be due to CRAB internals, but surely you do not want to hold your breath while waiting.
>
>
In the case where the missing data sample is important, the best recommendation we can give to users is to
USE RESCUE PROCEDURES WITHOUT CARING FOR DETAILS.
We will always welcome problem reports and will try to improve when resubmission failures can be due to CRAB internals, but surely you do not want to hold your breath in the meanwhile.
 
Changed:
<
<
The safest path is MORE TO BE ADDED:
>
>
The safest path is therefore:
 
  1. let running jobs die or complete and dust settle
  2. take stock of what's published in DBS at that point and make sure that it matches what's on disk
    • if your output is not in DBS, you can use crab report, but while DBS information is available forever, crab commands on a specific task may not
  3. assess whether it is more important to get the last percentage of statistics or go one with other work. Do you really need 100% completion in this task ?
Changed:
<
<
  1. if full statistics is needed, create a recovery task for the missing lumis and run it writing to the same dataset as the original task
>
>
  1. if full statistics is needed, create a recovery task for the missing lumis and run it writing to the same dataset as the original task.
 
Changed:
<
<
This FAQ explains how to create the recovery task, even if it is differently titled: https://twiki.cern.ch/twiki/bin/view/CMSPublic/CRAB3FAQ#Dealing_with_a_growing_input_dat
>
>
Recovery task is an important concept that can be useful in many circumstances. Please find instructions in this FAQ
 

At the other extreme there's: forget about this, and resubmit a new task with new output dataset.

Line: 426 to 424
 
<!--/twistyPlugin-->

Running CRAB

Added:
>
>

Recovery task

<!--/twistyPlugin twikiMakeVisibleInline-->

Recovery task: What

Recovery task is an important concept that can be useful in many circumstances.

The general idea is that a CRAB task has run to completion, all re-submission attempts done, but some of the necessary input data was not processed. A recovery task will run same executable and configuration the missed input data adding result to the same output destination (and DBS dataset) as the original task.

Recovery task: Why

CRAB developers try hard to give you a tool with perfect bookkeeping and full automation which brings each task to 100% success. Similarly do strive the operators of the global CMS submission infrastructure (aks HTCondor pool, aka glideIn) and the administrator of the many sites that contribute hardware resources for CMS data analysis. Yet at times things can go wrong, and we may not be able to investigate and fix every small glitch, and surely never within hours or days.

It is impossible to guarantee that a given task will always complete to 100% success in a short amount of time. At the same time it is impossible to make sure that all desired input data is available when the task is submitted. Moreover both good sense and experience show that the larger a task is, the larger is the chance it hits some problem. Large workflows therefore benefit from the possibility to run them sort of iteratively, with a short (hopefully one or two at most) succession of smaller and smaller tasks.

Recovery task: When

A partial list of real life events where a recovery task is user's fastest and simplest way to get work done:
  • Something went wrong in the global infrastructure and some jobs are lost beyond recovery
  • Something went wrong inside CRAB (bugs, hardware...) which can't be fixed by crab resubmit command
  • Some site went down for longer than it make sense to keep jobs in the queue
  • Some data was not available and had to be retransferred and took longer than... see above
  • More data have been added to the input dataset since the original task ran (pretty much as the above)
  • A new lumimask was prepared were lumis declared bad earlier are now good
  • ... more ...

Recovery task: How

You must have around the original

  • scram project area
  • crab configuration file including the pset and any other file referenced in there
  • original crab work directory if you did not publish output in DBS

There is currently no way to

This FAQ explains how to create the recovery task, even if it is differently titled: https://twiki.cern.ch/twiki/bin/view/CMSPublic/CRAB3FAQ#Dealing_with_a_growing_input_dat

<!--/twistyPlugin-->
 

How CRAB finds data in input datasets from DBS

Revision 462015-10-21 - StefanoBelforte

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 314 to 314
 note.gif Note: Even if crab checkusername gives an error retrieving the username from SiteDB, this should not stop you from trying to submit jobs with CRAB, because the error might just be a problem with crab checkusername itself and not a real problem with your registration in SiteDB (CRAB uses a different mechanism than the one described above to check the users' registration in SiteDB when attempting to submit jobs to the grid). </>
<!--/twistyPlugin-->
Added:
>
>

crab fails to resubmit some jobs

<!--/twistyPlugin twikiMakeVisibleInline-->
CRAB developers try hard to give you a tool with perfect bookkeeping and full automation which brings each task to 100% success. Similarly do strive the operators of the global CMS submission infrastructure (aks HTCondor pool, aka glideIn) and the administrator of the many sites that contribute hardware resources for CMS data analysis. Yet at times things can go wrong, and we may not be able to investigate and fix every small glitch, and surely never within hours or days.

It is important that you as a user are prepared for this to happen and know how to remain productive in your Physics analysis with the least effort. While there is a long tradition of 'resubmit them until they work", this is hardly useful any more. And while we can't prevent users from trying we can not guarantee that it will work, nor that CRAB can keep tasks around for indefinite amount of time.

In the case where the missing data sample is important, the best recommendation we can give to users is to USE RESCUE PROCEDURES WITHOUT CARING FOR DETAILS. We will always welcome problem reports and will try to improve when resubmission failures can be due to CRAB internals, but surely you do not want to hold your breath while waiting.

The safest path is MORE TO BE ADDED:

  1. let running jobs die or complete and dust settle
  2. take stock of what's published in DBS at that point and make sure that it matches what's on disk
    • if your output is not in DBS, you can use crab report, but while DBS information is available forever, crab commands on a specific task may not
  3. assess whether it is more important to get the last percentage of statistics or go one with other work. Do you really need 100% completion in this task ?
  4. if full statistics is needed, create a recovery task for the missing lumis and run it writing to the same dataset as the original task

This FAQ explains how to create the recovery task, even if it is differently titled: https://twiki.cern.ch/twiki/bin/view/CMSPublic/CRAB3FAQ#Dealing_with_a_growing_input_dat

At the other extreme there's: forget about this, and resubmit a new task with new output dataset. In between it is a murky land where many recipes may be more efficient according to details, but no general simple rule can be given and there's space for individual creativity and/or desperation.

<!--/twistyPlugin-->
 

Problems with job execution

Exit code 8028

Revision 452015-10-20 - AndresTanasijczuk

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 314 to 314
 note.gif Note: Even if crab checkusername gives an error retrieving the username from SiteDB, this should not stop you from trying to submit jobs with CRAB, because the error might just be a problem with crab checkusername itself and not a real problem with your registration in SiteDB (CRAB uses a different mechanism than the one described above to check the users' registration in SiteDB when attempting to submit jobs to the grid). </>
<!--/twistyPlugin-->
Deleted:
<
<

Task in FAILED status; crab resubmit --jobids is not resubmitting the jobs

<!--/twistyPlugin twikiMakeVisibleInline-->
The crab resubmit command WITH the --jobids option will not work if the crab status command is not returning the status of the jobs in the task (which is the case for example when a task is in FAILED status). The reason is the following: if the --jobids option is specified, the CRAB server will try to check upfront (before sending the resubmission request further down the chain) that the given job ids correspond to failed jobs by doing (the equivalent of) a crab status, and if the status doesn't return the jobs states the CRAB server will refuse to resubmit, giving the same error message as if one would have given job ids that are not in 'failed' state: CRAB3 server refused to resubmit the following jobs: ..... Only jobs in status failed can be resubmitted. Jobs in status finished can also be resubmitted, but only if the jobid is specified and force = True.

If the --jobids option is omitted, the resubmission should work. In this case CRAB will resubmit all terminally failed jobs.

<!--/twistyPlugin-->
 

Problems with job execution

Exit code 8028

Line: 336 to 327
  Exit code 8028 means "FileOpenError with fallback" (as documented here). That means that the some input file could not be opened neither in the first attempt from the local storage in the execution site nor in the fallback attempt from a remote site using AAA. Note that even if the file is not present at the execution site, the job will still try to find/open it from the local storage, and only in case of failure use the fallback procedure.
Changed:
<
<
Leaving the AAA error aside for a while, the first thing to contemplate here is to understand why the file could not be loaded from the local storage. Was it because the file is not available at the execution site? And if so, was it supposed to be available? If not, can you force CRAB to submit the jobs to the site(s) where the file is hosted? CRAB should so that automatically if the input file is one from the input dataset specified in the CRAB configuration parameter Data.inputDataset, unless you have set Data.ignoreLocality = True, or except in cases like using a (secondary) pile-up dataset. If yours is the last case, please read Using pile-up in this same twiki.
>
>
Leaving the AAA error aside for a while, the first thing to contemplate here is to understand why the file could not be loaded from the local storage. Was it because the file is not available at the execution site? And if so, was it supposed to be available? If not, can you force CRAB to submit the jobs to the site(s) where the file is hosted? CRAB should do that automatically if the input file is one from the input dataset specified in the CRAB configuration parameter Data.inputDataset, unless you have set Data.ignoreLocality = True, or except in cases like using a (secondary) pile-up dataset. If yours is the last case, please read Using pile-up in this same twiki.
  If you intentionally wanted (and had a good reason) to run jobs reading the input files via AAA, then yes, we have to care about why AAA failed. The first thing then is to check that the site where the input file is hosted supports access to its storage via AAA, which is true for the majority of the sites, but not all. If you want to know if your file is available through AAA you can check this link, and in particular you can do cmsenv and then use this command xrdfs cms-xrd-global.cern.ch locate /store/path/to/file. If AAA is supported, the next thing is to discard a transient problem: you can submit your jobs again and see if the error persists. Ultimately, you should write to the Computing Tools HyperNews forum (this is a forum for all kind of issues with CMS computing tools, not only CRAB). </>
<!--/twistyPlugin-->

Revision 442015-09-29 - AndresTanasijczuk

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 410 to 410
 
<!--/twistyPlugin twikiMakeVisibleInline-->
The following remarks apply to the main input dataset provided to CRAB via the Data.inputDataset configuration parameter:
Changed:
<
<
  • Dataset status: Datasets in DBS can have different status, this is controlled by Production and the norm is to use VALID datasets. Datasets inwith different status may occasionally be useful e.g. for comparison or dedicated study of the problems which led them to be deprecated. In order to run over a dataset whose status is not VALID, one has to set Data.allowNonValidInputDataset = True in the CRAB configuration.
  • File status: Files have a =is_file_valid_ flag in DBS, usually set to False when file is lost or corrupted. CRAB considers only valid files in the dataset. Invalid files are skipped.
  • Data location: A dataset in DBS is divided in blocks. Blocks can be migrated with PhEDEx, and PhEDEx is only service that knows about the currnet locations (host sites) of a dataset block. Therefore CRAB queries PhEDEx to retrieve the locations of the blocks in a dataset. Next a data location (aka PNN = Phedex None Name) is turned into a site where to run (aka PSN = Prosessing Site Name) using SiteDB. If a block has no valid locations in PhEDEx or no PSN associated in SiteDB), CRAB skips the block.
  • User datasets: for datasets created by users and published in DBS phys03 instance, the above is modified as follows:
    • Dataset status and File status flags are initially set to VALID by CRAB when dataset is published, then can be changed by the user.
    • Data block location is tracked as origin_site_name in DBS and data are assumed never to move. If datasets are moved, the user can update the origin_site_name. There is no way to have multiple locations.
>
>
  • Dataset status: Datasets in DBS can have different status. This is controlled by Production and the norm is to use VALID datasets. Datasets with different status may occasionally be useful, e.g. for comparison or dedicated study of the problems which led them to be deprecated. In order to run over a dataset whose status is not VALID, one has to set Data.allowNonValidInputDataset = True in the CRAB configuration.
  • File status: Files have a is_file_valid flag in DBS, usually set to False when file is lost or corrupted. CRAB considers only valid files in the dataset. Invalid files are skipped.
  • Data location: A dataset in DBS is divided in blocks. Blocks can be migrated with PhEDEx, and PhEDEx is the only service that knows about the current locations (host sites) of a dataset block. Therefore CRAB queries PhEDEx to retrieve the locations of the blocks in a dataset. Next, a data location (aka PNN = Phedex None Name) is turned into a site where to run (aka PSN = Prosessing Site Name) using SiteDB. If a block has no valid locations in PhEDEx or no PSN associated in SiteDB, CRAB skips the block.
  • User datasets: For datasets created by users and published in DBS phys03 instance, the above is modified as follows:
    • Dataset status and File status flags are initially set to VALID by CRAB when the dataset is published; then can be changed by the user.
    • Data block location is tracked as origin_site_name in DBS and data are assumed to never move. If datasets are moved, the user can update the origin_site_name. There is no way to have multiple locations.
 
<!--/twistyPlugin-->

Dealing with a growing input dataset and/or changing lumi-mask

Revision 432015-09-29 - StefanoBelforte

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 405 to 405
 

Running CRAB

Added:
>
>

How CRAB finds data in input datasets from DBS

<!--/twistyPlugin twikiMakeVisibleInline-->
The following remarks apply to the main input dataset provided to CRAB via the Data.inputDataset configuration parameter:

  • Dataset status: Datasets in DBS can have different status, this is controlled by Production and the norm is to use VALID datasets. Datasets inwith different status may occasionally be useful e.g. for comparison or dedicated study of the problems which led them to be deprecated. In order to run over a dataset whose status is not VALID, one has to set Data.allowNonValidInputDataset = True in the CRAB configuration.
  • File status: Files have a =is_file_valid_ flag in DBS, usually set to False when file is lost or corrupted. CRAB considers only valid files in the dataset. Invalid files are skipped.
  • Data location: A dataset in DBS is divided in blocks. Blocks can be migrated with PhEDEx, and PhEDEx is only service that knows about the currnet locations (host sites) of a dataset block. Therefore CRAB queries PhEDEx to retrieve the locations of the blocks in a dataset. Next a data location (aka PNN = Phedex None Name) is turned into a site where to run (aka PSN = Prosessing Site Name) using SiteDB. If a block has no valid locations in PhEDEx or no PSN associated in SiteDB), CRAB skips the block.
  • User datasets: for datasets created by users and published in DBS phys03 instance, the above is modified as follows:
    • Dataset status and File status flags are initially set to VALID by CRAB when dataset is published, then can be changed by the user.
    • Data block location is tracked as origin_site_name in DBS and data are assumed never to move. If datasets are moved, the user can update the origin_site_name. There is no way to have multiple locations.
<!--/twistyPlugin-->
 

Dealing with a growing input dataset and/or changing lumi-mask

<!--/twistyPlugin twikiMakeVisibleInline-->
Line: 488 to 501
 The pile-up files have be specified in the CMSSW parameter-set configuration file. There is no way yet to tell in the CRAB configuration file that one wants to use a pile-up dataset as a secondary input dataset. That means that CRAB doesn't know that the CMSSW code will want to access pile-up files; CRAB only knows about the primary input dataset (if any). This means that, assuming there is a primary input dataset, when CRAB does data discovery to figure out to which sites should it submit the jobs, it will only take into account the input dataset specified in the CRAB configuration file (in the Data.inputDataset parameter) and submit the jobs to sites where this dataset is hosted. If there is no primary input dataset, CRAB will submit the jobs to the less busy sites. In any case, if the pile-up files are not hosted in the execution sites, they will be accessed via AAA (Xrootd). But reading the "signal" events directly from the local storage and the pile-up events via AAA is more inefficient than doing the other way around, since for each "signal" event that is read one needs to read in general many (> 20) pile-up events. Therefore, it is highly recommended that the user forces CRAB to submit the jobs to the sites where the pile-up dataset is hosted by whitelisting these sites using the parameter Site.whitelist in the CRAB configuration file. Note that one also needs to set Data.ignoreLocality = True in the CRAB configuration file in case of using a primary input dataset so to avoid CRAB doing data discovery and eventually complain (and fail to submit) that the input dataset is not available in the whitelisted sites. One can use DAS to get the list of sites that host a dataset.
<!--/twistyPlugin-->
Deleted:
<
<

How CRAB treats input datasets from DBS

<!--/twistyPlugin twikiMakeVisibleInline-->
The following remarks apply to the main input dataset provided to CRAB via the Data.inputDataset configuration parameter:

  • CRAB doesn't care about the status of the dataset in DBS, but in order to run over a dataset whose status is not VALID, one has to set Data.allowNonValidInputDataset = True in the CRAB configuration.
  • CRAB considers only valid files in the dataset. Invalid files are skipped.
  • A dataset in DBS is divided in blocks. When a block is registered in DBS, an origin site name is associated for the site that hosts the block. Blocks can be migrated with PhEDEx, and it is PhEDEx the only service that knows about the updated locations (host sites) of a dataset block. Therefore CRAB queries PhEDEx when it comes to retrieve the locations of the blocks in a dataset. If a block has no valid locations in PhEDEx (for example a location that has no PSN associated in SiteDB), CRAB skips the block.
<!--/twistyPlugin-->
 

Miscellanea

Revision 422015-09-29 - AndresTanasijczuk

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 488 to 488
 The pile-up files have be specified in the CMSSW parameter-set configuration file. There is no way yet to tell in the CRAB configuration file that one wants to use a pile-up dataset as a secondary input dataset. That means that CRAB doesn't know that the CMSSW code will want to access pile-up files; CRAB only knows about the primary input dataset (if any). This means that, assuming there is a primary input dataset, when CRAB does data discovery to figure out to which sites should it submit the jobs, it will only take into account the input dataset specified in the CRAB configuration file (in the Data.inputDataset parameter) and submit the jobs to sites where this dataset is hosted. If there is no primary input dataset, CRAB will submit the jobs to the less busy sites. In any case, if the pile-up files are not hosted in the execution sites, they will be accessed via AAA (Xrootd). But reading the "signal" events directly from the local storage and the pile-up events via AAA is more inefficient than doing the other way around, since for each "signal" event that is read one needs to read in general many (> 20) pile-up events. Therefore, it is highly recommended that the user forces CRAB to submit the jobs to the sites where the pile-up dataset is hosted by whitelisting these sites using the parameter Site.whitelist in the CRAB configuration file. Note that one also needs to set Data.ignoreLocality = True in the CRAB configuration file in case of using a primary input dataset so to avoid CRAB doing data discovery and eventually complain (and fail to submit) that the input dataset is not available in the whitelisted sites. One can use DAS to get the list of sites that host a dataset. </>
<!--/twistyPlugin-->
Changed:
<
<

Input datasets/files from DBS

>
>

How CRAB treats input datasets from DBS

 
<!--/twistyPlugin twikiMakeVisibleInline-->
Changed:
<
<
  • CRAB doesn't care about the status of the dataset in DBS, but in order to run over a dataset in a status that is not VALID one has to set Data.allowNonValidInputDataset = True in the CRAB configuration.
  • CRAB considers only valid files. A dataset in DBS is divided in blocks. When a block is registered in DBS, an origin_site is defined as the site that hosts the block.
  • Blocks can be migrated with PhEDEx, and it is PhEDEx the only service that knows about the updated locations of a dataset block. Therefore CRAB queries PhEDEx when it comes to retrieve the locations of the blocks in a dataset. If a block has no valid locations in PhEDEx (for example a location that has no PSN associated in SiteDB), CRAB skips the entire block.
>
>
The following remarks apply to the main input dataset provided to CRAB via the Data.inputDataset configuration parameter:

  • CRAB doesn't care about the status of the dataset in DBS, but in order to run over a dataset whose status is not VALID, one has to set Data.allowNonValidInputDataset = True in the CRAB configuration.
  • CRAB considers only valid files in the dataset. Invalid files are skipped.
  • A dataset in DBS is divided in blocks. When a block is registered in DBS, an origin site name is associated for the site that hosts the block. Blocks can be migrated with PhEDEx, and it is PhEDEx the only service that knows about the updated locations (host sites) of a dataset block. Therefore CRAB queries PhEDEx when it comes to retrieve the locations of the blocks in a dataset. If a block has no valid locations in PhEDEx (for example a location that has no PSN associated in SiteDB), CRAB skips the block.
 
<!--/twistyPlugin-->

Miscellanea

Revision 412015-09-29 - AndresTanasijczuk

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 488 to 488
 The pile-up files have be specified in the CMSSW parameter-set configuration file. There is no way yet to tell in the CRAB configuration file that one wants to use a pile-up dataset as a secondary input dataset. That means that CRAB doesn't know that the CMSSW code will want to access pile-up files; CRAB only knows about the primary input dataset (if any). This means that, assuming there is a primary input dataset, when CRAB does data discovery to figure out to which sites should it submit the jobs, it will only take into account the input dataset specified in the CRAB configuration file (in the Data.inputDataset parameter) and submit the jobs to sites where this dataset is hosted. If there is no primary input dataset, CRAB will submit the jobs to the less busy sites. In any case, if the pile-up files are not hosted in the execution sites, they will be accessed via AAA (Xrootd). But reading the "signal" events directly from the local storage and the pile-up events via AAA is more inefficient than doing the other way around, since for each "signal" event that is read one needs to read in general many (> 20) pile-up events. Therefore, it is highly recommended that the user forces CRAB to submit the jobs to the sites where the pile-up dataset is hosted by whitelisting these sites using the parameter Site.whitelist in the CRAB configuration file. Note that one also needs to set Data.ignoreLocality = True in the CRAB configuration file in case of using a primary input dataset so to avoid CRAB doing data discovery and eventually complain (and fail to submit) that the input dataset is not available in the whitelisted sites. One can use DAS to get the list of sites that host a dataset. </>
<!--/twistyPlugin-->
Added:
>
>

Input datasets/files from DBS

<!--/twistyPlugin twikiMakeVisibleInline-->
  • CRAB doesn't care about the status of the dataset in DBS, but in order to run over a dataset in a status that is not VALID one has to set Data.allowNonValidInputDataset = True in the CRAB configuration.
  • CRAB considers only valid files. A dataset in DBS is divided in blocks. When a block is registered in DBS, an origin_site is defined as the site that hosts the block.
  • Blocks can be migrated with PhEDEx, and it is PhEDEx the only service that knows about the updated locations of a dataset block. Therefore CRAB queries PhEDEx when it comes to retrieve the locations of the blocks in a dataset. If a block has no valid locations in PhEDEx (for example a location that has no PSN associated in SiteDB), CRAB skips the entire block.
<!--/twistyPlugin-->
 

Miscellanea

How to predict how long my jobs will run for?

Revision 402015-09-26 - AndresTanasijczuk

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 414 to 414
 A more verbose discussion follows afer that.
Changed:
<
<
from UserUtilities import config, getLumisInDatasetFromDBS
>
>
from UserUtilities import config, getLumiListInValidFiles
 from LumiList import LumiList

config = config()

Line: 431 to 431
 # now the list of lumis that you successfully processed in Task-A # it can be done in two ways. Uncomment and edit the appropriate one: #1. (recommended) when Task-A output was a dataset published in DBS
Changed:
<
<
#taskALumis = getLumisInDatasetFromDBS(dataset=<TaskA-output-dataset-name>, dbsurl='phys03')
>
>
#taskALumis = getLumiListInValidFiles(dataset=<TaskA-output-dataset-name>, dbsurl='phys03')
 # or 2. when output from Task-A was not put in DBS #taskAlumis = LumiList(filename=LumiSummary.json file from running crab report on Task-A>
Line: 454 to 454
 In full words and with all rationals:

  1. Get the CMSSW parameter-set configuration file and the CRAB configuration file used in task A. These files can be found inside the TGZ archive file located in the inputs subdirectory of task's A CRAB project directory. After untaring/unzipping the TGZ archive file, the desired files will appear in a debug subdirectory: the files are named originalPSet.py and crabConfig.py respectively.
Changed:
<
<
  1. Get the luminosity sections already analyzed by task A. There are two simple ways of doing that: 1) get the lumiSummary.json file of task A using the crab report command; 2) use the getLumisInDatasetFromDBS function from the CRAB client to get the lumis in the dataset published by Task A.
>
>
  1. Get the luminosity sections already analyzed by task A. There are two simple ways of doing that: 1) get the lumiSummary.json file of task A using the crab report command; 2) use the getLumiListInValidFiles function from the CRAB client to get the lumis in the dataset published by Task A.
 
  1. Construct the lumi-mask to be used as input for task B. This lumi-mask should be equal to the extended lumi-mask published by Physics Validation for the input dataset minus the luminosity sections already analyzed by task A obtained in step 2. The subtraction of lumi-masks can be done beforehand using for example the compareJSON.py utility (available after CMSSW setup; type compareJSON.py --help for the utility help menu), or in the CRAB configuration file as shown in example 4 in Doing lumi-mask arithmetics.
  2. Submit task B using the same CMSSW parameter-set configuration file and CRAB configuration file as used in task A, except that change in the CRAB configuration the task name (for example, add a post-fix _v1 to the old task name) and the lumi-mask (use the JSON file constructed in step 3). If the publication dataset name is not changed, the output files of task B will be published in the same dataset as the output files of task A, which is in general what one would like to. The output files in the destination storage will appear in a different directory (because of the time-stamp in the directory path), but that should not be a problem. An example of the relevant part of the CRAB configuration file would look like this:
Changed:
<
<
from UserUtilities import config, getLumisInDatasetFromDBS
>
>
from UserUtilities import config, getLumiListInValidFiles
 from LumiList import LumiList

config = config()

Line: 470 to 470
 ... config.Data.inputDataset = <TaskA-input-dataset-name> config.Data.inputDBS = 'global'
Changed:
<
<
taskALumis = getLumisInDatasetFromDBS(dataset=<TaskA-output-dataset-name>, dbsurl='phys03')
>
>
taskALumis = getLumiListInValidFiles(dataset=<TaskA-output-dataset-name>, dbsurl='phys03')
 newLumiMask = LumiList(filename='new_lumi_mask.json') filteredLumiMask = newLumiMask - taskALumis filteredLumiMask.writeJSON('my_lumi_mask.json')

Revision 392015-09-24 - StefanoBelforte

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 405 to 405
 

Running CRAB

Changed:
<
<

Dealing with a growing input dataset and/or lumi-mask

>
>

Dealing with a growing input dataset and/or changing lumi-mask

 
<!--/twistyPlugin twikiMakeVisibleInline-->
Changed:
<
<
While data taking is progressing, corresponding datasets in DBS and lumi-mask files are growing. Also data quality is sometimes improved for already existing data, leading to updated lumi-masks which compared to older lumi-masks include luminosity sections that were previously filtered out. Both of these situations lead to the common case where one would like to run a task (lets call it task B) over an input dataset partially analyzed already in a previous task (lets call it task A), where task B should skip the data already analyzed in task A. In such a case, we recommend to proceed as follows:
>
>
While data taking is progressing, corresponding datasets in DBS and lumi-mask files are growing. Also data quality is sometimes improved for already existing data, leading to updated lumi-masks which compared to older lumi-masks include luminosity sections that were previously filtered out. Both of these situations lead to the common case where one would like to run a task (lets call it task B) over an input dataset partially analyzed already in a previous task (lets call it task A), where task B should skip the data already analyzed in task A.

This can be accomplished with a few lines in the CRAB configuration file, see an annotated example below. A more verbose discussion follows afer that.

from CRABClient.UserUtilities import config, getLumisInDatasetFromDBS
from WMCore.DataStructs.LumiList import LumiList

config = config()

config.General.requestName = 'TaskB'
...
 # you want to use same Pset as in previous task, in order to publish in same dataset
config.JobType.psetName = <TaskA-psetName>
...
# and of course same input dataset
config.Data.inputDataset = <TaskA-input-dataset-name>
config.Data.inputDBS = 'global'  # but this will work for a dataset in phys03 as well

# now the list of lumis that you successfully processed in Task-A
# it can be done in two ways. Uncomment and edit the appropriate one:
#1. (recommended) when Task-A output was a dataset published in DBS
#taskALumis = getLumisInDatasetFromDBS(dataset=<TaskA-output-dataset-name>, dbsurl='phys03')
# or 2. when output from Task-A was not put in DBS
#taskAlumis = LumiList(filename=<the LumiSummary.json file from running crab report on Task-A>

# now the current list of golden lumis for the data range you are interested, can be different from the one used in Task-A
officalLumiMask = LumiList(filename='<some-kosher-name>.json') 

# this is the main trick. Mask out also the lumis which you processed already
newLumiMask = officialLumiMask - taskALumis 

# write the new lumiMask file, now you can use it as input to CRAB
newLumiMask.writeJSON('my_lumi_mask.json')
# and there we, process from input dataset all the lumi listed in the current officialLumiMask file, skipping the ones you already have.
config.Data.lumiMask = 'my_lumi_mask.json' 
config.Data.publishDataName = <TaskA-output-dataset-name> #  add to your existing dataset
...

IMPORTANT NOTE : in this way you will add any lumi section in the intial data set that was turned from bad to good in the golden list after you ran Task-A, but if some of those data evolved the other way around (from good to bad), there is no way to remove those from your published datasets.

In full words and with all rationals:

 
  1. Get the CMSSW parameter-set configuration file and the CRAB configuration file used in task A. These files can be found inside the TGZ archive file located in the inputs subdirectory of task's A CRAB project directory. After untaring/unzipping the TGZ archive file, the desired files will appear in a debug subdirectory: the files are named originalPSet.py and crabConfig.py respectively.
  2. Get the luminosity sections already analyzed by task A. There are two simple ways of doing that: 1) get the lumiSummary.json file of task A using the crab report command; 2) use the getLumisInDatasetFromDBS function from the CRAB client to get the lumis in the dataset published by Task A.

Revision 382015-09-21 - AndresTanasijczuk

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 411 to 411
 While data taking is progressing, corresponding datasets in DBS and lumi-mask files are growing. Also data quality is sometimes improved for already existing data, leading to updated lumi-masks which compared to older lumi-masks include luminosity sections that were previously filtered out. Both of these situations lead to the common case where one would like to run a task (lets call it task B) over an input dataset partially analyzed already in a previous task (lets call it task A), where task B should skip the data already analyzed in task A. In such a case, we recommend to proceed as follows:

  1. Get the CMSSW parameter-set configuration file and the CRAB configuration file used in task A. These files can be found inside the TGZ archive file located in the inputs subdirectory of task's A CRAB project directory. After untaring/unzipping the TGZ archive file, the desired files will appear in a debug subdirectory: the files are named originalPSet.py and crabConfig.py respectively.
Changed:
<
<
  1. Obtain the lumiSummary.json file of task A using the crab report command. This JSON file contains the luminosity sections already analyzed by task A.
  2. Construct the lumi-mask JSON file to be used as input for task B. This lumi-mask should be equal to the extended lumi-mask published by Physics Validation for the input dataset minus the luminosity sections already analyzed by task A obtained in step 2. The subtraction of lumi-mask JSON files can be done beforehand using for example the compareJSON.py utility (available after CMSSW setup; type compareJSON.py --help for the utility help menu), or in the CRAB configuration file as shown in example 4 in Doing lumi-mask arithmetics.
  3. Submit task B using the same CMSSW parameter-set configuration file and CRAB configuration file as used in task A, except that change in the CRAB configuration the task name (for example, add a post-fix _v1 to the old task name) and the lumi-mask (use the JSON file constructed in step 3). If the publication dataset name is not changed, the output files of task B will be published in the same dataset as the output files of task A, which is in general what one would like to. The output files in the destination storage will appear in a different directory (because of the time-stamp in the directory path), but that should not be a problem.
>
>
  1. Get the luminosity sections already analyzed by task A. There are two simple ways of doing that: 1) get the lumiSummary.json file of task A using the crab report command; 2) use the getLumisInDatasetFromDBS function from the CRAB client to get the lumis in the dataset published by Task A.
  2. Construct the lumi-mask to be used as input for task B. This lumi-mask should be equal to the extended lumi-mask published by Physics Validation for the input dataset minus the luminosity sections already analyzed by task A obtained in step 2. The subtraction of lumi-masks can be done beforehand using for example the compareJSON.py utility (available after CMSSW setup; type compareJSON.py --help for the utility help menu), or in the CRAB configuration file as shown in example 4 in Doing lumi-mask arithmetics.
  3. Submit task B using the same CMSSW parameter-set configuration file and CRAB configuration file as used in task A, except that change in the CRAB configuration the task name (for example, add a post-fix _v1 to the old task name) and the lumi-mask (use the JSON file constructed in step 3). If the publication dataset name is not changed, the output files of task B will be published in the same dataset as the output files of task A, which is in general what one would like to. The output files in the destination storage will appear in a different directory (because of the time-stamp in the directory path), but that should not be a problem. An example of the relevant part of the CRAB configuration file would look like this:

from CRABClient.UserUtilities import config, getLumisInDatasetFromDBS
from WMCore.DataStructs.LumiList import LumiList

config = config()

config.General.requestName = 'TaskB'
...
config.JobType.psetName = <TaskA-psetName>
...
config.Data.inputDataset = <TaskA-input-dataset-name>
config.Data.inputDBS = 'global'
taskALumis = getLumisInDatasetFromDBS(dataset=<TaskA-output-dataset-name>, dbsurl='phys03')
newLumiMask = LumiList(filename='new_lumi_mask.json')
filteredLumiMask = newLumiMask - taskALumis
filteredLumiMask.writeJSON('my_lumi_mask.json')
config.Data.lumiMask = 'my_lumi_mask.json'
config.Data.publishDataName = <TaskA-output-dataset-name>
...
  The input dataset may eventually grow again and one may need to run a task over the extended input dataset with an extended lumi-mask. One can then repeat the process described above, except that in step 2 one has to obtain the lumiSummary.json files from all the previous tasks, and in step 3 one has to construct the lumi-mask to be used as input for the new task as the extended lumi-mask minus the union of luminosity sections already analyzed by the previous tasks (i.e. the union of the corresponding lumiSummary.json files).
<!--/twistyPlugin-->

Revision 372015-09-03 - AndresTanasijczuk

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 16 to 16
 

CRAB3 Frequently Asked Questions

Changed:
<
<
Complete: 1 Go to SWGuideCrab
>
>
Complete: 3 Go to SWGuideCrab
 
Contents:
Line: 60 to 60
 

CRAB cache

Added:
>
>

User quota in the CRAB cache

<!--/twistyPlugin twikiMakeVisibleInline-->
The CRAB User File Cache, or CRAB cache for short, is the place where:
  • the CRAB client puts the user input sandboxes when submitting a task to the CRAB server;
  • the CRAB client puts the crab.log files when they are uploaded via crab uploadlog;
  • the CRAB server puts the archives produced by the dry run submissions;
  • etc.
Each user has a quota of 4.88GB in the CRAB cache. If this limit is reached, new submissions will fail. Files in the CRAB cache are automatically deleted after 5 days. If besides that a user reaches the quota limit, he/she can free some space manually. See how in this FAQ.
<!--/twistyPlugin-->
 

What is the maximum allowed size of the user input sandbox?

<!--/twistyPlugin twikiMakeVisibleInline-->
Line: 78 to 89
 
  • The tweaked CMSSW parameter-set configuration file in pickle format (added as PSet.pkl) plus a simple PSet.py file to load the pickle file.
<!--/twistyPlugin-->
Changed:
<
<

How can I clean my user cache area in the CRAB server cache?

>
>

How can I clean my area in the CRAB cache?

 
<!--/twistyPlugin twikiMakeVisibleInline-->
Changed:
<
<
Use the crab purge command.
>
>
One can use the crab purge command to delete from the CRAB cache files associated to a given task. Actually, crab purge deletes only user input sandboxes (because there is no API to delete other files), but since they are supposed to be the main space consumers in the CRAB cache, this should be enough. If for some reason the crab purge command does not work, one can alternatively use the REST interface of the crabcache component. Instructions oriented for CRAB3 operators can be found here. Jordan Tucker has written the following script based on these instructions that removes all the input sandboxes from the user CRAB cache area (a valid proxy and the CRAB environment are required):

Show Hide script
<!--/twistyPlugin twikiMakeVisibleInline-->
#!/usr/bin/env python

import json
import os
import pycurl
from cStringIO import StringIO
from pprint import pprint
from CRABClient.UserUtilities import getUsernameFromSiteDB

class Crab3ToolsException(Exception):
    pass

class UserCacheHelper:
    def __init__(self, proxy=None, user=None):
        if proxy is None:
            proxy = os.getenv('X509_USER_PROXY')
        if not proxy or not os.path.isfile(proxy):
            raise Crab3ToolsException('X509_USER_PROXY is %r, get grid proxy first' % proxy)
        self.proxy = proxy

        if user is None:
            user = getUsernameFromSiteDB()
        if not user:
            raise Crab3ToolsException('could not get username from sitedb, returned %r' % user)
        self.user = user

    def _curl(self, url):
        buf = StringIO()
        c = pycurl.Curl()
        c.setopt(pycurl.URL, str(url))
        c.setopt(pycurl.WRITEFUNCTION, buf.write)
        c.setopt(pycurl.SSL_VERIFYPEER, False)
        c.setopt(pycurl.SSLKEY, self.proxy)
        c.setopt(pycurl.SSLCERT, self.proxy)
        c.perform()
        j = buf.getvalue().replace('\n','')
        try:
            return json.loads(j)['result']
        except ValueError:
            raise Crab3ToolsException('json decoding problem: %r' % j)

    def _only(self, l):
        if len(l) != 1:
            raise Crab3ToolsException('return value was supposed to have one element, but: %r' % l)
        return l[0]

    def listusers(self):
        return self._curl('https://cmsweb.cern.ch/crabcache/info?subresource=listusers')

    def userinfo(self):
        return self._only(self._curl('https://cmsweb.cern.ch/crabcache/info?subresource=userinfo&username=' + self.user))

    def quota(self):
        return self._only(self.userinfo()['used_space'])

    def filelist(self):
        return self.userinfo()['file_list']

    def fileinfo(self, hashkey):
        return self._only(self._curl('https://cmsweb.cern.ch/crabcache/info?subresource=fileinfo&hashkey=' + hashkey))

    def fileinfos(self):
        return [self.fileinfo(x) for x in self.filelist() if '.log' not in x] # why doesn't it work for e.g. '150630_200330:tucker_crab_repubmerge_tau0300um_M0400_TaskWorker.log' (even after quoting the :)?

    def fileremove(self, hashkey):
        x = self._only(self._curl('https://cmsweb.cern.ch/crabcache/info?subresource=fileremove&hashkey=' + hashkey))
        if x:
            raise Crab3ToolsException('fileremove failed: %r' % x)

if __name__ == '__main__':
    h = UserCacheHelper()
    for x in h.filelist():
        if '.log' in x:
            continue
        print 'remove', x
        h.fileremove(x)
<!--/twistyPlugin-->

note.gif Note: Once a task has been submitted, one can safely delete the input sandbox from the CRAB cache, as the sandbox is transferred to the worker nodes from the schedulers.

 
<!--/twistyPlugin-->

Stageout and publication

Line: 229 to 323
 If the --jobids option is omitted, the resubmission should work. In this case CRAB will resubmit all terminally failed jobs. </>
<!--/twistyPlugin-->
Deleted:
<
<

Quotas

User quota in the CRAB scheduler machines

<!--/twistyPlugin twikiMakeVisibleInline-->
Each user has a home directory with 100GB of disk space in each of the scheduler machines (schedd for short) assigned to CRAB3 for submitting jobs to the Grid. In this space, all the log files for the user's tasks are saved (except for the cmsRun log files, which are saved in the storage site). As a guidance, a task with 100 jobs uses on average 50MB in log files (this number depends a lot on the number of resubmissions, since each resubmission produces its log files). If a user reaches his/her quota in a given schedd, he/she will not be able to submit more jobs via that schedd (he/she may still be able to submit via other schedd, but since the user can not choose the schedd to which to submit -the choice is done by the CRAB server-, he/she would have to keep trying the submission until the task goes to a schedd with non-exahusted quota). To avoid that, log files are removed automatically after 30 days of their last modification. If a user reaches 50% of its quota in a given schedd, an automatic e-mail similar to the one shown below is sent to him/her.

Subject: WARNING: Reaching your quota

Dear analysis user <username>,

You are using <X>% of your disk quota on the server <schedd-name>. The moment you reach the disk quota of <Y>GB, you will be unable to
run jobs and will experience problems recovering outputs. In order to avoid that, you have to clean up your directory at the server. 
Here are the instructions to do so:
 https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideCrabFaq#How_to_clean_up_your_directory_i
Here it is a more detailed description of the issue:
 https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideCrabFaq#Disk_space_for_output_files
If you have any questions, please contact hn-cms-computing-tools(AT)cern.ch
 Regards,
CRAB support

This e-mail has a link (https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideCrabFaq#How_to_clean_up_your_directory_i) to the instructions on how to clean up space in the user's home directory in a schedd. A user can follow the instructions in that page, or alternatively use the crab purge command.

<!--/twistyPlugin-->
 

Problems with job execution

Exit code 8028

Line: 512 to 580
  </>
<!--/twistyPlugin-->
Added:
>
>

User quota in the CRAB scheduler machines

<!--/twistyPlugin twikiMakeVisibleInline-->
Each user has a home directory with 100GB of disk space in each of the scheduler machines (schedd for short) assigned to CRAB3 for submitting jobs to the Grid. Whenever a task is submitted by the CRAB server to a schedd, a task directory is created in this space containing among other things CRAB libraries and scripts needed to run the jobs. Log files from Condor/DAGMan and CRAB itself are also placed there. (What is not available in the schedds are the cmsRun log files, except for the snippet available in the CRAB job log file.) As a guidance, a task with 100 jobs uses on average 50MB of space, but this number depends a lot on the number of resubmissions, since each resubmission produces its log files. If a user reaches his/her quota in a given schedd, he/she will not be able to submit more jobs via that schedd (he/she may still be able to submit via other schedd, but since the user can not choose the schedd to which to submit -the choice is done by the CRAB server-, he/she would have to keep trying the submission until the task goes to a schedd with non-exahusted quota). To avoid that, task directories are automatically removed from the schedds after 30 days of their last modification. If a user reaches 50% of its quota in a given schedd, an automatic e-mail similar to the one shown below is sent to him/her.

Subject: WARNING: Reaching your quota

Dear analysis user <username>,

You are using <X>% of your disk quota on the server <schedd-name>. The moment you reach the disk quota of <Y>GB, you will be unable to
run jobs and will experience problems recovering outputs. In order to avoid that, you have to clean up your directory at the server. 
Here are the instructions to do so:
 https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideCrabFaq#How_to_clean_up_your_directory_i
Here it is a more detailed description of the issue:
 https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideCrabFaq#Disk_space_for_output_files
If you have any questions, please contact hn-cms-computing-tools(AT)cern.ch
 Regards,
CRAB support

This e-mail has a link (https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideCrabFaq#How_to_clean_up_your_directory_i) to the instructions on how to clean up space in the user's home directory in a schedd. A user can follow the instructions in that page, or alternatively use the crab purge command.

<!--/twistyPlugin-->
 

-- AndresTanasijczuk - 23 Oct 2014

Revision 362015-09-03 - AndresTanasijczuk

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 84 to 84
 Use the crab purge command. </>
<!--/twistyPlugin-->
Changed:
<
<

Stage out with CRAB

>
>

Stageout and publication

  Documentation about (input/output) data handling in CRAB: Crab3DataHandling.
Line: 145 to 145
  </>
<!--/twistyPlugin-->
Deleted:
<
<

Publication with CRAB

 

Can I delete a dataset I published in DBS?

<!--/twistyPlugin twikiMakeVisibleInline-->
Line: 337 to 335
 (Of course, from multiprocessing import Process needs to be executed only once, so put it outside any loop.)
<!--/twistyPlugin-->
Changed:
<
<

Miscellanea

>
>

Running CRAB

 
Changed:
<
<

How to predict how long my jobs will run for?

>
>

Dealing with a growing input dataset and/or lumi-mask

 
<!--/twistyPlugin twikiMakeVisibleInline-->
Changed:
<
<
Use the --dryrun option when doing crab submit. See crab submit --dryrun.
>
>
While data taking is progressing, corresponding datasets in DBS and lumi-mask files are growing. Also data quality is sometimes improved for already existing data, leading to updated lumi-masks which compared to older lumi-masks include luminosity sections that were previously filtered out. Both of these situations lead to the common case where one would like to run a task (lets call it task B) over an input dataset partially analyzed already in a previous task (lets call it task A), where task B should skip the data already analyzed in task A. In such a case, we recommend to proceed as follows:

  1. Get the CMSSW parameter-set configuration file and the CRAB configuration file used in task A. These files can be found inside the TGZ archive file located in the inputs subdirectory of task's A CRAB project directory. After untaring/unzipping the TGZ archive file, the desired files will appear in a debug subdirectory: the files are named originalPSet.py and crabConfig.py respectively.
  2. Obtain the lumiSummary.json file of task A using the crab report command. This JSON file contains the luminosity sections already analyzed by task A.
  3. Construct the lumi-mask JSON file to be used as input for task B. This lumi-mask should be equal to the extended lumi-mask published by Physics Validation for the input dataset minus the luminosity sections already analyzed by task A obtained in step 2. The subtraction of lumi-mask JSON files can be done beforehand using for example the compareJSON.py utility (available after CMSSW setup; type compareJSON.py --help for the utility help menu), or in the CRAB configuration file as shown in example 4 in Doing lumi-mask arithmetics.
  4. Submit task B using the same CMSSW parameter-set configuration file and CRAB configuration file as used in task A, except that change in the CRAB configuration the task name (for example, add a post-fix _v1 to the old task name) and the lumi-mask (use the JSON file constructed in step 3). If the publication dataset name is not changed, the output files of task B will be published in the same dataset as the output files of task A, which is in general what one would like to. The output files in the destination storage will appear in a different directory (because of the time-stamp in the directory path), but that should not be a problem.

The input dataset may eventually grow again and one may need to run a task over the extended input dataset with an extended lumi-mask. One can then repeat the process described above, except that in step 2 one has to obtain the lumiSummary.json files from all the previous tasks, and in step 3 one has to construct the lumi-mask to be used as input for the new task as the extended lumi-mask minus the union of luminosity sections already analyzed by the previous tasks (i.e. the union of the corresponding lumiSummary.json files).

 
<!--/twistyPlugin-->

Using pile-up

Line: 351 to 356
 The pile-up files have be specified in the CMSSW parameter-set configuration file. There is no way yet to tell in the CRAB configuration file that one wants to use a pile-up dataset as a secondary input dataset. That means that CRAB doesn't know that the CMSSW code will want to access pile-up files; CRAB only knows about the primary input dataset (if any). This means that, assuming there is a primary input dataset, when CRAB does data discovery to figure out to which sites should it submit the jobs, it will only take into account the input dataset specified in the CRAB configuration file (in the Data.inputDataset parameter) and submit the jobs to sites where this dataset is hosted. If there is no primary input dataset, CRAB will submit the jobs to the less busy sites. In any case, if the pile-up files are not hosted in the execution sites, they will be accessed via AAA (Xrootd). But reading the "signal" events directly from the local storage and the pile-up events via AAA is more inefficient than doing the other way around, since for each "signal" event that is read one needs to read in general many (> 20) pile-up events. Therefore, it is highly recommended that the user forces CRAB to submit the jobs to the sites where the pile-up dataset is hosted by whitelisting these sites using the parameter Site.whitelist in the CRAB configuration file. Note that one also needs to set Data.ignoreLocality = True in the CRAB configuration file in case of using a primary input dataset so to avoid CRAB doing data discovery and eventually complain (and fail to submit) that the input dataset is not available in the whitelisted sites. One can use DAS to get the list of sites that host a dataset. </>
<!--/twistyPlugin-->
Added:
>
>

Miscellanea

How to predict how long my jobs will run for?

<!--/twistyPlugin twikiMakeVisibleInline-->
Use the --dryrun option when doing crab submit. See crab submit --dryrun.
<!--/twistyPlugin-->
 

How to list/copy/remove files/directories in a storage element area?

<!--/twistyPlugin twikiMakeVisibleInline-->
Line: 442 to 455
 
<!--/twistyPlugin-->
Added:
>
>

Doing lumi-mask arithmetics

<!--/twistyPlugin twikiMakeVisibleInline-->
There is a tool written in python called LumiList.py (available in the WMCore library; is the same code as cmssw/FWCore/PythonUtilities/python/LumiList.py) that can be used to do lumi-mask arithmetics. The arithmetics can even be done inside the CRAB configuration file (that's the advantage of having the configuration file written in python). Below are some examples.

Example 1: A run range selection can be achieved by selecting from the original lumi-mask file the run range of interest.

from WMCore.DataStructs.LumiList import LumiList

lumiList = LumiList(filename='my_original_lumi_mask.json')
lumiList.selectRuns([x for x in range(193093,193999+1)])
lumiList.writeJSON('my_lumi_mask.json')

config.Data.lumiMask = 'my_lumi_mask.json'

Example 2: Use a new lumi-mask file that is the intersection of two other lumi-mask files.

from WMCore.DataStructs.LumiList import LumiList

originalLumiList1 = LumiList(filename='my_original_lumi_mask_1.json')
originalLumiList2 = LumiList(filename='my_original_lumi_mask_2.json')
newLumiList = originalLumiList1 & originalLumiList2
newLumiList.writeJSON('my_lumi_mask.json')

config.Data.lumiMask = 'my_lumi_mask.json'

Example 3: Use a new lumi-mask file that is the union of two other lumi-mask files.

from WMCore.DataStructs.LumiList import LumiList

originalLumiList1 = LumiList(filename='my_original_lumi_mask_1.json')
originalLumiList2 = LumiList(filename='my_original_lumi_mask_2.json')
newLumiList = originalLumiList1 | originalLumiList2
newLumiList.writeJSON('my_lumi_mask.json')

config.Data.lumiMask = 'my_lumi_mask.json'

Example 4: Use a new lumi-mask file that is the subtraction of two other lumi-mask files.

from WMCore.DataStructs.LumiList import LumiList

originalLumiList1 = LumiList(filename='my_original_lumi_mask_1.json')
originalLumiList2 = LumiList(filename='my_original_lumi_mask_2.json')
newLumiList = originalLumiList1 - originalLumiList2
newLumiList.writeJSON('my_lumi_mask.json')

config.Data.lumiMask = 'my_lumi_mask.json'
<!--/twistyPlugin-->
 

-- AndresTanasijczuk - 23 Oct 2014 \ No newline at end of file

Revision 352015-09-02 - AndresTanasijczuk

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 147 to 147
 

Publication with CRAB

Changed:
<
<

How to invalidate (parts of) a dataset published in DBS?

>
>

Can I delete a dataset I published in DBS?

 
<!--/twistyPlugin twikiMakeVisibleInline-->
Changed:
<
<
Datasets published in DBS can not be deleted by users. Instead, what a user can do is to change the state of the dataset or of some files in the dataset, for example from VALID to INVALID or vice versa. CRAB3 does not provide (yet) a command for doing this. The user can use the DBS script DBS3SetDatasetStatus.py (see https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideCrabForPublication#Invalidate_a_dataset_in_DBS) which is available by sourcing the CRAB2 environment (in a new fresh shell: source the UI, cd to a CMSSW area src directory, do cmsenv and source /afs/cern.ch/cms/ccs/wm/scripts/Crab/crab.(c)sh).
>
>
Users do not have permissions to delete a dataset or a file from DBS. Instead, what users can do is to change the status of the dataset or of individual files in the dataset. For more details see Changing a dataset or file status in DBS.
 
<!--/twistyPlugin-->

Jobs status

Revision 342015-08-26 - AndresTanasijczuk

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 354 to 354
 

How to list/copy/remove files/directories in a storage element area?

<!--/twistyPlugin twikiMakeVisibleInline-->
Changed:
<
<
You can use the gfal-* commands from a machine that has GFAL2 utility tools installed (e.g. lxplus). You have to pass Physical File Names (PFNs)as arguments to the commands. To get the Physical File Name given a Logical File Name and a CMS node name, you can use the lfn2pfn PhEDEx API LFNs are names like /store/user/mario/myoutput , note that a directory is also a file name.
>
>
You can use the gfal-* commands from a machine that has GFAL2 utility tools installed (e.g. lxplus). You have to pass Physical File Names (PFNs) as arguments to the commands. To get the Physical File Name given a Logical File Name and a CMS node name, you can use the lfn2pfn PhEDEx API. LFNs are names like /store/user/mario/myoutput; note that a directory is also a file name.

For example, for the LFN /store/user/username/myfile.root stored in T2_IT_Pisa you can do the following (make sure you did cmsenv before, so to use a new version of curl), where you can replace the first two lines with the values which are useful to you and simply copy/paste the long curl command:

 
Deleted:
<
<
For example, for the LFN /store/user/username/myfile.root stored in T2_IT_Pisa you can do the following (make sure you did cmsenv before to user a new version of curl). Where you can replace the first two lines with the values which are useful to you and simpy copy/paste the long curl command.
 
site=T2_IT_Pisa
lfn=/store/user/username/myfile.root
curl -ks "https://cmsweb.cern.ch/phedex/datasvc/perl/prod/lfn2pfn?node=${site}&lfn=${lfn}&protocol=srmv2" | grep PFN | cut -d "'" -f4
Changed:
<
<
which returns
>
>
which returns:
 
srm://stormfe1.pi.infn.it:8444/srm/managerv2?SFN=/cms/store/user/username/myfile.root

Revision 332015-08-21 - AndresTanasijczuk

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 88 to 88
  Documentation about (input/output) data handling in CRAB: Crab3DataHandling.
Added:
>
>

What are the allowed stageout LFN paths with CRAB?

<!--/twistyPlugin twikiMakeVisibleInline-->
CRAB allows only the following LFN directory paths for stageout:

  • /store/user/<username>[/<subdirs>] where username is the CERN primary account username;
  • /store/group/<groupname>[/<subdirs>] where groupname can be any already existing directory under /store/group/.

If not publishing, /store/local/<dir>[/<subdirs>] is also allowed.

These are all the allowed paths that can be set in the CRAB configuration parameter Data.outLFNDirBase. If any other path is given, the submission of the task will fail.

<!--/twistyPlugin-->
 

Can I stage out my files into a /store/user/ area that uses a different username than the one of my CERN primary account?

<!--/twistyPlugin twikiMakeVisibleInline-->
Line: 414 to 427
 Note: the <absolute-path-to-local-destination-directory> starts with / therefore there are three consecutive / characters like file:///tmp/somefilename.root
<!--/twistyPlugin-->
Added:
>
>

Why are my jobs submitted to a site that I had explicitly blacklisted?

<!--/twistyPlugin twikiMakeVisibleInline-->
There is a site overflow mechanism in place for T[1,2]_US sites, which takes place after CRAB. That means that even if CRAB would submit the jobs to a given US site A, the site overflow allows the jobs to run on another T2_US site B if site B has many more free slots than site A. Of course this may change the way the input dataset is read, i.e. it will be accessed via AAA if site B does not host the input dataset.

The site overflow can be turned off via the Debug.extraJDL CRAB configuration parameter:

config.Debug.extraJDL = ['+CMS_ALLOW_OVERFLOW=False']
<!--/twistyPlugin-->
 

-- AndresTanasijczuk - 23 Oct 2014

Revision 322015-08-08 - StefanoBelforte

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 341 to 341
 

How to list/copy/remove files/directories in a storage element area?

<!--/twistyPlugin twikiMakeVisibleInline-->
Changed:
<
<
You can use the gfal-* commands from a machine that has GFAL2 utility tools installed (e.g. lxplus). You have to pass physical path names as arguments to the commands. To get the physical path name given a logical path name and a CMS node name, you can use the lfn2pfn PhEDEx API passing the following query data: {'protocol': 'srmv2', 'node': '<CMS-node-name>', 'lfn': '<logical-path-name>'}. For example, for the LFN /store/user/username/myfile.root stored in T2_IT_Pisa, the URL to query the API would be https://cmsweb.cern.ch/phedex/datasvc/xml/prod/lfn2pfn?protocol=srmv2&node=T2_IT_Pisa&lfn=/store/user/username/myfile.root, which would show an output in xml format as this:
>
>
You can use the gfal-* commands from a machine that has GFAL2 utility tools installed (e.g. lxplus). You have to pass Physical File Names (PFNs)as arguments to the commands. To get the Physical File Name given a Logical File Name and a CMS node name, you can use the lfn2pfn PhEDEx API LFNs are names like /store/user/mario/myoutput , note that a directory is also a file name.

For example, for the LFN /store/user/username/myfile.root stored in T2_IT_Pisa you can do the following (make sure you did cmsenv before to user a new version of curl). Where you can replace the first two lines with the values which are useful to you and simpy copy/paste the long curl command.

site=T2_IT_Pisa
lfn=/store/user/username/myfile.root
curl -ks "https://cmsweb.cern.ch/phedex/datasvc/perl/prod/lfn2pfn?node=${site}&lfn=${lfn}&protocol=srmv2" | grep PFN | cut -d "'" -f4
which returns
srm://stormfe1.pi.infn.it:8444/srm/managerv2?SFN=/cms/store/user/username/myfile.root

<!--
To see full details, you call the PhEDEx API passing the following query data: {'protocol': 'srmv2', 'node': '<CMS-node-name>', 'lfn': '<logical-path-name>'}. For example, for the LFN /store/user/username/myfile.root stored in T2_IT_Pisa, the URL to query the API would be https://cmsweb.cern.ch/phedex/datasvc/xml/prod/lfn2pfn?protocol=srmv2&node=T2_IT_Pisa&lfn=/store/user/username/myfile.root, which would show an output in xml format as this:
 
- <phedex request_timestamp="1438345813.61989" instance="prod" request_url="http://cmsweb.cern.ch:7001/phedex/datasvc/xml/prod/lfn2pfn"
Line: 351 to 366
 
Changed:
<
<
From the above output the important information to copy is the pfn field. This is the physical path name.
>
>
From the above output the important information to copy is the pfn field. This is the Physical File Name that you can use as physical-path-name in the following examples
-->
  Before executing the gfal commands, make sure to have a valid proxy:
Line: 370 to 386
 Your proxy is valid until
Changed:
<
<
The gfal commands and their usage syntax for listing/removing/copying files/directories are (it is recommended to unset the environment when executing gfal commands, i.e. to add env -i in front of the commands):
>
>
The most useful gfal commands and their usage syntax for listing/removing/copying files/directories are in the examples below (it is recommended to unset the environment when executing gfal commands, i.e. to add env -i in front of the commands). See also the man entry for each command (man gfal-ls etc.):
  List a (remote) path:

Line: 395 to 411
 
env -i X509_USER_PROXY=/tmp/x509up_u$UID gfal-copy <physical-path-name-to-source-file> file://<absolute-path-to-local-destination-directory>
Added:
>
>
Note: the <absolute-path-to-local-destination-directory> starts with / therefore there are three consecutive / characters like file:///tmp/somefilename.root
 
<!--/twistyPlugin-->

Revision 312015-08-08 - StefanoBelforte

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 384 to 384
 env -i X509_USER_PROXY=/tmp/x509up_u$UID gfal-rm <physical-path-name-to-file>
Changed:
<
<
Remove a (remote) directory:
>
>
Recursively remove a (remote) directory and all files in it:
 
env -i X509_USER_PROXY=/tmp/x509up_u$UID gfal-rm -r <physical-path-name-to-directory>

Revision 302015-08-04 - AndresTanasijczuk

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 353 to 353
  From the above output the important information to copy is the pfn field. This is the physical path name.
Changed:
<
<
Before executing the gfal commands, make sure to have a valid proxy and have it exported in the environment variable X509_USER_PROXY:
>
>
Before executing the gfal commands, make sure to have a valid proxy:
 
voms-proxy-init -voms cms
Line: 370 to 370
 Your proxy is valid until
Deleted:
<
<
export X509_USER_PROXY=/tmp/x509up_u$UID
# or
#export X509_USER_PROXY=/tmp/x509up_u`id -u`
# or
#export X509_USER_PROXY=`voms-proxy-info --path`
 The gfal commands and their usage syntax for listing/removing/copying files/directories are (it is recommended to unset the environment when executing gfal commands, i.e. to add env -i in front of the commands):

List a (remote) path:

Changed:
<
<
env -i gfal-ls <physical-path-name-to-directory>
>
>
env -i X509_USER_PROXY=/tmp/x509up_u$UID gfal-ls <physical-path-name-to-directory>
 

Remove a (remote) file:

Changed:
<
<
env -i gfal-rm <physical-path-name-to-file>
>
>
env -i X509_USER_PROXY=/tmp/x509up_u$UID gfal-rm <physical-path-name-to-file>
 

Remove a (remote) directory:

Changed:
<
<
env -i gfal-rm -r <physical-path-name-to-directory>
>
>
env -i X509_USER_PROXY=/tmp/x509up_u$UID gfal-rm -r <physical-path-name-to-directory>
 

Copy a (remote) file to a directory in the local machine:

Changed:
<
<
env -i gfal-copy <physical-path-name-to-source-file> file://<absolute-path-to-local-destination-directory&gt;
>
>
env -i X509_USER_PROXY=/tmp/x509up_u$UID gfal-copy <physical-path-name-to-source-file> file://<absolute-path-to-local-destination-directory&gt;
 
<!--/twistyPlugin-->

Revision 292015-07-31 - AndresTanasijczuk

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 338 to 338
 The pile-up files have be specified in the CMSSW parameter-set configuration file. There is no way yet to tell in the CRAB configuration file that one wants to use a pile-up dataset as a secondary input dataset. That means that CRAB doesn't know that the CMSSW code will want to access pile-up files; CRAB only knows about the primary input dataset (if any). This means that, assuming there is a primary input dataset, when CRAB does data discovery to figure out to which sites should it submit the jobs, it will only take into account the input dataset specified in the CRAB configuration file (in the Data.inputDataset parameter) and submit the jobs to sites where this dataset is hosted. If there is no primary input dataset, CRAB will submit the jobs to the less busy sites. In any case, if the pile-up files are not hosted in the execution sites, they will be accessed via AAA (Xrootd). But reading the "signal" events directly from the local storage and the pile-up events via AAA is more inefficient than doing the other way around, since for each "signal" event that is read one needs to read in general many (> 20) pile-up events. Therefore, it is highly recommended that the user forces CRAB to submit the jobs to the sites where the pile-up dataset is hosted by whitelisting these sites using the parameter Site.whitelist in the CRAB configuration file. Note that one also needs to set Data.ignoreLocality = True in the CRAB configuration file in case of using a primary input dataset so to avoid CRAB doing data discovery and eventually complain (and fail to submit) that the input dataset is not available in the whitelisted sites. One can use DAS to get the list of sites that host a dataset. </>
<!--/twistyPlugin-->
Added:
>
>

How to list/copy/remove files/directories in a storage element area?

<!--/twistyPlugin twikiMakeVisibleInline-->
You can use the gfal-* commands from a machine that has GFAL2 utility tools installed (e.g. lxplus). You have to pass physical path names as arguments to the commands. To get the physical path name given a logical path name and a CMS node name, you can use the lfn2pfn PhEDEx API passing the following query data: {'protocol': 'srmv2', 'node': '<CMS-node-name>', 'lfn': '<logical-path-name>'}. For example, for the LFN /store/user/username/myfile.root stored in T2_IT_Pisa, the URL to query the API would be https://cmsweb.cern.ch/phedex/datasvc/xml/prod/lfn2pfn?protocol=srmv2&node=T2_IT_Pisa&lfn=/store/user/username/myfile.root, which would show an output in xml format as this:

- <phedex request_timestamp="1438345813.61989" instance="prod" request_url="http://cmsweb.cern.ch:7001/phedex/datasvc/xml/prod/lfn2pfn"
  request_version="2.3.21-comp3" request_call="lfn2pfn" call_time="0.00454" request_date="2015-07-31 12:30:13 UTC">
    <mapping protocol="srmv2" custodial="" destination="" space_token="" node="T2_IT_Pisa" lfn="/store/user/username/myfile.root" 
    pfn="srm://stormfe1.pi.infn.it:8444/srm/managerv2?SFN=/cms/store/user/username/myfile.root"/>
  </phedex>

From the above output the important information to copy is the pfn field. This is the physical path name.

Before executing the gfal commands, make sure to have a valid proxy and have it exported in the environment variable X509_USER_PROXY:

voms-proxy-init -voms cms

Enter GRID pass phrase for this identity:
Contacting voms2.cern.ch:15002 [/DC=ch/DC=cern/OU=computers/CN=voms2.cern.ch] "cms"...
Remote VOMS server contacted succesfully.


Created proxy in /tmp/x509up_u<user-id>.

Your proxy is valid until <some date-time 12 hours in the future>

export X509_USER_PROXY=/tmp/x509up_u$UID
# or
#export X509_USER_PROXY=/tmp/x509up_u`id -u`
# or
#export X509_USER_PROXY=`voms-proxy-info --path`

The gfal commands and their usage syntax for listing/removing/copying files/directories are (it is recommended to unset the environment when executing gfal commands, i.e. to add env -i in front of the commands):

List a (remote) path:

env -i gfal-ls <physical-path-name-to-directory>

Remove a (remote) file:

env -i gfal-rm <physical-path-name-to-file>

Remove a (remote) directory:

env -i gfal-rm -r <physical-path-name-to-directory>

Copy a (remote) file to a directory in the local machine:

env -i gfal-copy <physical-path-name-to-source-file> file://<absolute-path-to-local-destination-directory>
<!--/twistyPlugin-->
 

-- AndresTanasijczuk - 23 Oct 2014 \ No newline at end of file

Revision 282015-07-29 - AndresTanasijczuk

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 267 to 267
 
<!--/twistyPlugin twikiMakeVisibleInline-->
The most common reasons for this error are:
  1. The user is trying to analyze an input dataset, but he/she has specified in the CRAB configuration file JobType.pluginName = 'PrivateMC' instead of JobType.pluginName = 'Analysis'.
Changed:
<
<
  1. The user is generating MC events, correctly specifying in the CRAB configuration file JobType.pluginName = 'PrivateMC', but in the CMSSW parameter set configuration he/she has specified a PoolSource. The solution is to not define any PoolSource in the CMSSW parameter set configuration.
>
>
  1. The user is generating MC events, correctly specifying in the CRAB configuration file JobType.pluginName = 'PrivateMC', but in the CMSSW parameter set configuration he/she has specified a source of type PoolSource. The solution is to not specify a PoolSource. Note: This doesn't mean to remove process.source completely, as this attribute must be present. One could set process.source = cms.Source("EmptySource") if no input source is used.
 
<!--/twistyPlugin-->

CRAB Client API

Revision 272015-07-29 - AndresTanasijczuk

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 266 to 266
 
<!--/twistyPlugin twikiMakeVisibleInline-->
The most common reasons for this error are:
Changed:
<
<
  • The user is trying to analyze an input dataset, but he/she has specified in the CRAB configuration file JobType.pluginName = 'PrivateMC' instead of JobType.pluginName = 'Analysis'.
  • The user is generating MC events, correctly specifying in the CRAB configuration file JobType.pluginName = 'PrivateMC', but in the CMSSW parameter set configuration he/she has specified a PoolSource. The solution is to remove the PoolSource from the CMSSW parameter set configuration.
>
>
  1. The user is trying to analyze an input dataset, but he/she has specified in the CRAB configuration file JobType.pluginName = 'PrivateMC' instead of JobType.pluginName = 'Analysis'.
  2. The user is generating MC events, correctly specifying in the CRAB configuration file JobType.pluginName = 'PrivateMC', but in the CMSSW parameter set configuration he/she has specified a PoolSource. The solution is to not define any PoolSource in the CMSSW parameter set configuration.
 
<!--/twistyPlugin-->

CRAB Client API

Revision 262015-07-28 - AndresTanasijczuk

Line: 1 to 1
 
META TOPICPARENT name="SWGuideCrab"
<!-- /ActionTrackerPlugin -->
Line: 60 to 60
 

CRAB cache

Added:
>
>

What is the maximum allowed size of the user input sandbox?

<!--/twistyPlugin twikiMakeVisibleInline-->
100 MB.
<!--/twistyPlugin-->

What are the files CRAB adds to the user input sandbox?

<!--/twistyPlugin twikiMakeVisibleInline-->
CRAB adds to the user input sandbox the following directories/files:
  • The directories $CMSSW_BASE/lib, $CMSSW_BASE/biglib and $CMSSW_BASE/module. One can also tell CRAB to include the directory $CMSSW_BASE/python by setting JobType.sendPythonFolder = True in the CRAB configuration.
  • Any data and interface directory recursively found in $CMSSW_BASE/src.
  • All additional directories/files specified in the CRAB configuration parameter JobType.inputFiles.
  • The original CRAB configuration file (added as debug/crabConfig.py).
  • The original CMSSW parameter-set configuration file (added as debug/originalPSet.py).
  • The tweaked CMSSW parameter-set configuration file in pickle format (added as PSet.pkl) plus a simple PSet.py file to load the pickle file.
<!--/twistyPlugin-->
 

How can I clean my user cache area in the CRAB server cache?