CRAB3 Troubleshooting and Problem Solving guide

Complete: 3 Go to SWGuideCrab

Tip, idea This page explains what to do if your CRAB task is not successful. Most of the times users can diagnose and solve problems themselves. This is quicker than asking for help.

Expected flow

Once you have run the code interactively on a significant number of events, the picture below illustrates the way CRAB3 should be used.


Important points are:

  • Majority of tasks manages to complete with 100% success (CRAB already tries to resubmit failed jobs and failed file transfers whenever this makes sense and the capabilities in this area are constantly being improved)
  • CRAB has extensive (and expanding) retry logic already. And some errors are very well reproducible (like most software !)
    • blind resubmission is more more likely to waste resources than help
  • Still CRAB retry logic may not be sufficient in some cases, like a site having a problem for longer than a few hours. In such situations users can push things along by manually resubmitting failed jobs using crab resubmit and possibly use this to specify for a few parameters a value different from the one used in initial task submission, see crab resubmit command description
  • Users should not interfere with internal CRAB working, so crab kill can only be used to completely halt all CRAB processing to make sure there's a stable picture
  • When things are done and all balls stopped rolling, a fraction of the task may still not have been successful. Users should consider the option of a recovery task
  • When failures are too many, starting over may be the only solution, but unless there was a global problem in the infrastructure, you need to make sure that the failure is not due to something in your code or configuration

How to deal with problems

  1. make sure you subscribe to the announcement forum, to be aware oif scheduled outages and global known problems
  2. read the documentation
  3. complete among the basic or advanced CRAB tutorials those appropriate to your use case, check what is different in your use situation, find what broke things
  4. change from the tutorial (or something that works) to your configuration one step at a time, so you can find where things broke
  5. check CRAB3FAQ
  6. check messages from crab status command and if they contain a suggestion, follow it
  7. if some jobs failed with an exit code, check in the JobExitCodes and try to diagnose the problem yourself, usually this can be done looking at the stdout from one job which failed. You can get this via the crab getlog --short command or via the link in the Job column of the dashboard page for your task
  8. make sure that your executable works by testing it locally on the same input file(s) as the failed jobs
  9. if problem is hard to find, consider running and debugging a failing job interactively using the crab preparelocal command
  10. if nothing helps, and your are really desperate, or suspect a bug in the tool, or a problem in the infrastructure, you can ask for support
  11. please be aware that bug reports can be followed up upon only if submitted with full details, and a recipe to reproduce them

-- StefanoBelforte - 2017-07-04

Topic attachments
I Attachment History Action Size Date Who Comment
PNGpng CRAB_flow.png r1 manage 48.7 K 2017-07-04 - 12:51 StefanoBelforte How to use CRAB
Edit | Attach | Watch | Print version | History: r6 < r5 < r4 < r3 < r2 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r6 - 2019-12-18 - StefanoBelforte
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback