Problematic cases: chasing those 0.1% of files which dont complete

Could someone experienced please add better description here

You will have looked at the current reconstruction at gross feature level. It will be running fine, few failed jobs, lots of green boxes, and the files associated with each production at 99.x% or above. It may seem that this is fine, but it may not be.

To understand why this is, you need to be aware that although many thousands of files are allocated to jobs, everything works on a run basis. Thus if a single file from any run is problematic then the reconstruction and/or merging of that run will not complete - and it may stall forever. If you see a production with 99.x% complete then you should investigate. Here are some possibilities.

  • By clicking on the production you can chose in the pull down menu to view the "run status". If you go through this you might see a run which is not complete. You can then look at the jobs associated with this by clicking on "show jobs"

  • By clicking on the production you can choose "file status". It there are many jobs in the assigned state which are not progressing you can click on "show files" and see where this is happening. Note the "task id" - this may be useful in identifying the jobs (if they exist).

  • By clicking on the production you can choose "file error count". This will identify if there are some files foe which many retries have been attempted.

-- PeterClarke - 14-Nov-2010

Edit | Attach | Watch | Print version | History: r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r1 - 2010-11-14 - PeterClarke
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LHCb All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback