Shifter Checklist

There are some common tasks that shifters should be regularly checking.

eLogger

  • Read the operations elogger from the previous day to find out what has been happening.

DIRAC monitoring pages

  • Production summary webpage for the number of Active productions and the number of waiting, running and failed jobs in each production.
  • The job plots from the accounting page, looking in particular at the number of failed jobs per site over the past day for the running productions. If one site has a large number of failed jobs then this is the one that you should start investigating first of all. See image below for the Failed jobs.
  • The pilot summary page to check the number of pilots that are being submitted, running and aborted. If there are a lot of aborted jobs or no pilots being submitted/running then this indicates a problem which requires immediate investigation. Contact the developers.
  • The site downtime calendar. The sites that are in downtime should remain banned and those that have successfully come back from maintenance (SAM jobs passing etc) should be unbanned.

GGUS tickets

  • The RSS feed of open LHCb tickets in GGUS. This should always be checked before submitting a new ticket.

email lists

-- GreigCowan - 09 Dec 2008


This topic: LHCb > WebHome > LHCbComputing > ProductionProcedures > ProductionProceduresDailyShifterChecklist
Topic revision: r1 - 2008-12-09 - GreigCowan
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback