Shifter Checklist
There are some common tasks that shifters should be regularly checking.
eLogger
- Read the operations elogger
from the previous day to find out what has been happening.
DIRAC monitoring pages
- Production summary webpage
for the number of Active productions and the number of waiting, running and failed jobs in each production.
- The job plots from the accounting page, looking in particular at the number of failed jobs per site over the past day for the running productions. If one site has a large number of failed jobs then this is the one that you should start investigating first of all. See image below for the Failed jobs.
- The pilot summary page to check the number of pilots that are being submitted, running and aborted. If there are a lot of aborted jobs or no pilots being submitted/running then this indicates a problem which requires immediate investigation. Contact the developers.
- The site downtime calendar
. The sites that are in downtime should remain banned and those that have successfully come back from maintenance (SAM jobs passing etc) should be unbanned.
GGUS tickets
- The RSS feed
of open LHCb tickets in GGUS. This should always be checked before submitting a new ticket.
email lists
--
GreigCowan - 09 Dec 2008
This topic: LHCb
> WebHome >
LHCbComputing >
ProductionProcedures > ProductionProceduresDailyShifterChecklist
Topic revision: r1 - 2008-12-09 - GreigCowan