July 2014 Reports

To the main

31 July 2014 (Thursday)

  • MC and User jobs mostly
  • T0:
    • Problem with loss of sandboxes physical machine (volhcb15: INC0612499 - disk failure?) led to loss of many jobs; now migrated to VM based replacement with an OpenStack managed volume for the sandboxes. (If sandboxes aren't accessible, many user jobs have effectively failed.)
    • Also loss of physical machine for lhcb-logs.cern.ch ( INC0612668 - disk failure again? ) has disrupted our monitoring; again migrating to a VM instance as a replacement.
    • We've increased the number of LHCb ops team people with admin access to the OpenStack tenancy following the lbvobox11 scenario which CERN ops can't yet handle themselves.
  • T1:
  • Tuesday downtime of central LHCb DIRAC services for database migration etc completed successfully.

28 July 2014 (Monday)

  • MC and User jobs mostly
  • T0:
    • Problem with another VO box, lbvobox11, (Alarm GGUS:107269) ongoing, due to overloading.
  • T1: Overnight problem with transfers to RRC-KI (DNS related?) were ended by a scheduled downtime in the morning.
  • We are having a one-hour downtime tomorrow (29 Jul) at 2:30pm CERN time. Database migration and some other updates. Almost all our central services will be unavailable for part of this time.

21 July 2014 (Monday)

  • MC and User jobs mostly
  • T0:
  • T1: Problem with transfers from SARA-NCBJ (GGUS:106949 against NCBJ) ongoing.

21 July 2014 (Monday)

  • MC and User jobs mostly
  • T0:
    • Problem with lbvobox14 (Alarm GGUS:107065). Hardware problem yesterday and machine is still not back in operation. Would really really like to know an ETA for the machine as we are debating what to do with some services which were on the machine. They will need to be moved if the machine is not back today, but it will need a lot of effort (customisation).
    • Continuing problem with lcg-voms2 (GGUS:107014)
    • Awaiting CERN update on GGUS:106434 about open files at CERN with Brazilian proxies. Need to let us know if the fix has been rolled out to production machines also.
  • T1: Problem with transfers from SARA-NCBJ (GGUS:106949 against NCBJ). It is only this channel which is having a problem. All transfers between other destinations for both these sites are fine.

17 July 2014 (Thursday)

  • MC(74%) and User(26%) jobs only, with no critical problems.
  • T0: NTR
  • T1: NTR

10 July 2014 (Thursday)

  • Mainly MC(80%) and User(19%) jobs, with no critical problems.
  • T0: NTR
  • T1: NTR

7 July 2014 (Thursday)

  • No meeting(WLCG workshop in Barcelona) - No critical problems, running MC and user jobs.
  • T0: NTR
  • T1: NTR

-- JoelClosier - 31 Mar 2014


This topic: LHCb > WebHome > LHCbComputing > ProductionOperations > ProductionOperationsWLCG2014Reports > ProductionOperationsWLCGJuly14Reports
Topic revision: r2 - 2014-08-18 - JoelClosier
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback