26th September (Monday)

  • Activity
    • Monte Carlo simulation, data reconstruction/stripping and user jobs on the Grid

  • Site Issues
    • T0:
      • Mass recursive deletion by a user causing EOS problems (GGUS:123957) - resolved

  • T1:
    • NIKHEF : CVMFS issues (GGUS:124026) - resolved
    • RAL : Diskserver down

  • Others : ARC CEs publishing incorrect numbers of Waiting and Running jobs by default. Please contact LHCb or ARC developers or other ARC sites if starting to deploy ARC CEs for LHCb.

19th September (Monday)

  • Activity
    • Monte Carlo simulation, data reconstruction/stripping and user jobs on the Grid

  • Site Issues
    • T0:
      • EOS timeout problem over the weekend that looks like it was to do with a mass recursive deletion by a user (GGUS:123957)
      • Serious problems with LSF at the end of last week. LSF CEs were down for ~30+ hours from Thursday morning. Will there be a formal incident report? (GGUS:123937)

    • T1:
      • NTR

12th September (Monday)

  • Activity
    • Monte Carlo simulation, data reconstruction/stripping and user jobs on the Grid

  • Site Issues
    • T0:
      • T-Systems cloud extension is removed from production mask.
      • CERN-PROD Multiple failures accessing and storing data to/from Castor srm-lhcb.cern.ch (GGUS:123821) in progress.
      • CERN-PROD request for Dual stack voms server for LHCb (GGUS:123799) in progress.
    • T1:
      • PIC Outage Downtime declared due to network and dCache upgrades on 14th September (next Wednesday)
      • NL-T1 warning for a network test to be conducted tomorrow morning (13th September) 6:00-8:00.
      • RAL Will be on Warning tomorrow (13th September) for maintenance on Tape Library.
      • IN2P3 there were problems with memory consumption on Turbo jobs. Being investigated.

5th September (Monday)

  • Activity
    • Monte Carlo simulation, data reconstruction/stripping and user jobs on the Grid

  • Site Issues
    • T0:
      • T-Systems cloud extension is in draining for the LHCb jobs
    • T1:
      • PIC new HTCondorCE ce13.pic.es is deployed and added to BDII and being added to the LHCb configuration
      • Disk server failure at RAL, 16 files lost

-- JoelClosier - 2017-01-10


This topic: LHCb > WebHome > LHCbComputing > ProductionOperations > ProductionOperationsWLCGdailyReports > ProductionOperationsWLCG2016Reports > ProductionOperationsWLCGSep16Reports
Topic revision: r1 - 2017-01-10 - JoelClosier
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback