Week of 200810

WLCG Operations Call details

  • For remote participation we use the Vidyo system. Instructions can be found here.

General Information

  • The purpose of the meeting is:
    • to report significant operational issues (i.e. issues which can or did degrade experiment or site operations) which are ongoing or were resolved after the previous meeting;
    • to announce or schedule interventions at Tier-1 sites;
    • to inform about recent or upcoming changes in the experiment activities or systems having a visible impact on sites;
    • to provide important news about the middleware;
    • to communicate any other information considered interesting for WLCG operations.
  • The meeting should run from 15:00 Geneva time until 15:20, exceptionally to 15:30.
  • The SCOD rota for the next few weeks is at ScodRota
  • Whenever a particular topic needs to be discussed at the operations meeting requiring information from sites or experiments, it is highly recommended to announce it by email to wlcg-scod@cernSPAMNOTNOSPAMPLEASE.ch to allow the SCOD to make sure that the relevant parties have the time to collect the required information, or invite the right people at the meeting.

Best practices for scheduled downtimes



  • local:
  • remote:

Experiments round table:

  • CMS reports ( raw view) -
    • Overall very good CPU utilization: reached ~330k cores (~280k production, ~50k analysis) during the weekend, could be a new record for CMS
    • No big problems

  • ALICE -
    • NTR

  • LHCb reports ( raw view) -
    • Activity:
      • Usual MC, user and WG production.
    • Issues:

Sites / Services round table:

  • ASGC:
  • BNL: downtime scheduled for Aug. 24~25, dCache upgrade to v6.2, and enabling SRR.
  • EGI:
  • FNAL:
  • IN2P3:
    • Modifications in HTCondor configuration resulted in a decrease in the number of jobs for LHCb. It has been fixed.
    • Frontier issues caused by heavy load of traffic + a disk space problem. Now fixed. Disk space will be increased during the next maintenance.
  • JINR:
  • KIT:
    • Downtime for replacing the CMS dCache door node with two new (virtual) machines was extended from 10:00 to 13:30 CEST, because firewall ACLs and DNS records could not be updated faster.
    • Following this intervention, CMS@GridKa was down for almost two days, till Thursday morning, where we eventually found a misconfigured router, that made the CMS MC SAM tests fail.
    • Storage performance was dragged down by a load hot spot on two servers caused by extreme Alice activity. This indicates an issue with how Alice data is distributed, but we need to investigate this a bit more. We had to put an I/O limit for Alice on GPFS to improve the situation for literally all other clients.
    • Site downtime that was planned for September 26th already has been shifted to October 6th. Formal GOC-DB downtime announcement still pending.
  • NDGF:
  • NL-T1: Sara-matrix dCache upgrade on Wednesday 19th: https://goc.egi.eu/portal/index.php?Page_Type=Downtime&id=29218
  • NRC-KI:
  • OSG:
  • PIC:
  • RAL: NTR

  • CERN computing services: Nothing To Report.
  • CERN storage services:
  • CERN databases: Nothing To Report
  • Monitoring:
    • SiteMon draft availability reports for July sent around
    • Exceptionally, We would like to extend recomputation request period until the end of August
  • MW Officer:
  • Networks:
  • Security: NTR


This topic: LCG > WebHome > WLCGCommonComputingReadinessChallenges > WLCGOperationsMeetings > WLCGOpsMeetingWeek200810
Topic revision: r18 - 2020-08-10 - DavidBouvet
This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback