Week of 131202

WLCG Operations Call details

To join the call, at 15.00 CE(S)T, by default on Monday and Thursday (at CERN in 513 R-068), do one of the following:

  1. Dial +41227676000 (Main) and enter access code 0119168, or
  2. To have the system call you, click here

The scod rota for the next few weeks is at ScodRota

WLCG Availability, Service Incidents, Broadcasts, Operations Web

VO Summaries of Site Usability SIRs Broadcasts Operations Web
ALICE ATLAS CMS LHCb WLCG Service Incident Reports Broadcast archive Operations Web

General Information

General Information GGUS Information LHC Machine Information
CERN IT status board WLCG Baseline Versions WLCG Blogs GgusInformation Sharepoint site - LHC Page 1



  • local: Stefan, Manuel, Felix, Luca, Maarten
  • remote: Andrei, Michael, Stefano, Lisa, Onno, Alexei, Tiju, WooJin, Pepe, Christian, Rob,
  • apologies: Sang-Un, Rolf

Experiments round table:

  • CMS
    • ru-PNPI - During the installation of SL6 a severe power incident happened which led to burn-out of power supplies in their disk array. CMS disk space is not available for the moment.

  • ALICE -
    • NTR

  • LHCb
    • Main activities is Simulation at all Sites.
    • T0:
    • T1:
      • Network interruption at PIC on the weekend
      • GRIDKA: problems with file downloading presumably due to failure getting metadata from the local SRM

Sites / Services round table:

  • BNL: NTR
  • NL-T1: Currently two pool nodes are down because there is hardware maintenance on the attached storage controller. On december 5th there will be maintenance on a power feed. Not yet submitted to GOCDB.
  • RAL: NTR
  • KIT: NTR
  • PIC: Saturday incident, started ~ 6.25pm local time b/c of power supply problem, affected only WAN, fixed around 22.30. All local jobs were continuing to run ok.
  • NDGF: dCache upgrade is done and pools are coming back now
  • OSG: problem with SAM availabilities and reliabilities where the transferring data was not working correctly. The problem is fixed and should be cleared up shortly
  • ASGC: Castor server for CMS for HC and SAM test down, get it back asap.
  • KISTI: Scheduled downtime from 4th December 06:00 (UTC) to 09:00 (UTC) for network intervention. The network bandwidth for KISTI-CERN will be 2Gbps after the intervention.
  • IN2P3: downtime on December 10th (Major update for network equipment (routers for IPV6) Minor updates for: CVMFS, dCache servers, mass storage system, batch system controller), Consequences: Total network outage in the morning (no Grid Operations portal at that time neither, so no downtime notifications during 2 hours). Batch downtime starts already at December 9th in the evening, back in the evening of the 10th.
  • Storage: EOS ALICE DT in the morning for upgrade to latest version. Wednesday Castor upgrade for ALICE & LHCb, this morning ATLAS & CMS happened
  • Grid Services; Outage of batch service on Sat morning, not possible to submit jobs. B/c of faulty configuration. Down from 6am - 12am. SIR




  • local:
  • remote:

Experiments round table:

  • ALICE -

Sites / Services round table:

  • GGUS: For the Year End period: GGUS is monitored by a monitoring system which is connected to the on-call service. In case of total GGUS unavailability the on-call engineer (OCE) at KIT will be informed and will take appropriate action. Apart from that WLCG should submit an alarm ticket which triggers a phone call to the OCE.


Edit | Attach | Watch | Print version | History: r11 < r10 < r9 < r8 < r7 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r8 - 2013-12-04 - MariaDimou
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback