Week of 140324

WLCG Operations Call details

  • At CERN the meeting room is 513 R-068.

  • For remote participation we use the Alcatel system. At 15.00 CE(S)T on Monday and Thursday (by default) do one of the following:
    1. Dial +41227676000 (just 76000 from a CERN office) and enter access code 0119168, or
    2. To have the system call you, click here

  • In case of problems with Alcatel, we will use Vidyo as backup. Instructions can be found here. The SCOD will email the WLCG operations list in case the Vidyo backup should be used.

General Information

  • The SCOD rota for the next few weeks is at ScodRota
  • General information about the WLCG Service can be accessed from the Operations Web

Monday

Attendance:

  • local: AndreaS, MariaA, MariaD, Kate, Daniel, Xavi, Jerome, Felix
  • remote: Daniela/LHCb, Ulf/NDGF, Lucia/CNAF, Tiju/RAL, Onno/NL-T1, Rolf/IN2P3-CC, Rob/OSG, Lisa/FNAL, Pepe/PIC, Dmitri/KIT, Sang-Un Ahn/KISTI

Experiments round table:

  • CMS reports (raw view) -
    • Still, quiet days
    • Production activities
      • HeavyIon rereco plus standard MC activities are rolling
    • Tickets on CERN:
      • INC:522849 : problems in reading from t0streamer. It is impacting Tier0 functionality tests. an also be seen at https://sls.cern.ch/sls/service.php?id=CASTORCMS_T0STREAMER
    • Tickets on T1s
    • Other tickets (newer on top)
      • GGUS:102334 T2_CH_CERN pilot issues -> still open; it seems it was an ARGUS overload
    • Miscellanea
      • this week is CMS Spring Offline & Computing week at CERN

  • ALICE -
    • NTR

  • LHCb reports (raw view) -
    • Stripping, MCsimulation and User jobs.
    • T0: NTR, would have been good to have the GOCDB announcement on LFC DB upgrade a bit earlier. It was announced 20 minutes before the downtime.
    • T1: NTR

Sites / Services round table:

  • ASGC: ntr
  • CNAF: ntr
  • FNAL: ntr
  • IN2P3-CC: working on the tickets quoted by CMS, exchanging information with CMS operations
  • KIT: "at risk" downtime scheduled this Wednesday from 0930 to 1030 UTC for network maintenance
  • KISTI: migrated to EMI-3 last week without problems, upgraded our xrootd redirector for disk to the latest version, 3.3.6
  • NDGF: tomorrow will upgrade dCache to 2.2.9. The downtime will probably be around 30', much less than what is published in GOCDB
  • NL-T1: would like to schedule a full day downtime on April 10 for system updates and a dCache upgrade. Let us know if there are any objections
  • PIC: ntr
  • RAL: ntr
  • OSG: ntr
  • CERN batch and grid services: a very high load on Argus was observed between 0900 and 1000 CET, seems to be generated by CMS users. Any idea of a likely explanation?
  • CERN storage services: tomorrow CASTORLHCB will be unavailable from 0800 to 1200 CET, to align with the database intervention. It might take much shorter than that.
  • Databases: The LHCb offline database has been upgraded to Oracle 12c this morning, everything went well

  • GGUS release on the 26th of March. The release will bring the possibility to notify multiple sites with one ticket, Shibboleth support, and implement several CMS specific requests. The service will be in downtime from 7:00 to 10:00 UTC, followed by ALARM tests as usual, including the russian Tier1s. Announcement on http://ggus.eu

AOB:

Thursday

Attendance:

  • local: MariaD (SCOD), MariaA, Felix, Jerome, Daniel, Maarten.
  • remote: Lisa/FNAL, Ulf/NDGF, Kyle/OSG, Tommaso/CMS, John Kelly/RAL, Rolf/IN2P3, Dennis/NL_T1, Alexei/ATLAS, Antonio/CNAF, Michael/BNL, Sang-Un/KISTI.

Experiments round table:

  • ATLAS reports (raw view) -
    • Central services
      • GGUS ok after the downtime
      • AGIS downtimes not cleaned if removed, Illinois downtime is correct, mygrid, indiana.
      • SS CRITICAL errors in e.g. Lyon and CNAF-ASGC
    • T1

  • CMS reports (raw view) -
    • Still, quiet days
    • Production activities
      • Heavy Ion rereco plus standard MC activities are rolling
      • Due to time limitations (QM conference), we have plans to deploy this (also) on the HLT farm
    • Tickets on CERN:
    • Tickets on T1s
    • Other tickets (newer on top)
      • GGUS:102334 T2_CH_CERN pilot issues -> still open; it seems it was an ARGUS overload
    • Miscellanea
      • this week is CMS Spring Offline & Computing week at CERN

  • ALICE -
    • NTR

Sites / Services round table:

  • BNL: ntr
  • FNAL: ntr
  • OSG: ntr
  • KISTI: ntr
  • RAL: ntr
  • IN2P3: ntr
  • NL_T1: ntr
  • PIC: ntr
  • NDGF:
    • The scheduled dCache upgrade did not take place last Tuesday, bugs were found in the new version. Next try will be on Tuesday April 1st.
    • A bad tape drive was found. ATLAS reads from Stockholm tapes currently fail. Investigating.
  • ASGC: A hardware failure was found on a tape drive. ATLAS is affected.
  • CNAF: ntr
  • Russian Tier1s: not connected

  • CERN Grid Services: ntr

  • GGUS
    • ticket update e-mails were not delivered for a while Tue early afternoon CET (GGUS:102645). This outage was a follow-up of the KIT mail problems of Monday (around 4pm). The changes done on Monday led to some permission problems which blocked the mail delivery. This report was provided by Guenter Grein, GGUS developer.
    • GGUS ALARM tests for CERN didn't send email to the operators. The issue is being followed up in Savannah:142611.

AOB:

  • ATTENTION: start of European Summer Time on Sun March 30 !

-- SimoneCampana - 20 Feb 2014

Edit | Attach | Watch | Print version | History: r14 < r13 < r12 < r11 < r10 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r14 - 2014-03-27 - MariaDimou
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback