Week of 140714

WLCG Operations Call details

  • At CERN the meeting room is 513 R-068.

  • For remote participation we use the Alcatel system. At 15.00 CE(S)T on Monday and Thursday (by default) do one of the following:
    1. Dial +41227676000 (just 76000 from a CERN office) and enter access code 0119168, or
    2. To have the system call you, click here

  • In case of problems with Alcatel, we will use Vidyo as backup. Instructions can be found here. The SCOD will email the WLCG operations list in case the Vidyo backup should be used.

General Information

  • The SCOD rota for the next few weeks is at ScodRota
  • General information about the WLCG Service can be accessed from the Operations Web

Monday

Attendance:

  • local: Maria Alandes (chair, minutes), Giuseppe Bagliesi (CMS), Luca Canali (IT-DB), Maria Dimou (GGUS), Alessandro di Girolamo (ATLAS), Felix Lee (ASGC), Maarten Litmaath (ALICE)
  • remote: Thomas Bellman (NDGF), Michael Ernst (BNL), Tiju Idiculla (RAL), Lisa Giacchetti (FNAL), Dmitry Nilsen (KIT), Elisabeth Prout (OSG), Alexander Verkooijen (NL-T1), Matteo (CNAF)

Experiments round table:

Tiju informs that these were in fact two different disk servers and that both of them are now back in production.

  • CMS reports (raw view) -
    • No major issues, processing and production is continuing
      • CSA14 exercise ongoing: kick-off on Monday 7th
    • Problems with voms-proxy-init and CRL on lxplus5 (solved) GGUS:106789
    • T0
      • NTR
    • T1
      • NTR

  • ALICE -
    • NTR

Sites / Services round table:

  • ASGC: NTR
  • BNL: NTR
  • CNAF: Due to a kernel upgrade on SL6 machines, the computing power of the farm has temporarily decreased while the affected machines are being rebooted.
  • FNAL: NTR
  • GridPP: Not present
  • IN2P3: Not present
  • JINR: Not present
  • KISTI: Not present
  • KIT: NTR
  • NDGF: There will be a downtime for tape libraries next Wednesday and due to this fact ATLAS data will be unavailable during the whole day. The service is expected to be back in production on Thursday.
  • NL-T1: NTR
  • OSG: NTR
  • PIC: NTR (reported offline in email)
  • RAL: NTR
  • RRC-KI: Not present
  • TRIUMF: Not present

  • CERN batch and grid services: Not present
  • CERN storage services: Not present
  • Databases: NTR. Alessandro reports that the FTS 3 pilot and production service has been suffering from some instabilities and it is believed that it would be good to move the MySQL DB hosted in the same physical node to a diffetent machine. Luca replies that this is better to follow up with the DB on demand people to understand whether this in fact causes any overload and study the possibility of moving it to a different machine, which in principle should be possible.
  • GGUS: Reminder! Release this Wednesday with ALARM tests using new GGUS host cert. Maria adds that the release should be transparent in any case.
  • Grid Monitoring: Not present
  • MW Officer: Not present

AOB:

Thursday

Attendance:

  • local:
  • remote:

Experiments round table:

  • ATLAS reports (raw view) -
    • Tier0/1
      • CERNPROD_TZERO staging errors on Tuesday, overloaded due to heavy request, system protected itself (GGUS:106878)
      • Taiwan: Network issues, under investigation (GGUS:106736), transfers failing Taiwan as Source
      • FZK: Staging errors, decreasing since 9:00 UTC, under investigation

  • ALICE -
    • NTR

  • LHCb reports (raw view) -
    • MC(74%) and User(26%) jobs only, with no critical problems.
    • T0: NTR
    • T1: NTR

Sites / Services round table:

  • ASGC:
  • BNL:
  • CNAF:
  • FNAL:
  • GridPP:
  • IN2P3:
  • JINR:
  • KISTI:
  • KIT:
  • NDGF:
  • NL-T1:
  • OSG:
  • PIC:
  • RAL:
  • RRC-KI:
  • TRIUMF:

  • CERN batch and grid services:
    • FTS3 Software Upgrade , Tuesday afternoon next week, transparent ITSSB entry. software upgrade from 3.2.22 to 3.2.26. Includes work arounds to frequent crashes in underlying gridsite.
    • an attempt to upgrade our QA CEs went wrong on Monday and had to be rolled back. Due to this these CEs where unavailable for 2-3h
  • CERN storage services:
  • Databases:
  • GGUS: Msg from Guenter Grein (GGUS developer): During yesterday's alarm tests we faced a couple of problems with the new certificate. The reason for this is that the new certificate has attribute "X509v3 Extended Key Usage" and values "TLS Web Server Authentication, TLS Web Client Authentication" but not value "emailProtection". Therefore the verify operations at various T1 failed.We rolled back to the old certificate now. The old certificate is valid until July 28. Meanwhile our CA has to fix the attribute issue. Progress recorded in JIRA:1276
  • Grid Monitoring:
  • MW Officer:

AOB:

Topic attachments
I Attachment History Action Size Date Who Comment
Unknown file formatpptx 2014-07-15.pptx r2 r1 manage 2841.2 K 2014-07-14 - 15:44 MariaDimou Final GGUS slides for the 2014/07/15 WLCG MB
Edit | Attach | Watch | Print version | History: r14 | r12 < r11 < r10 < r9 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r10 - 2014-07-17 - UlrichSchwickerath
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback