Week of 150316

WLCG Operations Call details

  • At CERN the meeting room is 513 R-068.

  • For remote participation we use the Vidyo system. Instructions can be found here.

General Information

  • The purpose of the meeting is:
    • to report significant operational issues (i.e. issues which can or did degrade experiment or site operations) which are ongoing or were resolved after the previous meeting;
    • to announce or schedule interventions at Tier-1 sites;
    • to inform about recent or upcoming changes in the experiment activities or systems having a visible impact on sites;
    • to provide important news about the middleware;
    • to communicate any other information considered interesting for WLCG operations.
  • The meeting should run from 15:00 until 15:20, exceptionally to 15:30.
  • The SCOD rota for the next few weeks is at ScodRota
  • General information about the WLCG Service can be accessed from the Operations Web
  • Whenever a particular topic needs to be discussed at the daily meeting requiring information from site or experiments, it is highly recommended to announce it by email to wlcg-operations@cernSPAMNOTNOSPAMPLEASE.ch to make sure that the relevant parties have the time to collect the required information or invite the right people at the meeting.

Monday

Attendance:

  • local: Maria D. (SCOD), Maarten (ALICE), Massimo Lamanna (CERN Data mgnt), Ale di Gi (ATLAS), Mark Slater (LHCb), Ulrich S. (CERN Grid Services).
  • remote: Dimitri (KIT), Hung-Te Lee (ASGC), Rolf Rumler (IN2P3), Michael Ernst (BNL), Onno Zweers (NL_T1), Tiju (RAL), Ulf (NDGF), Sang-Un (KISTI), Pepe Flix (PIC and CMS), Matteo (CNAF), Kyle (OSG), Di Q. (Triumf).

Experiments round table:

ATLAS, ALICE and CERN Data Mgnt suggested a SIR from the CERN network team from the last Thu-Fri trouble as the CERN Service Status Board was giving some info but people weren't feeling the issues are clear until a solution was given past 9am on Fri. Maria D. as SCOD will follow-up with the network team.

  • ALICE -
    • CERN:
      • CASTOR team restored previous behavior of staging through Xrootd - thanks!
      • 2 team tickets opened on Fri because of issues due to network problem
      • 1 team ticket GGUS:112354 opened on Sun because of expired PK-GRID-CA CRL

  • LHCb reports (raw view) -
    • Distributed computing dominated by Monte Carlo and user activities.
      • T0: VOMS tickets : 112279, 112281
      • T1:
        • Issues with large number of SRM requests to PIC and IN2P3. We're investigating what the cause is.
Discussion at the meeting with contribution from NL_T1, PIC and IN2P3 led to the conclusion that the dCache update might be linked to the appearance of the many SRM requests being queued. Mark/Pepe will open a GGUS ticket to the dCache dev. team and sites will add their experience in the ticket diary.

Sites / Services round table:

  • ASGC: ntr
  • BNL: ntr
  • CNAF: ntr
  • FNAL: not connected
  • GridPP: not connected
  • IN2P3: nta. Commented on the LHCb SRM issue above.
  • JINR: not connected
  • KISTI: ntr
  • KIT: ntr
  • NDGF: will do dcache updates (probably 2.12.2) on Wednesday morning 7:30 - 8:30 UTC (from Ulf by email).
  • NL-T1: nta. Commented on the LHCb SRM issue above. They have a single instance so they can't see which is the affected VO but observe getTURL problems since the dCache upgrade to v.2.10.20.
  • NRC-KI: not connected
  • OSG: ntr
  • PIC: working on the SRM problem presented by LHCb
  • RAL: ntr
  • RRC-KI: not connected
  • TRIUMF: ntr

  • CERN batch and grid services: ntr
  • CERN storage services: ntr
  • Databases: not present
  • GGUS: not present
  • Grid Monitoring: not present
  • MW Officer: reports on Thursdays

AOB:

Thursday

Attendance:

  • local:
  • remote:

Experiments round table:

  • CMS reports (raw view) -
    • Since the CMS Computing and Offline week is ongoing, nobody will be available to join most likely
      • Please address follow-up questions to Christoph
    • Rather successful test of distributed PromptRECO using ~50% of Tier-1 CPU resources in multi-core mode
    • Had some issues with EOS, all under investigation by experts:

  • ALICE -

  • T0: VOMS tickets : 112279, 112281
  • T1:
    • Recent issue with overloading dCache SRMs with gfal_getturlfromsurl requests: Found to be due to a bug fix in the srm_ifce library which is used by gfal 1. Temporary fix of rolling back to the previous version is in place. Will soon be constructing TURls using string manipulation instead. Many, many thanks for the help in debugging this from both the sites and the dCache developers GGUS:112413

Sites / Services round table:

  • ASGC:
  • BNL:
  • CNAF:
  • FNAL:
  • GridPP:
  • IN2P3:
  • JINR:
  • KISTI:
  • KIT:
  • NDGF:
  • NL-T1:
  • OSG:
  • PIC:
  • RAL:
  • TRIUMF:

  • CERN batch and grid services:
  • CERN storage services:
  • Databases:
  • GGUS:
    • Network maintenance on central KIT infrastructure might affect GGUS from 23-Mar-15 06:00:00 to 23-Mar-15 09:00:00
    • Next GGUS release from 25-Mar-15 07:00:00 to 25-Mar-15 09:00:00.
  • Grid Monitoring:
  • MW Officer:

AOB: The CERN network team provided this page describing the 12-13 March incident.

-- AndreaSciaba - 2015-02-27

Topic attachments
I Attachment History Action Size Date Who Comment
Unknown file formatpptx MB-Mar-15.pptx r1 manage 2868.8 K 2015-03-17 - 09:38 PabloSaiz  
Edit | Attach | Watch | Print version | History: r19 | r17 < r16 < r15 < r14 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r15 - 2015-03-19 - MariaDimou
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback