Week of 130610

Daily WLCG Operations Call details

To join the call, at 15.00 CE(S)T Monday to Friday inclusive (in CERN 513 R-068) do one of the following:

  1. Dial +41227676000 (Main) and enter access code 0119168, or
  2. To have the system call you, click here

The scod rota for the next few weeks is at ScodRota

WLCG Availability, Service Incidents, Broadcasts, Operations Web

VO Summaries of Site Usability SIRs Broadcasts Operations Web
ALICE ATLAS CMS LHCb WLCG Service Incident Reports Broadcast archive Operations Web

General Information

General Information GGUS Information LHC Machine Information
CERN IT status board WLCG Baseline Versions WLCG Blogs GgusInformation Sharepoint site - LHC Page 1


Monday

Attendance:

  • local: Stefan, Maarten, Xavi, Ivan, Manuel
  • remote: Xavier, Maria, David, Lisa, Vladimir, Gareth, Wei-Jen, Paolo, Rob, Rolf,

Experiments round table:

  • ATLAS
    • T0/Central services
      • Jobs at OPENSTACK_CLOUD at CERN are failing because of bad credentials GGUS:94716 & INC:12844. Ongoing.
    • T1
      • NDGF-T1: GGUS:94670 Transfers from NDGF-T1 are failing with "An end of file occurred". Pool servers reset. Fixed.

  • CMS
    • Production at moderate levels with upgrade MC production
    • GGUS:94505, GGUS:94615 File read issues at RAL due to high load there.
    • GGUS:94741, GGUS:94748, GGUS:94750 File read issues at IN2P3, possible need for file replication. -- GGUS tickets appear to all stem from the same Savannah ticket
      • Followed up and fixed after the meeting by Guenter.
    • GGUS:94595 Test alarm ticket -- been quietly watching this -- anything we need to do here?
      • Maarten: no further action needed from your side

  • ALICE -
    • Russian network: on June 4 the GEANT link to Moscow was cut to 100 Mbit/s (GGUS:94540) and since that time there have been many job failures at most of the Russian sites due to timeouts. To alleviate the congestion and avoid job loss, the most affected ALICE sites have been closed for job processing Sun evening:
      • IHEP
      • JINR
      • MEPHI
      • PNPI

  • LHCb
    • Incremental stripping campaign in progress and MC productions ongoing
    • T0:
    • T1:
      • GRIDKA: Problem with staging during weekend; Solved

Sites / Services round table:

  • KIT: Problems with CMS squids, overloaded, used former ATLAS squids to mitigate the problem.
  • FNAL: NTR
  • RAL: NTR
  • ASGC: NTR
  • CNAF: NTR
  • OSG: NTR
  • IN2P3: Currently in downtime for MSS, due to robotics maintenance. Dcache will be upgraded to 2.2.10. Long DT tomorrow for several services, except FTS and operations portal. Adding CREAM-CE under SL6. Batch back Tue evening. Robotics will be back Wed morning.
  • Storage: Hotfix next Mo/Tue for VO experiments for Castor. 5 mins transparent change.
  • Dashboard: NTR
  • Data processing: NTR

AOB:

Thursday

Attendance:

  • local: Stefan, Manuel, Ivan, Maarten,
  • remote: Xavier, Stefano, Vladimir, Lisa, Ronald, Wei-Jen, Paolo, Rob, Gareth, Rolf, Jeremy, Pepe,

Experiments round table:

  • ATLAS
    • T1: RAL DIsk server down since this morning
      • ongoing
    • T1: RRC-KI-T1 decrease efficiency: GGUS:94827
    • T0/Central services
      • Jobs at OPENSTACK_CLOUD at CERN are succeeding again: GGUS:94716 & INC:312844. Ticket still open. Ongoing.

  • CMS
    • Production at moderate levels with upgrade MC production, but ...
    • ... close to the launch of 2011 MC reprocessing @ T1s - should be a matter of a few days after some effort needed on how to match 2011 data and recent releases.
    • 2011 Data instead will happen at CERN only; also in validation state. Also a matter of days for the start (could be this week).
    • GGUS:94505, GGUS:94615 File read issues at RAL due to high load there.
      • It was related to lazy-download being disabled by the Software. It was reapplied, and everything is back to normal. Tickets closed.
    • GGUS:94741, GGUS:94748, GGUS:94750 File read issues at IN2P3, possible need for file replication. -- GGUS tickets appear to all stem from the same Savannah ticket
      • They have been collapsed to a single ticket. It seems a retransfer is indeed needed, investigating whether there is again an effect from lazy-download being removed.
    • Some SLS alarms have been raised for Database access. It turned out it was a monitoring problem after some hostname change. Solved & no real effect.

  • ALICE -
    • KISTI: splendid participation in raw data production cycles for p-Pb periods!
    • KIT: site firewall got again overloaded due to jobs failing over to remote storage elements after suffering errors with the local SE. The KIT Xrootd and network experts are investigating. Meanwhile the concurrent jobs cap was lowered from 10k all the way down to 1k on Tue evening, then ramped up in the course of Wed and Thu morning to 4k at the moment. To be continued.

  • LHCb
    • Incremental stripping campaign in progress and MC productions ongoing
    • T0:
    • T1:
      • IN2P3: Alarm ticket (GGUS:94810) "SE return wrong tURL". Problem was fixed very quickly
        • Rolf: Issue related to the dcache version upgrade yesterday
    • Other: Web Server overloaded (GGUS:94824), under investigation.

Sites / Services round table:

  • CNAF: NTR
  • KIT: NTR
  • FNAL: NTR
  • NL-T1: NTR
  • ASGC: NTR
  • OSG: NTR
  • RAL: Issue of batch job starter, currently looking into it (GGUS from CMS was submittted)
  • IN2P3: NTA
  • GridPP: NTR
  • PIC: Last Monday DT to upgrade dcache head-nodes and interventions on network level. Network intervention was not successful and Tue night SRM service not stable, Wed morning the problem was fixed.
  • Dashboard: NTR
  • Grid services: NTR

AOB:

Edit | Attach | Watch | Print version | History: r8 < r7 < r6 < r5 < r4 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r8 - 2013-06-13 - StefanRoiser
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback