Week of 130722

Daily WLCG Operations Call details

To join the call, at 15.00 CE(S)T Monday to Friday inclusive (in CERN 513 R-068) do one of the following:

  1. Dial +41227676000 (Main) and enter access code 0119168, or
  2. To have the system call you, click here

The scod rota for the next few weeks is at ScodRota

WLCG Availability, Service Incidents, Broadcasts, Operations Web

VO Summaries of Site Usability SIRs Broadcasts Operations Web
ALICE ATLAS CMS LHCb WLCG Service Incident Reports Broadcast archive Operations Web

General Information

General Information GGUS Information LHC Machine Information
CERN IT status board WLCG Baseline Versions WLCG Blogs GgusInformation Sharepoint site - LHC Page 1


Monday

Attendance:

  • local: AndreaS, Steve, Maarten, Ken/CMS, Eddie, Massimo
  • remote: Alexander/NL-T1, Michael/BNL, Peter/ATLAS, Xavier/KIT, Saverio/CNAF, Boris/NDGF, Nadia/IN2P3-CC, Kyle/OSG, Tiju/RAL, Lisa/FNAL, Wei-Jen/ASGC, Sang-Un/KISTI

Experiments round table:

  • ATLAS reports (raw view) -
    • T0/Central services
      • NTR
    • T1
      • BNL file recovery progressing
      • RAL disk server problem. All files successfully recovered.
      • SARA Transfer errors (Failed to contact on remote SRM) GGUS:95911. Fixed

  • CMS reports (raw view) -
    • Continuing 2011 legacy rereco activity and some Upgrade MC generation, everything pretty quiet
    • From last week, RAL GGUS:95825 was declared a Castor bug.
    • Castor T1TRANSFER service was degraded again over the weekend. But still not a real problem; probably deserves some discussion with CMS as to what the shifter instructions should be. [Massimo: the problem is related to an increasing mismatch between the CASTOR tuning and the evolution of the CMS usage patterns; it would be good to have a meeting where to discuss these issues.]
    • GGUS:95887 to T1_FR_IN2P3 is about file read errors. Ticket came Friday, site started working on it today.
    • Wigner T-Systems 100 Gbit link down for several hours over the weekend. [Maarten: Wigner vs. CERN computer centre should be transparent: don't mind too much about SLS errors unless they have a direct effect. Ken: will check if there was any impact on CMS.]

  • ALICE -
    • CERN: all CERN CEs started refusing proxies of ALICE and other VOs around 18:20 Sunday evening; fixed around 09:15 Monday morning (TEAM ticket GGUS:95914) [Maarten: the problem was most likely related to a bad ARGUS configuration and it was solved by Ulrich but there are no details in the ticket.]

Sites / Services round table:

  • ASGC: ntr
  • BNL: ntr
  • CNAF: ntr
  • FNAL: ntr
  • IN2P3-CC: ntr
  • KIT: ntr, just a reminder of a downtime starting tomorrow by closing the queues to update the SE for LHCb
  • KISTI: ntr
  • NDGF: SRM downtime tomorrow from 7 to 9, to update dCache and install a new host certificate; some disk pools will also be updated.
  • NL-T1: ntr
  • RAL: tomorrow, "at risk" intervention to the site firewall, followed from 10 to 16 by a CASTOR upgrade for LHCb and CMS
  • OSG: ntr
  • CERN batch and grid services: ntr
  • CERN storage services: just upgraded SRM for all experiments (apart from LHCb, which was upgraded last week). The upgrade should have been transparent.
  • Dashboards: ntr

AOB:

Thursday

Attendance:

  • local:
  • remote:

Experiments round table:

  • ATLAS reports (raw view) -
    • T0/Central services
      • ATLR database outage yesterday evening, looks fine now INCIDENT
    • T1
      • All except cern and bnl PRODDISK endpoints have been removed from ATLAS data management
      • FZK and DE T2 sites now configured to use FTS3

  • CMS reports (raw view) -
    • T1 disk/tape separation is getting done (RAL first site) and we are starting to nail out details of how to send users analysis jobs to T1's using pilots
    • Continuing 2011 legacy rereco activity and some Upgrade MC generation, everything pretty quiet
    • GGUS:95887 to T1_FR_IN2P3 about file read error solved by reconfiguring dCache there (more time to dCache movers). Details not understood.
    • GGUS:96102: file read error at T1_UK_RAL . Site now looking into this.
    • nothing else worth mentioning

  • ALICE -
    • NTR

Sites / Services round table:

AOB:

Edit | Attach | Watch | Print version | History: r6 < r5 < r4 < r3 < r2 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r5 - 2013-07-25 - MaartenLitmaath
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback