Week of 131014

WLCG Operations Call details

To join the call, at 15.00 CE(S)T, by default on Monday and Thursday (at CERN in 513 R-068), do one of the following:

  1. Dial +41227676000 (Main) and enter access code 0119168, or
  2. To have the system call you, click here

The scod rota for the next few weeks is at ScodRota

WLCG Availability, Service Incidents, Broadcasts, Operations Web

VO Summaries of Site Usability SIRs Broadcasts Operations Web
ALICE ATLAS CMS LHCb WLCG Service Incident Reports Broadcast archive Operations Web

General Information

General Information GGUS Information LHC Machine Information
CERN IT status board WLCG Baseline Versions WLCG Blogs GgusInformation Sharepoint site - LHC Page 1


Monday

Attendance:

  • local: Belinda, Maarten, Pablo, Ulrich
  • remote: Alexey, David, Gareth, Kyle, Lucia, Onno, Pepe, Peter, Rolf, Sang-Un, Thomas, Wei-Jen, Xavier

Experiments round table:

  • ATLAS reports (raw view) -
    • NTR
      • issues from last Thursday have been addressed.

  • CMS reports (raw view) -
    • Continuing 2011 Legacy rereco and some MC production -- otherwise very little to report!
    • GGUS:97974 CVMFS bad node at KIT -- fixed right away
    • GGUS:97938 Transfers failing to T2_RU_SINP since end of September -- still waiting for news!

  • ALICE -
    • the outage due to the AliEn CA having expired was resolved Thu late afternoon; all production sites were working again by Fri afternoon
      • Pablo: the renewed CA has a lifetime of 20 years

  • LHCb reports (raw view) -
    • Main activities are incremental stripping (T0/1) and Simulation
    • T0: NTR
    • T1: NTR

Sites / Services round table:

  • ASGC
    • power cut this morning, all services down, working on recovery
  • CNAF - ntr
  • IN2P3 - ntr
  • KISTI - ntr
  • KIT
    • Wed-Thu at-risk downtime for tape library maintenance requiring a 1h cut on both days
  • NDGF
    • Wed morning FTS downtime to patch Oracle
  • NLT1
    • FTS downtime Tue
    • NDGF-NLT1 transfer failures reported in GGUS:97937: no network issue found on our side, can NDGF experts look into their dCache logs?
      • Thomas: will inform the experts
  • OSG
    • was there an issue with MyWLCG late last week?
      • Maarten: nothing we know of here
      • Pablo: there was a transparent DB cleanup on Tue
  • PIC - ntr
  • RAL - ntr

  • dashboards - ntr
  • GGUS: (Text entered by MariaD before leaving for CHEP)
    • Progress on the GGUS ALARMers' disappearance can be followed in GGUS:97755.
    • Progress on the missing operators' notification for the Oct 7th ALICE ALARM can be found in INC:406556 and GGUS:97817.
      • the problem was due to an inadvertent advance change of location of the files necessary to sign emails
      • a test alarm was sent OK to KIT after correcting the issue
      • a full set of alarm tests will be done on Oct 23 for the next GGUS release
  • grid services
    • high load on batch system last week, reaching the 300k maximum of jobs being managed; the 2 top users have throttled their jobs and the service is OK for the time being
      • the issue will be mitigated as more SLC6 capacity gets put into production
  • storage
    • CASTOR upgrades with downtimes from 09:00 to 14:00 CEST:
      • Oct 21 ATLAS
      • Oct 28 ALICE + CMS
      • Oct 29 LHCb + public
    • EOS upgrades:
      • Oct 23 ATLAS
      • others to be decided later

AOB:

Thursday

Attendance:

  • local: Belinda, Jacobo, Luca M, Maarten
  • remote: Alexey, David, Kyle, Lisa, Lucia, Michael, Pepe, Peter, Rolf, Ronald, Sang-Un, Thomas, Tiju, Wei-Jen, Xavier

Experiments round table:

  • ATLAS reports (raw view) -
    • Central services
      • FTS error with proxy handling. A known bug seen previously. Can central services take a look and understand the issue, perhaps apply a patch. (GGUS:97960) (ATLAS Twiki)
        • Maarten: there have been similar errors on a few FTS-2 instances in the last few months; those instances were running the latest version, so the problems would not have the same cause as last year; the first thing to try for the FTS admin is to restart the service, which will force the daemons to reload their memory caches and/or certain DB tables with up-to-date information; feel free to open tickets for the FTS developers, but these recurrent issues give more ammunition for a quick replacement of FTS-2 by FTS-3, hopefully still this year
    • Luca: GGUS:97662 is about a missing file in CASTOR-ATLAS, that got garbage-collected due to its directory having a no-tape file class, while ATLAS expected the file to have been on tape
      • Peter: the directory file class actually looks correct, will follow up with ADC colleagues and update the ticket

  • ALICE -
    • CERN: as of late yesterday evening first successful jobs on SLC6 using CVMFS
      • the SLC6 share of ALICE has been very low so far...
      • SLC5 jobs are still using Torrent for now
    • US T2 LLNL needed to be switched off due to government shutdown!
      • should be back again soon...

  • LHCb reports (raw view) -
    • Main activities are incremental stripping (T0/1) and Simulation
    • T0: NTR
    • T1: NTR

Sites / Services round table:

  • ASGC
    • recovered OK from power cut on Mon
  • BNL - ntr
  • CNAF - ntr
  • FNAL - ntr
  • IN2P3
    • on Oct 23 the SL6 batch capacity will increase from 50 to 90%
  • KISTI - ntr
  • KIT - ntr
  • NDGF - ntr
  • NLT1 - ntr
  • OSG
    • RSV transfers to SAM failed ~8h ago, the data are being sent again and the cause is under investigation
  • PIC - ntr
  • RAL - ntr

  • dashboards
    • SAM DB intervention Mon Oct 21 14:00 CEST, test results may be delayed by 30 min
  • storage
    • reminder: ATLAS CASTOR and EOS interventions on Mon and Wed next week

AOB:

Edit | Attach | Watch | Print version | History: r8 < r7 < r6 < r5 < r4 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r8 - 2013-10-17 - MaartenLitmaath
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback