Week of 130520

Daily WLCG Operations Call details

To join the call, at 15.00 CE(S)T Monday to Friday inclusive (in CERN 513 R-068) do one of the following:

  1. Dial +41227676000 (Main) and enter access code 0119168, or
  2. To have the system call you, click here
The scod rota for the next few weeks is at ScodRota

WLCG Availability, Service Incidents, Broadcasts, Operations Web

VO Summaries of Site Usability SIRs Broadcasts Operations Web
ALICE ATLAS CMS LHCb WLCG Service Incident Reports Broadcast archive Operations Web

General Information

General Information GGUS Information LHC Machine Information
CERN IT status board WLCG Baseline Versions WLCG Blogs GgusInformation Sharepoint site - LHC Page 1


Monday: Whit Monday holiday

Tuesday

Attendance:

  • local: AndreaV/SCOD, Jarka/ATLAS, Felix/ASGC, Eddie/Dashboard
  • remote: Xavier/KIT, Michael/BNL, John/RAL, Ulf/NDGF, Onno/NLT1, Lisa/FNAL, Rolf/IN2P3, Kyle/OSG, David/CMS, Pepe/PIC
Experiments round table:

  • CMS reports (raw view) -
    • Getting more work from MC reprocessing at all sites -- should have seen job counts increase significantly from late in the day on the 20th
    • GGUS:94104 -- this was from last week -- we have not seen a repeat of this problem. Continuing to follow up with operators who first reported it.
    • GGUS:94126 -- ticket originally about CVMFS black hole node at KIT -- then observing SE issues -- suspecting SE issues due to large number of CMS jobs running at KIT at the time & load they generated. Following up
    • Quite a few tickets issued on friday following up with T2 sites with recurring problems: GGUS:94145, GGUS:94146, GGUS:94147, GGUS:94148, GGUS:94149, GGUS:94150, GGUS:94151

  • ALICE -
    • NTR

Sites / Services round table:
  • Xavier/KIT: still having issues with tape libraries, the technician left this morning and said that all should be ok now
  • Michael/BNL: ntr
  • John/RAL: had a network intervention this morning, it went ok
  • Ulf/NDGF: there will be a 1h outage for dcache on Thursday morning
  • Onno/NLT1: mass storage system is currently in maintenance at SARA
  • Lisa/FNAL: ntr
  • Rolf/IN2P3: ntr
  • Kyle/OSG: ntr
  • Felix/ASGC: ntr
  • Pepe/PIC: ntr

  • Eddie/Dashboard: ntr

AOB: none

Thursday

Attendance:

  • local: AndreaV/SCOD, Jarka/ATLAS, Felix/ASGC, David/Dashboard, Steve/Grid, Przemek/Database, Luca/Storage, MariaD/GGUS
  • remote: Michael/BNL, Lisa/FNAL, Ulf/NDGF, Kyle/OSG, Gareth/RAL, Jeremy/Gridpp, Ronald/NLT1, Paolo/CNAF; Stefano/CMS

Experiments round table:

  • CMS reports (raw view) -
    • Production activity in full swing, everything OK
    • GGUS:94104 -- this was from last week -- we have not seen a repeat of this problem. Our operators had a question relating to the ticket, but we think we can close it since the original issue has not reoccurred.
    • No other relevant GGUS opened.

  • ALICE -
    • NTR

Sites / Services round table:

  • Michael/BNL: ntr
  • Lisa/FNAL: ntr
  • Ulf/NDGF: downtime foreseen for today has been postponed to next week, exact date to be scheduled
  • Kyle/OSG: reminder about maintenance next week on Tue
  • Gareth/RAL: Castor update foreseen for next week has been postponed and removed from GOCDB
  • Jeremy/Gridpp,: ntr
  • Ronald/NLT1: moved most of Nikhef WNs to EL6, a few nodes are still on EL5 and will be moved today, no issues seen so far
  • Paolo/CNAF: ntr
  • Felix/ASGC: ntr

  • David/Dashboard: ntr
  • Luca/Storage: rolling interventions scheduled on Castor DB to apply Oracle security patches next week:
    • Mon June 3, 14:00-16:00: PUBLIC, ALICE
    • Tue June 4, 09:30-11:30: CMS, ATLAS
    • Wed June 5, 14:00-16:30: Name server, Repack
    • Thu June 6, 13:00-15:00: LHCb
  • Przemek/Database: nta
  • Steve/Grid: CERN central services: WMS security update deployed on Wednesday afternoon, should be transparent, https://itssb.web.cern.ch/planned-intervention/security-update-wms-servers/22-05-2013
  • MariaD/GGUS: For information of GGUS interface developers: there have been cases when a ticket from status '(un)solved' was changed directly into status 'assigned'. A fix that will enter production with the coming GGUS release on 2013/06/05 sets the ticket to status 'reopened' first. Details in Savannah:137603 . Reminder: possible short (few minutes) unavailability of GGUS due to urgent network activities at KIT on 2013-05-27 between 5:30 and 6:00 UTC https://goc.egi.eu/portal/index.php?Page_Type=View_Object&object_id=122717&grid_id=0

AOB: back to normal schedule, next meeting will be on Monday May 27

Edit | Attach | Watch | Print version | History: r9 < r8 < r7 < r6 < r5 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r9 - 2013-05-23 - AndreaValassi
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback