Week of 130225

Daily WLCG Operations Call details

To join the call, at 15.00 CE(S)T Monday to Friday inclusive (in CERN 513 R-068) do one of the following:

  1. Dial +41227676000 (Main) and enter access code 0119168, or
  2. To have the system call you, click here
  3. The scod rota for the next few weeks is at ScodRota

WLCG Availability, Service Incidents, Broadcasts, Operations Web

VO Summaries of Site Usability SIRs Broadcasts Operations Web
ALICE ATLAS CMS LHCb WLCG Service Incident Reports Broadcast archive Operations Web

General Information

General Information GGUS Information LHC Machine Information
CERN IT status board WLCG Baseline Versions WLCG Blogs GgusInformation Sharepoint site - LHC Page 1


Monday

Attendance: local(Simone - SCOD , Alexandre - Dashboards, Jan - CERN Storage, Alessandro - ATLAS); remote(Ulf - NDGF, Michael - BNL, Joel - LHCb, MariaD - GGUS, Wei-Jen - ASGC, Saverio - CNAF, Tiju - RAL, Onno - NL-T1, Dimitri - KIT, Rob - OSG, Pepe - PIC)

Experiments round table:

  • ATLAS reports -
    • Central services
    • T0/1s
      • PIC_MCTAPE ~6,000 transfer failures: "server err.451.No write pools configured" Friday night. GGUS:91739 verified:an error creating tape file families. Fixed on Saturday at ~10am.
      • RAL-LCG2 ~10,000 transfer failures: "SOURCE:SRM_ABORTED". GGUS:91743 filed on Saturday, in progress.
      • FZK-LCG2: as from the "ongoing issues" at the top of the ADCOperationsDailyReports2013 - form UK sites file transfer problems: still observing these errors, mostly at FZK-LCG2_PERF-IDTRACKING and FZK-LCG2_SCRATCHDISK tokens at a rate about a thousand errors in 4 hours Friday at ~11pm. GGUS:87958 in progress updated.

  • ALICE reports -
    • Central services: this morning the AliEn catalogue DB was moved to a new, more powerful machine to sustain its steady growth.

  • LHCb reports -
    • Ongoing activity as before: reprocessing (CERN,IN2P3 + T2s), some prompt-processing (CERN + T2s), MC and user jobs.
    • T0:
    • T1: IN2P3 : (GGUS:91760) : authentication problem with one certificate used for Production. SARA : downtime

Sites / Services round table:

  • NDGF: there will be a three days interruption of the NDGF-CERN primary OPN link (starting from today). The backup should be working.
    • More infos collected after the meeting concerning the maintenance windows:
      • Maint. win. Start : 20130225 15:59 UTC
      • Maint. win. End : 20130225 22:59 UTC
      • Maint. win. Start : 20130227 15:59 UTC
      • Maint. win. End : 20130227 22:59 UTC
      • Maint. win. Start : 20130228 15:59 UTC
      • Maint. win. End : 20130228 22:59 UTC

  • ASGC: the ASGC-CERN network is dow. Under investigation.
    • More info collected after the meeting. From ASGC: "both IPLC 10G link and 2.5G link are disconnected since 04:47am GMT/UTC, Feb. 25, 2013. Out telecom carriers, CHT-i and FarEastone, have confirmed that the disconnection is caused by electric power down after a fire disaster event of the IPLC city pop(Chief Data Center located at Neihu, Taipei). The blaze has been stop, but the electric power cannot be rebooted until The Fire Brigade release the building. The IPLC link will remain in disconnection before electric power system restarted."
  • RAL: tomorrow morning the site will be AT RISK for maintenance
  • SARA: will be in downtime today and tomorrow (the intervention will include the migration of the WNs to SL6+EMI2)
  • KIT: VMEM has been set to 10GB per job slot under Alice request.
  • OSG: tomorrow there will be the monthly scheduled maintenance of central OSG operational services. Some services might have a very short outage.
  • PIC: after the discussion on proof-lite on grid sites last thursday PIC verified that 1% of jobs at PIC exploit multiple cores within the same job slot.
  • GGUS: Reminder! As announced last week, there will be a GGUS Release this Wednesday 2013/02/27 with ALARM tests as usual. The interface to Ibergrid changes, PIC is affected!

AOB:

Tuesday

Attendance: local();remote().

Experiments round table:

  • ATLAS reports -
    • Central services
      • NTR
    • T1s and network
      • TRIUMF-LCG2 many job failures "no such file or dir" (group gener input for shepra). GGUS:91766 in progress.
      • RRC-KI-T1 commissioning is ongoing.

Sites / Services round table:

  • From John Shade: 10G IPLC link of ASGC is recover at 19:45 UTC of Feb. 25, 2013. The communication among ASGC, LHCOPN, and LHCONE has been resumed.

AOB:

Wednesday

Attendance: local();remote().

Experiments round table:

Sites / Services round table:

AOB:

Thursday

Attendance: local();remote().

Experiments round table:

Sites / Services round table:

AOB:

Friday

Attendance: local();remote().

Experiments round table:

Sites / Services round table:

AOB:

Edit | Attach | Watch | Print version | History: r16 | r10 < r9 < r8 < r7 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r8 - 2013-02-26 - MaartenLitmaath
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback