Week of 120806

Daily WLCG Operations Call details

To join the call, at 15.00 CE(S)T Monday to Friday inclusive (in CERN 513 R-068) do one of the following:

  1. Dial +41227676000 (Main) and enter access code 0119168, or
  2. To have the system call you, click here
  3. The scod rota for the next few weeks is at ScodRota

WLCG Service Incidents, Interventions and Availability, Change / Risk Assessments

VO Summaries of Site Usability SIRs, Open Issues & Broadcasts Change assessments
ALICE ATLAS CMS LHCb WLCG Service Incident Reports WLCG Service Open Issues Broadcast archive CASTOR Change Assessments

General Information

General Information GGUS Information LHC Machine Information
CERN IT status board M/W PPSCoordinationWorkLog WLCG Baseline Versions WLCG Blogs   GgusInformation Sharepoint site - Cooldown Status - News


Monday

Attendance: local (AndreaV, Alexandre, Ken, Marc, LucaM, Philippe, Alessandro, Eva); remote (Ulf/NDGF, Kyle/OSG, Michael/BNL, Jhen-Wei/ASGC, John/RAL, Alexander/NLT1, Marc/IN2P3, Dimitri/KIT, Burt/FNAL).

Experiments round table:

  • ATLAS reports -
    • T1s:
      • PIC: Issue exporting T0 data to PIC during the night. GGUS:84833. Solved this morning. From PIC: "We had a huge load on ATLAS pools due to data replication to a new hardware. It was solved more than one hour ago but we'll keep the ticket opened a couple of hours until we are sure that current load won't affect data transfers anymore."

  • CMS reports -
    • LHC machine / CMS detector
      • Not much data this weekend due to series of unfortunate events.
    • CERN / central services and T0
      • Had to delay prompt reconstruction of some recent runs because of a problem updating databases that is still being investigated.
      • Some files have been inaccessible from Castor, see INC:151642. [LucaM: files were inaccessible from EOS (not Castor) due to two machines down, now fixed.]
    • Tier-1/2:
      • ASGC is having downtime today to fix Castor problems. Currently have open tickets for Hammercloud (GGUS:84658), SUM test (GGUS:84632), and MC production (SAV:130787).
    • Other:
      • Daniele Bonacorsi is the next CRC.

  • LHCb reports -
    • Ongoing issues with RAW export to Gridka. Still many files in "Ready" status. Very low transfer rate. Latest updates on tickets say that CERN see SRM timeouts? (GGUS:84550, GGUS:84778 (Alarm), GGUS:84670).
      • [Dimitri/KIT: problems ongoing with connectivity to tape, experts are following up. Marc: could these issues also explain the problem with low space on tape cache? Dimitri: not sure, will ask the experts.]
      • [LucaM: will do some debugging from CERN too, is this issue also seen at other sites? Marc: no, this is only Gridka.]
    • T1:
      • GridKa: Very low space left on GridKa-Tape cache. Possibly related related to above. Ticket opened (GGUS:84838)
      • IN2P3: Investigating why pilots aren't using CVMFS both here and at the Tier2. This caused some job failures last week.

Sites / Services round table:

  • Ulf/NDGF: Alice is writing a lot of data and we are having troubles coping with the data rates. Tomorrow the situation may get worse because of a network intervention, we will only have 2/3 of the writing capacity for three hours.]
  • Kyle/OSG: ntr
  • Michael/BNL: ntr
  • Jhen-Wei/ASGC: Castor downtime is ongoing, scheduled until 4pm UTC (6pm CET). Ale: will switch on a few transfers to Taiwan tonight but will wait for tomorrow morning before switching on T0 exports to Taiwan, after we are sure that everything is back to normal.]
  • John/RAL: ntr
  • Alexander/NLT1: ntr
  • Marc/IN2P3: ntr
  • Dimitri/KIT: nta
  • Burt/FNAL: ntr

  • Philippe/Grid: ntr
  • Eva/Databases: ntr
  • LucaM/Storage: nta
  • Alexandre/Dashboard: SRM voput tests failing for CMS/ASGC. [Jhen-Wei: due to the ongoing Castor intervention.]

AOB: none

Tuesday

Attendance: local();remote().

Experiments round table:

  • ALICE reports -
    • NDGF: experts at CERN and NDGF are looking into the unexpectedly high write rates reported yesterday and discovered a few issues that are being followed up.

Sites / Services round table:

AOB:

Wednesday

Attendance: local();remote().

Experiments round table:

Sites / Services round table:

AOB:

Thursday

Attendance: local();remote().

Experiments round table:

Sites / Services round table:

AOB:

Friday

Attendance: local();remote().

Experiments round table:

Sites / Services round table:

AOB:

-- JamieShiers - 02-Jul-2012

Edit | Attach | Watch | Print version | History: r16 | r7 < r6 < r5 < r4 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r5 - 2012-08-07 - MaartenLitmaath
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback