-- HarryRenshall - 17 Apr 2009

Week of 090420

WLCG Baseline Versions

WLCG Service Incident Reports

  • This section lists WLCG Service Incident Reports from the previous weeks (new or updated only).

GGUS Team / Alarm Tickets during last week

Weekly VO Summaries of Site Availability

Daily WLCG Operations Call details

To join the call, at 15.00 CE(S)T Monday to Friday inclusive (in CERN 513 R-068) do one of the following:

  1. Dial +41227676000 (Main) and enter access code 0119168, or
  2. To have the system call you, click here

General Information

See the weekly joint operations meeting minutes

Additional Material:

Monday:

Attendance: local(Nick, Julia, Alessandro, Harry, Gavin, Ewan, Olof, Roberto, Patricia, Steve);remote(Angela, Michael, Gareth, Ronald).

Experiments round table:

  • ATLAS report - During the Weekend observed many Tier-1 failed, in particular PIC to INFN Tier-1 and FZK. GGUS tickets submitted. Problems solved at PIC and FZK. No news from INFN. Many small other issues observed. Also problem with Tier-2 (Toronto): starting from March was down for 80% of the time, sometime scheduled sometimes not but from the point of availability for ATLAS it's the same. How should ATLAS cope with such fluctuations. Harry: in MB it was decided that it's up to the experiment to decommission if site is considered unusable. If so, the information about the decommissioning should be made public. Harry will alert the Canadian Tier-2 federation contact person will a copy to the MB.

  • ALICE - (Patricia) Several issues: second VOBox GridKa Tier-1 is not performing well because of late proxy renewal. Should be increased to 48hrs. Second issue: Maarten said that several WMS at CERN suffered from large queues last Friday. The WMS were submitting to a site with which there had already been problems during Christmas. Patricia correct the repeated resubmissions in the ALICE software at that site (had not been patched) but that seems to have moved the problem to a different WMS, which looks like a problem with the WMS itself. Still investigating. Nick: there is already the GD WMS, which could be used for debugging in a production like environment. Patricia will start to use it.

  • LHCb reports - (Roberto) All hands-on MC09 preparation, which is the next round for physics MC production. In parallel there will be a test of the time-left utility used in by LHCb software to use up the CPU slots on the WNs. Other issues last couple of days: PIC problem with rouge user, which was immediately fixed. Also problem with CASTOR SRM publication CNAF BDII. Gavin: seems to be because CNAF is publishing with the wrong site name. Did not receive the notification about downtime extension at CNAF end of March. LHCb suggests that site register a new downtime instead of extension in this case. Nick: developers working on a solution for flagging the different types of extensions of downtime.

Sites / Services round table:

  • FZK (Angela) Seeing submission problems: globus-gma daemon dying but don't know why. Keep restarting it... Also confirms problems reported by ATLAS: it should have been fixed in the latest PBS patch but it doesn't seem to be the case.
  • BNL (Michael): NTR
  • RAL (Gareth): were some problems with national BDII last week. It's all working fine now and they have some ideas about the root cause. Put in downtime tomorrow while they repower service nodes (BDII, FTS,...). Also put in a 'at-risk' for CASTOR for next week: VDQM update (tape queue).
  • NIKHEF (Ronald): announcement: no worker nodes available upcoming Weekend. Will start draining queues from Wednesday (ATLAS) and Thursday (others).
  • CERN (Ewan): very long tape queue for CMS tape recalls. WMS problems for ALICE and also ATLAS.

AOB:

  • FZK: seeing SAM test failures because site BDII cannot be queried from Taiwan. Loosing a few percent availability per day. Could be related to network connection. Has anybody else seen this problem before? Ronald: yes, sometimes ago at NIKHEF.

Tuesday:

Attendance: local();remote().

Experiments round table:

  • ATLAS -

  • ALICE -

Sites / Services round table:

AOB:

Wednesday

Attendance: local();remote().

Experiments round table:

  • ATLAS -

  • ALICE -

Sites / Services round table:

AOB:

Thursday

Attendance: local();remote().

Experiments round table:

  • ATLAS -

  • ALICE -

Sites / Services round table:

AOB:

Friday

Attendance: local();remote().

Experiments round table:

  • ATLAS -

  • ALICE -

Sites / Services round table:

AOB:

Edit | Attach | Watch | Print version | History: r10 | r5 < r4 < r3 < r2 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r3 - 2009-04-20 - OlofBarring
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback