Week of 090525

WLCG Baseline Versions

WLCG Service Incident Reports

GGUS Team / Alarm Tickets during last week

Archive of Broadcasts

Weekly VO Summaries of Site Availability

Daily WLCG Operations Call details

To join the call, at 15.00 CE(S)T Monday to Friday inclusive (in CERN 513 R-068) do one of the following:

  1. Dial +41227676000 (Main) and enter access code 0119168, or
  2. To have the system call you, click here

General Information

See the weekly joint operations meeting minutes

Additional Material:

Monday:

Attendance: local( Jamie, Gang, Stephane, Nick, Harry, Andrea, Patricia, MariaD, Simone, Roberto, Diana, Dirk);remote(Andrea (CNAF), JT, Daniele ).

Experiments round table:

  • ATLAS - (Stephane/Simone) Issue with LFC at PIC on the weekend triggered discussion on mechanism for sites to contact experiment experts. ATLAS proposes to setup email list / egroup, with experiment experts which can be used by a small group of know T1 site contacts (10-20 in total) to get in contact with experiment expert on call (and possibly later Point 1 experts). MariaG: Could VOMS groups + email list be used? Or GGUS tickets? Simone: would need to be able to generate SMS. Jamie: traceability of this channel would be useful. On the technical side of the problem: ATLAS was trying to delete LFC entries at PIC with the central deletion tool. This has triggered instability of the PIC LFC instance, which had not been seen in similar activities/load before. The full analysis is still going on, but the problem seems to be related to particular non-canonical ACL values rather than unusual load. PIC experts and LFC support are working on the remaining cleanup and understanding of the problem (ticket 48980). Simone: T2 transfers from Taiwan are enabled again as of Sunday. No problems observed so far. Jeff mentioned problems on Sunday with failing sam tests due to expired proxies. Andrea saw similar problems for submissions via the WMS, even though direct submission is working fine. To be checked by the ATLAS.

  • CMS reports - (Daniele) Daniele reminded sites to consult the wiki documents which have been put in place in preparation for STEP09. He asked if June 1st (CERN holiday) will be considered first full day of STEP09. Jamie: June 1st will be a best effort day.

  • ALICE - (Patricia) No major issues during the weekend - smooth running. Some problems at CERN T0 with jobs which required manual killing. Not clear yet why this happens apparently only at CERN. New T2 at Hiroshima is ramping up. Still some issues with submission via their VObox. Working on the issue together with another new site in Spain.

  • LHCb reports - (Roberto) MC09 production going on. Cooling problems at Lyon and problems with CRL at CNAF (being worked on by CNAF experts). Some jobs had been killed by NIKHEF watchdog - Jeff added that the problem has been analysed and was due to external database connections.

Sites / Services round table:

  • CNAF (Andrea) - working on LHCb certificate problems and also cross checking on other nodes to avoid similar problems. The site also saw some job submission problems (~20 jobs died) which are being analysed.

  • NIKHEF(Jeff): dCache upgrade to 1.9.2-5 showed problems with memory consumption (>8GB) for gsi-dcap doors. The issue has been forwarded to dCache support and the site is preparing for a possible downgrade to 1.9.0-10 in case the problem cannot be fixed quickly. Simone/Jeff warned other sites to take this into account in case of planned dcache upgrades.

  • ASGC(Gang): SRM problems last week have been identified as another instance of the Oracle "BigID" problem.

  • CERN-PROD ATLAS issue with Castor SRM last Thursday not yet understood. Service request opened to developers. SIR will follow once we understand better.

AOB:

Tuesday:

Attendance: local();remote().

Experiments round table:

  • ATLAS -

  • ALICE -

Sites / Services round table:

AOB: (MariaDZ) Since the GGUS May 2009 Release (last week) one can now see the LHCOPN tickets linked from the GGUS homepage (bottom left). I am putting this link in these minutes' template. When people need more info a network person will be called to the wlcg-operations' meeting. Discussed today at a LHCOPN meeting too. We need OSG, at least BNL feedback on the provision of site contact and emergency email for use by GGUS. This was discussed in 2 dedicated meetings (last December and March). Detail in http://savannah.cern.ch/support/?107531. About the ATLAS requirement presented yesterday, please check https://savannah.cern.ch/support/?108277, we should refine the requirement in this ticket.

Wednesday

Attendance: local();remote().

Experiments round table:

  • ATLAS -

  • ALICE -

Sites / Services round table:

AOB:

Thursday

Attendance: local();remote().

Experiments round table:

  • ATLAS -

  • ALICE -

Sites / Services round table:

AOB:

Friday

Attendance: local();remote().

Experiments round table:

  • ATLAS -

  • ALICE -

Sites / Services round table:

AOB:

-- JamieShiers - 25 May 2009

Edit | Attach | Watch | Print version | History: r19 | r7 < r6 < r5 < r4 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r5 - 2009-05-26 - MariaDimou
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback