Week of 090525

WLCG Baseline Versions

WLCG Service Incident Reports

GGUS Team / Alarm Tickets during last week

Archive of Broadcasts

Weekly VO Summaries of Site Availability

Daily WLCG Operations Call details

To join the call, at 15.00 CE(S)T Monday to Friday inclusive (in CERN 513 R-068) do one of the following:

  1. Dial +41227676000 (Main) and enter access code 0119168, or
  2. To have the system call you, click here

General Information

See the weekly joint operations meeting minutes

Additional Material:

Monday:

Attendance: local( Jamie, Gang, Stephane, Nick, Harry, Andrea, Patricia, MariaD, Simone, Roberto, Diana, Dirk);remote(Andrea (CNAF), JT, Daniele ).

Experiments round table:

  • ATLAS - (Stephane/Simone) Issue with LFC at PIC on the weekend triggered discussion on mechanism for sites to contact experiment experts. ATLAS proposes to setup email list / egroup, with experiment experts which can be used by a small group of know T1 site contacts (10-20 in total) to get in contact with experiment expert on call (and possibly later Point 1 experts). MariaG: Could VOMS groups + email list be used? Or GGUS tickets? Simone: would need to be able to generate SMS. Jamie: traceability of this channel would be useful. On the technical side of the problem: ATLAS was trying to delete LFC entries at PIC with the central deletion tool. This has triggered instability of the PIC LFC instance, which had not been seen in similar activities/load before. The full analysis is still going on, but the problem seems to be related to particular non-canonical ACL values rather than unusual load. PIC experts and LFC support are working on the remaining cleanup and understanding of the problem (ticket 48980). Simone: T2 transfers from Taiwan are enabled again as of Sunday. No problems observed so far. Jeff mentioned problems on Sunday with failing sam tests due to expired proxies. Andrea saw similar problems for submissions via the WMS, even though direct submission is working fine. To be checked by the ATLAS.

  • CMS reports - (Daniele) Daniele reminded sites to consult the wiki documents which have been put in place in preparation for STEP09. He asked if June 1st (CERN holiday) will be considered first full day of STEP09. Jamie: June 1st will be a best effort day.

  • ALICE - (Patricia) No major issues during the weekend - smooth running. Some problems at CERN T0 with jobs which required manual killing. Not clear yet why this happens apparently only at CERN. New T2 at Hiroshima is ramping up. Still some issues with submission via their VObox. Working on the issue together with another new site in Spain.

  • LHCb reports - (Roberto) MC09 production going on. Cooling problems at Lyon and problems with CRL at CNAF (being worked on by CNAF experts). Some jobs had been killed by NIKHEF watchdog - Jeff added that the problem has been analysed and was due to external database connections.

Sites / Services round table:

  • CNAF (Andrea) - working on LHCb certificate problems and also cross checking on other nodes to avoid similar problems. The site also saw some job submission problems (~20 jobs died) which are being analysed.

  • NIKHEF(Jeff): dCache upgrade to 1.9.2-5 showed problems with memory consumption (>8GB) for gsi-dcap doors. The issue has been forwarded to dCache support and the site is preparing for a possible downgrade to 1.9.0-10 in case the problem cannot be fixed quickly. Simone/Jeff warned other sites to take this into account in case of planned dcache upgrades.

  • ASGC(Gang): SRM problems last week have been identified as another instance of the Oracle "BigID" problem.

  • CERN-PROD ATLAS issue with Castor SRM last Thursday not yet understood. Service request opened to developers. SIR will follow once we understand better.

AOB:

Tuesday:

Attendance: local( Nick, Roberto, Julia, Gavin, MariaD, Gang, Dirk);remote( Daniele, JohnK (RAL), Andrea(CNAF), JT, Michael ).

Experiments round table:

  • ATLAS - (Simone) ASGC progress: ATLAS started to test recall from tape functionality. Recalls without space tokens results in correct space. The access to a VO box for Simone has been appreciated and simplifies the recommissioning. Still iusse with small number of available tape drives(2) which are shared with CMS. Further 6 drives are expected to become available soon (ASGC is waiting for IBM intervention). Simone remarked that some new recall request have overtaken older ones and asked the site to investigate.

  • CMS reports - (Daniele) CMS CRL UI expired at CERN (ticket 49039). CASTOR upgrade for CMS finished smoothly with some minor issues. The upgrade broke some CMS download scripts as the stager query format had changed. CMS has in the meantime adjusted their production fedex to cope with the unintended change. The old output format will be restored with the upcoming minor upgrade to 2.1.8-8. Daniele reported that CMS got confirmation from ATLAS for multi-vo tape i/o tests. A more concrete schedule is expected after a meeting between CMS and ATLAS tomorrow. Prestaging test have started at CNAF and will continue with other sites. Also transfer test samples have been prepared and tests will soon proceed.

  • ALICE -

  • LHCb reports - (Roberto) Monte Carlo production is continuing smoothly but issues gsi-dcap have been observed at Lyon and during the weekend at GridKA. Roberto reported an issue with the UI in CERN AFS. Gavin added that this issue is currently being looked at by the support team in FIO.

Sites / Services round table:

  • RAL (John): RAL had scheduled downtime for a network change which unfortunately failed. The intervention will have to be repeated.

  • CNAF (Andrea) : Several tickets solved and closed. Upgraded to latest glite version and did not observe any problems so far.

  • BNL (Michael) : BNL upgraded to latest version of ATLAS site services. Michael mentioned some issues but will provide more detail later. Simone added that this change was required as two weeks ago BNL did not properly mark locations in DDM catalog when BNL was still using a candidate release for the site services. This issue is now fixed with the upgrade, but it is not clear yet what happened to failed location from before. ATLAS will follow up with the site.

  • NIKHEF/SARA (Jeff) : Still discussions / investigations on dcache downgrade or possibly upgrade to new patch level - triggered by problems after recent dcache upgrade.

  • CERN(Gavin): ATLAS SRM issue of last Thu have now been understood.

AOB: (MariaDZ) Since the GGUS May 2009 Release (last week) one can now see the LHCOPN tickets linked from the GGUS homepage (bottom left). I am putting this link in these minutes' template. When people need more info a network person will be called to the wlcg-operations' meeting. Discussed today at a LHCOPN meeting too. We need OSG, at least BNL feedback on the provision of site contact and emergency email for use by GGUS. This was discussed in 2 dedicated meetings (last December and March). Detail in http://savannah.cern.ch/support/?107531. About the ATLAS requirement presented yesterday, please check https://savannah.cern.ch/support/?108277, we should refine the requirement in this ticket.

Wednesday

Attendance: local();remote().

Experiments round table:

  • ATLAS -

  • ALICE -

Sites / Services round table:

AOB: (MariaDZ): USAG meeting tomorrow agenda http://indico.cern.ch/conferenceDisplay.py?confId=59811

Thursday

Attendance: local();remote().

Experiments round table:

  • ATLAS -

  • ALICE -

Sites / Services round table:

AOB:

Friday

Attendance: local();remote().

Experiments round table:

  • ATLAS -

  • ALICE -

Sites / Services round table:

AOB:

-- JamieShiers - 25 May 2009

Edit | Attach | Watch | Print version | History: r19 | r9 < r8 < r7 < r6 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r7 - 2009-05-26 - DirkDuellmann
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback