WLCG-OSG-EGEE Ops' Minutes Mon 16 Mar 2009

Summary

gLite 3.1 Update 42 (scheduled for 23-MAR-2009) will include VDT 1.6.1 Release 9 which fixes a Globus bug that affects most m/w services.
gLite 3.2 Update 01 (scheduled for 30-MAR-2009) will include new SL5 WNs
LHCb reported being plagued by WMS problems

Attendance

EGEE

  • Asia Pacific ROC: Absent
  • Central Europe ROC: Malgorzata Krakowian
  • OCC / CERN ROC: John Shade, Antonio Retico, Steve Traylen, Diana Bosio, Maria Dimou
  • French ROC: Pierre Girard and 0033478930880
  • German/Swiss ROC: Wen Mei
  • Italian ROC: Absent
  • Northern Europe ROC: Gert Svensson
  • Russian ROC: Lev Shamardin
  • South East Europe ROC: Kostas Koumantaros
  • South West Europe ROC: Absent
  • UK/Ireland ROC: Jeremy Coles
  • GGUS: Guenter Grein
  • COD: Vera Hansper

WLCG

  • WLCG Service Coordination: Harry Renshall

WLCG Tier 1 Sites

  • ASGC: Absent
  • BNL: Absent
  • CERN site: Absent
  • FNAL: Catalin Dumitrescu, Rob Quick
  • FZK: Angela Poschlad
  • IN2P3: Pierre Girard
  • INFN: Absent
  • NDGF: Absent
  • PIC: Absent
  • RAL: Gareth Smith
  • SARA/NIKHEF: Absent
  • TRIUMF: Absent

LHC Experiments

  • ATLAS: Absent
  • LHCb: Roberto Santinelli
  • CMS: Absent
  • ALICE: Absent

Feedback on Last Week's Minutes

None was given.

EGEE Items

Grid Operator Hand Over on Duty

  Primary Team Secondary Team
From ROC Russia ROC UK/I
To ROC DECH ROC SEE

Sites Considered For Suspension
  • Vera, once again reporting from a train, thought that since the IN-DAE-VECC-EUINDIAGRID site had not responded for a few weeks, it was worth mentioning in the hand-over log. Shu-Ting followed it up a bit, and discovered that there was another cause.
  • INFN-FERRARA has been in scheduled downtime since the end of January, but the situation is understood.

PPS Reports and Issues

  • CERN Prod is now running two CREAM-CEs for ALICE production. The EMT is tracking a series of patches for the 4th April.
Antonio tried to get the BDII 5 patch accelerated in the EMT, but they had other priorities. However, they were happy to have volunteers to test it (please contact Antonio using pps-support AT cern.ch).

  • gLite 3.1 PPS Update 45:
    • fixes FTS job submission problem (invalidated proxy problem).
    • new VOMS server certificates included.

gLite Release News

  • gLite 3.1 Update 42 to production in preparation scheduled for 23-MAR-2009
    • adds a dependency on "which"
    • includes VDT 1.6.1 Release 9 which fixes a Globus bug (affects most m/w services)
    • includes new version of VOMS certificates

  • gLite 3.2 Update 01 to production in preparation scheduled for 30-MAR-2009
    • new SL5 WNs
    • signed repository for RPMs

Antonio commented that the approach is for quick, and well-focused releases (with roll-back facility) rather than big bang ones.

EGEE Items From ROC Reports

  • CE ROC reported a problem with GStat and installed capacity. Steve commented that having more free CPUs than total CPUs was “a wonderful position to be in”, but mentioned that the GStat developers were at CERN this week and that he hoped that the problem would be fixed soon.
  • France had a more severe outage than anticipated last Tuesday, but they were back to normal on the same day.

Grid Service Interventions

  • There will be significant disruption at CERN on Thursday 19th March
    • As of: 5:00AM to 8:00AM DNS, DHCP, DBs, EDH, NICE, email, web, AFS, lxplus will be affected.
    • As of: 8:00AM to 13:00 LCG disks, tapes and CPUS will be affected.

Request to sites, ROCs etc. to report any collateral damage (spot the things that break due to dependencies when they shouldn’t). Jeremy: The point of raising it today is to make sure that everyone is looking out for things that break and each region/country is keeping an ordered list that can be combined with others and referenced.

GridMap Update

  • For those who missed the announcement, a new and improved version of GridMap is available. See the agenda page for the list of new features. Max.Boehm AT eds.com will accept comments.

WLCG Items

WLCG issues coming from ROC reports

  • None

Upcoming WLCG Service Interventions

  • Consult links on the agenda page.

WLCG Service Coordination

ATLAS Service

  • Absent

ALICE Service

  • Absent

CMS Service

  • [Pointer to the daily WLCG reports is not very helpful. A summary for this meeting would be appreciated - John]

LHCb Service

  • Roberto mentioned the LHCb show stopper due to a WMS outage possibly related to the “mega-patch”. Restarting WMS every two minutes is a terrible workaround.
  • He asked that the CIC portal provides a virtual memory field in the VO card (only physical memory available now). [The obvious question is "did he submit a GGUS ticket?" - John]
  • CNAF STORM issue needs escalating (CNAF not responding to GGUS ticket). Maria: did you click on the escalate button? A FAQ is available on the GGUS homepage about escalation policy.

Harry re-iterated, or clarified as the case may be, that the Castor and FTS services at CERN would be down on the 19th until 18:00.

OSG Items

Rob covered four escalated tickets:
  • 1 will take about a month to solve
  • 46646 & 46647 seem to be duplicates but awaiting customer feedback
  • 46682 was resolved on 11th.
Maria: why can’t progress fields be updated? Does Guenter need to debug something? For tickets awaiting customer feedback, should they be set to resolved? Maria was asked to take the issue off-line with Rob & Guenter.

Newly Created Action Items

Assigned to Due date Description State Closed Notify  
Main.All 2009-03-23 Note all problems linked to CERN outage of the 19th

Update 23/3/09: other than some SAM alarms due to temporary glitches with the central LFC and Top-level BDII, no problems were noted.

2009-03-24 edit

Review of Open Action Items

Open Action Items

IdSubmitterDescriptionCreationDueAssigned To 

Actions Closed in Last 20 Days

IdSubmitterDescriptionCreationDueAssigned ToClosed 

AOB

None,... and then suddenly, Helene: Downtime from GOCDB on Wednesday morning for new release which puts CIC-portal at risk. She will send a broadcast.

Meeting ended at 16:34

Next Meeting

The next meeting will be Monday, 23 MAR 2009 15:00 UTC (16:00 Swiss local time).

  • Attendees can join from 14:45 UTC (15:45 Swiss local time) onwards.
  • The meeting will start promptly at 15:00 UTC (16:00 Swiss local time).
  • The WLCG section will start at the fixed time of 15:30 UTC (16:30 Swiss local time).
  • To dial in to the conference:
    • Dial +41227676000
    • Enter access code 0148141


These minutes can only be changed by members of:

Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r4 - 2009-03-24 - JohnShade
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    EGEE All webs login

This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright &© by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Ask a support question or Send feedback