WLCG-OSG-EGEE Ops' Minutes Mon 01 Dec 2008

Summary

Reminder from LHCb that sites should deploy the VOMS role pilot.
CERN has a CE that provides access to SL5 WNs (requested by CMS).

Attendance

EGEE

  • Asia Pacific ROC: Jason Shih
  • Central Europe ROC: Marcin Radecki
  • OCC / CERN ROC: John Shade, Nick Thackray, Steve Traylen, Farida Naz
  • French ROC: Pierre Girard
  • German/Swiss ROC: Angela Poschlad
  • Italian ROC: Absent
  • Northern Europe ROC: Absent
  • Russian ROC: Lev Shamardin
  • South East Europe ROC: Emanouil Atanassov
  • South West Europe ROC: Absent
  • UK/Ireland ROC: Jeremy Coles
  • GGUS: Torsten Antoni
  • GOCDB: Gilles Mathieu
  • CIC Portal: Osman Aidel

WLCG

  • WLCG Service Cordination: Harry Renshall, Jamie Shiers
  • GGUS Ticket Police: Maria Dimou

WLCG Tier 1 Sites

  • ASGC: Jason Shih
  • BNL: Absent
  • CERN site: Ewan Roche, Ulrich Schwickerath
  • FNAL: Joe Kaiser
  • FZK: Torsten Antoni
  • IN2P3: Pierre Girard
  • INFN: Absent
  • NDGF: Jens Larsson
  • PIC: Absent
  • RAL: Gareth Smith
  • SARA/NIKHEF: Absent
  • TRIUMF: Absent

LHC Experiments

  • ATLAS: Alessandro di Girolamo
  • LHCb: Roberto Santinelli
  • CMS: Absent
  • ALICE: Absent

Feedback on Last Week's Minutes

None was given, but Joe Kaiser (jumping the gun a bit) was concerned that he hadn't received any notification of the "VOMS outage". Steve pointed out that VOM-RS registrations had been blocked for 1 minute at most, and that a broadcast had been sent. All VO managers were notified, but it seems that OSG operations weren’t. Steve will check, but he suggested that Rob should subscribe to the bits of interest in CIC portal using the RSS feed.

EGEE Items

Grid Operator Hand Over on Duty

  Primary Team Secondary Team
From ROC DECH ROC SEE
To ROC SWE ROC NE

  • Emanouil stated that definitive deadlines were needed for site suspension (3 days used to no avail, what now?). Nick confirmed that suspension should occur after three days of radio silence - but in this particular case, the site responded after the second warning.

PPS Reports

  • Please see the detailed Wiki page advertised on the agenda. With Antonio already in Abington, Nick read the text out loud, and added that there was no longer a fixed two-week period in PPS. Only deployment testing is done systematically; functionality & stress testing is only done in PPS when there are specific requests.

gLite Release News

  • The agenda contains all the details.
  • gLite 3.1 Update 37 was released to Production today. The fix to avoid recursive publishing had to be re-instated.

EGEE Items From ROC Reports

  • only one from IN2P2 about downtime there affecting CEs & SEs today & tomorrow.
  • Osman intervened to say that the CIC portal would be switched to CNAF at 18:00 UTC today until 19:00 to avoid any inconveniences tomorrow during the electrical maintenance.

Nick pointed out that there had been Data Management issues with the biomed VO, but only 2 GGUS tiockets were opened. He urged sites to raise tickets if they witness poor data management practices, since details are needed for investigations.

WLCG Items

WLCG issues coming from ROC reports

  • Jason from Taiwan commented on his Castor incident report (attached to the agenda). Jamie stressed that incidents should always lead to reports, and thanked Jason for his, and for having stayed up for the meeting. He said that daily reports were useful, especially when they contained messages about a problem being solved. The OCFS2 to ASM update will no doubt lead to outages, but it is hoped that it will be complete by end of the year.

  • Nick passed the message to LCG Tier1s that after 1 working day of an incident occuring, a report to the daily WLCG meeting is needed, followed by regular updates.

  • Ulrich informed the meeting that CERN has a CE providing access to several SL5 WNs. He'd received a request from CMS to publish the CE in the production BDII, and wanted to check whether this could be done. There were no objections (2 SL5 WNs are concerned), but Harry asked that it only be done as of tomorrow morning.

Upcoming WLCG Service Interventions

  • See the various links provided on the agenda page.

ATLAS Service

  • Alessandro reminded the audience that SRMv2 are used as of today for availability calculations. He wished to inform sites that ATLAS-specific SE tests will be removed & SRMv2 added. The granularity of space-tokens is being discussed (ATLAS check each space token). A new dashboard was developed 3 months ago to view the different results.

ALICE Service

CMS Service

  • Please refer to Daniele's notes on the agenda page.

LHCb Service

  • Roberto stated that, like ATLAS, they will be setting a few more tests to critical.
  • Since Thursday, all LHCb sites have been requested to deploy VOMS role pilot. A tag is used in the JDL to steer jobs to appropriate sites (50 sites configured so far). Angela (FZK) wondered how many pool accounts were needed, and Roberto replied that 5-10 accounts should be sufficient.
  • Last week, Jeff Templon animated a thread about setting critical tests, including for services that GridView doesn’t use for availability calculations. Jeff considers a test critical if it implies that jobs going to the site will fail. John explained that the term "critical" is overloaded. Originally, it implied that the test was included in availability calculations, would colour the SAM portal appropriately, and raise COD alarms. However, these three things can happen independently (cf. APEL tests which raise alarms but are not included in availability calculations).

Alessandro pointed out that the new dashboard had been developed to be more flexible (e.g. offers the ability to decouple site critical tests, data management tests, etc.). Alessandro suggested adding the link to minutes, and lo and behold, here it is! He said that WMS tests should maybe be critical so as to appear in some of the higher-level tools (like GridMap), and mentioned a possible criticality ranking (1-10).

WLCG Service Coordination

  • Nick mentioned that, as always, the link to the recommended versions of storage s/w was on the agenda page - but Jamie pointed out that link was stale. Nick will update it!

OSG Items

Nothing from Rob's side, but Maria queried GGUS:43840, which was opened on 20-Nov, concerned a problem at SLAC, and was labelled urgent. The ticket had been reacted to immediately, but there had been no activity since. Joe had checked that the ticket had gone to the correct T2 in sunny California on the day it was opened. He will chase the T2.

Action Items

Newly Created Action Items

Assigned to Due date Description State Closed Notify  
Main.OCC 2007-03-05 Example Action Item 2007-03-06 SteveTraylen   edit

Review of Open Action Items

Open Action Items

IdSubmitterDescriptionCreationDueAssigned To 

Actions Closed in Last 20 Days

IdSubmitterDescriptionCreationDueAssigned ToClosed 

Next Meeting

The next meeting will be Monday, 8 DEC 2008 15:00 UTC (16:00 Swiss local time).

  • Attendees can join from 14:45 UTC (15:45 Swiss local time) onwards.
  • The meeting will start promptly at 15:00 UTC (16:00 Swiss local time).
  • The WLCG section will start at the fixed time of 15:30 UTC (16:30 Swiss local time).
  • To dial in to the conference:
    • Dial +41227676000
    • Enter access code 0157610


These minutes can only be changed by members of:

Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r2 - 2008-12-02 - JohnShade
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    EGEE All webs login

This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright &© by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Ask a support question or Send feedback