None was given, but Joe Kaiser (jumping the gun a bit) was concerned that he hadn't received any notification of the "VOMS outage". Steve pointed out that VOM-RS registrations had been blocked for 1 minute at most, and that a broadcast had been sent. All VO managers were notified, but it seems that OSG operations weren’t. Steve will check, but he suggested that Rob should subscribe to the bits of interest in CIC portal using the RSS feed.
EGEE Items
Grid Operator Hand Over on Duty
Primary Team
Secondary Team
From
ROC DECH
ROC SEE
To
ROC SWE
ROC NE
Emanouil stated that definitive deadlines were needed for site suspension (3 days used to no avail, what now?). Nick confirmed that suspension should occur after three days of radio silence - but in this particular case, the site responded after the second warning.
PPS Reports
Please see the detailed Wiki page advertised on the agenda. With Antonio already in Abington, Nick read the text out loud, and added that there was no longer a fixed two-week period in PPS. Only deployment testing is done systematically; functionality & stress testing is only done in PPS when there are specific requests.
gLite Release News
The agenda contains all the details.
gLite 3.1 Update 37 was released to Production today. The fix to avoid recursive publishing had to be re-instated.
EGEE Items From ROC Reports
only one from IN2P2 about downtime there affecting CEs & SEs today & tomorrow.
Osman intervened to say that the CIC portal would be switched to CNAF at 18:00 UTC today until 19:00 to avoid any inconveniences tomorrow during the electrical maintenance.
Nick pointed out that there had been Data Management issues with the biomed VO, but only 2 GGUS tiockets were opened. He urged sites to raise tickets if they witness poor data management practices, since details are needed for investigations.
WLCG Items
WLCG issues coming from ROC reports
Jason from Taiwan commented on his Castor incident report (attached to the agenda). Jamie stressed that incidents should always lead to reports, and thanked Jason for his, and for having stayed up for the meeting. He said that daily reports were useful, especially when they contained messages about a problem being solved. The OCFS2 to ASM update will no doubt lead to outages, but it is hoped that it will be complete by end of the year.
Nick passed the message to LCG Tier1s that after 1 working day of an incident occuring, a report to the daily WLCG meeting is needed, followed by regular updates.
Ulrich informed the meeting that CERN has a CE providing access to several SL5 WNs. He'd received a request from CMS to publish the CE in the production BDII, and wanted to check whether this could be done. There were no objections (2 SL5 WNs are concerned), but Harry asked that it only be done as of tomorrow morning.
Upcoming WLCG Service Interventions
See the various links provided on the agenda page.
ATLAS Service
Alessandro reminded the audience that SRMv2 are used as of today for availability calculations. He wished to inform sites that ATLAS-specific SE tests will be removed & SRMv2 added. The granularity of space-tokens is being discussed (ATLAS check each space token). A new dashboard was developed 3 months ago to view the different results.
ALICE Service
CMS Service
Please refer to Daniele's notes on the agenda page.
LHCb Service
Roberto stated that, like ATLAS, they will be setting a few more tests to critical.
Since Thursday, all LHCb sites have been requested to deploy VOMS role pilot. A tag is used in the JDL to steer jobs to appropriate sites (50 sites configured so far). Angela (FZK) wondered how many pool accounts were needed, and Roberto replied that 5-10 accounts should be sufficient.
Last week, Jeff Templon animated a thread about setting critical tests, including for services that GridView doesn’t use for availability calculations. Jeff considers a test critical if it implies that jobs going to the site will fail. John explained that the term "critical" is overloaded. Originally, it implied that the test was included in availability calculations, would colour the SAM portal appropriately, and raise COD alarms. However, these three things can happen independently (cf. APEL tests which raise alarms but are not included in availability calculations).
Alessandro pointed out that the new dashboard had been developed to be more flexible (e.g. offers the ability to decouple site critical tests, data management tests, etc.). Alessandro suggested adding the link to minutes, and lo and behold, here it is! He said that WMS tests should maybe be critical so as to appear in some of the higher-level tools (like GridMap), and mentioned a possible criticality ranking (1-10).
WLCG Service Coordination
Nick mentioned that, as always, the link to the recommended versions of storage s/w was on the agenda page - but Jamie pointed out that link was stale. Nick will update it!
Nothing from Rob's side, but Maria queried GGUS:43840, which was opened on 20-Nov, concerned a problem at SLAC, and was labelled urgent. The ticket had been reacted to immediately, but there had been no activity since. Joe had checked that the ticket had gone to the correct T2 in sunny California on the day it was opened. He will chase the T2.