Meeting got underway five minutes late due to Nick and his problematic laptop.
Feedback on Last Week's Minutes
None was given.
EGEE Items
Grid Operator Hand Over on Duty
Primary Team
Secondary Team
From
ROC DECH
ROC CERN
To
ROC Russia
ROC Italy
For the two Italian sites mentioned in the agenda (INFN Napoli, INFN Lecce), Paolo said that Lecce will be set to uncertified until site manager responds, and Napoli should be fixed today. Russian COD asked what to do about Lecce site ticket. Nick replied that GGUS:39533 can be closed & the other one can be closed when site emerges from downtime.
PPS Reports
none received
gLite Release News
gLite update 3.1 update 29 released to production with LFC and DPM patches.
The release of gLite3.1 Update30 to production is in preparation. The update, to be released next Thursday, will affect the vast majority of services. It will contain, notably:
Cream CE and clients
A patch to globus VDT , fixing the issue raised with BUG:37563 (limit in proxy delegation chain)
dCache 1.8.0-15p5
New torque clients for 64bit WNs
GFAL/lcg_util bugfix release
EGEE Items From ROC Reports
None seen this week & none raised during meeting.
WLCG Items
Well ahead of schedule at 16:12...
Free beer. This point inserted by minute-taker to see who bothers to read the minutes. Send me a mail to claim yours.
WLCG issues coming from ROC reports
None.
End points for FTM service at tier-1 sites
end-points of FTM service: no mails received from BNL or NDGF. There was no-one on the line from BNL, but Jens from NDGF stated that the service is currently being installed, and he will send Nick the end-point. RAL is also missing but they’re on holiday today.
FTS SL4 - required by the experiments or tier-1 sites?
The FTS situation is somewhat unclear. It's available on SL4, but buggy. Question on whether WLCG actually want to have FTS on SL4. Most said they were not keen to make changes due to stability concerns. Nick then asked those present what their position was:
INFN: difficult to update without service downtime (half a day)
NDGF: will probably be under pressure to upgrade this winter
PIC: stable for the moment
Th question that needs to be decided is when we will upgrade (during LHC downtime?)
FTS patches in production should not be considered as production – Antonio hopes to clear up the mess “somehow”. In the meantime, please don’t install SL4 FTS.
As an aside, Roberto wanted to know when Globus patch will be released – Antonio expected Thursday.
The update to proxy mixup with WMS will be fast-tracked because EMT thinks it’s important.
Roberto: WMS proxy is not VOMS-aware, which is a problem (one user using two roles doesn’t work).
Upcoming WLCG Service Interventions
Four from the larger sites mentioned (see agenda or CIC portal)
Some Atlas data will be unavailable during the NDGF downtime.
WLCG Service Coordination
Harry pointed out that the link in the agenda was wrong, but there was nothing notable to mention that isn't in the reports from the experiments.
ATLAS Service
Alessandro: Has Tier2 list of sites been added to GGUS team link? Torsten: no, this will be done only once the GOCDB interface is working (to avoid manual interventions). No timescale other than “pretty soon”.
Alessandro: is it possible to have common shifter at point 1 who can submit team tickets? Torsten, who just got back to the office, had no update but pointed out that the Savannah shopping-list ticket is open.
ALICE Service
CMS Service
See the details on the agenda page, but the summary is:
continuing export activities to Tier 1s. Jamboree on Wednesday to wrap-up.
CMS on-line DB issues due to CMS internal communication glitch.
Castor problems & degraded PhEDEx service. More than 40K stager “gets” for non-existent files, working on a solution.
Tier 1s receiving data, but not at a high rate.
LHCb Service
See the details on the agenda page or the CIC weekly report filled by Roberto. Roberto had four points for the assembly:
Is it valid that downtime declared less than 24hrs in advance should be considered as scheduled? Nick/Harry replied that the length of downtime determined how far in advance it needs to be declared. * c.f. CNAF problems. Sites should have fabric monitoring – or how about SAM test (ATLAS have something)? He emphasised that loading shared libraries should not take several minutes. Action on Nick to remind sites that the shared area is a critical service.
GridView & SRM service needs to be fixed to include SRMv2 in availability calculations. GridView is using hard-coded & perhaps obsolete list of services. John replied that SAM's SRMv2 tests are in validation and will be shortly released. In due time, GridView will include the results in site availability calculations.
Alarm ticket for CNAF never arrived. The “why” needs to be clarified. Diana said that ROC Italy’s connection to GGUS was at fault. Diana will check the GGUS side of things. ROC Italy needs to check on its side, but their main-man is on vacation. Mail arrived, but corresponding ticket in GGUS wasn’t found. Nb. GGUS tickets can always be replied to (even by people not registered in GGUS) and this will provide an update to the ticket.
Antonio queried whether LHCb could test the glexec that’s in pre-production. Roberto will raise this in their task-force meeting tomorrow.
The recommended dCache version has now been updated (see list of base services on agenda page).
Rob Quick, live from CERN, stated that there were no open tickets for OSG. On the other side of the table, Maria was happy.
Action Items
Newly Created Action Items
Assigned to
Due date
Description
State
Closed
Notify
Main.OCC
2008-09-01
Remind sites that the shared area is a critical service. Update 10/9/08: Nick sent an EGEE broadcast about this. In fact, he sent two, to explain that this only concerned sites supporting VOs that had explicitly mentioned the shared software area in their respective VO cards.