SL5 64 bit worker node has been running at CERN for some time without problems. This will be packaged and released as soon as possibe. 32 bit versions of SL5 services will only be worked on upon request and if there is sufficient effort. The preference is for the SL5 version of the middleware to be 64 bit only.
With BNL and FNAL completing their move out of the EGEE infrastructure and into the OSG infrastructure, a meeting has been set up to finalize the details of how trouble tickets will be dealt with.
Attendance
EGEE
Asia Pacific ROC: ShuTing Liao
Central Europe ROC: Malgorzata Krakowian
OCC / CERN ROC: John Shade, Antonio Retico, Nick Thackray, Steve Traylen, Maite Barosso
French ROC: Pierre Girard, Osman Aidel, Rolf Rummler
German/Swiss ROC: Angela Poschlad
Italian ROC: Alessandro Cavalli
Northern Europe ROC: Ron Trompert
Russian ROC: Lev Shamardin, Victor Edneral
South East Europe ROC: Kostas Koumantaros, Ioannis Liabotis
South West Europe ROC: Kai Neuffer, Gonzalo Merino
UK/Ireland ROC: Jeremy Coles
GGUS: Torsten Antoni
GOCDB: Gilles Mathieu
WLCG
WLCG Service Coordination: Harry Renshall, Jamie Shiers
Again new alarms for nodes which have already been in SD This has been fixed during the week.
The new version of the https://lcg-sam.cern.ch:8443/sam/sam.py?... looks more attractive but unfortunately it is not so clear and easy to deal with those cases when an alarm is in ERROR but the last SAM test show that the corresponding service is still OK. Please submit requests for changes through GGUS.
Report from DECH Europe:
Problems Encountered during shift:
GGUS ticket: GGUS:46448. Site USCMS-FNAL-WC1 is an OSG site. Alarms should not be raised. But it happened this week when they started to publish their resources in a resource group. Seems to be fixed now.
GGUS ticket: GGUS:46448. The alarm FTS-infosites on fts-t1import.cern.ch is failing due to the fact that the middleware does not foresee the current production scenario in use at CERN. Developers are aware, a bug has been opened and it will be like this until the bug is fixed (Savannah bug #46083). STEVE: This is now fixed properly. If any more problems are seen, raise another ticket.
gLite 3.1 PPS Update 44 went through deployment test and it is now being installed by the remaining PPS sites. The update contains:
New version of Cream CE. (PATCH:2667 ,PATCH:2669). Among others this version provides:
Short term proxy renewal solution in CREAM based CE
fixes in particular BUG:44712 (Problem with lcmaps conf file used for glexec) currently affecting Alice
[YAIM] glite-yaim-core 4.0.6 with many bug fixes (PATCH:2636)(PATCH:2697)
[BDII] Default DB cache size reduced to 50Mb(PATCH:2679) for x86_64
[WN] New glite-wn-info command designed to be executed on the WN by a job submitter. It returns information about that worker node to be used in a grid context (PATCH:2757 ; PATCH:2758)
Release of gLite 3.1 Update 41 to production in preparation The update, scheduled for the 25th of February will contain:
update to WMS 3.1 with numerous bug fixes ROBERTO (LHCb): Will CERN-PROD deploy this update? If so, when? EWAN: Usually within the week following the release to the production repositories.
New version of Cream CE. (PATCH:2667 ,PATCH:2669). Among others this version provides:
Short term proxy renewal solution in CREAM based CE
fixes in particular BUG:44712 (Problem with lcmaps conf file used for glexec) currently affecting Alice
EGEE Items From ROC Reports
SWE ROC: We would like to know the status of the new gLite "authorization framework", the "framework to identify local T2 users" at a site. This will be dealt with next week.
UKI ROC: Got a GGUS ticket (GGUS:46475) but believe tickets should not apply. Still waiting for feedback on this. This CE is flagged as "Not in Production" in the GOCDB. Monitoring is turned on for troubleshooting purposes during commissioning. Our understanding is that GGUS ticketing does not apply in these circumstances. JOHN S.: Ideally SAM should not raise alarms in this case. Will look into honouring the "not in production" flag. Will check if a bug already exists and open one if not. ACTION on John
Grid Service Interventions.
See the links on the agenda page.
SL5
OLIVER: Version of 64 bit SL5 WN has been running under production conditions at CERN. This "testing" is now finished and the patch will be built and put into certification. It will be released as gLite 3.2.
In gLite 3.2 on SL5, 64 bit services/clients will be prioritized. 32 bit will be done where there is a need and resources available to do it.
The LHC experiments have declared (in the LCG Architects Forum) that they want 64 bit SL5 rolled out as soon as possible. Antonio: Has any formal meeting been held with the experiments to get final sign off? Oliver: No, although the LCG Architects Forum could be seen as such. Antonio: I will do this through the PPS pilot so we can close the pilot.
WLCG Items
WLCG issues coming from ROC reports
DECH: FZK-LCG2: New instance for FTS (2.1) is in production. The two instances will run in parallel for some time until all experiments have switched to the new instance. The new Service name is fts-fzk.gridka.de
DECH: CMS User with voms group /cms/dcms cannot run with at CERN and various other sites, see https://gus.fzk.de/ws/ticket_info.php?ticket=46019. Not supporting this group and probably a lot of other groups makes no sense or the groups are waste. In my opinion, when one site supports a VO it should use wildcards to ensure the support for all users proxies. If it does not use wildcards the queues in the information system should be published only for the supported groups and roles. Is there a standard way how to deal with this situation? Or is it possible to exclude special group or roles in the information system (blacklist)? Steve T. is looking into this and all information will be put into the ticket. EWAN: Might not be technically possible to do this for the WMS. Looking in to this.
Maria:GGUS:45094. Felipe Silva is very unresponsive. Rob: Added some more contact names to the ticket.
The date for the meeting to discuss streamlining of tickets to FNAL and BNL is now set up.
Newly Created Action Items
Assigned to
Due date
Description
State
Closed
Notify
Main.!JohnShade
2009-03-09
When an individual service at a site is marked as "not in production" in the GOCDB, but the site is "in production", SAM continues to test the service. This is not the intended functionality. Check if there is a bug outstanding on this already, and if not, create one. Update 27/2/09: It turns out that GridView does not synchronize on that particular GOCDB field, so it isn't available to SAM. The recommended workaround is to create a scheduled downtime - tests will still run, but no tickets will be raised. The requested functionality will be in the new Aggregated Topology Provider, and GOCDB will have a production attribute associated with each service. Update 9/3/09 - closing during meeting