Only half of the sites running a CREAM instance are passing the SAM tests for CREAM in validation. Present version of CREAM is working, sites should check what the issues are.
New version of simplified EGEE intervention procedures ratified by ROC managers: https://edms.cern.ch/document/1032984 Sites that have a reliability or availability of less than 50% during three consecutive months will be suspended, and will have to go through the certification process again.
Downtimes longer than one month should be exceptional and be approved beforehand by the corresponding ROC and this body notified.
Attendance
EGEE
Asia Pacific ROC:
Central Europe ROC:
OCC / CERN ROC: John Shade, Antonio Retico, Maite Barroso
French ROC: Helene Cordier
German/Swiss ROC: Sven, Wen Mei
Italian ROC: Paolo Veronessi
Northern Europe ROC: Thomas Bellman, Gert Svensson
Russian ROC: Lev Shamardin
South East Europe ROC: Marios Chatziangelou
South West Europe ROC:
UK/Ireland ROC: Jeremy Coles
GGUS:
GOCDB: Gilles Mathieu
Feedback on Last Week's Minutes
None was given.
EGEE Items
Grid Operator Hand Over on Duty
c-COD Team
From
ROC France
To
ROC CE
Report from cCOD:
1 ticket (GGUS #51458) for ROC_SE is opened for more than 1 month. I have sent a reminder to ROC_SE about it to check if the problem can be solved.
ROC_AP has 2 APEL tickets opened for more than 1 month. Work is in progress for MY_MIMOS-GC-01. For TW-FTT, ROC_AP sent a reminder and will escalate to last step if no answer.
Gilles will check if there is any ticket related to these sites.
Question about APEL test: we cannot put a site out of production with APEL problem not solved for more than 1 month. How should we handle such tickets?
Good practise: The ROC should try to isolate/debug the problem. If still this not help, involve the APEL team in the ticket.
Only half of the sites running a CREAM instance are passing cream tests in validation. Present version of CREAM is working, sites should check what the issues are.
MPI tests? Part of the CE/CREAM CE tests, included there, in validation.
from Monday last week; security update 56, including CREAM CE version 1.5 (PATCH:3259). It is being applied here and there, as observed in rollout.
gLite 3.2 07 in preparation, vobox, and new version of wms ui configuration. Scheduled since 1 months ago.
Additionally, a security fix for a vulnerability is being prepared, it will be moved to production quickly, by this week. The patch will not require reconfiguration but a restart, and it affects ~8 node types.
EGEE Items From ROC Reports
FZL [INFO]: On the 14th of October DE-KIT will run a test of the LHCOPN backuplink infrastructure. We expect this intervention to be completely transparent. The execution of the link test will start at 9:00am (CEST).
FZK [AT RISK]: Planned intervention AT RISK: 20-10-2009 8:00 - 10:00 UTC Due to the application of an Oracle patch, GridKa/DE-KIT s LHCb 3D/LFC database is at risk.
Grid Service Interventions
Consult links on the agenda page.
Miscellaneous
Recent vulnerability: 7 sites still not patched, OSCT following up
Egee roc managers ratified simplified intervention procedures. We are asking sites to declare downtimes correctly and on time. Gilles: this will be enforced in GOCDB as of next Wednesday 21st of October, 2 pm UTC, and also at the CIC portal. https://edms.cern.ch/document/1032984
Sites with reliability of less than 50% over 3 consecutive months will be suspended. Who should suspend them? their ROC
Downtimes longer than 1 month should be exceptional and closely followed up by the ROC