Dear FR CCOD,
this morning there are a lot of items on the dashboard:
- ROC_CERN: tickets expired for CA-ALBERTA-WESTGRID-T2, CERN-PROD and
CEFET-RJ, old alarm for CERN-PROD
- ROC_AP: tickets non solved by 30 days for VN-IOIT-KEYLAB, ticket
expired for BEIJING-LCG2
- ROC_NE: ticket expired for KTU-BG-GLITE
Let's see if in the next hours the ROD teams will handle properly that items. As past weeks,
an item of discussion at the operation meeting maybe the APEL problem affecting two AP
sites (TW-FTT and IN-DAE-VECC-02) since some months:
GGUS:52103, GGUS:51229.
ROC_AP involved the apel supporters to try solving the problems
Best Regards,
Alessandro Paolini
Response from ROC CERN:
ALBERTA: site BDII is failing, but the problem might be due to to the top level bdii
Last week a 2-month pilot started, focused on glexec and on argus. It is done in collaboration with the VOs, especially ATLAS, 5 sites are involved. No 'vacancies' but if a site is interested the e-mail to contact is in the wiki page above.
We started the staged roll out of several updates, on 3.1 and 3.2.
UPDATE on 3.2: started today .
Release notes are linked from the release page above, new version of glexec, SCAS, CREAM and of Torque and Maui. Please note that server and client versions of Torque have to match.
The list of sites in the release page.
There are services not covered: SCAS, VO-BOX, Torque server.
If you are interested in participating let Antonio know, the mailing list is grid-deployment-managers@cern.ch. The aim is for release to production is December 8th.
UPDATE on 3.1: last Friday.
Torque. Known issue: conflicts MPI, the fix is currently in certification.
update of WMS: of ICE
update of CREAM: many bug fixes
Vacancy: mpi-utils, if someone is interested contact Antonio at the address above.
list of vacancies is always at the end of the release wiki page.
EGEE Items From ROC Reports
ROC_Canada and Russia did not validate there reports this week
South West Europe ROC: Last week asked "There is a new value in gstat2.0: GlueCEPolicyAssignedJobSlots, which is not queried yet by SGE (Sun Grid Engine). Therefore, our SGE sites will have a critical error. Following a mail from GonÁalo Borges the request to query this variable has not reached properly the SGE supporters. Is it possible to change the error to a warning until they will have implemented it in SGE?
This is now released within the gstat-validation RPM.
Germany-Switzerland: FZK [Announcement]: Planned OUTAGE: Tuesday, 1. December, 09:00 - 11:00 UTC atlas dCache atlassrm-fzk.gridka.de: During this downtime we will update all ATLAS dCache pools and the headnode to the newest officially released dCache version (1.9.5-9 or higher).
John Shade: today: 27 CREAM CEs pass the tests, 9 not, one is down. Tomorrow I will suggest at the SA1 CM to move the test to critical.
Jason Shih: the upgrade of hostcert of gstat actually cause short outage of SAM status publishing. the hostcert was updated around 'Nov 30 09:43', but the operator forgot to correct the group permission and later solve by Joanna 10min after the event escalated. It might cause the SAM status missing from gstat page.
Next Meeting
The next meeting will be Monday, 07 Dec 2009 15:00 UTC (16:00 Swiss local time).
Attendees can join from 14:45 UTC (15:45 Swiss local time) onwards.
The meeting will start promptly at 15:00 UTC (16:00 Swiss local time).