Subject: GD Group Report for C5-06-Feb-2009 From: Laurence Field Date: Thu, 5 Feb 2009 18:50:36 +0100 To: "c5-members (C5 - Members)" CC: Response to the question from the previous C5 meeting. ==================================== Based on work done originally with the WLCG monitoring working group, Nagios is seen as a replacement of SAM for remote testing of grid sites from the ROC within EGEE. These results will then be used to calculate availability for the sites. CERN ROC will use this system to monitor the sites within the CERN region. We also supply Nagios as a reference implementation of site monitoring for grid sites, in particular those who currently deploy no site monitoring. Sites are free to deploy a different site monitoring system (e.g. LEMON) and monitor their sites to an equivalent level using it. SAM Prod Service ============ All 4 SAM Production machines have been replaced with more powerful quad-core servers for performance reasons. The version of the SAM portal that embeds Gridview test history graphs in the SAM history pages is now installed in Production. Proper separation of the database topology objects used by SAM and GridView carried out successfully on Wednesday. Thanks to careful planning and testing in Validation, as well as close and fruitful collaboration with the IT/DM team, the downtime of the SAM and GridView services was kept to a minimum. Several occurrences of strange SAM test failures due to "request expired" on the WMS. These are being investigated by WMS support. EGEE Pre-Production Service Coordination: ============================ 2009-02-04: gLite 3.1 Update 40 and of gLite 3.0 Update45 were released to production Release notes in http://glite.web.cern.ch/glite/packages/R3.1/updates.asp and http://glite.web.cern.ch/glite/packages/R3.1/x86_64/updates.asp CERN_PPS ======= * PPS AFS UI upgraded to the latest version 3.1.29 (update43) * Latest version of lcg-CAs v1.27-1 tested. Operational Security ============== A new Security Service Challenge campaign (= security drill) will be launched in the coming weeks against the different WLCG Tier1s. The challenge will be the same than last year. The objective is to simulate a grid security incident at a given site, then carefully monitor and review the response of the site in order to make it more effective. More information is available at: http://cern.ch/osct/ssc.html Integration, Test & Release ================== Implementation of release rollback in the gLite release scripts has started Draft reference cards for gLite services have been completed; http://twiki.cern.ch/twiki/bin/view/EGEE/ServiceReferenceCards