Subject: GD Group Report for C5-16-Jan-2009 From: Laurence Field Date: Thu, 15 Jan 2009 20:20:56 +0100 To: Laurence Field LCG deployment ============== - Total number of Sites (1): 266 - Status -> Num. Sites (1): ok -> 193 degraded -> 15 down -> 58 - Software -> Num. Sites (2): gLite-3_1_0 -> 235 gLite-3_0_2 -> 10 gLite-3_0_0 -> 0 - Average of concurrently running jobs during this week (3): 50.9k (1) Sites that are Certified, in Production and that have been monitored by SAM during the last week under OPS credentials. SAM is available at: https://lcg-sam.cern.ch:8443/sam/sam.py To see this page one needs a grid certificate loaded in the browser. The calculation of the Site availability (Status) is described at: https://cern.ch/twiki/pub/LCG/GridView/Gridview_Service_Availability_Computation.pdf (2) Software version is coming from the 'CE-sft-softver' CE test. Sites not supporting SAM 'CE' service, or not having sent results for this particular test during the last week, are not counted. (3) Job statistics taken from GStat: http://goc.grid.sinica.edu.tw/gstat/ http://goc.grid.sinica.edu.tw/gstat/total/GIISQuery_Usage_job_.html EGEE Pre-Production Service Coordination: ----------------------------------------- 2008-01-14: Release of gLite 3.1 Update 39 to production in preparation The update, scheduled for the 21st of January will contain: * New version of Cream (PATCH:2415). 2009-01-13: Pilot service of Cream CE: in progress * A new version of CREAM was release to the pilot. This version fixes BUG:45437 and BUG:45736. * Within the SA1 coordination meeting the ROCs were invited to use the pilot version of CREAM for their regional installation * Stress test of the ICE+CREAM submission chain: A submission rate of 40 job/min was attained but a failure rate higher than expected was observed. The issue is currently under analysis * Pilot end-date moved to mid-March. * Minutes in http://indico.cern.ch/conferenceDisplay.py?confId=47118 * Details about the pilot (planning, layout, technical info) can be found in the page https://twiki.cern.ch/twiki/bin/view/LCG/PpsPilotCream * Details about the single tasks can be found in the tracker http://www.cern.ch/pps/index.php?dir=./ActivityManagement/SA1DeploymentT askTracking specifically listing the subtasks of TASK:7981 2009-01-09: In order to improve the roll-out procedure of the BDII service and to minimise the overall risk of service disruption we are looking for a production site running a top-level BDII to join the Release test process. Upon a new update of the BDII software in production the site would be requested to be the first to upgrade the BDII and to confirm that the updated service work as expected. More info about release test procedures in https://twiki.cern.ch/twiki/bin/view/LCG/PPS_Release_Testing Contact: pps-support@cern.ch 2008-11-09: Pilot service of SLC5 WN at CERN: in progress * LHCb tests on the pilot pointed out some issues with the gssklog mechanism when submitting from DIRAC3. The issue apparently arises with the newer version of VDT distributed in the WN Under investigation. * In accordance with the plans,two production CEs are being reconverted to use SLC5 They will be made available for production next week (19th of Jan) * Details about the pilot (including planning, layout, technical info) can be found in the page https://twiki.cern.ch/twiki/bin/view/LCG/PpsPilotSLC5 * Details about the single tasks can be found in the tracker http://www.cern.ch/pps/index.php?dir=./ActivityManagement/SA1DeploymentT askTracking specifically listing the subtasks of TASK:8350 Operational Security ============== A security incident is currently being investigated, after suspicious SSH connection attempts from an EGEE Chinese site to a site in Portugal. With the information available so far, it is believed the risk for the infrastructure is low. CERN GRID Pre-Production Site (CERN_PPS): ----------------------------------------- * Pre-deployment test of gLite 3.1 PPS Update 42 Done: - glite-AMGA (oracle) - CREAM CE - glite-WN - DPM Mysql * PPS AFS UI upgraded to latest version 3.1.28 * CREAM pilot resources of INFN-CNAF and INFN-BARI start publishing by CERN_PPS site-BDII in order to test them by SAM. * Malfunctioning of LB server affecting job submission via WMS pps-wms.cern.ch after PPS update41. - Bugs already reported. - To fix bugs, a patch https://savannah.cern.ch/patch/index.php?2562 is incertification with High-priority. - WE are in touch with developers to have a workaround for the fix. SAM ------- Production: *upgraded gLite UI middleware to version 3.1.23 on the SAM prod UIs Integration, Test & Release Report ----------------------------------- * Patches Certified None SCAS server update received and now in certification * Releases Production gLite releases were suspended during the reporting period while a post-mortem was conducted on recent releases. They will recommence next week. Patches scheduled for release to PPS #2652 Fixes for FQAN order, short FQANs + miscellaneous [4] x86_64 #2680 VDT 1.6.1 Release 9 SL4/x86 #2681 VDT 1.6.1 Release 9 SL4/x86_64 #2701 Adding which dependency to glite-WN #2702 Adding which dependency to glite-WN x86_64 * Other work A plan for implementing rpm signing in the release is being drawn up The glite-WN in a single rpm is being tested - results are encouraging, but the rpm has yet to be made relocatable ETICS ----- New revision release being tested, deployment foreseen next Monday. This release includes a new version of the repository service that automatically generates YUM repositories out of each submitted build to be used for deployment tests or other purposes. We had two new loss of connectivity accidents this week, one at around 14:30 on Monday (AFS not available both from our own repository interface and from the centrally hosted web site https://eticssoft.web.cern.ch) and another one on Wednesday at 16:00 that affected users connecting to CERN from Germany (this one is probably related to the incident logged at around 17:00 on the IT Support Service Status page, although it only mentions outgoing connectivity - http://it-support-servicestatus.web.cern.ch/it-support-servicestatus/Inciden tArchive/090114-internet.htm)