GD Group Report for C5-28-Mar-2008 =================================== LCG deployment: --------------- - Total number of Sites (*): 242 - Software -> Num. Sites (*): gLite-3_1_0 -> 163 gLite-3_0_2 -> 68 gLite-3_0_0 -> 1 LCG-2_7_0 -> 2 unknown -> 8 - Status -> Num. Sites (*): ok -> 178 degraded -> 12 down -> 52 - Average of concurrently running jobs during this week (+): ~25k (*) Sites that are Certified _and_ Production _and_ Monitored by SAM: https://lcg-sam.cern.ch:8443/sam/sam.py To see this page one needs a grid certificate loaded in the browser. The calculation of the Site availability (Status) is described at: http://goc.grid.sinica.edu.tw/gocwiki/SAM_Metrics_calculation Software version is coming from the 'CE-sft-softver' CE test. Sites not supporting SAM 'CE' service, or not having sent results for this particular test during the last week, are counted as 'unknown'. (+) Job statistics taken from GStat: http://goc.grid.sinica.edu.tw/gstat/ http://goc.grid.sinica.edu.tw/gstat/total/GIISQuery_Usage_cpu_.html For the time being we do not report CPU numbers: 1. Not all the reported CPUs are actually available for grid jobs. 2. Sites with multiple CEs may have their CPUs double-counted. 3. GStat includes sites that are not considered by the SFTs. EGEE Pre-Production Service Coordination: ----------------------------------------- Nothing to report this week. CERN GRID Pre-Production Site (CERN_PPS): ----------------------------------------- Nothing to report this week. WLCG Transfer Service: ---------------------- Nothing to report this week. SAM (Service Availability Monitoring): -------------------------------------- * Closed a number of Savannah bugs, and released a new CA (Certificate Authority) test. This caused problems due to an obscure modification on one of the SAM UIs such that it started submitting jobs to WMS rather than RBs. The WMS server used (wms112.cern.ch), caused random false failures for many CEs in all EGEE ROCs. The WMS node is being checked, and SAM has reverted to using RBs only. Problem period: From Wednesday 19 21:00 to Tue 25 14:00. * When reverting back to using RBs, some jobs were still being submitted using WMS-syntax, and this caused certain sites to experience job submission failures. This created some confusion, but was relatively harmless since one UI was working correctly, and the tests only threw warnings. * In terms of bug fixes, an urgent patch was applied to ensure that SRMs publishing themselves as "srm" rather than "SRM" were not filtered from the SAM database. Note that this is a workaround for the fact that YAIM tagged things in a way not conforming to the Glue standard, which specifies that SRMs should be of type "SRM" and have a version attribute. A variety of variations exist, such as srm_v1, srm_v2, SRM, and srm. * SAM sensors release 1.4.3-2 was installed in Validation on 2008-03-26. The RPM was updated to contain CA v1.20. The submission framework was modified slightly to improve the retrieval of stdout messages from tests. This exposed a bug in HEPIX which required a workaround (HEPIX have since fixed the bug). This release also fixes a bug affecting SEE-GRID, who use environment variables within JDLs to change the location of test files on Worker Nodes. * See https://twiki.cern.ch/twiki/bin/view/LCG/SAMProdServUnavail for outage information. Grid Operational Security: -------------------------- * The minutes of the recent OSCT meeting are available at: http://indico.cern.ch/materialDisplay.py?materialId=minutes&confId=29322 * Security Services Challenges are still in progress. All EGEE Tier1s except France have been challenged so far. More information about the challenges is available at: http://cern.ch/osct/ssc.html Grid Authentication & Authorization Services: --------------------------------------------- Nothing to report this week. gLite 3.x Build & Integration: ------------------------------ Nothing to report this week. glite 3.x Testing &* Certification: ----------------------------------- * Certification - patches certified: #1645 R3.1/SLC4/x86_64: GFAL/lcg_util update #1680 R3.1/SLC4/x86_64: GFAL 1.10.8 * Releases No middleware releases this week Grid User Support: ------------------ Nothing to report this week. The full GD report can be consulted on: --------------------------------------- https://twiki.cern.ch/twiki/bin/view/LCG/GDC5Reports ---Zdenek