GD Group Report for C5-30-May-2008 ================================== LCG deployment ============== - Total number of Sites (1): 286 - Status -> Num. Sites (1): ok -> 189 degraded -> 17 down -> 78 na -> 2 - Software -> Num. Sites (2): gLite-3_1_0 -> 202 gLite-3_0_2 -> 37 gLite-3_0_0 -> 1 LCG-2_7_0 -> 2 - Average of concurrently running jobs during this week (3): ~19k (1) Sites that are Certified, in Production and that have been monitored by SAM during the last week under OPS credentials. SAM is available at: https://lcg-sam.cern.ch:8443/sam/sam.py To see this page one needs a grid certificate loaded in the browser. The calculation of the Site availability (Status) is described at: https://cern.ch/twiki/pub/LCG/GridView/Gridview_Service_Availability_Computation.pdf (2) Software version is coming from the 'CE-sft-softver' CE test. Sites not supporting SAM 'CE' service, or not having sent results for this particular test during the last week, are not counted. (3) Job statistics taken from GStat: http://goc.grid.sinica.edu.tw/gstat/ http://goc.grid.sinica.edu.tw/gstat/total/GIISQuery_Usage_job_.html EGEE Pre-Production Service Coordination: ----------------------------------------- * Pilot of AMGA at CERN_PPS in progress * Installation completed: LHCb is now using the service for the integration of the bookkeeping service * A self-consistent draft of the "PPS Service Description" was sent to the EGEE ROC Managers for comments https://twiki.cern.ch/twiki/bin/view/LCG/PreProductionServiceDescription The document is also being circulated in EGEE/NA4. * Set-up of preproduction LB notification service for ARDA in progress: Analysis with developers in progress. * *gLite3.1.0 PPS Update29* was released to PPS. The update, currently in phase of pre-deployment testing, affects the BDII* service, namely: - The contents of the !GlueSite object has been improved resulting in changes being made to YAIM to reflect the recommendations in http://goc.grid.sinica.edu.tw/gocwi... - Both the site and the top level BDIIs publish the GlueService object using a new information service provider. (See details in PATCH:1278). - BDII_regions_URL variables are now required by YAIM if they are not defined in the config files * After pre-deployment testing the PPS is currently upgrading to *gLite3.1.0 PPS Update28* . The update contains: - WMS: fix for minor configuration bug * *gLite3.1 Update25* to production in preparation. The update, to be released on thursday 29th May, contains - glite-WMS 3.1 for SL4 - glite-LB 3.1 for SL4 - SGE-utils - new jobmanager version supporting DGAS job records * *gLite3.0 Update43* released to production The update contains: .. lcg-vomscerts-5.0.0 with new host certificate for the VOMS server vo.racf.bnl.gov Affected metapackages - lcg-RB - glite-SE_classic - glite-VOBOX - glite-WMS - glite-LB - glite-WMSLB The following metapackages, now supported with gLite version 3.1, are affected as well if still deployed at some sites in version 3.0 - lcg-CE - lcg-CE_torque - glite-LFC_mysql - glite-LFC_oracle - glite-SE_dpm_disk - glite-SE_dpm_mysql - glite-SE_dpm_oracle * *gLite3.1 Update24* was released to production. The update contains .. lcg-vomscerts-5.0.0 with new host certificate for the VOMS server vo.racf.bnl.gov The affected metapackages are - lcg-CE - lcg-CE_torque - glite-LFC_mysql - glite-LFC_oracle - glite-SE_dpm_disk - glite-SE_dpm_mysql - glite-SE_dpm_oracle .. Yaim core and yaim lcg-ce 4.0.4 series - Job Priorities Implementation * The WMS 3.1 PPS pilot at CERN-PROD and INFN CNAF for Atlas and CMS was successfully terminated.More details in https://twiki.cern.ch/twiki/bin/view/LCG/PPIslandFollowUp2008x05x22 CERN GRID Pre-Production Site (CERN_PPS): ----------------------------------------- * Pre-deployment test for PPS-update 29 - on top-BDII and site-BDII : done Test was OK. * Pre-production instance of the AMGA server has been started. LHCb has started running tests. * pps-wms.cern.ch was reinstalled with SL4 and the latest WMS for gLite 3.1 on SL4. Tested. Required workarounds applied. The services is OK now. SAM: ---- Nothing to report other than standard bug-fixing activities and the implementation of Lemon sensors to test the health of the SAM service (proactive error detection of problems with test submissions and other anomalies). Grid operational Security: -------------------------- Nothing to report this week. gLite 3.x Integration & Build: ------------------------------ - Certification repository gLite 3.0 --------------------------------------- .. Presently .. 0 in preparation .. 0 in configuration .. 0 in certification gLite 3.1 --------------------------------------- .. Presently .. 0 in preparation .. 1 in configuration .. 18 in certification - PPS repository gLite 3.0 --------------------------------------- .. No new release (latest: 3.0.2 PPS Update 49) .. Next set of patches scheduled for release to PPS : None gLite 3.1 --------------------------------------- .. 3.1 PPS Update 29 #1786 Updated Yaim BDII #1854 New yaim to fix the bug #36982 in WMS patch 1726 .. Next set of patches scheduled for release to PPS : None - Production repository gLite 3.0 --------------------------------------- .. 3.0.2 Update 43 #1810 R3.0 lcg-vomscerts-5.0.0 ... #1811 R3.0 WMS lcg-vomscerts-5.0.0 ... .. Next set of patches scheduled for release to production : None gLite 3.1 --------------------------------------- .. 3.1 Update 24 #1709 [ YAIM ] yaim core and yaim lcg-ce 4.0.4 series Job Priorities implementation #1812 R3.1 lcg-vomscerts-5.0.0 adds next cert for vo.racf.bnl.gov #1813 [ YAIM ] yaim core 4.0.4-2 containing a quick fix .. Next set of patches scheduled for release to production: #1726 gLite 3.1 WMS for slc4/i386 platform #1727 gLite 3.1 LB for slc4/i386 platform #1809 New JobManager version for SGE #1820 New YAIM for WMS to fix bug 36476 gLite 3.x testing & Certification: ---------------------------------- * Certification - patches certified: #1854: New yaim to fix the bug #36982 in WMS patch 1726 - patches rejected: #1522: glite-CONDOR_utils for lcg-CE3.1 CREAM / lcas/lcmaps / glexec all currently in certification. Blocking glexec issue identified - #37063. * Configuration Work on integration of WLCG monitoring tools * Testing Testing of yaim configuration for cluster publishing recommendations of WN Resource Working Group Previous (full) reports can be consulted at: https://twiki.cern.ch/twiki/bin/view/LCG/GDC5Reports --Zdenek