LCG deployment ============== - Total number of Sites (1): 264 - Status -> Num. Sites (1): ok -> 219 degraded -> 6 down -> 39 - Software -> Num. Sites (2): gLite-3_1_0 -> 238 gLite-3_0_2 -> 11 gLite-3_0_0 -> 1 - Average of concurrently running jobs during this week (3): 41.2k (1) Sites that are Certified, in Production and that have been monitored by SAM during the last week under OPS credentials. SAM is available at: https://lcg-sam.cern.ch:8443/sam/sam.py To see this page one needs a grid certificate loaded in the browser. The calculation of the Site availability (Status) is described at: https://cern.ch/twiki/pub/LCG/GridView/Gridview_Service_Availability_Computation.pdf (2) Software version is coming from the 'CE-sft-softver' CE test. Sites not supporting SAM 'CE' service, or not having sent results for this particular test during the last week, are not counted. (3) Job statistics taken from GStat: http://goc.grid.sinica.edu.tw/gstat/ http://goc.grid.sinica.edu.tw/gstat/total/GIISQuery_Usage_job_.html EGEE Pre-Production Service Coordination: ============================== 2008-12-17: gLite 3.1 Update 38 was released to production. The update contains: * First release of Hydra (encryption of files on SE) for SLC4 32/64bit * Bug fixes of Proxy renewal mechanism on FTA (PATCH:2344) * MyProxy?: Info provider configuration + improvements (PATCH:2518) * lcg-vomscerts: renamed all certificates with ".pem" suffixes because of BUG:43395 (PATCH:2598 / 9) Release notes in http://glite.web.cern.ch/glite/packages/R3.1/updates.asp and http://glite.web.cern.ch/glite/packages/R3.1/x86_64/updates.asp 2008-12-17: Pilot service of SLC5 WN at CERN: in progress * two days of testing were reserved to LHCb as agreed and the CE ce110 is now back publishing 'Preproduction' state. Alice can start to use the CE again since Thursday 18; in total we have now 58 nodes behind this CE. They are expected to go into production before Christmas; A down time for ce118 and ce119 was scheduled and they will be drained over the Christmas break before attaching them to the pilot * A check-point meeting was held with the site and the VOs to define a detailed timeline for the pilot activities * The tentative end-date of the pilot was agreed to be the end of January * The existing nodes were upgraded with the new WN software, based on Java 1.6 and VDT1.10 * Minutes of the check-point meeting can be read at https://twiki.cern.ch/twiki/bin/view/LCG/PPIslandFollowUp2008x12x11 * Details about the pilot (including planning, layout, technical info) can be found in the page https://twiki.cern.ch/twiki/bin/view/LCG/PpsPilotSLC5 * Details about the single tasks can be found in the tracker http://www.cern.ch/pps/index.php?dir=./ActivityManagement/SA1DeploymentT askTracking specifically listing the subtasks of TASK:8350 2008-12-14: Pilot service of Cream CE: in progress * A new version of CREAM corresponding to PATCH:2667 was released by the developers and was installed on the CREAM PPS pilot * A request was sent to PPS site admins and the EGEE regional managers to join for an extension of the pilot. The request has been presented as well at the EGEE SA1 Coordination meeting (http://indico.cern.ch/conferenceDisplay.py?confId=44163) * Details about the pilot (planning, layout, technical info) can be found in the page https://twiki.cern.ch/twiki/bin/view/LCG/PpsPilotCream * Details about the single tasks can be found in the tracker http://www.cern.ch/pps/index.php?dir=./ActivityManagement/SA1DeploymentT askTracking specifically listing the subtasks of TASK:7981 2008-12-12: gLite 3.1 PPS Update 41 went through the PPS deployment test and it is now being installed by the remaining PPS sites. This update contains: * hot fix to glite-BDII, glite-SE_dcache_info and lcg-CE (already deployed in production with PATCH:2649 and (PATCH:2651) * New WMS 3.1.100 with support to ARC CEs(PATCH:1841) * Enhancements of the glite-yaim-condor-utils and related configuration * trustmanager configure.sh fix for new bouncycastle (PATCH:2644, PATCH:2645) * fix to glite-yaim-mon and APEL to deal with bcprov location (effect of upgrade of bouncycastle) (PATCH:2647) * Bug fixes in glite-yaim-clients 4.0.5 for x86_64(PATCH:2672) Release notes in https://twiki.cern.ch/twiki/bin/view/EGEE/PPSReleaseNotes_310_PPS_Update 41 Deployment test reports in: http://www.cern.ch/pps/index.php?dir=./release/testreports/gLite3.1.0/gL ite3.1.0-PPS-UPDATE41/ Service Availability Monitoring ===================== Problems with both our Production WMS systems last Thursday caused intermittent job failures for almost a day. A Tomcat/Oracle problem on Sunday prevented test results being published for about six hours. Finally, an operational error caused the SAM cluster to be unavailable for about two hours on Tuesday. All these unavailablities are described at https://twiki.cern.ch/twiki/bin/view/LCG/SAMProdServUnavail. On the development front, good progress with testing the publishing of test results from both the submission framework and WNs using the Message Bus. This is part of the project to gradually move to a Nagios-based monitoring infrastructure. Released new Certificate Authority RPMs based on IGTF 1.26-1. VOMS Service ========== LCG VOMS service was lost completely on Sunday morning also due the Tomcat/Oracle problems that the SAM service experienced encroaching in on VOMS portion of the ORACLE service. Operational Security ============== A root compromised host has been reported at a partner site, where an attacker used a known kernel exploit to gain root access against an unpatched host. The attacker then installed an SSH credential sniffer and captured number of passwords. While the impact of this incident on CERN is rather limited, it is important to note that several incidents in 2008 involved unpatched hosts. It is essential to ensure our hosts are effectively patched on a regular basis and against important security vulnerabilities published by the vendors, in order to ensure local attackers cannot use these vulnerabilities to escalate as root. Integration, Test & Release Report ================================== * Patches Certified patch #2563: R3.1/i386/SLC4: DPM/LFC v1.7.0 patch #2564: R3.1/x86_64/SLC4: DPM/LFC v1.7.0 patch #2652: Fixes for FQAN order, short FQANs + miscellaneous [4] x86_64 patch #2680: VDT 1.6.1 Release 9 SL4/x86 patch #2681: VDT 1.6.1 Release 9 SL4/x86_64 patch #2705: Removing Multivalue SE from GlueCESEBind patch #2706: Removing Multivalue SE from GlueCESEBind x86_64 patch #2707: [ yaim-torque ] YAIM release for torque server, client and utils * Patches rejected patch #2684, SCAS, was rejected due to a memory leak * Releases The following patches were released to production; patch #1579 R3.1/SLC4/noarch: Hydra service patch #2017 R3.1/SLC4/i386: Hydra client patch #2344 R3.1/SLC4/x86_64: Proxy renewal 1.3.6 patch #2518 MyProxy Updates, myproxy-config, yaim and info provider. patch #2598 R3.1 lcg-vomscerts-5.2.0 renames certificates patch #2599 R3.1 lcg-vomscerts-5.2.0 renames certificates x86_64 patch #2644 trustmanager configure.sh fix for new bouncycastle patch #2645 trustmanager configure.sh fix for new bouncycastle (64bit) patch #2647 Patch for glite-yaim-mon and APEL to deal with bcprov location patch #2705 Removing Multivalue SE from GlueCESEBind patch #2706 Removing Multivalue SE from GlueCESEBind x86_64