GD Group Report for C5-16-Nov-2007 ================================== We are changing the format of GD report as follows: - the C5 report will now contain only items that are relevant somehow for other groups. - it will mention the URL of GD twiki page where the full report can be seen. Since the twiki is not ready yet, we are this time (only) still including the full report (below). ---------------- C5 report ----------------------- CERN GRID Pre-Production Site (CERN_PPS): ----------------------------------------- * Set-up of lsf-based CE lxb2090 implementing the new (uncertified) version of the jobpriority configuration for Atlas: In progress. EGEE Pre-Production Service Coordination: ----------------------------------------- * New gLite3.1 update was released to production, includes - lcg-CE for SLC4 - BDII for SLC4 Gridview: --------- * The proposal for a new addition in the service availability computation algorithm has been written. This addition will handle those services for which have no critical tests have been defined by a VO. This proposal will be presented for approval to the WLCG PMB. Service Availability Monitoring (SAM): --------------------------------------- * Unavailabilities: 07-11-2007 (Wed) 16:00-17:00 Reason: central DPM was restarted gLite 3.x Certification & Testing: ---------------------------------- * SL4, gLite 3.0 LFC/DPM 1.6.7-1: LFC certified, still need a fix for the DPM info provider * SL4, gLite 3.1 Testing dual architecture runtime on WN to ensure 32 and 64 bit binaries can link to the middleware. Grid Authentication and Authorization Services: ----------------------------------------------- * Host certificate of voms.cern.ch has been changed Nov/15/2007. * VOMS proxy generation from voms.cern.ch is not possible until end of November - to give sites more time to upgrade lcg-vomscerts - to not provide users VOMS proxies which will be rejected by most of sites ---> The link to GD full reports twiki page will be here in the future. ---Zdenek ---------------- end of GD C5 report --------------------- =============================================================== ------ full GD report (on twiki only in the future) ----------- LCG deployment: -------------- - Total number of Sites (*): 247 - Software -> Num. Sites (*): gLite-3_1_0 -> 79 gLite-3_0_2 -> 148 gLite-3_0_0 -> 3 LCG-2_7_0 -> 6 unknown -> 11 - Status -> Num. Sites (*): ok -> 184 degraded -> 12 down -> 51 - Average of concurrently running jobs during this week (+): ~18k (*) Sites that are Certified _and_ Production _and_ Monitored by SAM: https://lcg-sam.cern.ch:8443/sam/sam.py To see this page one needs a grid certificate loaded in the browser. The calculation of the Site availability (Status) is described at: http://goc.grid.sinica.edu.tw/gocwiki/SAM_Metrics_calculation Software version is coming from the 'CE-sft-softver' CE test. Sites not supporting SAM 'CE' service, or not having sent results for this particular test during the last week, are counted as 'unknown'. (+) Job statistics taken from GStat: http://goc.grid.sinica.edu.tw/gstat/ http://goc.grid.sinica.edu.tw/gstat/total/GIISQuery_Usage_cpu_.html For the time being we do not report CPU numbers: 1. Not all the reported CPUs are actually available for grid jobs. 2. Sites with multiple CEs may have their CPUs double-counted. 3. GStat includes sites that are not considered by the SFTs. CERN Grid Operations managed by GD: ----------------------------------- * Update 36 for gLite 3.0 done on all the LCG RB nodes. * Current status of the gLite WMS, gLite LB and LCG RB nodes can be found at https://twiki.cern.ch/twiki/bin/view/LCG/CurrentStatusWMSLBNodesCERN. CERN GRID Pre-Production Site (CERN_PPS): ----------------------------------------- * Upgrade of the site to glite 3.0 PPS-update42: Done - CEs, DPM, FTS, FTS, MONBOX+e2emonit, WMSLB. * High load on roc-wms, used as fail-over node for SAM Admins Page and CERN ROC certification. * Set-up of lsf-based CE lxb2090 implementing the new (uncertified) version of the jobpriority configuration for Atlas: In progress. Note: thanks to all cern-pps admins for the quality and detail of the worklog. EGEE Pre-Production Service Coordination: ----------------------------------------- * gLite3.1.0-PPS-UPDATE09 was released to PPS and is currently in the phase of pre-deployment testing. This new version of the middleware contains 7 new fixes. * 3 Patches moved directly to production with gL3.1 Update06 After the pre-deployment test, the gLite 3.1 part of PPS is now upgraded to gLite3.0.2-PPS-UPDATE42. This upgrade affects all services using a MySQL server and contains: - new voms certificate for the WMS repository - upgraded MySQL server * gLite3.1 Update06 was released to production: The release contains mainly: - lcg-CE for SLC4 - BDII for SLC4 - fixes for issue in publishing site name entry found in PPS. Several issues were found in PPS and reported to production as known issues. A relevant one is the GStat error report: https://gus.fzk.de/ws/ticket_info.php?ticket=28922 * gLite3.0 Update36 was released to production: The release contains: - new host certificates of voms.cern.ch server CERN ROC: --------- * GGUS6 functionality testing, also testing of the correct interaction between GGUS6 and CERN PRMS ticketing system WLCG Transfer Service: ---------------------- * Transfer ranging from 270 to 930 MB/s, averaging around 480 MB/s per day. * Involving all major T1 sites. * Mostly traffic from CMS and Atlas * 3 open ticket in total * Throughput plots: http://gridview.cern.ch/GRIDVIEW/ Gridview: --------- * New CVS repository for gridview project created. The old Gridview repository is now moved to gridview_old. New structure for gridview-common module, frontend, summarizers and synchronizers are committed to CVS. This was done to facilitate better packaging and deployment in production environments. * Proposal for a new addition in the service availability computation algorithm. This addition will handle those services for which have no critical tests have been defined by a VO. This proposal will be presented for approval to the WLCG PMB. Service Availability Monitoring (SAM): --------------------------------------- * Unavailabilities: 07-11-2007 (Wed) 16:00-17:00 Reason: central DPM was restarted * Updated the version of several components * More information about new components is available in the validation release notes: o https://twiki.cern.ch/twiki/bin/view/LCG/SamValidationClient o https://twiki.cern.ch/twiki/bin/view/LCG/SamValidationSensors o https://twiki.cern.ch/twiki/bin/view/LCG/SamValidationDB * SAM Portal for the validation instance is available here: https://lcg-sam-val.cern.ch:8443/sam-val/sam.py * New CE-wn-sec-fp security test moved to validation and successfully tested. Ready for production. gLite 3.x Build & Integration: ------------------------------ * Certification repository: - gLite 3.0 (patches) 3 in preparation, 2 in configuration, 12 in certification - gLite 3.1 2 in preparation, 2 in configuration, 16 in certification * PPS repository: - gLite 3.0 - PPS Update 42 contains 3 new patches. Next set of patches scheduled for release: None. - gLite 3.1 - PPS Update 09 contains 7 new patches. Next set of patches scheduled for release: None. * Production repository: - gLite 3.0 - Update 36 contains 2 new patches. Next set of patches to be released: 2 (MySQL) - gLite 3.1 - Update 06 contains 12 new patches. Next set of patches scheduled for release: None. gLite 3.x Certification & Testing: ---------------------------------- * Certification, gLite 3.0 - SL4 .. LFC/DPM 1.6.7-1 on SL4 ; LFC certified, still need a fix for the DPM info provider - SL3 .. LFC/DPM 1.6.7-1 on SL3 ; stress tests on DPM being run, LFC OK * Patches certified: None. * gLite 3.1 / SL4 - 32bit .. lcg-CE and glite-BDII released to production Fixed various issue on VOBOX and resubmitted to certification - 64bit .. testing dual architecture runtime on WN to ensure 32 and 64 bit binaries can link to the middleware * Configuration - Integration of MPI config into yaim - Finalization of yaim for glite3.1/SL4 WMS & LB * Other work Working with PIC to get lcg-CE working with Condor batch system. gLite Support: -------------- Regular analysis of GGUS ticket processing time and quality is recorded and presented to the ROC managers. This week's report, as an example: http://goc.grid.sinica.edu.tw/gocwiki/Week_2007/10/30_-_2007/11/12 summarizes problem cases and reminds all ROCs of the upcoming major GGUS 6.0 Release due for Nov. 29th. The CERN ROC and CERN Remedy PRMS managers are valuable participants in the preparation testing. Grid Authentication and Authorization Services: ----------------------------------------------- * Host certificate of voms.cern.ch has been changed Nov/15/2007. * VOMS proxy generation from voms.cern.ch is not possible until end of November - to give sites more time to upgrade lcg-vomscerts - to not provide users VOMS proxies which will be rejected by most of sites Grid Operational Security: -------------------------- Nothing to report this week. ETICS: ------ * A new project for the VDT (Virtual Data Toolkit) distribution has been registered in ETICS, hosting a number of middleware component from the VDT stack (globus, myproxy, gpt, etc). * A cleanup of the current Externals repository is ongoing in order to remove components that are now hosted by more specific projects. The owners of components affected by these changes will be informed about the consequences of the operations and the steps to be taken. * New components registered this week: VDT 1.8.1 (releases 1 and 2) containing globus 4.0.5 * Internal testing of the new ETICS 2.0 release (containing the new ETICS Client 1.3.0) is almost finished. The final release candidate is scheduled to be out on November 20th and the final release is currently planned for Wednesday November 28th. * The next ETICS All-Hands meeting will take place at CERN from 21 to 23 November. OMII-Europe: ------------ The All-Hands meeting took place as announced in Edinburgh last week. Nothing special to report. ---Zdenek