WLCG Tier1 Service Coordination Minutes - 19 January 2012

Attendance

Local:

Remote:

Action list review

Release update

Data Management & Other Tier1 Service Issues

Site Status Recent changes Planned changes
CERN CASTOR 2.1.11-9+SRM-2.11 for all main instances, tapegateway active on CASTORPUBLIC ; xrootd: 2.1.11-1
FTS: 5 nodes in SLC5 3.7.0-3; 7 nodes in SLC4 3.2.1
EOSATLAS: 0.1.1-7/xrootd-3.1; EOSCMS: 0.1.0/xrootd-3.0.4
  • The experience with the new Tape Gateway (2.1.11-9 TG) is satisfactory (since Jan 10)
  • The NS has been migrated to a new HW infrastructure and Oracle 11g on Jan 11
  • By the end of next week all DB used by CASTOR at CERN will be running against Oracle 11g
  • Plan to switch to TG for all in the enxt two weeks
  • Intensive tests for 2.1.12 (candidate release for 2012 data taking continuing)

ASGC CASTOR 2.1.11-6
SRM 2.11-0
DPM 1.8.2-3
Dec 2011: DPM upgraded, both head node and disk servers
31/12/11: unscheduled intervention for tape system due to network switch failure
18/01/12: scheduled intervention for DC power maintenance: CASTOR and DPM were unavailable
None
BNL dCache 1.9.12.10 (Chimera, Postgres 9 w/ hot backup)
http (aria2c) and xrootd/Scalla on each pool
postgres hot backup None
CNAF StoRM 1.8.0 (Atlas, CMS, LHCb) None None
FNAL dCache 1.9.5-23 (PNFS, postgres 8 with backup, distributed SRM) httpd=2.2.3
Scalla xrootd 2.9.7/3.1.0.osg
Oracle Lustre 1.8.6
EOS 0.1.1-4/xrootd 3.1.0.osg with Bestman 2.0.10
use OSG packages for Xrootd EOS-0.1.1-8 (01/24/2012)
Lustre 1.8.7 (opportunity window)
IN2P3 dCache 1.9.5-29 (Chimera) on core servers and pool nodes   Upgrade of core servers and pool servers to 1.9.12-14 (Chimera) on 2012-02-07 (site downtime). New hardware for core servers (more RAM, SSD disks). Move to postgres 9.0.
KIT dCache
atlassrm-fzk.gridka.de: 1.9.12-11 (Chimera)
cmssrm-fzk.gridka.de: head nodes 1.9.5-26 (Chimera), pool nodes 1.9.5-6 through -25
gridka-dcache.fzk.de: head nodes 1.9.5-26 (PNFS), pool nodes 1.9.5-24,-25
xrootd (version 20100510-1509_dbg)
   
NDGF dCache 1.9.14 (Chimera) on core servers. Mix of 1.9.13 and 2.0.0 on pool nodes.    
NL-T1 dCache 1.9.12-10 (Chimera) (SARA), DPM 1.7.3 (NIKHEF) None None
PIC dCache 1.9.12-14 (last upgrade to patch release on 14-Dec); PNFS on Postgres 9.0 None None
RAL CASTOR 2.1.10-1
2.1.10-0 (tape servers)
SRM 2.10-2
none
  • Beginning of next week: upgrade to SRM 2.11
  • Jan 26 2012: upgrade to CASTOR information provider
  • Feb 14 2012: upgrade to CASTOR 2.1.11-8 and to new Transfer Manager
TRIUMF dCache 1.9.5-28 with Chimera namespace None None

Other site news

  • Michael explained the recent changes at BNL. Two new different access protocols were added: http and xrootd access. They allow to access the data even if all dCache servers are down (apart from Chimera). We took advantage of the Local Site Mover: instead of accessing dcache data via dcap, we use a wrapper. So, the new protocols can work as fallback solutions and are also useful for the Federated xrootd Storage. All this is limited to read (not write).

CASTOR news

CERN operations and development

EOS news

xrootd news

dCache news

  • Baseline : 1.9.12-15 : Release Notes
  • As of Jan 1, dCache is open source. Mostly AGPL, some libraries LGPL.
  • Next dCache workshop scheduled for April 17/18 at DESY/Zeuthen. Agenda will be ready end of next week.

StoRM news

FTS news

  • FTS 2.2.8 now installed on the CERN pilot service. Stress tests from Altas, Phedex transfers, and Oracle 11 db all working.

DPM news

LFC news

LFC deployment

Site Version OS, n-bit Backend Upgrade plans
ASGC NA NA NA NA
BNL 1.8.0-1 SL5, 64-bit Oracle None
CERN 1.8.2-0 64-bit SLC5 Oracle Upgrade to SLC5 64-bit only pending for lfcshared1/2
CNAF 1.8.0-1 SL5 64-bit Oracle None
FNAL N/A     Not deployed at Fermilab
IN2P3 1.8.2-2 SL5 64-bit Oracle 11g  
KIT 1.7.4-7 SL5 64-bit Oracle Oracle backend migration pending
NDGF 1.7.4.7-1 Ubuntu 10.04 64-bit MySQL None
NL-T1 1.7.4-7 CentOS5 64-bit Oracle  
PIC 1.7.4-7 SL5 64-bit Oracle  
RAL 1.7.4-7 SL5 64-bit Oracle  
TRIUMF 1.7.3-1 SL5 64-bit MySQL None

Experiment issues

WLCG Baseline Versions

Status of open GGUS tickets

  • Experiments reported no issues.
  • IN2P3 reported progress on the long-standing network issues they are experiencing with remote sites in the USA and in Japan. Many GGUS tickets are involved and both ATLAS and CMS are affected. If the IN2P3 LAN re-configuration is of any help, VOs, please, update GGUS:75983 and GGUS:74268.

Review of recent / open SIRs and other open service issues

Conditions data access and related services

  • The validation of COOL query performance on Oracle 11g servers has been completed. It is confirmed that the problems previously observed in the tests on Oracle 11.2.0.2 servers were due bug 10405897 in this Oracle server version, as described in task #23366.
    • This bug is fixed in Oracle 11.2.0.3, which is the version being deployed by IT-DB on the databases of all LHC experiments.
    • It should be noted that this bug was absent in 11.2.0.1 and was introduced in 11.2.0.2, i.e. in a minor patch update. For the future, this suggests that performance validation may be needed also for minor patch updates.
    • More details about the COOL performance validation, together with detailed reports including performance plots and execution plans for the relevant queries, may be found on the COOL twiki. For instance, this is the COOL performance report on Oracle 11.2.0.3

Database services

  • Experiment reports:
    • ALICE: New online database being prepared.
    • ATLAS: ATONR, ATLR, ADCR databases upgraded (+ migration to new hw) to 11g in the last 2 weeks. ATLARC and ATLDSC (downstream capture) upgraded (+ migration to new hw) to 11g before Xmas. Few incidents affecting Streams replication ( Streams Capture Aborting With ORA-26767 Due To Temp Tables Created By DBMS_COMPRESSION [ID 1082323.1] and logminer issue with compatible parameter) fully understood and fixed.
    • CMS: New online database being prepared. Upgrade scheduled to be confirmed. CMSARC database upgraded to 11g.
    • LHCb: New online database being prepared. LHCBR upgrade confirmed on 24th January.

  • Site reports:
Site Status, recent changes, incidents, ... Planned interventions
BNL Deployment of CPU patches 12/12/2011 and 12/13/2011 in oracle databases. No issues observed/reported per deployment of these patches None
CNAF   LHCB database upgrade scheduled on 31st January
KIT Nothing to report ATLAS- and LHCb databases upgrades to 11g on Tue 24.01 and Wed 01.02
IN2P3 ATLAS database upgraded on 19th January LHCB database upgrade scheduled on 2nd February
PIC Upgraded TAGs database to 11.2.0.3 on 12th of January LHCB database upgrade scheduled on 30th January. FTS upgrade (and LFC stop) to be scheduled
RAL ATLAS database upgraded on 18th January. Castor DB have been moved to two new 3-nodes RAC. Waiting on Atlas to move the LFC back to CERN after that we will upgrade FTS to 11g LHCB database upgrade scheduled on 31st January
SARA Tested the upgrade to 11.2.0.3 twice on a stand-alone Oracle database (no RAC cluster) Upgrade to 11g scheduled on February 1st
TRIUMF Nothing to report ATLAS 3D Oracle Upgrade to 11.2.0.3 scheduled for Monday Jan 23rd

AOB

-- AndreaSciaba - 18-Jan-2012

Edit | Attach | Watch | Print version | History: r21 < r20 < r19 < r18 < r17 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r21 - 2012-01-19 - MariaDimou
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback