WLCG Tier1 Service Coordination Minutes - 15 July 2010

Data Management & Other Tier1 Service Issues

Site StatusSorted ascending Recent changes Planned changes
ASGC CASTOR 2.1.7-19 (stager, nameserver)
CASTOR 2.1.8-14 (tapeserver)
SRM 2.8-2
None None
SRM 2.8-5 (ALICE)
StoRM 1.5.1-3 (ATLAS, CMS, LHCb,ALICE)
RAL CASTOR 2.1.7-27 (stagers)
CASTOR 2.1.8-3 (nameserver central node)
CASTOR 2.1.8-17 (nameserver local node on SRM machines)
CASTOR 2.1.8-8, 2.1.8-14 and 2.1.9-1 (tape servers)
SRM 2.8-2
None Plans are in place for upgrading to 2.1.9 later this year. We have completed stress+functional testing of 2.1.9 on our test systems, and will include end user testing from August. Information about the upgrade is available here
CERN CASTOR 2.1.9-5 (All)
SRM 2.9-3 (all)
None CASTOR and xroot plugin to be upgraded to 2.1.9-7 for all instances during technical stop. Name server memory upgrade. Changes should be implemented online without service interruption.
BNL dCache 1.9.4-3 (PNFS) None None
FNAL dCache 1.9.5-10 (admin nodes) (PNFS)
dCache 1.9.5-12 (pool nodes)
None Will upgrade PNFS server hardware during technical stop
IN2P3 dCache 1.9.5-11 (Chimera)    
KIT dCache 1.9.5-15 (admin nodes) (Chimera)
dCache 1.9.5-5 - 1.9.5-15 (pool nodes)
TRIUMF dCache 1.9.5-17 with Chimera namespace None Storage firmware upgrade durint technical stop
New DB infrastrucure deployed to host TAGS + FTS DB, 2 hours FTS DT for July 19 to move to the new instance
NL-T1 dCache 1.9.5-19 (Chimera) (SARA), DPM 1.7.3 (NIKHEF)    
PIC dCache 1.9.5-20rc1 (PNFS)   20/7: scheduled downtime from 0600 to 1800 for OS and firmware upgrades to storage, computing and Oracle (3D, FTS, LFC) services; FTS queues will be drained; dcache will be upgraded to dCache 1.9.5-21
NDGF dCache 1.9.7 (head nodes) (Chimera)
dCache 1.9.5, 1.9.6 (pool nodes)


SRM 2.9-4 is now officially available in the savannah release area. Full release notes and upgrade instructions are available.

dCache news

Nothing to report.

StoRM news

The version 1.5.3 has just been released and is available for installation. The release is currently available for SL4 only, but version 1.5.4 should be released also for SL5 during the first week of August.

Known issues: 1.5.3 does not support tape. Version 1.5.4 will support tape.

DPM news

See comments about LFC.

LFC news

LFC 1.7.4-6 is now the recommended version for SLC5. For SLC4 it is still 1.7.3 due to an issue with the VOMS library.

LFC 1.7.4-7 is in staged rollout but the only difference is some fixes for the Python 2.5 interface.

FTS news

FTS 2.2.5 (supporting sites without SRM and .lsc files) will enter certification next week.

Database services

  • Experiment reports:
    • ALICE:
      • Planned shut-down of ALIONR cluster on Monday 19th for storage array reboot
    • ATLAS:
      • New standby database for ATLR cluster is being installed at the moment as we were observing some problems that may be hardware related with the old standby DB
      • A new version of PVSS streams' apply handler has been deployed on 12th of July for Atlas online to offline replication. This change had been developed and tested together with Atlas in order to improve performance of DCS client tools.
    • CMS:
      • Problems with CMS trigger online application - follow-up in progress
      • Some problems with online -> offline conditions replication on 13th of July - single crash, restarted automatically during night
      • User errors caused PVSS replication to fail around 18:30 on 13th of July - transactions skipped, users notified
      • A replacement plan for hardware hosting CMS databases deployed at P5 has been agreed with CMS database coordinators. The plan will be implemented this autumn.
    • LHCb:
      • NTR

  • Site reports:
Site Status, recent changes, incidents, ... Planned interventions
ASGC * SRM db high loading issue again last week, some ora_jxxx jobs occupying lots of memory without releasing for long time, under investigation.
* Still working on our Oracle RAC testbed verification.
BNL Nothing to report, waiting for PSU July and evaluating possibility of PSU April rollback and application of July PSU.  
KIT * Saturday 10.07.2010 - air condition failure at Gridka and part of infrastructure went down. Affected were also 3D Oralce RACs. As a consequence of this:
- LHCb
RACs were down for approximately 4 hours. The information about broken streams on the 3D Databases we got at 8:26PM (CET). After the intervention of DBA, at 10:36PM (CET) was LHCb and LFC/FTS Database online. Due to SAN failure ATLAS-DB still offline till 0:13AM day after (11.07.2010). Since 0:13AM are all 3D Databases in KIT-T1 100% online.
IN2P3 Nothing to report None
NDGF Nothing to report None
PIC Last week - rolled back PSU patch on ATLAS, LHC and LFC databases. Also audit was turned on for ATLAS and LHC DBs. During the Scheduled Downtime intervention, we're going to correct LAN problems in a FTS Database server, and upgrade firmware revisions of all the blades hosting Oracle. 6 a.m - 6 p.m. on 20th of July - series of interventions - firmware and OS upgrades affecting storage, computing and Oracle (3D, FTS, LFC) services.
RAL Nothing to report Multipath configuration changes based on this time table:
* Tuesday 20th 11:00 - 15:00 At Risk on OGMA (ATLAS).
* Wednesday 21st 10:00 - 14:00 At Risk on LUGH (LHCb).
* Thursday 22nd 10:00 - 14:00 At Risk on SOMNUS (LFC/FTS).
SARA Nothing to report No interventions
TRIUMF Nothing to report Monday July 19th - move of FTS Oracle RAC to new servers. In addition interventions are planned for ATLAS 3D Oracle RAC: Linux OS upgrade to RH Linux 5 & storage upgrades - short downtime on Tuesday 20th of July

-- JamieShiers - 13-Jul-2010

This topic: LCG > WebHome > WLCGCommonComputingReadinessChallenges > WLCGOperationsWeb > Tier1ServiceCoordination > WLCGTier1ServiceCoordinationMinutes100715
Topic revision: r15 - 2010-07-15 - AndreaSciaba
This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback