WLCG Tier1 Service Coordination Minutes - 10 February 2011

Attendance

LHC run in 2011 / 2012

Action list review

Release update

Data Management & Other Tier1 Service Issues

Site Status Recent changes Planned changes
CERN CASTOR 2.1.10 (CMS, ATLAS and ALICE)
CASTOR 2.1.9-9 (LHCb)
SRM 2.9-4 (all)
xrootd 2.1.9-7
   
ASGC CASTOR 2.1.7-19 (stager, nameserver)
CASTOR 2.1.8-14 (tapeserver)
SRM 2.8-2
28/1: 30' of unscheduled downtime for CASTOR due to blade firmware upgrade, core servers had to be rebooted None
BNL dCache 1.9.5-23 (PNFS, Postgres 9) none in the process of adding disk space
CNAF StoRM 1.5.6-3 (ATLAS, CMS, LHCb,ALICE)   certification terminated on Feb 09, upgrade OS to SL5 is delayed a little bit
FNAL dCache 1.9.5-23 (PNFS)
Scalla xrootd 2.9.1/1.4.2-4
Oracle Lustre 1.8.3
none Moving unmerged pools from dCache to Lustre
Deploying scalable SRM servers with DNS load balancing
IN2P3 dCache 1.9.5-24 (Chimera) Upgraded to version 1.9.5-24 on 2011-02-08  
KIT dCache 1.9.5-15 (admin nodes) (Chimera)
dCache 1.9.5-5 - 1.9.5-15 (pool nodes)
   
NDGF dCache 1.9.11 None None
NL-T1 dCache 1.9.5-23 (Chimera) (SARA), DPM 1.7.3 (NIKHEF)   On march 8th a core router will be replaced at NIKHEF. Services at NIKHEF will not be available and be at risk two days afterwards.
PIC dCache 1.9.5-23 (PNFS)    
RAL CASTOR 2.1.9-6 (stagers)
2.1.9-1 (tape servers)
SRM 2.8-6
CMS disk server upgraded to SL5 64 bit on 31/1/11 ALICE disk server upgrades to SL5 64bit on 15/2/11
Plans for CASTOR upgrade to 2.1.10 in March
TRIUMF dCache 1.9.5-21 with Chimera namespace None None

CASTOR news

CERN operations

Development.

xrootd news

dCache news

StoRM news

FTS news

DPM news

  • DPM 1.8.0-1 for gLite 3.2 has been released to Production on Feb 9.
  • For gLite 3.1 it remains in the Staged Rollout and the memory leak appears not to be fixed.

LFC news

  • LFC 1.8.0-1 for gLite 3.2 has been released to Production on Feb 9.
  • For gLite 3.1 it remains in the Staged Rollout and the memory leak appears not to be fixed.

LFC deployment

Site Version OS, n-bit Backend Upgrade plans
ASGC 1.7.4-7 SLC5 64-bit Oracle None
BNL 1.8.0-1 SL5, 64-bit Oracle None
CERN 1.7.3 64-bit SLC4 Oracle Will upgrade to SLC5 64-bit by the end of Jan or begin of Feb.
CNAF 1.7.4-7 SL5 64-bit Oracle  
FNAL N/A     Not deployed at Fermilab
IN2P3 1.8.0-1 SL5 64-bit Oracle Upgraded to LFC 1.8.0 on January 4th
KIT 1.7.4 SL5 64-bit Oracle  
NDGF 1.7.4.7-1 Ubuntu 9.10 64-bit MySQL None
NL-T1 1.7.4-7 CentOS5 64-bit Oracle  
PIC 1.7.4-7 SL5 64-bit Oracle  
RAL 1.7.4-7 SL5 64-bit Oracle  
TRIUMF 1.7.3-1 SL5 64-bit MySQL  

Experiment issues

WLCG Baseline Versions

Status of open GGUS tickets

GGUS - Service Now interface: update

Review of recent / open SIRs and other open service issues

Conditions data access and related services

Database services

---++ Database services

  • 10.2.0.5 patching status - ALL databases are now running 10.2.0.5 - no major issues found, some minor issues included:
    • OEM agents not reading the 10.2.0.5 DB alert logs properly - bug 10170020
    • Oracle Bug 9184754 found with very specific to ATLAS PANDA workload

  • Experiment reports:
    • ALICE:
      • Nothing to report
    • ATLAS:
      • Applied patch 9184754 on ADCR production DB - bug was only affecting PANDA application and was causing single instance crashes every few days
      • On Friday (4th of Feb) morning ATLAS PVSS replication aborted due to foreign key violation on target database (ATLAS offline) by five transactions. Harming transactions were applied on source database (ATLAS online) before without problems - violating the constraint which should never happen. The problem of constraint inconsistency in PVSS schema is being investigated but the root cause is not known yet. In order to start replication and maintain the replica consistent problematic transactions have been applied without constraint validation.
    • CMS:
      • Nothing to report
    • LHCb:
      • Nothing to report

  • Site reports:
Site Status, recent changes, incidents, ... Planned interventions
ASGC    
BNL * Conditions database successfully upgraded to 10.2.0.5. No issues occurred during this upgrade.
* Former LFC_FTS cluster was reconfigured to be used as a physical standby database:
- Upgrades included OS RHEL5, Cluster/database server 10.2.0.5, Storage firmware.
-Only initially enabled on one of the production clusters, as a part of the integration of this Data Guard technology within the oracle database operations.
* Enable IPMI on all oracle production clusters.
* To enable Data guard for LFC database.
* Decommission TAGS database service.
CNAF   * 16 Feb - LHCb cluster upgrade to 10.2.0.5
* 2 Mar (to be confirmed): FTS DB upgrade to 10.2.0.5 + FTS DB purge old data and set up of the periodical cleaning job, that was missing before.
KIT * Jan 26: Upgrade of 3D RACs (ATLAS, LHCb) to 10.2.0.5. None
IN2P3    
NDGF Nothing to report None
PIC 8th Feb - we've upgraded FTS database. We're planning to upgrade all other databases before the end of February but we don't have a exact date.
RAL * We have upgraded the 3D, LFC,FTS and castor databases to 10.2.0.5
* In few days we should receive the new hardware ready for us to install Oracle and start our testing (this is the HW that will be used for data guard CASTOR and FTS/LFC).
 
SARA Nothing to report No interventions
TRIUMF * Upgraded Oracle 3D RAC to Oracle 10.2.0.5 None

AOB

-- JamieShiers - 03-Feb-2011

Edit | Attach | Watch | Print version | History: r16 < r15 < r14 < r13 < r12 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r13 - 2011-02-10 - MatthewViljoenExCern
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback