WLCG Tier1 Service Coordination Minutes - 2nd December 2010

Attendance

LHC machine - shutdown and 2011 startup plans

Release Update

WLCG Baseline Versions

Data Management & Other Tier1 Service Issues

Site Status Recent changes Planned changes
CERN CASTOR 2.1.9-8 (ATLAS)
CASTOR 2.1.9-9 (ALICE, CMS and LHcb)
SRM 2.9-4 (all)
xrootd 2.1.9-7
   
ASGC CASTOR 2.1.7-19 (stager, nameserver)
CASTOR 2.1.8-14 (tapeserver)
SRM 2.8-2
29/11: network maintenance, storage services stopped None
BNL dCache 1.9.4-3 (PNFS) None None
CNAF StoRM 1.5.4-5 (ATLAS, CMS, LHCb,ALICE)    
FNAL dCache 1.9.5-23 (PNFS)
Scalla xrootd 2.9.1/1.4.2-4
None None
IN2P3 dCache 1.9.5-22 (Chimera)    
KIT dCache 1.9.5-15 (admin nodes) (Chimera)
dCache 1.9.5-5 - 1.9.5-15 (pool nodes)
   
NDGF dCache 1.9.7 (head nodes) (Chimera)
dCache 1.9.5, 1.9.6 (pool nodes)
   
NL-T1 dCache 1.9.5-23 (Chimera) (SARA), DPM 1.7.3 (NIKHEF)    
PIC dCache 1.9.5-23 (PNFS)    
RAL CASTOR 2.1.7-27 and 2.1.9-6 (stagers)
2.1.9-1 (tape servers)
SRM 2.8-2 and SRM 2.8-6
Added 2 new SRM backends for ATLAS ATLAS upgrade to 2.1.9-6 on 6-8/12/10
TRIUMF dCache 1.9.5-21 with Chimera namespace    

Other site news

The FTS channels to TW-FTT were created at: CERN, BNL, ASGC, IN2P3, SARA, PIC, TRIUMF, KIT.

CASTOR news

CERN operations

Development

No significant news.

xrootd news

dCache news

No significant news.

StoRM news

FTS news

FTS 2.2.5 still in certification.

DPM news

No significant news.

LFC news

No significant news.

LFC deployment

Site Version OS, n-bit Backend Upgrade plans
ASGC 1.7.2-4 SLC4 64-bit Oracle Testing ongoing, upgrade by the end of the year
BNL 1.7.2-4 SL4 Oracle 1.7.4 on SL5 postponed to January
CERN 1.7.3 64-bit SLC4 Oracle Will upgrade to SLC5 64-bit by the end of the year
CNAF 1.7.2-4 SLC4 32-bit Oracle 1.7.4 on SL5 64-bit in November
FNAL N/A     Not deployed at Fermilab
IN2P3 1.7.4-7 SL5 - 64 bits Oracle  
KIT 1.7.4 SL5 64-bit Oracle  
NDGF        
NL-T1 1.7.4-7 CentOS5 64-bit Oracle  
PIC 1.7.4-7 SL5 64-bit Oracle  
RAL 1.7.4-7 SL5 64-bit Oracle  
TRIUMF 1.7.3-1 SL5 64 bit MySQL  

BDII deployment plan

Site Plan
NL-T1 There are in total more than 5 top-level BDIIs at the NL-T1. In LCG_GFAL_INFOSYS at both SARA and NIKHEF there are three top-level BDIIs configured. At NIKHEF two BDIIs from NIKHEF and one BDII at SARA configured. At SARA there are two SARA BDIIs and one NIKHEF BDII in LCG_GFAL_INFOSYS
US ATLAS-T1 Working with OSG on the deployment of a resilient and performant top-level BDII infrastructure in the US

Status of open GGUS tickets

Review of recent / open SIRs and other open service issues

Conditions Data Access and related services

Experiment Database Service Issues

  • Experiment reports:
    • ALICE:
      • Nothing to report
    • ATLAS:
      • Atlas offline database suffered from 4 instance reboots this week. Instance 4 rebooted on 28.11, 30.11, 02.12 morning around 4AM and instance 3 rebooted on 02.12 around 11:30AM. Initially high load caused by COOL application was suspected as rootcause however there have been corresponding I/O errors and spikes of physical writes observed on 02.12 which points out to disk or hardware related problems. DBAs are currently working on this problem to understand the root cause and provide a fix to the issue as soon as possible.
    • CMS:
      • On Wednesday (1st Dec) morning CMS PVSS streaming aborted once again for 30 minutes while executing modifications (adding new table partitions for 2011) on one of the replicated tables. In fact all changes were already there manually applied by user job. That caused dictionary inconsistency and abort of apply process. Colliding changes have been marked to be skipped and apply process was restarted.
      • On Thursday (2st Dec) CMS PVSS aborted several times due to missing tablespace on offline database - they were not created together with corresponding tablespaces on online database. All related streams errors were solved manually by creating proper tablespaces on the offline database.
    • LHCb:
      • nothing

  • Site reports:
Site Status, recent changes, incidents, ... Planned interventions
ASGC Nothing to report None
BNL Validations for new harware
Working on improvements for Weekly reports
None
CNAF Nothing to report None
KIT Nothing to report None
IN2P3 Nothing to report None
NDGF Nothing to report None
PIC Nothing to report None
RAL Nothing to report None
SARA Nothing to report Next Tuesday migration to the cluster
TRIUMF Database was not accessible during last weekend due number of session exceeded because resource_limit parameter was set to FALSE profiles were not working None

Dates & topics for future meetings

AOB

-- JamieShiers - 23-Nov-2010

Edit | Attach | Watch | Print version | History: r12 < r11 < r10 < r9 < r8 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r11 - 2010-12-02 - ZbigniewBaranowski
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback