WLCG Tier1 Service Coordination Minutes - 21 April 2011

Attendance

  • Local: Manuel, Stephane, Simone, Alessandro, Steven, Maarten, Zbyszek, Andrea S, Fernando, Nicolo, Maria, Jamie, Ian, Elisa, Markus, Fabrizio
  • Remote: Jon, Alexander Verkooijen, Michael, Felix, Carlos, Patrick, RAL, Carmine, TRIUMF - Andrew, Alexei, Rolf, Jhen-Wei, Gareth, Elizabeth

Action list review

Release update

Data Management & Other Tier1 Service Issues

Site Status Recent changes Planned changes
CERN CASTOR 2.1.10 (all)
SRM 2.10-x
xrootd: ALICE 2.1.10 update 1, others: 2.1.9-7
   
ASGC CASTOR 2.1.7-19 (stager, nameserver)
CASTOR 2.1.8-14 (tapeserver)
SRM 2.8-2
DPM 1.8.0-1
  26-27/4: network maintenance (link 1 link 2)
26-29/4: CASTOR upgrade: all CMS services will be unavailable; affecting tape for ATLAS; STAR channels stopped
BNL dCache 1.9.5-23 (PNFS, Postgres 9) none Transition to Chimera in summer 2011
CNAF StoRM 1.5.6-3 SL4 (CMS, LHCb,ALICE)
StoRM 1.6 SL5 (ATLAS)
   
FNAL dCache 1.9.5-23 (PNFS) httpd=1.9.5.-25
Scalla xrootd 2.9.1/1.4.2-4
Oracle Lustre 1.8.3
none none
IN2P3 dCache 1.9.5-24 (Chimera) on all core servers and pool nodes   Upgrade to version 1.9.5-25 on 2011-05-24 (scheduled site downtime).
KIT dCache (admin nodes): 1.9.5-15 (Chimera), 1.9.5-24 (PNFS)
dCache (pool nodes): 1.9.5-9 through 1.9.5-24
   
NDGF dCache 1.9.12    
NL-T1 dCache 1.9.5-23 (Chimera) (SARA), DPM 1.7.3 (NIKHEF)    
PIC dCache 1.9.5-25 (PNFS, Postgres 9)    
RAL CASTOR 2.1.10-0
2.1.9-1 (tape servers)
SRM 2.10-1,2.8-6
Upgrade of SRMs to 2.10-2 (apart from ALICE) Gen SRM upgrade to 2.10-2. Updates to support T10KC
TRIUMF dCache 1.9.5-21 with Chimera namespace None None

CASTOR news

CERN operations

Development

xrootd news

dCache news

StoRM news

FTS news

DPM news

  • DPM 1.8.0-2 for gLite 3.1 has been released to production on April 13

LFC news

  • LFC 1.8.0-1 for gLite 3.1: waiting for rebuild of the meta package with the correct VOMS libraries (1.9.10-14)

LFC deployment

Site Version OS, n-bit Backend Upgrade plans
ASGC 1.7.4-7 SLC5 64-bit Oracle None
BNL 1.8.0-1 SL5, 64-bit Oracle None
CERN 1.7.3 64-bit SLC4 Oracle Upgrade to SLC5 64-bit pending
CNAF 1.7.4-7 SL5 64-bit Oracle  
FNAL N/A     Not deployed at Fermilab
IN2P3 1.8.0-1 SL5 64-bit Oracle 11g Oracle DB migrated to 11g on Feb. 8th
KIT 1.7.4-7 SL5 64-bit Oracle Oracle backend migration pending
NDGF 1.7.4.7-1 Ubuntu 9.10 64-bit MySQL None
NL-T1 1.7.4-7 CentOS5 64-bit Oracle  
PIC 1.7.4-7 SL5 64-bit Oracle  
RAL 1.7.4-7 SL5 64-bit Oracle  
TRIUMF 1.7.3-1 SL5 64-bit MySQL  

Experiment issues

  • ATLAS requested to know which is the minimum version of StoRM that supports checksumming and proposes to adopt it as baseline version.
  • ATLAS asked if the overwrite option is supposed to work, as it does not on some sites. Patrick said that it is an option that can be enabled or not and will provide instructions. Maarten mentioned that it's a global site setting, but Simone said that this is not a problem for ATLAS-only sites. Simone will check with FTS where the overwrite works and where it does not, as it is also possible that FTS is not using it but it does a rm.

WLCG Baseline Versions

S2 status

Following the presentation on the developer needs for S2, Patrick remarked that S2 is useful not only in case of changes in the protocol but also to check that nothing breaks.

Jamie asked if dCache and CASTOR agree that S2 should be maintained by a third party; Patrick said yes, but there was no representative for CASTOR. Patrick said that if it was developed only by dCache and CASTOR, it would not be useful.

Consistency of Storage Elements and LFC

Status of open GGUS tickets

Review of recent / open SIRs and other open service issues

Conditions data access and related services

Database services

  • Experiment reports:
    • ALICE:
      • nothing to report
    • ATLAS:
      • replication to CNAF, SARA, NDGF was shut down permanently on Tuesday 12.04
      • ATLASDD which keeps ATLAS Geometry data was added to Atlas Conditions replication to Tier1s on Thursday 14.04.
      • On Monday(18.04) morning ATLAS replication of conditions data to T1s replication of ATLAS conditions was unavailable since 9:30 until 11:45 because of streams process deadlock which occurred after weekly short maintenance stop of replication service. To get rid of lock restart of downstream instances was required.
    • LHCB:
      • On Monday (18.04) morning same problem as in ATLAS case affected LHCB conditions replication. It replication was inactive between 9:30 and 11.30.
    • CMS:
      • On Tuesday 12th April between 15:30 and 16:30 there was a rolling intervention on CMS offline production database in order to install a patch supposed to fix a bug causing occassional crashes of the CMS T0AST application.
      • CMS condition apply crashed on Thursday afternoon (14,04) because procedure suggested by Oracle analyst was not correct (drop partitions was suggested without cleaning up support table).

  • Site reports:
Site Status, recent changes, incidents, ... Planned interventions
CERN First successful upgrade to 11.2.0.2 of TEST2 db; CPU April is out but it does not contain relevant fixes Upgrade of LCG integration to 11.2.0.2
ASGC    
BNL Following up some streams connectivity problems (opened SR) the database was rebooted in rolling fashion on Friday (15.04) On 27.04.2011 ATLAS Conditions oracle database service will be moved to a new data center room.
CNAF    
KIT Migration of FTS/LFC Oracle backend on April 7 failed due to DataGuard problems. Details under investigation.  
IN2P3 ntr none
NDGF    
PIC The yearly power maintenance on 19.04. Due to problems in cooling systems during the scheduled downtime, the downtime was extended by 24 hours none
RAL Castor SRM has been upgraded to 2.10, Transparent network intervention on Tuesday (19.04), high load on SRM , caused by wrong executon plan. none
SARA removal of all ATLAS conditions data planning of upgrade to 10.2.0.5
TRIUMF ntr none

AOB

-- JamieShiers - 19-Apr-2011

Edit | Attach | Watch | Print version | History: r17 < r16 < r15 < r14 < r13 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r17 - 2011-05-05 - AndreaSciaba
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback