WLCG Tier1 Service Coordination Minutes - 3rd May 2012


  • Local: Alexei, Andrea S, Eva, Jan, Jhen-Wei, Luca C, Maarten, Maite, Massimo, NN, Nicolo, Simone, Stefan, Zbyshek
  • Remote: Alessandro C, Andreas P, Carlos, Carmine, Giovanni, Gonzalo, John, Lisa, Michael, Rob, Roger, Rolf, Ronald, Stephen, Xavier M

Action list review

Release update

Data Management & Other Tier1 Service Issues

Site Status Recent changes Planned changes
CERN CASTOR 2.1.12-4 for all instances (w/ xroot-xcastor2fs_2112-1.1.0-1); SRM-2.11 for all instances.
FTS: all nodes in SLC5 3.7.7-2
EOS 0.1.2-2 (w/ xrootd-3.1) for all instances
None update EOS - not released/scheduled yet
ASGC CASTOR 2.1.11-6
SRM 2.11-0
DPM 1.8.2-3
11-12/04: unscheduled downtime due to hardware failure in CASTOR
25-27/04: downtime (link1 link2) for CASTOR DB migration
BNL dCache (Chimera, Postgres 9 w/ hot backup)
http (aria2c) and xrootd/Scalla on each pool
None None
CNAF StoRM 1.8.1 (Atlas, CMS, LHCb)    
FNAL dCache 1.9.5-23 (PNFS, postgres 8 with backup, distributed SRM) httpd=2.2.3
Scalla xrootd 2.9.7/3.1.0.osg
Oracle Lustre 1.8.6
EOS 0.1.1-12/xrootd 3.1.0.osg with Bestman 2.0.10
FTS 3.7.7 on SL5
IN2P3 dCache 1.9.12-16 (Chimera) on core servers and pool nodes.
New hardware (more RAM, SSD disks) for Chimera and SRM servers (with SL6).
Postgres 9.1
None None
KIT dCache
atlassrm-fzk.gridka.de: 1.9.12-11 (Chimera)
cmssrm-fzk.gridka.de: 1.9.12-17 (Chimera)
gridka-dcache.fzk.de: 1.9.12-17 (PNFS)
xrootd (version 20100510-1509_dbg)
NDGF dCache 2.1 (Chimera) on core servers. Mix of 1.9.13 and 2.0.1 on pool nodes.    
NL-T1 dCache 1.9.12-10 (Chimera) (SARA), DPM 1.7.3 (NIKHEF)    
PIC dCache 1.9.12-14; PNFS on Postgres 9.0 none none
RAL CASTOR 2.1.11-8
2.1.11-8 (tape servers)
SRM 2.11-1
none 2.1.11-9 upgrade at beginning of June
TRIUMF dCache 1.9.5-28 with Chimera namespace None None

Other site news


CERN operations and development

EOS news

xrootd news

Found last minute bug in version 3.2.0, 3.2.1 soon to be released.

dCache news

StoRM news

FTS news

  • IN2P3-CC: FTS 2.2.8 EMI. Patch for globus-fts-client installed on Friday 27th of April: transfers are in GFTPv2 since then.
  • PIC: idem

  • Maarten: the only outstanding issue with the FTS appears to be the intermittent failures due to expired proxies on the server, e.g. reported by RAL
    • Michail: we are investigating that with high priority
    • Andreas P: also KIT is seeing that
  • Andreas P: some FTS rpms have dependency issues, had to be installed with "--nodeps"
    • after the meeting: Michail will contact FTS experts at KIT for further details
  • Andreas P: where are FTS-3 demos announced?
    • on fts3-steering AT cern.ch; we will announce the purpose of that list on fts-users AT cern.ch
      • all FTS admins should at least join the latter list!

DPM news

  • DPM 1.8.3 released by EMI
    • EPEL compatibility
    • synchronous GET for performance improvements
    • HTTP interface
    • Integration of gridPP admin toolkit
    • Added thread safe CSEC_MECH
    • Upgrade recommended for all sites running the EMI/UMD version, as some bugs were fixed that could hit any site during maintenance operations
      • gLite 3.2 version could also be affected
  • First SL6 support expected with EMI-II (May 2012)
  • DPM 1.8.2-3 (EMI release) in production
  • DPM 1.8.2-3 (gLite release) in production
  • Periodic releases of new unstable components can be followed on the blog: https://svnweb.cern.ch/trac/lcgdm/blog

LFC news

  • LFC 1.8.3 released by EMI
    • Added thread safe CSEC_MECH
    • Packaging is now Fedora compliant
    • Added support for MySQL 5.5 for the db creation script
    • Added sample my.cnf configuration file
    • New HTTP/DAV frontend daemon
  • LFC 1.8.2-3 (EMI release) in production
  • LFC 1.8.2-2 (gLite release) in production

LFC deployment

Site Version OS, n-bit Backend Upgrade plans
BNL 1.8.0-1 SL5, 64-bit Oracle None
CERN 1.8.2-0 SLC5 64-bit Oracle all servers are SLC5 64-bit virtual machines
CNAF 1.8.2-2 SL5 64-bit Oracle None
IN2P3 1.8.2-2 SL5 64-bit Oracle 11g  
KIT 1.8.2-2 SL5 64-bit Oracle Oracle backend migration pending
NDGF Ubuntu 10.04 64-bit MySQL None
NL-T1 1.8.2-2 CentOS5 64-bit Oracle  
PIC 1.8.2-2 SL5 64-bit Oracle  
RAL 1.8.2-2 SL5 64-bit Oracle None
TRIUMF 1.7.3-1 SL5 64-bit MySQL None

Experiment issues

WLCG Baseline Versions

  • WLCG Baseline versions: table
    • Maarten: that page should be taken seriously!
      • for example, the EMI/UMD top-level BDII has caching enabled; moreover, the gLite BDII is unsupported since May 1
      • for the EMI WN a few sites have set up or are setting up test queues that we will want to use also to test the EMI-2 SL6 versions at some point; we aim for clarifying the EMI-1 SL5 WN situation w.r.t. ATLAS and CMS jobs in June, such that sites hopefully can upgrade in the near future; the gLite WN is supported until the end of September

Status of open GGUS tickets

Review of recent / open SIRs and other open service issues

  • 2012-04-02 CASTORLHCB - 3 filesystems lost on diskonly pool: IncidentsDiskOnlyDataLoss20120402
  • 2012-04-17 GGUS:81352 : a short intrusive intervention in a disk-server delayed the ATLAS Tier0 processing for 2 hours.
  • 2012-04-23 GGUS:81512 : a recall requests backlog on a tape library impacted the ATLAS Tier0 activity. Reported at C5.

  • 2012-04-11: CASTOR crashed on April 11 2012 at ASGC (SIR)

Conditions data access and related services

Database services

  • Experiment reports:
    • ALICE:
      • Replication using Streams will be stopped in the next weeks. An Active DataGuard copy is ready and being tested currently.
    • ATLAS:
      • New ATLAS online Active DataGuard copy ready.
    • CMS:
      • Replication using Streams has been stopped. Using Active DataGuard copy.
      • CMS online Active DataGuard copy being prepared at BNL by Carlos.
    • LHCB: Conditions schema had to be recovered (rollback) by LHCb request. Consequently, the replication setup had to be stopped and recovered as well.

  • Site reports:
Site Status, recent changes, incidents, ... Planned interventions
CERN Oracle April 2012 security patches have been released. Being deployed at CERN - test/integration/development databases. Oracle Security Alert for CVE-2012-1675: vulnerability with the TNS listener.  
BNL Propagation connectivity affected from CERN to BNL database via connector descriptor using the scan address. Per connector descriptor redefinition at CERN source to use virtual ip address the propagation resumed. No issues observed from local client using the scan address or using vip when connecting to the database service. Applied latest OS and Database Security patches in test cluster. CMS Active data guard prototype: Legacy hardware for CMS Active Standby replication prototype was identified/upgraded/enabled.11g( installed. Oracle Enterprise Manager 12c installed, agents have been enabled initially in Atlas LFC replication test cluster. To apply latest CPU patches and OS kernel in production database services, schedule TBD.
KIT January CPU applied. Compatible parameter April CPU to be scheduled
PIC ntr  
RAL Waiting for test system to to test the patch about the scan-listener that becomes not reachable after a failover + April PSU. In two weeks time, upgrade the file catalog to version 182-2 and after that database upgrade to waiting for the schemas to be upgraded to 2.1.11-9 before database upgrade from to  
TRIUMF ntr ntr


-- JamieShiers - 02-May-2012
Edit | Attach | Watch | Print version | History: r15 < r14 < r13 < r12 < r11 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r15 - 2012-05-04 - MaartenLitmaath
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback