WLCG Tier1 Service Coordination Minutes - 5 April 2012


Local: Massimo, Oliver, Alexandre, MariaG, Maarten, Stefan, AndreaS, Eva, IanF, MariaDZ, Ale, Stephane.

Remote: Burt, Jhen-Wei, IN2P3, RAL, Carlos (BNL), Andrew (Triumf), Andreas (KIT), Joel, Alexei.

Action list review

Release update

Data Management & Other Tier1 Service Issues

Site Status Recent changes Planned changes
CERN CASTOR 2.1.12-4 for all instances (w/ xroot-xcastor2fs_2112-1.1.0-1); SRM-2.11 for all instances.
FTS: all nodes in SLC5 3.7.7-2
EOS 0.1.2-2 (w/ xrootd-3.1) for all instances
None None
ASGC CASTOR 2.1.11-6
SRM 2.11-0
DPM 1.8.2-3
None None
BNL dCache (Chimera, Postgres 9 w/ hot backup)
http (aria2c) and xrootd/Scalla on each pool
None None
CNAF StoRM 1.8.1 (Atlas, CMS, LHCb)    
FNAL dCache 1.9.5-23 (PNFS, postgres 8 with backup, distributed SRM) httpd=2.2.3
Scalla xrootd 2.9.7/3.1.0.osg
Oracle Lustre 1.8.6
EOS 0.1.1-12/xrootd 3.1.0.osg with Bestman 2.0.10
FTS 3.7.7 on SL5
IN2P3 dCache 1.9.12-16 (Chimera) on core servers and pool nodes.
New hardware (more RAM, SSD disks) for Chimera and SRM servers (with SL6).
Postgres 9.1
KIT dCache
atlassrm-fzk.gridka.de: 1.9.12-11 (Chimera)
cmssrm-fzk.gridka.de: 1.9.12-17 (Chimera)
gridka-dcache.fzk.de: 1.9.12-17 (PNFS)
xrootd (version 20100510-1509_dbg)
Upgrade of cmssrm-fzk.gridka.de and gridka-dcache.fzk.de at 15.03.2012 Build of lhcbsrm-fzk.gridka.de sometime past May.
NDGF dCache 2.1 (Chimera) on core servers. Mix of 1.9.13 and 2.0.1 on pool nodes.    
NL-T1 dCache 1.9.12-10 (Chimera) (SARA), DPM 1.7.3 (NIKHEF)    
PIC dCache 1.9.12-14; PNFS on Postgres 9.0 None None
RAL CASTOR 2.1.11-8
2.1.11-8 (tape servers)
SRM 2.11-1
None None
TRIUMF dCache 1.9.5-28 with Chimera namespace None None

Other site news

  • FNAL: testing EOS version 0.1.4-1 (they have deployed 0.1.2-2). Saw some of the load issues with FTS 2.2.8, but have not yet applied the patches.


CERN operations and development

EOS news

xrootd news

dCache news

StoRM news

FTS news

  • FTS 2.2.8 EMI now running in production at: CERN on all the FTS servers (pilot, T2Export and T0Export), NDGF-T1, RAL-LCG2, Taiwan-LCG2, FZK-LCG2 (just yesterday, waiting confirmation everything is ok). Rollout to the T1s is ongoing.
  • Summary of 2.2.8 rollout - https://svnweb.cern.ch/trac/glitefts/wiki/FTS228RolloutPlanning
    • Detailed FTS server deployment plan: https://docs.google.com/spreadsheet/ccc?key=0AthhzXLQok7XdFpUeDBfLXE2S1RDZE4zcHp6QWVpUFE .
    • Alessandro from CNAF requested to postpone their date for FTS upgrade towards the end of the required period 28-29.3. Ale confirmed that 4 hrs max. are enough.
    • CMS (Nicolo) said 3 GGUS tickets were opened with FTS 2.2.8 experience recently. The new functionality of 'resume' transfers requires a patch, now ready and being tested on the pilot system in order to use a less frequent check-point (now checking every second and overloading the system). The new EMI release in April will contain this patch.
  • FTS 3 update by Oliver (slides on the agenda). Nxt demo on 21st March.

DPM news

LFC news

LFC deployment

Site Version OS, n-bit Backend Upgrade plans
BNL 1.8.0-1 SL5, 64-bit Oracle None
CERN 1.8.2-0 SLC5 64-bit Oracle all servers are SLC5 64-bit virtual machines
CNAF 1.8.0-1 SL5 64-bit Oracle 24/4: upgrade to 1.8.2-2
IN2P3 1.8.2-2 SL5 64-bit Oracle 11g  
KIT 1.7.4-7 SL5 64-bit Oracle Oracle backend migration pending
NDGF Ubuntu 10.04 64-bit MySQL None
NL-T1 1.7.4-7 CentOS5 64-bit Oracle  
PIC 1.8.2-2 SL5 64-bit Oracle  
RAL 1.8.2-2 SL5 64-bit Oracle None
TRIUMF 1.7.3-1 SL5 64-bit MySQL None

Experiment issues

  • Joel (LHCb) requests that deployment of new major versions of LFC is followed by this meeting, as there are still sites with very old versions.

WLCG Baseline Versions

  • WLCG Baseline versions: table
  • Recent updates:
    • CASTOR: bug fix release.
  • WLCG Baseline versions: table
    • CernVM-FS: revised RPMs; fixed presentation bug in cvmfs_config chksetup. The default maximum number of open files has been increased from 32k to 64k.
    • StoRM: fixed some bugs in BE and FE; fixed bugs in YAIM related to publication of GLUE information.
    • VOBOX: complete overhaul of the proxy renewal facility that fixes several bugs and adds new features.

Status of open GGUS tickets

No issues this time.

Review of recent / open SIRs and other open service issues

Conditions data access and related services

Database services

  • Experiment reports:
    • ALICE: ntr
    • ATLAS: Increased the number of sessions for ATLAS LFC.
    • CMS: ntr
    • LHCb: ntr

  • Site reports:
Site Status, recent changes, incidents, ... Planned interventions
BNL Interesting tests on SDU negotiation between client and server (see details bellow). CMS conditions database replication to BNL via Active Dataguard is being investigated.  
CNAF ntr  
KIT 11g migration of all 3 RACs finished (FTS/LFC in March, LHCb 3D/LFC in Feb, ATLAS 3D in Jan) Compatible parameter change to being scheduled
PIC ntr  
RAL Fixed the problem seen for FTS running on 10g: Oracle bug, Patch 9949948: PROCESS SPIN UNDER KSFDRWAT0 IF AIO-MAX-NR TOO LOW  
TRIUMF ntr  

  • From Carlos (BNL):
    • Via different functional tests between client and a test database server it was observed that the Session Data Unit (SDU) negotiation between client and database server could not be be grater 8*1024 SDU size in 11G ( if the DEFAULT_SDU_SIZE parameter in the sqlnet.ora is set in the GRID_HOME as stated in different oracle documentation for this database release.
    • After following up this with Oracle Support SR(3-5430820151) and providing different tests scenarios, oracle support acknowledged that the information presented is not accurate and this parameter should be set via sqlnet.ora file in the database DB_HOME, oracle support based this advise on oracle's internal documentation.
    • After changing the location of the sqlnet.ora file as advised and doing some functional tests the negotiation between client database and server database for this SDU parameter it was observed values grater than 8*1024 up to 32*1024.


-- AndreaSciaba - 04-Apr-2012

Edit | Attach | Watch | Print version | History: r20 < r19 < r18 < r17 < r16 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r20 - 2012-05-03 - GiuseppeLoPresti
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback