WLCG Tier1 Service Coordination Minutes - 5 April 2012
Attendance
Local: Massimo, Oliver, Alexandre, MariaG, Maarten, Stefan, AndreaS, Eva, IanF, MariaDZ, Ale, Stephane.
Remote: Burt, Jhen-Wei,
IN2P3,
RAL, Carlos (BNL), Andrew (Triumf), Andreas (KIT), Joel, Alexei.
Action list review
Release update
Data Management & Other Tier1 Service Issues
Site |
Status |
Recent changes |
Planned changes |
CERN |
CASTOR 2.1.12-4 for all instances (w/ xroot-xcastor2fs_2112-1.1.0-1); SRM-2.11 for all instances. FTS: all nodes in SLC5 3.7.7-2 EOS 0.1.2-2 (w/ xrootd-3.1) for all instances |
None |
None |
ASGC |
CASTOR 2.1.11-6 SRM 2.11-0 DPM 1.8.2-3 |
None |
None |
BNL |
dCache 1.9.12.10 (Chimera, Postgres 9 w/ hot backup) http (aria2c) and xrootd/Scalla on each pool |
None |
None |
CNAF |
StoRM 1.8.1 (Atlas, CMS, LHCb) |
|
|
FNAL |
dCache 1.9.5-23 (PNFS, postgres 8 with backup, distributed SRM) httpd=2.2.3 Scalla xrootd 2.9.7/3.1.0.osg Oracle Lustre 1.8.6 EOS 0.1.1-12/xrootd 3.1.0.osg with Bestman 2.0.10 FTS 3.7.7 on SL5 |
|
|
IN2P3 |
dCache 1.9.12-16 (Chimera) on core servers and pool nodes. New hardware (more RAM, SSD disks) for Chimera and SRM servers (with SL6). Postgres 9.1 |
|
|
KIT |
dCache atlassrm-fzk.gridka.de: 1.9.12-11 (Chimera) cmssrm-fzk.gridka.de: 1.9.12-17 (Chimera) gridka-dcache.fzk.de: 1.9.12-17 (PNFS) xrootd (version 20100510-1509_dbg) |
Upgrade of cmssrm-fzk.gridka.de and gridka-dcache.fzk.de at 15.03.2012 |
Build of lhcbsrm-fzk.gridka.de sometime past May. |
NDGF |
dCache 2.1 (Chimera) on core servers. Mix of 1.9.13 and 2.0.1 on pool nodes. |
|
|
NL-T1 |
dCache 1.9.12-10 (Chimera) (SARA), DPM 1.7.3 (NIKHEF) |
|
|
PIC |
dCache 1.9.12-14; PNFS on Postgres 9.0 |
None |
None |
RAL |
CASTOR 2.1.11-8 2.1.11-8 (tape servers) SRM 2.11-1 |
None |
None |
TRIUMF |
dCache 1.9.5-28 with Chimera namespace |
None |
None |
Other site news
- FNAL: testing EOS version 0.1.4-1 (they have deployed 0.1.2-2). Saw some of the load issues with FTS 2.2.8, but have not yet applied the patches.
CASTOR news
CERN operations and development
EOS news
xrootd news
dCache news
StoRM news
FTS news
- FTS 2.2.8 EMI now running in production at: CERN on all the FTS servers (pilot, T2Export and T0Export), NDGF-T1, RAL-LCG2, Taiwan-LCG2, FZK-LCG2 (just yesterday, waiting confirmation everything is ok). Rollout to the T1s is ongoing.
- Summary of 2.2.8 rollout - https://svnweb.cern.ch/trac/glitefts/wiki/FTS228RolloutPlanning
- Detailed FTS server deployment plan: https://docs.google.com/spreadsheet/ccc?key=0AthhzXLQok7XdFpUeDBfLXE2S1RDZE4zcHp6QWVpUFE
.
- Alessandro from CNAF requested to postpone their date for FTS upgrade towards the end of the required period 28-29.3. Ale confirmed that 4 hrs max. are enough.
- CMS (Nicolo) said 3 GGUS tickets were opened with FTS 2.2.8 experience recently. The new functionality of 'resume' transfers requires a patch, now ready and being tested on the pilot system in order to use a less frequent check-point (now checking every second and overloading the system). The new EMI release in April will contain this patch.
- FTS 3 update by Oliver (slides on the agenda). Nxt demo on 21st March.
DPM news
- DPM 1.8.3, certification completed
- Will be released for EMI-I on SL5
- EPEL compatibility
- synchronous GET for performance improvements
- HTTP interface
- Integration of gridPP admin toolkit
- First SL6 support expected with EMI-II (May 2012)
- DPM 1.8.2-3 (EMI release) in production
- DPM 1.8.2-3 (gLite release) in production
- Periodic releases of new unstable components can be followed on the blog: https://svnweb.cern.ch/trac/lcgdm/blog
LFC news
- LFC 1.8.2-3 (EMI release) in production
- LFC 1.8.2-2 (gLite release) in production
LFC deployment
Site |
Version |
OS, n-bit |
Backend |
Upgrade plans |
NDGF |
1.7.4.7-1 |
Ubuntu 10.04 64-bit |
MySQL |
None |
TRIUMF |
1.7.3-1 |
SL5 64-bit |
MySQL |
None |
BNL |
1.8.0-1 |
SL5, 64-bit |
Oracle |
None |
CERN |
1.8.2-0 |
SLC5 64-bit |
Oracle |
all servers are SLC5 64-bit virtual machines |
CNAF |
1.8.0-1 |
SL5 64-bit |
Oracle |
24/4: upgrade to 1.8.2-2 |
KIT |
1.7.4-7 |
SL5 64-bit |
Oracle |
Oracle backend migration pending |
NL-T1 |
1.7.4-7 |
CentOS5 64-bit |
Oracle |
|
PIC |
1.8.2-2 |
SL5 64-bit |
Oracle |
|
RAL |
1.8.2-2 |
SL5 64-bit |
Oracle |
None |
IN2P3 |
1.8.2-2 |
SL5 64-bit |
Oracle 11g |
|
Experiment issues
- Joel (LHCb) requests that deployment of new major versions of LFC is followed by this meeting, as there are still sites with very old versions.
WLCG Baseline Versions
- WLCG Baseline versions: table
- Recent updates:
- WLCG Baseline versions: table
- CernVM-FS: revised RPMs; fixed presentation bug in
cvmfs_config chksetup
. The default maximum number of open files has been increased from 32k to 64k.
- StoRM: fixed some bugs in BE and FE; fixed bugs in YAIM related to publication of GLUE information.
- VOBOX: complete overhaul of the proxy renewal facility that fixes several bugs and adds new features.
Status of open GGUS tickets
No issues this time.
Review of recent / open SIRs and other open service issues
Conditions data access and related services
Database services
- Experiment reports:
- ALICE: ntr
- ATLAS: Increased the number of sessions for ATLAS LFC.
- CMS: ntr
- LHCb: ntr
Site |
Status, recent changes, incidents, ... |
Planned interventions |
BNL |
Interesting tests on SDU negotiation between client and server (see details bellow). CMS conditions database replication to BNL via Active Dataguard is being investigated. |
|
CNAF |
ntr |
|
KIT |
11g migration of all 3 RACs finished (FTS/LFC in March, LHCb 3D/LFC in Feb, ATLAS 3D in Jan) |
Compatible parameter change to 11.2.0.3.0 being scheduled |
IN2P3 |
|
|
PIC |
ntr |
|
RAL |
Fixed the problem seen for FTS running on 10g: Oracle bug, Patch 9949948: PROCESS SPIN UNDER KSFDRWAT0 IF AIO-MAX-NR TOO LOW |
|
SARA |
|
|
TRIUMF |
ntr |
|
- From Carlos (BNL):
- Via different functional tests between client and a test database server it was observed that the Session Data Unit (SDU) negotiation between client and database server could not be be grater 8*1024 SDU size in 11G (11.2.0.3) if the DEFAULT_SDU_SIZE parameter in the sqlnet.ora is set in the GRID_HOME as stated in different oracle documentation for this database release.
- After following up this with Oracle Support SR(3-5430820151) and providing different tests scenarios, oracle support acknowledged that the information presented is not accurate and this parameter should be set via sqlnet.ora file in the database DB_HOME, oracle support based this advise on oracle's internal documentation.
- After changing the location of the sqlnet.ora file as advised and doing some functional tests the negotiation between client database and server database for this SDU parameter it was observed values grater than 8*1024 up to 32*1024.
AOB
--
AndreaSciaba - 04-Apr-2012
Topic revision: r20 - 2012-05-03
- unknown