WLCG Tier1 Service Coordination Minutes - 21 April 2011
Attendance
- Local: Manuel, Stephane, Simone, Alessandro, Steven, Maarten, Zbyszek, Andrea S, Fernando, Nicolo, Maria, Jamie, Ian, Elisa, Markus, Fabrizio
- Remote: Jon, Alexander Verkooijen, Michael, Felix, Carlos, Patrick, RAL, Carmine, TRIUMF - Andrew, Alexei, Rolf, Jhen-Wei, Gareth, Elizabeth
Action list review
Release update
Data Management & Other Tier1 Service Issues
Site |
Status |
Recent changes |
Planned changes |
CERN |
CASTOR 2.1.10 (all) SRM 2.10-x xrootd: ALICE 2.1.10 update 1, others: 2.1.9-7 |
|
|
ASGC |
CASTOR 2.1.7-19 (stager, nameserver) CASTOR 2.1.8-14 (tapeserver) SRM 2.8-2 DPM 1.8.0-1 |
|
26-27/4: network maintenance (link 1 link 2 ) 26-29/4: CASTOR upgrade : all CMS services will be unavailable; affecting tape for ATLAS; STAR channels stopped |
BNL |
dCache 1.9.5-23 (PNFS, Postgres 9) |
none |
Transition to Chimera in summer 2011 |
CNAF |
StoRM 1.5.6-3 SL4 (CMS, LHCb,ALICE) StoRM 1.6 SL5 (ATLAS) |
|
|
FNAL |
dCache 1.9.5-23 (PNFS) httpd=1.9.5.-25 Scalla xrootd 2.9.1/1.4.2-4 Oracle Lustre 1.8.3 |
none |
none |
IN2P3 |
dCache 1.9.5-24 (Chimera) on all core servers and pool nodes |
|
Upgrade to version 1.9.5-25 on 2011-05-24 (scheduled site downtime). |
KIT |
dCache (admin nodes): 1.9.5-15 (Chimera), 1.9.5-24 (PNFS) dCache (pool nodes): 1.9.5-9 through 1.9.5-24 |
|
|
NDGF |
dCache 1.9.12 |
|
|
NL-T1 |
dCache 1.9.5-23 (Chimera) (SARA), DPM 1.7.3 (NIKHEF) |
|
|
PIC |
dCache 1.9.5-25 (PNFS, Postgres 9) |
|
|
RAL |
CASTOR 2.1.10-0 2.1.9-1 (tape servers) SRM 2.10-1,2.8-6 |
Upgrade of SRMs to 2.10-2 (apart from ALICE) |
Gen SRM upgrade to 2.10-2. Updates to support T10KC |
TRIUMF |
dCache 1.9.5-21 with Chimera namespace |
None |
None |
CASTOR news
CERN operations
Development
xrootd news
dCache news
StoRM news
FTS news
DPM news
- DPM 1.8.0-2 for gLite 3.1 has been released to production on April 13
LFC news
- LFC 1.8.0-1 for gLite 3.1: waiting for rebuild of the meta package with the correct VOMS libraries (1.9.10-14)
LFC deployment
Site |
Version |
OS, n-bit |
Backend |
Upgrade plans |
ASGC |
1.7.4-7 |
SLC5 64-bit |
Oracle |
None |
BNL |
1.8.0-1 |
SL5, 64-bit |
Oracle |
None |
CERN |
1.7.3 64-bit |
SLC4 |
Oracle |
Upgrade to SLC5 64-bit pending |
CNAF |
1.7.4-7 |
SL5 64-bit |
Oracle |
|
FNAL |
N/A |
|
|
Not deployed at Fermilab |
IN2P3 |
1.8.0-1 |
SL5 64-bit |
Oracle 11g |
Oracle DB migrated to 11g on Feb. 8th |
KIT |
1.7.4-7 |
SL5 64-bit |
Oracle |
Oracle backend migration pending |
NDGF |
1.7.4.7-1 |
Ubuntu 9.10 64-bit |
MySQL |
None |
NL-T1 |
1.7.4-7 |
CentOS5 64-bit |
Oracle |
|
PIC |
1.7.4-7 |
SL5 64-bit |
Oracle |
|
RAL |
1.7.4-7 |
SL5 64-bit |
Oracle |
|
TRIUMF |
1.7.3-1 |
SL5 64-bit |
MySQL |
|
Experiment issues
- ATLAS requested to know which is the minimum version of StoRM that supports checksumming and proposes to adopt it as baseline version.
- ATLAS asked if the overwrite option is supposed to work, as it does not on some sites. Patrick said that it is an option that can be enabled or not and will provide instructions. Maarten mentioned that it's a global site setting, but Simone said that this is not a problem for ATLAS-only sites. Simone will check with FTS where the overwrite works and where it does not, as it is also possible that FTS is not using it but it does a rm.
WLCG Baseline Versions
S2 status
Following the presentation on the developer needs for S2, Patrick remarked that S2 is useful not only in case of changes in the protocol but also to check that nothing breaks.
Jamie asked if dCache and CASTOR agree that S2 should be maintained by a third party; Patrick said yes, but there was no representative for CASTOR. Patrick said that if it was developed only by dCache and CASTOR, it would not be useful.
Consistency of Storage Elements and LFC
Status of open GGUS tickets
Review of recent / open SIRs and other open service issues
Conditions data access and related services
Database services
- Experiment reports:
- ALICE:
- ATLAS:
- replication to CNAF, SARA, NDGF was shut down permanently on Tuesday 12.04
- ATLASDD which keeps ATLAS Geometry data was added to Atlas Conditions replication to Tier1s on Thursday 14.04.
- On Monday(18.04) morning ATLAS replication of conditions data to T1s replication of ATLAS conditions was unavailable since 9:30 until 11:45 because of streams process deadlock which occurred after weekly short maintenance stop of replication service. To get rid of lock restart of downstream instances was required.
- LHCB:
- On Monday (18.04) morning same problem as in ATLAS case affected LHCB conditions replication. It replication was inactive between 9:30 and 11.30.
- CMS:
- On Tuesday 12th April between 15:30 and 16:30 there was a rolling intervention on CMS offline production database in order to install a patch supposed to fix a bug causing occassional crashes of the CMS T0AST application.
- CMS condition apply crashed on Thursday afternoon (14,04) because procedure suggested by Oracle analyst was not correct (drop partitions was suggested without cleaning up support table).
Site |
Status, recent changes, incidents, ... |
Planned interventions |
CERN |
First successful upgrade to 11.2.0.2 of TEST2 db; CPU April is out but it does not contain relevant fixes |
Upgrade of LCG integration to 11.2.0.2 |
ASGC |
|
|
BNL |
Following up some streams connectivity problems (opened SR) the database was rebooted in rolling fashion on Friday (15.04) |
On 27.04.2011 ATLAS Conditions oracle database service will be moved to a new data center room. |
CNAF |
|
|
KIT |
Migration of FTS/LFC Oracle backend on April 7 failed due to DataGuard problems. Details under investigation. |
|
IN2P3 |
ntr |
none |
NDGF |
|
|
PIC |
The yearly power maintenance on 19.04. Due to problems in cooling systems during the scheduled downtime, the downtime was extended by 24 hours |
none |
RAL |
Castor SRM has been upgraded to 2.10, Transparent network intervention on Tuesday (19.04), high load on SRM , caused by wrong executon plan. |
none |
SARA |
removal of all ATLAS conditions data |
planning of upgrade to 10.2.0.5 |
TRIUMF |
ntr |
none |
AOB
--
JamieShiers - 19-Apr-2011