WLCG Operations Coordination Minutes - 18th October 2012
Agenda
Attendance
- Local: Andrea Sciabà (secretary), Maria Girone, Andrea Valassi, Stefan Roiser, Felix Lee, Joel Closier, Domenico Giordano, Maite Barroso, Oliver Keeble, Maarten Litmaath, Nicolò Magini, Stephen Gowdy, Manuel Guijarro
- Remote: Gonzalo Merino (chair), Donato Di Girolamo, Stephen Burke, Onno Zweers, Daniele Bonacorsi, Massimo Sgaravatto, Di Qing, Andreas Petzold, Oliver Gutsche, Gareth Smith
- Apologies: Maria Dimou, Ikuo Ueda (on behalf of ATLAS)
Task Force reports
CVMFS
Progress since last meeting
We need to make a distinction about deployment and in production for a VO, i.e. some sites have deployed CVMFS but its not used yet, therefore also not visible in the statistics of the VOS
- Sites that have deployed CVMFS since last meeting
- AUVERGRID (ATLAS, LHCb)
- CERN-PROD (CMS)
- FZK-LCG2 (CMS)
- INFN-T1 (CMS)
- JINR-LCG2 (CMS)
- MPPMU (ATLAS)
- TR-10-ULAKBIM (ATLAS)
Gonzalo: do the experiments have deadlines for the CVMFS deployment?
Stefan: LHCb does not, but it should be completed as soon as possible; the aim is to have a single software deployment process.
Stephen & Oli: CMS does not have a deadline; we will anyway support CVMFS and a cron-based software installation system.
gLExec
- Deployment status on Oct 9 (not presented at the GDB).
PerfSonar
Maria thanks Donato (CNAF) for volunteering for the PS task force and points out that the site representation is still a bit weak.
Tracking tools
The issues in
Savannah:131988
and
Savannah:132582
will be discussed at the next meeting on 2012/11/01. Please add your comments in these tickets to have max. input for our preparation.
SHA-2 migration
- For CREAM only the EMI-3 release will support SHA-2.
- See OSG update in Oct GDB
- JGlobus-2 updated with support for legacy proxies
- hopefully this will allow us to move to RFC proxies after SHA-2
- the dCache developers still need to assess if the improved JGlobus-2 is sufficiently OK for the dCache deployments and use cases that we need to care about
- to be continued...
Gonzalo points out that there is a GGUS ticket for dCache where one can follow the developments on this topic and Maarten says that indeed this is a top priority for the dCache developers.
Generic links:
FTS 3 integration and deployment
On October the 17th we had the FTS 3 Demo (
http://indico.cern.ch/conferenceDisplay.py?confId=211609
). After the
FTS3Demo we discussed the series of tests for the next weeks: CMS is already running functional tests between
RAL, ASGC and CERN, those will be kept on running. ATLAS will setup new DDMEndpoints (one for sure in BNL, to be seen if others will be needed) to start functional tests too. Depending on the results we will then see if we can start stress testing one of the instances.
FTS3 features demonstrated:
- python client bindings
- gridftp session reuse for gsiftp->gsiftp transfers
- extra gridftp debug log file per transfer
Gonzalo: do you plan to ask more sites to join the testing?
Nicolò: no, at this stage when we run functional tests is not needed. Still sites can deploy an FTS3 pilot of they wish, in particular CNAF because we still miss a site with StoRM.
Middleware deployment
- The EGI security dashboard will flag hosts found running services that are unsupported (gLite 3.1 and the majority of gLite 3.2 services). The COD team will open tickets for the corresponding sites to upgrade or decommission unsupported services by the end of this month or risk suspension of the entire site:
- Worker Node testing for WLCG
- Good progress: for most sites the experiments have given the green light to upgrade to the next EMI-2 WN update (which will include fixes for all issues that were blockers before).
Maarten: the only sites for which there are still issues are ATLAS sites using gsidcap. In parallel, there is also the migration to SL6, some sites already upgraded.
AndreaS: remember that there is a Nagios bug with SL6 worker nodes.
Maarten: yes, but there is a workaround consisting in using the nagios executable from the previous version.
Daniele announces that Christoph Wissing declared StoRM validated for CMS (it was another non-OK case).
XrootD
Domenico reports that the TF will meet next week, but already started collecting requirements from ATLAS and CMS. The TF membership is also expected to grow. They are in close contact with the storage federations working group. More news for the next meeting.
Squid monitoring
The squid monitoring task force met once and made some progress, but since the leader Dave Dykstra is on vacation this Thursday, a full report will be delayed until the next meeting. The other members of the task force are Dario Barberis, Alexandre Beche, Doug Benjamin, Barry Blumenfeld, Simone Campana, Alastair Dewhurst, Alessandro Di Girolamo, Stefan Roiser, and Andrea Valassi.
WMS future
Maarten: given the lower priority of this TF, it did not start yet.
Gonzalo: do the experiments have a date by when they want to stop using the WMS?
Maarten: there is some non-trivial effort to reduce the remaining dependencies, and there have been no strong complaints from experiments or sites about WMS issues.
News from other WLCG working gropus
No report.
Experiment operations review and plans
ALICE
- EOS-ALICE suffered a number of instabilities, but appears to handle the full load (event and conditions data) OK now. Thanks IT-DSS for the various optimizations!
- Migration from CASTOR ALICE_DISK to EOS: all read access to disk-based data now goes via EOS, which will redirect clients to CASTOR for files that have not been migrated yet.
ATLAS
See
report
.
CMS
See the
slides
.
LHCb
See the
slides
.
GGUS tickets
- PIC Tier1 waiting for APEL answer at: GGUS:87263
. APEL parser crashing due to some values in the virtual memory field in PBS exceed MySQL int sizes. Jar available that fixes it, but not yet posted on the ticket.
Tier-1 Grid services
Storage deployment
Site |
Status |
Recent changes |
Planned changes |
CERN |
CASTOR 2.1.13-5; SRM-2.11 for all instances. EOS 0.2.18 for all instances except CMS |
Upgraded all CASTOR instances to 2.1.13-5; upgraded all EOS instances except CMS to 0.2.18; upgraded SRM-EOS enabling root protocol TURLs. EOSALICE and EOSATLAS had issues after upgrade, now stabilzed. |
EOSCMS upgrade to 0.2.18 postponed |
ASGC |
CASTOR 2.1.11-9 SRM 2.11-0 DPM 1.8.2-5 |
None |
None |
BNL |
dCache 1.9.12.10 (Chimera, Postgres 9 w/ hot backup) http (aria2c) and xrootd/Scalla on each pool |
None |
None |
CNAF |
StoRM 1.8.1 (Atlas, CMS, LHCb) |
|
|
FNAL |
dCache 1.9.5-23 (PNFS, postgres 8 with backup, distributed SRM) httpd=2.2.3 Scalla xrootd 2.9.7/3.2.2-1.osg Oracle Lustre 1.8.6 EOS 0.2.19/xrootd 3.2.2-1.osg with Bestman 2.2.2.0.10 |
|
|
IN2P3 |
dCache 1.9.12-16 (Chimera) on core servers and 1.9.12-24 and pool nodes. New hardware (more RAM, SSD disks) for Chimera and SRM servers (with SL6). Postgres 9.1 xrootd 3.0.4 |
migration from remaining solaris to linux servers |
upgrade xroot to 3.2.4 |
KIT |
dCache atlassrm-fzk.gridka.de: 1.9.12-11 (Chimera) cmssrm-fzk.gridka.de: 1.9.12-17 (Chimera) gridka-dcache.fzk.de: 1.9.12-17 (PNFS) xrootd (version 20100510-1509_dbg) |
|
|
NDGF |
dCache 2.3 (Chimera) on core servers. Mix of 2.3 and 2.2 versions on pool nodes. |
|
|
NL-T1 |
dCache 2.2.4 (Chimera) (SARA), DPM 1.8.2 (NIKHEF) |
|
|
PIC |
dCache 1.9.12-20 (Chimera) |
|
|
RAL |
CASTOR 2.1.11-8/2.1.12-10 2.1.11-8/2.1.12-10 (tape servers) SRM 2.11-1 |
ATLAS and CMS upgraded to 2.1.12 |
Gen and LHCb will be upgraded to 2.1.12 before end of Oct |
TRIUMF |
dCache 1.9.12-19 with Chimera namespace |
|
|
FTS deployment
Site |
Version |
Recent changes |
Planned changes |
CERN |
2.2.8 - transfer-fts-3.7.10-1 |
"expired proxy" FTS patch applied this week |
|
ASGC |
2.2.8 - transfer-fts-3.7.10-1 |
transfer-fts upgraded |
|
BNL |
2.2.8 - transfer-fts-3.7.10-1 |
FTS patch applied |
None |
CNAF |
2.2.8 - transfer-fts-3.7.10-1 |
transfer-fts upgraded |
|
FNAL |
2.2.8 - transfer-fts-3.7.7-2 |
transfer-fts upgraded on fts2 |
after 1week will apply to fts1 |
IN2P3 |
2.2.8 - transfer-fts-3.7.10-1 |
transfer-fts upgraded |
|
KIT |
2.2.8 - transfer-fts-3.7.10-1 |
transfer-fts upgraded |
|
NDGF |
2.2.8 - transfer-fts-3.7.10-1 |
transfer-fts upgraded |
|
NL-T1 |
2.2.8 - transfer-fts-3.7.10-1 |
transfer-fts upgraded |
|
PIC |
2.2.8 - transfer-fts-3.7.10-1 |
|
|
RAL |
2.2.8 - transfer-fts-3.7.10-1 |
|
|
TRIUMF |
2.2.8 - transfer-fts-3.7.10-1 |
transfer-fts upgraded |
None |
LFC deployment
Other site news
Data management provider news
CASTOR news
CERN operations and development
EOS news
xrootd news
dCache news
StoRM news
DPM news
1.8.4 has been certified and is expected in the next EMI2 release (due before end of October)
- First "dmlite" release bringing performance improvements to HTTP interface
- rewritten dpm-xrootd plugin (already production tested)
- 32bit library support, improved draining, bugfixes.
Latest developments in DPM/LFC will be presented Fri morning (accessible via vidyo) -
https://indico.cern.ch/conferenceDisplay.py?confId=204968
FTS news
A patch for the proxy expiration issue is available and has been widely deployed, details in
GGUS:81844
All remaining sites are encouraged to install this, instructions in the GGUS ticket.
The provision of a 2.2.9 release consolidating all outstanding fixes has been proposed.
LFC news
gfal/lcg_util news
gfal/lcg_util 1.13.9 has been certified and is expected in the next EMI2 update (before end of October). It contains fixes for the following issues found during EMI2 WN validation:
Middleware news and baseline versions (Nicoḷ)
https://twiki.cern.ch/twiki/bin/view/LCG/WLCGBaselineVersions
Gonzalo asks what are the plans for the VOBOX. Maarten replies that WLCG foresees to create a WLCG VOBOX by the end of the year.
AOB
Gonzalo asks if our WG should follow up on the ongoing work reported by Ulrich at the last GDB about the WN information.
Maria: yes, we should discuss it at the next planning meeting and possibly create a TF. Another news is that finally the list of the critical services and their criticality is complete, as ATLAS confirmed their latest numbers. We will show it at the planning meeting.
Action list
--
AndreaSciaba - 15-Oct-2012