WLCG Operations Coordination Minutes - 18th October 2012

Agenda

Attendance

  • Local: Andrea Sciabà (secretary), Maria Girone, Andrea Valassi, Stefan Roiser, Felix Lee, Joel Closier, Domenico Giordano, Maite Barroso, Oliver Keeble, Maarten Litmaath, Nicolò Magini, Stephen Gowdy, Manuel Guijarro
  • Remote: Gonzalo Merino (chair), Donato Di Girolamo, Stephen Burke, Onno Zweers, Daniele Bonacorsi, Massimo Sgaravatto, Di Qing, Andreas Petzold, Oliver Gutsche, Gareth Smith
  • Apologies: Maria Dimou, Ikuo Ueda (on behalf of ATLAS)

Task Force reports

CVMFS

Progress since last meeting

We need to make a distinction about deployment and in production for a VO, i.e. some sites have deployed CVMFS but its not used yet, therefore also not visible in the statistics of the VOS

  • Sites that have deployed CVMFS since last meeting
    • AUVERGRID (ATLAS, LHCb)
    • CERN-PROD (CMS)
    • FZK-LCG2 (CMS)
    • INFN-T1 (CMS)
    • JINR-LCG2 (CMS)
    • MPPMU (ATLAS)
    • TR-10-ULAKBIM (ATLAS)

Gonzalo: do the experiments have deadlines for the CVMFS deployment?
Stefan: LHCb does not, but it should be completed as soon as possible; the aim is to have a single software deployment process.
Stephen & Oli: CMS does not have a deadline; we will anyway support CVMFS and a cron-based software installation system.

gLExec

  • Deployment status on Oct 9 (not presented at the GDB).

PerfSonar

Maria thanks Donato (CNAF) for volunteering for the PS task force and points out that the site representation is still a bit weak.

Tracking tools

The issues in Savannah:131988 and Savannah:132582 will be discussed at the next meeting on 2012/11/01. Please add your comments in these tickets to have max. input for our preparation.

SHA-2 migration

  • For CREAM only the EMI-3 release will support SHA-2.
  • See OSG update in Oct GDB
    • JGlobus-2 updated with support for legacy proxies
    • hopefully this will allow us to move to RFC proxies after SHA-2
    • the dCache developers still need to assess if the improved JGlobus-2 is sufficiently OK for the dCache deployments and use cases that we need to care about
    • to be continued...

Gonzalo points out that there is a GGUS ticket for dCache where one can follow the developments on this topic and Maarten says that indeed this is a top priority for the dCache developers.

Generic links:

FTS 3 integration and deployment

On October the 17th we had the FTS 3 Demo (http://indico.cern.ch/conferenceDisplay.py?confId=211609). After the FTS3Demo we discussed the series of tests for the next weeks: CMS is already running functional tests between RAL, ASGC and CERN, those will be kept on running. ATLAS will setup new DDMEndpoints (one for sure in BNL, to be seen if others will be needed) to start functional tests too. Depending on the results we will then see if we can start stress testing one of the instances.

FTS3 features demonstrated:

  • python client bindings
  • gridftp session reuse for gsiftp->gsiftp transfers
  • extra gridftp debug log file per transfer

Gonzalo: do you plan to ask more sites to join the testing?
Nicolò: no, at this stage when we run functional tests is not needed. Still sites can deploy an FTS3 pilot of they wish, in particular CNAF because we still miss a site with StoRM.

Middleware deployment

  • The EGI security dashboard will flag hosts found running services that are unsupported (gLite 3.1 and the majority of gLite 3.2 services). The COD team will open tickets for the corresponding sites to upgrade or decommission unsupported services by the end of this month or risk suspension of the entire site:

  • Worker Node testing for WLCG
    • Good progress: for most sites the experiments have given the green light to upgrade to the next EMI-2 WN update (which will include fixes for all issues that were blockers before).

Maarten: the only sites for which there are still issues are ATLAS sites using gsidcap. In parallel, there is also the migration to SL6, some sites already upgraded.
AndreaS: remember that there is a Nagios bug with SL6 worker nodes.
Maarten: yes, but there is a workaround consisting in using the nagios executable from the previous version.
Daniele announces that Christoph Wissing declared StoRM validated for CMS (it was another non-OK case).

XrootD

Domenico reports that the TF will meet next week, but already started collecting requirements from ATLAS and CMS. The TF membership is also expected to grow. They are in close contact with the storage federations working group. More news for the next meeting.

Squid monitoring

The squid monitoring task force met once and made some progress, but since the leader Dave Dykstra is on vacation this Thursday, a full report will be delayed until the next meeting. The other members of the task force are Dario Barberis, Alexandre Beche, Doug Benjamin, Barry Blumenfeld, Simone Campana, Alastair Dewhurst, Alessandro Di Girolamo, Stefan Roiser, and Andrea Valassi.

WMS future

Maarten: given the lower priority of this TF, it did not start yet.
Gonzalo: do the experiments have a date by when they want to stop using the WMS?
Maarten: there is some non-trivial effort to reduce the remaining dependencies, and there have been no strong complaints from experiments or sites about WMS issues.

News from other WLCG working gropus

No report.

Experiment operations review and plans

ALICE

  • EOS-ALICE suffered a number of instabilities, but appears to handle the full load (event and conditions data) OK now.
    Thanks IT-DSS for the various optimizations!
  • Migration from CASTOR ALICE_DISK to EOS: all read access to disk-based data now goes via EOS,
    which will redirect clients to CASTOR for files that have not been migrated yet.

ATLAS

See report.

CMS

See the slides.

LHCb

See the slides.

GGUS tickets

  • PIC Tier1 waiting for APEL answer at: GGUS:87263. APEL parser crashing due to some values in the virtual memory field in PBS exceed MySQL int sizes. Jar available that fixes it, but not yet posted on the ticket.

Tier-1 Grid services

Storage deployment

Site Status Recent changes Planned changes
CERN CASTOR 2.1.13-5; SRM-2.11 for all instances. EOS 0.2.18 for all instances except CMS Upgraded all CASTOR instances to 2.1.13-5; upgraded all EOS instances except CMS to 0.2.18; upgraded SRM-EOS enabling root protocol TURLs. EOSALICE and EOSATLAS had issues after upgrade, now stabilzed. EOSCMS upgrade to 0.2.18 postponed
ASGC CASTOR 2.1.11-9
SRM 2.11-0
DPM 1.8.2-5
None None
BNL dCache 1.9.12.10 (Chimera, Postgres 9 w/ hot backup)
http (aria2c) and xrootd/Scalla on each pool
None None
CNAF StoRM 1.8.1 (Atlas, CMS, LHCb)    
FNAL dCache 1.9.5-23 (PNFS, postgres 8 with backup, distributed SRM) httpd=2.2.3
Scalla xrootd 2.9.7/3.2.2-1.osg
Oracle Lustre 1.8.6
EOS 0.2.19/xrootd 3.2.2-1.osg with Bestman 2.2.2.0.10
   
IN2P3 dCache 1.9.12-16 (Chimera) on core servers and 1.9.12-24 and pool nodes.
New hardware (more RAM, SSD disks) for Chimera and SRM servers (with SL6).
Postgres 9.1
xrootd 3.0.4
migration from remaining solaris to linux servers upgrade xroot to 3.2.4
KIT dCache
atlassrm-fzk.gridka.de: 1.9.12-11 (Chimera)
cmssrm-fzk.gridka.de: 1.9.12-17 (Chimera)
gridka-dcache.fzk.de: 1.9.12-17 (PNFS)
xrootd (version 20100510-1509_dbg)
   
NDGF dCache 2.3 (Chimera) on core servers. Mix of 2.3 and 2.2 versions on pool nodes.    
NL-T1 dCache 2.2.4 (Chimera) (SARA), DPM 1.8.2 (NIKHEF)    
PIC dCache 1.9.12-20 (Chimera)    
RAL CASTOR 2.1.11-8/2.1.12-10
2.1.11-8/2.1.12-10 (tape servers)
SRM 2.11-1
ATLAS and CMS upgraded to 2.1.12 Gen and LHCb will be upgraded to 2.1.12 before end of Oct
TRIUMF dCache 1.9.12-19 with Chimera namespace    

FTS deployment

Site Version Recent changes Planned changes
CERN 2.2.8 - transfer-fts-3.7.10-1 "expired proxy" FTS patch applied this week  
ASGC 2.2.8 - transfer-fts-3.7.10-1 transfer-fts upgraded  
BNL 2.2.8 - transfer-fts-3.7.10-1 FTS patch applied None
CNAF 2.2.8 - transfer-fts-3.7.10-1 transfer-fts upgraded  
FNAL 2.2.8 - transfer-fts-3.7.7-2 transfer-fts upgraded on fts2 after 1week will apply to fts1
IN2P3 2.2.8 - transfer-fts-3.7.10-1 transfer-fts upgraded  
KIT 2.2.8 - transfer-fts-3.7.10-1 transfer-fts upgraded  
NDGF 2.2.8 - transfer-fts-3.7.10-1 transfer-fts upgraded  
NL-T1 2.2.8 - transfer-fts-3.7.10-1 transfer-fts upgraded  
PIC 2.2.8 - transfer-fts-3.7.10-1    
RAL 2.2.8 - transfer-fts-3.7.10-1    
TRIUMF 2.2.8 - transfer-fts-3.7.10-1 transfer-fts upgraded None

LFC deployment

Site Version OS, distribution Backend WLCG VOs Upgrade plans
BNL 1.8.0-1 for T1 and 1.8.3.1-1 for US T2s SL5, gLite Oracle ATLAS None
CERN 1.8.2-0 SLC5, gLite Oracle ATLAS, LHCb  

Other site news

Data management provider news

CASTOR news

CERN operations and development

EOS news

xrootd news

dCache news

StoRM news

DPM news

1.8.4 has been certified and is expected in the next EMI2 release (due before end of October)

  • First "dmlite" release bringing performance improvements to HTTP interface
  • rewritten dpm-xrootd plugin (already production tested)
  • 32bit library support, improved draining, bugfixes.

Latest developments in DPM/LFC will be presented Fri morning (accessible via vidyo) - https://indico.cern.ch/conferenceDisplay.py?confId=204968

FTS news

A patch for the proxy expiration issue is available and has been widely deployed, details in GGUS:81844 All remaining sites are encouraged to install this, instructions in the GGUS ticket.

The provision of a 2.2.9 release consolidating all outstanding fixes has been proposed.

LFC news

gfal/lcg_util news

gfal/lcg_util 1.13.9 has been certified and is expected in the next EMI2 update (before end of October). It contains fixes for the following issues found during EMI2 WN validation:

Middleware news and baseline versions (Nicoḷ)

https://twiki.cern.ch/twiki/bin/view/LCG/WLCGBaselineVersions

Gonzalo asks what are the plans for the VOBOX. Maarten replies that WLCG foresees to create a WLCG VOBOX by the end of the year.

AOB

Gonzalo asks if our WG should follow up on the ongoing work reported by Ulrich at the last GDB about the WN information.
Maria: yes, we should discuss it at the next planning meeting and possibly create a TF. Another news is that finally the list of the critical services and their criticality is complete, as ATLAS confirmed their latest numbers. We will show it at the planning meeting.

Action list

-- AndreaSciaba - 15-Oct-2012

Topic attachments
I Attachment History Action Size Date Who Comment
PDFpdf glexec-121010.pdf r1 manage 291.1 K 2012-10-17 - 22:35 MaartenLitmaath gLExec deployment status on Oct 9
Edit | Attach | Watch | Print version | History: r25 < r24 < r23 < r22 < r21 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r25 - 2012-10-23 - MaartenLitmaath
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback