WLCG Operations Coordination Minutes - May 22nd, 2014

Agenda

Attendance

  • local:
  • remote:

News

  • 2014 WLCG Workshop in Barcelona (7-9 July):
    • The WLCG workshop agenda is now available in Indico. We will start contacting the speakers to define the contents and details of each talk, and we will also contact the experiments to start thinking about the experiment session. We would like to see part of each experiment session dedicated to the long term future and hear about computing model evolution for Run3/Run4.
    • Please, register to the conference if you are planning to come! Registration deadline is 9th of June.

Middleware news and baseline versions

https://twiki.cern.ch/twiki/bin/view/LCG/WLCGBaselineVersions

Tier-0 and Tier-1 Grid services

Storage deployment

Site Status Recent changes Planned changes
CERN CASTOR:
v2.1.14-11 and SRM-2.11-2 on ATLAS, ALICE, CMS and LHCB
EOS:
ALICE (EOS 0.3.4 / xrootd 3.3.4)
ATLAS (EOS 0.3.8 / xrootd 3.3.4 / BeStMan2-2.3.0)
CMS (EOS 0.3.7 / xrootd 3.3.4 / BeStMan2-2.3.0)
LHCb (EOS 0.3.3 / xrootd 3.3.4 / BeStMan2-2.3.0 (OSG pre-release))
   
ASGC CASTOR 2.1.13-9
CASTOR SRM 2.11-2
DPM 1.8.7-3
xrootd
3.3.4-1
None None
BNL dCache 2.6.18 (Chimera, Postgres 9.3 w/ hot backup)
http (aria2c) and xrootd/Scalla on each pool
None None
CNAF StoRM 1.11.3 emi3 (ATLAS, LHCb, CMS)    
FNAL dCache 2.2 (Chimera, postgres 9) for disk instance; dCache 1.9.5-23 (PNFS, postgres 8 with backup, distributed SRM) for tape instance; httpd=2.2.3
Scalla xrootd 3.3.6-1
EOS 0.3.21-1/xrootd 3.3.6-1.slc5 with Bestman 2.3.0.16
   
IN2P3 dCache 2.6.18-1 (Chimera) on SL6 core servers and pool nodes
Postgres 9.2
xrootd 3.3.4 (Alice T1), xrootd 3.3.4 (Alice T2)
FAX federation enabled
CMS (T1+T2) federation enabled
perhaps dCache upgrade on some Solaris servers holding staging pools
JINR-T1 dCache
  • srm-cms.jinr-t1.ru: 2.6.25
  • srm-cms-mss.jinr-t1.ru: 2.2.24 with Enstore
xrootd federation host for CMS: 3.3.6
   
KISTI xrootd v3.3.4 on SL6 (redirector only; servers are still 3.2.6 on SL5 to be upgraded) for disk pools (ALICE T1)
xrootd 20100510-1509_dbg on SL6 for tape pool
xrootd v3.2.6 on SL5 for disk pools (ALICE T2)
dpm 1.8.7-4
   
KIT dCache
  • atlassrm-fzk.gridka.de: 2.6.21-1
  • cmssrm-kit.gridka.de: 2.6.17-1
  • lhcbsrm-kit.gridka.de: 2.6.17-1
xrootd
  • alice-tape-se.gridka.de 20100510-1509_dbg
  • alice-disk-se.gridka.de 3.2.6
  • ATLAS FAX xrootd redirector 3.3.3-1
  Downtime 26th/27th May for updating to the latest available dCache release in the 2.6 branch, probably 2.6.28
NDGF dCache 2.8.2 (Chimera) on core servers and on pool nodes.    
NL-T1 dCache 2.2.17 (Chimera) (SURFsara), DPM 1.8.7-3 (NIKHEF)    
PIC dCache head nodes (Chimera) and doors at 2.2.23-1
xrootd door to VO severs (3.3.4)
xrootd proxy deployed for ATLAS / being deployed for CMS Before summer 2.6 / After summer 2.10, most likely
RAL CASTOR 2.1.13-9
2.1.14-5 (tape servers)
SRM 2.11-1
  Upgrade to Castor 2.1.14. Plan for Nameserver on 10th June (date to be confirmed). Stagers over following 2 to 3 weeks.
RRC-KI-T1 dCache 2.2.24 + Enstore (ATLAS)
dCache 2.6.22 (LHCb)
xrootd - EOS 0.3.19 (Alice)
   
TRIUMF dCache 2.6.21 None None

FTS deployment

Site Version Recent changes Planned changes
CERN 2.2.8 - transfer-fts-3.7.12-1    
CERN 3.2.22    
ASGC 2.2.8 - transfer-fts-3.7.12-1 None None
BNL 2.2.8 - transfer-fts-3.7.10-1 None None
CNAF 2.2.8 - transfer-fts-3.7.12-1    
FNAL 2.2.8 - transfer-fts-3.7.12-1    
FNAL fts-server-3.2.3-5    
IN2P3 2.2.8 - transfer-fts-3.7.12-1    
JINR-T1 2.2.8 - transfer-fts-3.7.12-1    
KIT 2.2.8 - transfer-fts-3.7.12-1    
NDGF 2.2.8 - transfer-fts-3.7.12-1    
NL-T1 2.2.8 - transfer-fts-3.7.12-1    
PIC 2.2.8 - transfer-fts-3.7.12-1 None Deprecation by August 2014
RAL 2.2.8 - transfer-fts-3.7.12-1    
RAL 3.2.22    
TRIUMF 2.2.8 - transfer-fts-3.7.12-1    

LFC deployment

Site Version OS, distribution Backend WLCG VOs Upgrade plans
BNL 1.8.3.1-1 for T1 and US T2s SL6, gLite ORACLE 11gR2 ATLAS None
CERN 1.8.7-4 SLC6, EPEL Oracle 11 ATLAS, OPS, ATLAS Xroot federations  
CERN 1.8.7-4 SLC6, EPEL Oracle 12 LHCb  

Oracle deployment

  • Note: only Oracle instances with a direct impact on offline computing activities of LHC experiments are tracked here
  • Note: an explicit entry for specific instances is needed only during upgrades, listing affected services. Otherwise sites may list a single entry.

Site Instances Current Version WLCG services Upgrade plans
CERN CMSR 11.2.0.4 CMS computing services Done on Feb 27th
CERN CASTOR Nameserver 11.2.0.4 CASTOR for LHC experiments Done on Mar 04th
CERN CASTOR Public 11.2.0.4 CASTOR for LHC experiments Done on Mar 06th
CERN CASTOR Alicestg, Atlasstg, Cmsstg, LHCbstg 11.2.0.4 CASTOR for LHC experiments Done: 10-14-25th March
CERN LCGR 11.2.0.4 All other grid services (including e.g. Dashboard, FTS) Done: 18th March
CERN LHCBR 12.1.0.1 LHCb LFC, LHCb Dirac bookkeeping Done: 24th of March
CERN ATLR, ADCR 11.2.0.4 ATLAS conditions, ATLAS computing services Done: April 1st
CERN HR DB 11.2.0.4 VOMRS Done: April 14th
CERN CMSONR_ADG 11.2.0.4 CMS conditions (through Frontier) Done: May 7th
BNL   11.2.0.3 ATLAS LFC, ATLAS conditions TBA: upgrade to 11.2.0.4 (tentatively September)
RAL   11.2.0.3 ATLAS conditions TBA: upgrade to 11.2.0.4 (tentatively September)
IN2P3   11.2.0.4 ATLAS conditions Done: 13th of May
TRIUMF TRAC 11.2.0.4 ATLAS conditions Done

T0 news

  • Quattor phase-out: CERN is currently migrating all centrally managed services from Quattor to a new Puppet based Configuration Management system. This migration is meant to be fully finished by 31st October 2014. On that date, all components of the Quattor infrastructure (CDB, CDBSQL, CDBWeb, SINDES, SWREP, SMS, LEAF tools and CLUMAN) will stop working.
  • SHA-2 Certificates have been automatically added to all users in the 4 LHC VOs.

Other site news

Data management provider news

Experiments operations review and Plans

ALICE

  • activities for Quark Matter 2014 have ramped down
  • high production activity has taken over
  • CERN
    • SLC6 job efficiencies:
      • various data analytics and comparison efforts ongoing
      • new VOBOX has been set up to target physical SLC6 hosts only

ATLAS

  • MC production and analysis: stable load in the past week
    • MC prod workload is available till the start of DC14 (mid/endJune), but single-core only
    • occasional multi-core validation tasks
  • rucio full chain testing starting now. Ramping up in the next 4/6 weeks to stress test the Rucio components. Still proper monitoring missing, more news in one week from now at the ADC weekly. This will not impact in terms of data transfers the normal ATLAS activities, and thus not even the other experiments nor sites.
  • migration of sites from LFC to Rucio: all clouds migrated but US, which is ongoing. The CERN LFC is not used anymore. We will discuss with CERN-IT in the coming week to snaphot the DB and close the frontends.
  • issue about rfc proxy support in condor cream submission. GGUS:105188

CMS

  • High priority Production and Processings
    • Heavy Ion MC (almost done now)
    • Upgrade MC
    • CSA14 preparation (13 TeV MC)
  • Made CMS SAM test for glexec critical on May 19th
  • SAM test for xrootd fallback
    • Not yet critical
    • Still waiting (mainly) for RAL to fix some issues
  • FTS3 for Phedex Debug transfers becoming mandatory now
    • Have sent tickets to sites
  • Need to deprecate Savannah to GGUS bridge in GGUS Mai release (May 26th)
    • Relies on old CMS siteDB API, which is being decommissioned June 3rd
    • Changing CMS shifter instructions to use GGUS directly
    • Operations effort is moving to GGUS anyway

LHCb

Ongoing Task Forces and Working Groups

Tracking tools evolution TF

  • GGUS release on the 26th. The alarms for UK and USA will be done on the 27th. The rest, on the 26th
  • The Savannah-GGUS bridge for CMS will be decomissioned in this release

FTS3 Deployment TF

  • CMS opening GGUS tickets to sites to complete migration to FTS3 in PhEDEx Debug.

gLExec deployment TF

Machine/Job Features

  • NTR

Middleware readiness WG

  • 4th meeting took place on May 15
    • agenda
    • minutes will be announced
  • next meeting on Wed July 2, 16:00-17:30 CEST

Multicore deployment

  • NTR

SHA-2 Migration TF

  • introduction of the new VOMS servers
    • blocking issue - job submission to CREAM fails when the proxy was signed by a VOMS server with a SHA512 host certificate (GGUS:104768)
    • the fix has been put into the EMI-3 third-party repo on May 16
      • bouncycastle-mail-1.46-2
    • no official announcement yet about which node types are affected
      • CREAM and/or UI
      • others?
    • we also need the fix to become available in UMD
    • all sites then need to update their affected hosts
    • we will define a new timeline accordingly
  • RFC proxies
    • ATLAS discovered that Condor still uses an old CREAM client that does not support RFC proxies
      • their pilot factories thus need to keep using legacy proxies for the time being
      • this matter will be followed up with the Condor devs
    • CMS intended to switch their SAM preprod instance to RFC proxies
      • that should still work now, but would fail when WMS submission is replaced by Condor-G submission

WMS decommissioning TF

IPv6 validation and deployment TF

HTTP proxy discovery TF

  • NTR

Network and transfer metrics WG

Action list

AOB

-- NicoloMagini - 19 May 2014

Edit | Attach | Watch | Print version | History: r26 | r21 < r20 < r19 < r18 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r19 - 2014-05-22 - AlastairDewhurst
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback