Week of 130812
Daily WLCG Operations Call details
To join the call, at 15.00 CE(S)T Monday to Friday inclusive (in CERN 513 R-068) do one of the following:
- Dial +41227676000 (Main) and enter access code 0119168, or
- To have the system call you, click here
The scod rota for the next few weeks is at
ScodRota
WLCG Availability, Service Incidents, Broadcasts, Operations Web
General Information
Monday
Attendance:
- local: Simone (SCOD), Ivan (WLCG Monitoring), Luca (CERN Databases), Luca (CERN Storage)
- remote: Dimitri (KIT), Sang-Un (KISTI), David (CMS), Michael (BNL), Tiju (RAL), Matteo (CNAF), Wei-Jen (ASGC), Onno (NL-T1), Pavol (ATLAS), Rob (OSG), Ulf (NDGF)
Experiments round table:
- ATLAS reports (raw view) -
- T0
- CERN-PROD GGUS:96524
FTS2 channel CERN->ASGC again stuck at 04:00 in the morning on Friday, after some investigation some spurious agent was removed in the afternoon, but it took around 48 hours, until backlog was cleared.
- CERN EOS GGUS:96519
Atlas EOS instance had problems (memory issues), restarted.
- T1
- FZK (GGUS:96245
) : Transfer problems from/to FZK with different sites affecting a fraction of transfers. Under investigation.
- CMS reports (raw view) -
- MC production and rereconstruction continue
- GGUS:96546/INC:356501
CMSEOS files not manifesting in global xrootd redirector, but are visible directly in eoscms.cern.ch -- possibly similar issue on May 30.
- Luca: cmsd daemon had a problem in reading a config file after a restart. Fixed.
- GGUS:96504
User with possibly expired certificate
- GGUS:96482
Transfers from Caltech to T1_UK_RAL -- investigation continues.
- GGUS:96559
Hammercloud failures reading files at ASGC -- in progress
- Wei-Jen: ASGC failed HC due to an expired host certificate. A new one has been requested.
Sites / Services round table:
- NL-T1: SARA had one pool node in HW maintenance this morning. Some files were unavailable.
- ASGC: scheduled downtime tonight for 1 day for network hardware upgrade.
- NDGF: problem with SRM during the weekend. 1/2 hour downtime between saturday and sunday
AOB:
Thursday
Attendance:
- local: Simone (SCOD), Yvan (WLCG Monitoring), Luca (CARN Databases), Luca (CERN Storage), Alex (CERN Grid Services), Vito (CERN Grid Services)
- remote: Michael (BNL), Woo Jin (KIT), David (CMS), John (RAL), Jeremy (GridPP), Rob (OSG), Wei-Jen (ASGC), Ulf (NDGF)
Experiments round table:
- CMS reports (raw view) -
- Relatively light activity -- primarily upgrade MC production
- No new GGUS tickets -- GGUS:96482
(Caltech/RAL transfers) waiting for more info, CMS transfer team will follow up.
- LHCb reports (raw view) -
- Mostly MC productions ongoing, tail of reprocessing and restripping campaign
- T0:
- T1:
- Recovered from network interuptions at RAL earlier in the week
- Local transfer failures at IN2P3 and SARA resolved (SRM overloads?)
Sites / Services round table:
- WLCG Monitoring:
- WLCG Transfers Dashboard: a new dashboard prototyping the future evolution of the WLCG Transfers Dashboard has been deployed. http://dashb-wlcg-transfers-new.cern.ch/
- follows a hierarchical architecture designed to provide a common feature set independent of transfer protocol, while delegating to FTS and XRootD Dashboards for protocol-specific features
- includes monitoring of ALICE XRootD data traffic.
- The current production WLCG Transfers Dashboard remains available. http://dashb-wlcg-transfers.cern.ch/
- ATLAS DDM Dashboard: monitoring of on-demand transfers for ATLAS (dq2-get / dq2-put) has been deployed to the integration version of ATLAS DDM Dashboard (http://dashb-atlas-data-soup-tbed.cern.ch/ddm2/#activity=%288%29
). This feature is currently undergoing validation by ATLAS before release to production.
- CERN Storage:
- Tue 20 there will be the upgrade of CASTOR oracle backend. Transparent.
- Grid services:
- the batch batch farm nodes will be reinstalled in a rolling fashion in the next days (transparent)
AOB: