WLCG Operations Coordination Minutes, September 17th 2015
Highlights
Agenda
Attendance
- local: Maria Dimou (Minutes), Andrea Sciaba (chair), Maarten Litmaath, Andrea Manzi, Marian Babik, David Cameron, Giuseppe Lo Presti EDIT AFTER THE MEETING
- remote: Alessandra Forti, Antonio Maria Perez Calero Yzquierdo, Christoph Wissing, Frederique Chollet, Felix Lee, Maite Barroso, Michael Ernst, Peter Gronbech, Renaud Vernet, Ult Tigerstedt, Rob Quick, Thomas Hartmann, Alessandro Cavalli, Vincenzo Spinoso, Pepe Flix, Alessandra Doria EDIT AFTER THE MEETING
- apologies: EDIT AFTER THE MEETING
Operations News
Middleware News
- Baselines:
- Issues:
- T0 and T1 services
- CERN
- FTS3 will be upgraded to 3.3.1 on Monday 21st of September
- IN2P3
- dCache upgrade (2.10.40) on core servers next week (22/09/2015)
- JINR
- dCache updated to 2.10.40
- KIT
- update all of dCache to latest version in 2.13 branch during GridKa's annual downtime from 29th of September till 1st of October.
- RRC-KI-T1
- dCache upgrade to 2.10.39 on pools and gridftp doors
Tier 0 News
Tier 1 Feedback
* NDGF-T1 enabled IPv6 for SRM on 14.9.2015. We request that the FTS3 developers would take
https://its.cern.ch/jira/browse/DMC-681
seriously and fix the issue. As long as we are alone with IPv6 enabled it should be no problem, but there are already T2s with dualstack. Also ARC did not support delayed passive with IPv6, causing all reads and writes to be proxied. This has already been fixed in ARC and a new release is coming soon.
Tier 2 Feedback
Experiments Reports
ALICE
- high activity
- CERN
- team ticket GGUS:116095
about expired CRLs on myproxy.cern.ch
- IPv6 connectivity issue in the Wigner data center was fixed
- Accessing CASTOR for reading or writing raw data files:
- Various constructive meetings between ALICE experts and the CASTOR team.
- Short- and longer-term ideas were discussed.
- Reco jobs now download the raw data files instead of streaming them.
- The effect should become visible when more data is ready for processing.
- Further ideas involving EOS are being investigated.
- DAQ and CASTOR experts also retraced how a particular file ended up lost.
- Thanks for the good support!
ATLAS
CMS
LHCb
- Data Processing
- Validation and data quality verification of data are going. All data is buffered on disk resident areas -> no staging
- Operations
- Ongoing discussion with IT/PES about worker nodes which are executing payloads significantly slower (e.g. GGUS:116023
)
Ongoing Task Forces and Working Groups
gLExec Deployment TF
HTTP Deployment TF
Information System Evolution
- WLCG Information System Use Cases document presented at the MB
- MB gave feedback to work on several areas that need further discussion and agreement within the TF:
- Future Use Cases: use cases document describes the current interactions with the IS. The TF should now investigate what it is actually needed so that we can better understand how the IS could evolve.
- Static vs Dynamic: MB would like to see summarised the types of information actually needed by the experiments. Probably a more elaborated version of what it is already summarised in this twiki under Types of Information and focus only in the future use cases.
- "Indicative pledges" per site in REBUS: The TF requested the MB to include "indicative pledges" per site in REBUS. MB would like to understand why this information is needed and have a concrete proposal on how it will be collected.
- Installed capacity: a better definition, and maybe also name, is needed for what it is called today "installed capacity". MB would also like to understand why this information is needed and also how it will be collected.
- T3s and opportunistic resources: it would be good to understand how information is going to be collected from T3s and opportunistic resources.
- OSG, NDGF and EGI will present their plans to provide information about their resources in the future at the next TF meeting. GOCDB will also present the latest features.
IPv6 Validation and Deployment TF
Update on the status of IPv6 deployment in WLCG (from Bruno Hoeft)
Tier-1 |
Site |
LHCOPN IPv6 peering |
LHCONE IPv6 peering |
perfSONAR via IPv6 |
ASGC |
- |
- |
- |
BNL |
not on their priority list |
CH-CERN |
yes |
yes |
LHC[OPN/ONE] |
DE-KIT |
yes |
yes |
LHC[OPN/ONE] |
FNAL |
yes |
yes |
LHC[OPN/ONE] but not yet visible in Dashboard |
FR-CCIN2P3 |
yes |
yes |
LHC[OPN/ONE] but not yet visible in Dashboard |
IT-INFN-CNAF |
- |
yes |
LHCONE |
NDGF |
yes |
yes |
LHC[OPN/ONE] |
ES-PIC |
yes |
yes |
LHCOPN |
KISTI |
started but no peering implemented |
NL-T1 |
no peering implemented |
TRIUMF |
IPv6 peering planned at end of 2015 |
RRC-KI-T1 |
- |
- |
- |
Tier-2 |
Site |
LHCONE IPv6 peering |
perfSONAR |
DESY |
yes |
LHCONE |
CEA SACLAY |
yes |
- |
ARNES |
yes |
- |
WISC-MADISON |
yes |
- |
UK sites |
QMUL peers with LHCONE but not for IPv6 |
Prague FZU |
IPv6 still working but the previous contact person left |
There are additional IPv6 perfSONAR servers at Tier-2 centres, but not via LHCONE.
Machine/Job Features
Middleware Readiness WG
Multicore Deployment
Network and Transfer Metrics WG
- OSG perfSONAR datastore
entered production on 14th of Sept providing storage and interface for all perfSONAR results.
- Publishing of the perfSONAR results using pre-production (ITB) services was successfully established, working to resolve issue with some event types not being published, production still pending SLA.
- WLCG-wide meshes campaign with latency testing ramped up to 81 sonars caused some instabilities of the sonars with 4GB RAM, therefore we have decreased the number of tests performed and this has improved the situation.
- Final version of the perfSONAR 3.5 is planned to be released on 28th of September and will be auto-deployed to all WLCG instances. There were no issues found in the testbed, but we plan to update couple of production instances in advance to check if everything is fine.
- ESNet and OSG have started developments on the perfSONAR configuration interface - open source project motivated by the existing version developed for WLCG. There has been also interest from GEANT and ESNet to collaborate on an open source project based on the existing proximity service.
- Follow up meeting was held to discuss findings of the FTS performance study lead by Saul Youssef (Boston University), new optimization algorithm was proposed and discussed.
- Next WG meeting will be on 30th of Sept (https://indico.cern.ch/event/400643/
)
RFC proxies
Squid Monitoring and HTTP Proxy Discovery TFs
- Alastair is making progress on the next deliverable (a flexible squid registration exception list), but is not quite ready to put it into production
- We agreed to change the documentation for squid registration to make it clear that T3s that are not already registered in GOCDB do not have to register their squids to have them monitored, they can send an email and we'll add an exception
Action list
Creation date |
Description |
Responsible |
Status |
Comments |
2015-09-03 |
Status of multi-core accounting |
John Gordon |
ONGOING |
A presentation about the plans to provide multicore accounting data in the Accounting portal should be presented at the next Ops Coord meeting on October 1st https://indico.cern.ch/event/393617/ since this is a long standing issue |
2015-06-04 |
Status of fix for Globus library (globus-gssapi-gsi-11.16-1 ) released in EPEL testing |
Andrea Manzi |
ONGOING |
GGUS:114076 is now closed. However, host certificates need to be fixed for any service in WLCG that does not yet work OK with the new algorithm. Otherwise we will get hit early next year when the change finally comes in Globus 6.1. A broadcast message has been sent by EGI. |
Specific actions for experiments
Specific actions for sites
AOB
--
MariaDimou - 2015-09-14