TWiki
>
LCG Web
>
WLCGCommonComputingReadinessChallenges
>
WLCGOperationsWeb
>
WLCGOpsCoordination
>
NetworkTransferMetrics
(revision 162) (raw view)
Edit
Attach
PDF
---+!! WLCG Network Throughput WG %TOC% ---++ Mandate * Ensure sites and experiments can better understand and fix networking issues ---++ Objectives * Oversight of the perfSONAR network infrastructure * Coordination of the WLCG network performance incidents * Detection and follow up on issues seen by the perfSONAR network ---++ Meetings Bi-weekly meetings, European and North American throughput [[https://indico.cern.ch/category/4372/][calls]] ---++ Members Shawn !McKee (chairperson), Marian Babik (co-chair), ATLAS (Simone Campana), CMS (Nicolo Magini), LHCb (Stefan Roiser, Joel Closier), Alice (Latchezar Betev, Costin Grigoras), FAX (Ilija Vukotic), FTS (Michail Salichos, Oliver Keeble), Panda (Kaushik De), Rucio (Vincent Garonne), !BelleII (Malachi Schram) perfSONAR contacts: US-ATLAS (Shawn !McKee), US-CMS (Jorge Alberto Diaz Cruz), UK-ALL (Alessandra Forti, Duncan Rand), IT-ATLAS (Alessandro de Salvo), IT-CMS (Enrico Mazzoni), CA-ALL (Rolf Seuster), FR-ALL (Frederique Chollet, Laurent Caillat, Frederic Schaer), TW-ALL (Hsin-Yen Chen), ND-ALL (Ulf Tigerstedt), DE-ALL (Guenter Duckeck, Andreas Petzold, DE-KIT: Bruno Hoeft, Aurelie Reymund), ES-ALL (Fernando Lopez, Josep Flix), CERN (Stefan Stancu), LHCOPN/LHCONE (John Shade, ESNet: Mike OConnor), RU-ALL (Victor Kotlyar), ESnet Science Engagement group (Jason Zurawski), !BelleII (Malachi Schram) ---++ Contacts Primary contact is via mailing list wlcg-network-throughput-wg@cern.ch, previous mailing lists (wlcg-ops-coord-tf-perfsonar and wlcg-ops-coord-wg-metrics) are still active, defined as aliases to the the primary mailing list. The primary mailing list has two sub-groups: wlcg-perfsonar-support@cern.ch and throughput-l@lists.bnl.gov, which are used to organize and follow up on the corresponding European and North American throughput calls. ---++ Coming Events * https://indico.cern.ch/category/4372/ ---++ Network Throughput Support Unit ---+++!! Network Performance Incidents Follow up Procedure The main motivation for this procedure is to investigate network *performance* issues with assistance of the perfSONAR team. The focus is on *performance* issues and the primary objective is to confirm if a transfer problem observed is network related or not. If it's confirmed to be a WAN issue then work with perfSONAR team to try to narrow it down to particular network link and thus help identify who might be responsible for it. The full text of the procedure follows: * New GGUS support unit (WLCG Network Throughput; https://wiki.egi.eu/wiki/GGUS:WLCG_Network_Throughput) can be used to report incidents (corresponding mailing list is: wlcg-network-throughput at cern.ch, initial participation there is the same as for the WG mailing list (transfer systems, experiments, perfsonar support, esnet, lhcopn/lhcone). * Experiments can report to the mailing list potential network *performance* incidents/degradations, WLCG perfSONAR support unit will investigate and confirm if this is network related issue. Once confirmed, it will notify relevant sites and will try to assist in narrowing down the problem to particular link(s). Affected sites will be contacted and should open an incident with their network providers. Tracking of the ongoing incidents will be done on the WG page. * Sites observing a network performance problem should follow their standard procedure, i.e. report to their network team and if necessary escalate to their network provider while informing the wlcg-network-throughput mailing list. If confirmed to be WAN related, WLCG perfSONAR support unit can assist in further debugging of the problem. For the non-technical (policy) issues or if unclear, sites should escalate to the WLCG operations coordination. ---+++!! Network Performance Incidents | *Incident* | *Ticket* | *Comments* | | RAL IPv6 consist. loss | GGUS:140447 | Resolved: External router upgrade/fix | | UK sites | GGUS:143218 GGUS:143220 | Resolved: Router hosting the GEANT connection not fully distributing the affected prefixes to all of the JANET core | | CERN inbound | OTG:0052301 | Resolved: All CERN IPv4 prefixes were leaked to LHCONE GEANT by TIFR | | JINR inbound | GGUS:141954 | Resolved: packet loss seen btw Geneva/Moscow, trans. module had to be replaced in Frankfurt | | EU sites to IHEP/CN | via mailing list | Resolved: Routing issue - ticket with GEANT was opened by IHEP, peerings were updated | | UFL to IC | via mailing list | Resolved: transfer rates improved before root cause was found | | US T2s/AMS to CERN | GGUS:139866 GGUS:139874 | Resolved: ESNet network incident impacting US to CERN connectivity (also impacted AMS) | | SARA to CERN | GGUS:138472 | Resolved: MTU issue on IPv6 suspected, but was just packet loss in the end | | RAL/SARA to IN2P3 | GGUS:137967 GGUS:137972 GGUS:137994 GGUS:139756 | Resolved: Packet loss on the link due to congestion, IN2P3 has a ticket with RENATER (resolved by upgrading) | | IN2P3 -CC to UTA_SWT2 | via mailing list | Resolved: Possible saturation on LHCONE at/close to IN2P3 -CC | | AGLT2 inbound | via mailing list | On-going: Narrowed down to ESNet -> ALGT2 segment | | FNAL inbound | GGUS:137632 | Resolved: Bad link was identified by FNAL | | IHEP-CN - JINR/IHEP-SU | GGUS:136606 GGUS:136332 | On-going: more efficient transit path is missing btw. concerned NRENs, to be followed up in Asia Forum/LHCOPN-LHCONE WS | | DESY/FNAL | GGUS:135962 | Resolved: Tests didn't indicate any obvious network issue (* but not all relevant network aspects could be tested). | | UFlorida - Kharkov | via mailing list | Resolved: MTU step-down issue - pmtu discovery ACL was fixed by UF | | UNI-Freiburg | GGUS:135304 | Resolved: CERN prefixes missing in the routing announcements to SWITCH | | DESY inbound | GGUS:134470 | Resolved: Network configuration tuned/changed at DESY | | AGLT2/LHCONE | via mailing list | Resolved: Performance issue to LHCONE sites, narrowed down to US/ESNet segment (module issue) | | NCP/Pakistan commissioning | via mailing list | On-going: Investigated in collaboration with GlobalNOC ( [[https://docs.google.com/document/d/1HHhK9t4PpYPzZOfJUAhupRAodhPT6HNIZi6J8ljpBtw/edit#][report]]), proposed routing changes for TEIN3 | | CYFRONET/RRC-KI | GGUS:131375 | Resolved: MTU step-down (Resolved by PSNC NREN) | | BEgrid-ULB-VUB UKI-LT2-IC-HEP | GGUS:132286 | Resolved: IceCUBE flows overloading BEgrid-ULB-VUB networks | | NDGF/BNL from multiple locations | GGUS:131975, GGUS:131981 | Resolved: Issue with FTS at RAL | | RO-02-NIPNE to multiple locations | GGUS:128489 | Resolved: MTU step-down + load balancing suspected; NREN was contacted by NIPNE | | PIC to PL Swierk | GGUS:130112 | Resolved: Unable to investigate as no pS at PL Swierk, but error suggesting a storage problem | | CNAF/RALPP | GGUS:130112 | Resolved: Investigated and resolved as non-network issue | | Oxford | GGUS:130032 | On-going: Significant issues seen during August (down to 50Mbs), perf improved afterwards but still not at levels seen last year | | SARA/IC | GGUS:129964 | Resolved: Issue with firmware router at SARA network provider | | NCP/Pakistan | via mailing list | Resolved: QoS issue, IPv6 performs fine | | CBPF to CNAF, PIC and IN2P3 | via mailing list to LHCONE ops, GGUS:129561 from LHCb | Resolved: MTU step down issue within RNP | | T0 to JINR | GGUS:129544 | Resolved: by JINR putting in place new Moscow - Dubna link and fixing asymmetries in routing | | IN2P3 NIKHEF to UC | via mailing list | Resolved: Univ. of Chicago investigated (root cause unknown) | | BNL ASGC | ESNet ticket ESNET-20170123-005 | Resolved: Issue opened by WG; Resolved by ESNet | | IHEP EU | GGUS:125623 | Resolved: by NREN (site was not notified of the ongoing network issue) | | UNL FNAL | via mailing list | Resolved: UNL investigated (root cause unknown) | | CERN RRCKI | GGUS:124538 | Resolved: RRCKI re-routed from AMS to BUD, root cause for congested path RRCKI-AMS was not understood | | MIT inbound throughput | via mailing list | Resolved; MIT opened ticket with Internet2 | | EELA-UTFSM MWT2_UC | via mailing list | Resolved: gsiftp timeouts, non-network issue | | McGill BU | GGUS:123285 | Resolved, gridftp timeouts, but re-appeared, network seems to perform well, likely an issue with storage | | Victoria - Prague | via mailing list | Resolved; grid output retrieval failing; asymmetric paths and MTU step down issues | | SARA consistent loss | GGUS:121687 | Resolved after SARA migrated to the new data centre | | RAL consistent loss | GGUS:121687 | Resolved, RAL router upgraded | | BNL RAL CERN | GGUS:121687 | Resolved, issue with RAL router | | BNL SARA CERN | GGUS:120957 | Resolved, issue with ESNet router at CERN and saturated link CERN/SARA (was upgraded to 20Gbps) | | ASGC CERN IJS | GGUS:119820 | Resolved, issue with router at ASGC and IJS firewall | | CBPF | GGUS:120081 | Resolved: RNP stopped publishing to ESNet CBPF IPs | | FNAL CERN | GGUS:119551 | Resolved: fixed by ESNet - faulty router interface in New York | | PIC inbound | via mailing list | Resolved: 10 Gbps link WAN at PIC sharing LHCOPN,LHCONE was completely saturated causing input discards | | BNL to PIC | via mailing list | Resolved: LHCOPN link CERN-PIC was flapping a lot due to an issue with the Geant fibre to Spain | | MAINZ CA | via mailing list | Resolved: MAINZ uses a "commercial" network provider and Canadian sites only peer with R&E networks | | OU inbound | via mailing list | Resolved: Narrowed down to a faulty switch on site | | CA EU | GGUS:118748, GGUS:118730 | Resolved: Trans-atlantic channel instability, resolved by re-routing at Canarie | ---++ Security Announcements --- --- * Security: New SSL vulnerability dubbed Logjam: https://weakdh.org/sysadmin.html. WLCG perfSONAR hosts should NOT be vulnerable to this attack. The Apache configuration installed by the Toolkit disables the cipher suites in question by default. * Security: CVE released 2nd of April 2015 for cassandra, which is used by the perfSONAR measurement archive software, esmond. NO action required to protect perfSONAR Toolkit since vulnerable ports are both disabled and firewalled. --- --- ---++ Links Deployment Guide: * https://twiki.opensciencegrid.org/bin/view/Documentation/DeployperfSONAR Installation and Configuration Guides: * https://twiki.opensciencegrid.org/bin/view/Documentation/InstallUpdatePS Infrastructure Monitoring: * OSG Production instance: https://psetf.grid.iu.edu/etf/check_mk/ Global Configuration Interface (meshes, tests): * OSG Production instance: http://meshconfig.grid.iu.edu (requires authorization, please contact GOC support) Collector and Central Store for all perfSONAR metrics: * OSG Production collector: http://psds0.grid.iu.edu/rsv/ * OSG Production store: http://psds.grid.iu.edu/ perfSONAR stream: * All metrics published via CERN ActiveMQ (for details contact perfsonar-esmond-mq@cern.ch) and GOC RabbitMQ Dashboards: * perfSONAR monitoring and debugging dashboard (maddash) * [[http://psmad.grid.iu.edu/maddash-webui/index.cgi?dashboard=OPN%20Config][OPN production dashboard]] * [[http://psmad.grid.iu.edu/maddash-webui/index.cgi?dashboard=LHCONE%20Mesh%20Config][LHCONE production dashboard]] * *WLCG latency production dashboards*: [[http://psmad.grid.iu.edu/maddash-webui/index.cgi?dashboard=WLCG%20ATLAS%20Latency%20Mesh][ATLAS]], [[http://psmad.grid.iu.edu/maddash-webui/index.cgi?dashboard=WLCG%20CMS%20Latency%20Mesh][CMS]], [[http://psmad.grid.iu.edu/maddash-webui/index.cgi?dashboard=WLCG%20LHCb%20Latency%20Mesh][LHCb]] * *WLCG bandwidth production dashboards*: [[http://psmad.grid.iu.edu/maddash-webui/index.cgi?dashboard=WLCG%20ATLAS%20Bandwidth%20Mesh][ATLAS]], [[http://psmad.grid.iu.edu/maddash-webui/index.cgi?dashboard=WLCG%20CMS%20Bandwidth%20Mesh][CMS]], [[http://psmad.grid.iu.edu/maddash-webui/index.cgi?dashboard=WLCG%20LHCb%20Bandwidth%20Mesh][LHCb]] * GRAFANA dashboards * [[http://monit-grafana-open.cern.ch/?orgId=16][LHCOPN traffic]] ---++ Meetings * Past WG meetings can be found at https://indico.cern.ch/category/4372/ * 01/28/2015 perfSONAR operations [[https://indico.cern.ch/event/369420/][meeting]] * 11/26/2014 Metrics area [[https://indico.cern.ch/event/354593/][meeting]] * 10/22/2014 perfSONAR operations [[https://indico.cern.ch/event/347735/][meeting]] * 10/3/2014 perfSONAR operations [[https://indico.cern.ch/event/342995/][meeting]] * 9/8/2014 Network and Transfer Metrics WG Kick-off [[https://indico.cern.ch/event/336520/][meeting]] * 9/15/2014 LHCOPN and LHCONE joint Meeting: Ann Arbor (US) 15-16 of September [[https://indico.cern.ch/event/318811/][agenda]] ---++ Presentations * Regular updates are presented during HEPiX, WLCG workshop, LHCOPN/LHCONE workshop and GDB * [[https://indico.cern.ch/event/466991/][HEPiX Spring Workshop]] * [[https://indico.cern.ch/event/384358/][GDB]] * [[https://indico.cern.ch/event/384358/][HEPiX Fall Workshop]] * [[https://indico.cern.ch/event/376098/][LHCONE/LHCOPN]] joint meeting 1-2 June * [[http://chep2015.kek.jp/][CHEP2015]] * [[https://indico.cern.ch/event/346931/session/3/#20150324?slotId=0][HEPiX Workshop]] Update on WLCG/OSG perfSONAR infrastructure * OSG Area Coordinators Meeting - Networking [[https://twiki.grid.iu.edu/bin/view/Management/20141029AgendaMinutes][agenda]] * Update on Network and Transfer Metrics WG at GDB [[https://indico.cern.ch/event/272778/][agenda]] * Network and Transfer Metrics WG Kick-off [[https://indico.cern.ch/event/336520/][meeting]] * Network Monitoring and Metrics at [[https://indico.cern.ch/event/305362/other-view?view=standard][WLCG workshop, Barcelona]] * Proposal for new Working Group: Network and Transfer Metrics at [[https://indico.cern.ch/event/313378/][WLCG Operations Coordination]] ---++ Reports ---+++!! Report 16/05/2019 %STARTSECTION{"16052019"}% * *Detailed status update was presented at HEPiX (https://indico.cern.ch/event/765497/contributions/3351215/)* * CHEP abstract to be submitted (https://docs.google.com/document/d/1O5PhgCmdwbYJpL7qHpPFxxMLaGh1aWbMWyO69pXk-H0/edit) * perfSONAR infrastructure status - CC7/4.1 campaign * All T1s updated and re-configured, except TRIUMF (waiting for hw) and RRC-KI (missing IPv6); we have started to follow up with T2s * Overall we have 176 perfSONARs on 4.1 (137 on 4.1.6); status has significantly improved * 4.2.0 release soon - will bring preemptive scheduling & gridftp testing * WLCG/OSG network services were updated * Issues with the psmad dashboard were fixed, dashboard now well populated (OPN, UK and FR meshes in very good shape; [[http://psmad.opensciencegrid.org/maddash-webui/index.cgi][psmad/maddash]]) * http://monit-grafana-open.cern.ch also now well populated, some issues with site mapping due to IPv6 fixed, others still remain (mostly due to too many sources/complex topology processing) * New collector is now in production, re-written from scratch within SAND project, improved performance (lowered latency) * Work is on-going in both SAND and IRIS-HEP to switch all perfSONAR to report measurements directly to the message bus (real-time measurements capability) * WLCG Network Throughput Support Unit: see [[NetworkTransferMetrics#Network_Throughput_Support_Unit][twiki]] for summary of recent activities. * *100 Gbps perfSONARs* now at SARA, CERN, CSCS, BNL (80Gbps), KIT (in QA) * *perfSONAR now part of the cloud benchmark testing developed in OCRE project (https://github.com/cern-it-efp/OCRE-Testsuite/)* * Will be presented at GEANT perfSONAR workshop (https://wiki.geant.org/display/gn43wp6/European+perfSONAR+workshop+2019+-+London) %ENDSECTION{"16052019"}% ---+++!! Report 07/03/2019 %STARTSECTION{"07032019"}% * *perfSONAR infrastructure status - CC7/4.1 campaign ongoing* * perfSONAR 4.0 and perfSONARs on SL6 are no longer supported since Q4 2018 - please update ASAP * *New baseline version for perfSONAR is the latest release 4.1.6 (fixes important bug causing duplicate testing)* * WLCG/OSG network services were updated * All meshes were updated to test throughput and traceroutes over both IPv4 and IPv6; dual stack mesh was retired * Monitoring was updated with new thresholds and now also tracks IPv4/IPv6 efficiency (https://psetf.aglt2.org/etf/check_mk/) * Documentation was updated as well (https://opensciencegrid.org/networking/) * *perfSONAR dashboard ( [[http://psmad.opensciencegrid.org/maddash-webui/index.cgi][psmad/maddash]]) was re-configured and fixed* * WLCG Network Throughput Support Unit: see [[NetworkTransferMetrics#Network_Throughput_Support_Unit][twiki]] for summary of recent activities. %ENDSECTION{"07032019"}% ---+++!! Report 07/02/2019 %STARTSECTION{"07022019"}% * *perfSONAR infrastructure status - CC7/4.1 campaign ongoing* * *perfSONAR 4.0 and perfSONARs on SL6 are no longer supported since Q4 2018 - please update ASAP* * We have started ticketing sites, starting with T1s and major T2s * WG update will be presented at HEPiX in San Diego * WLCG/OSG network services were updated * StashCache will use perfSONAR to track network performance, new mesh was already added * FR region mesh was added * All meshes were updated to test throughput and traceroutes over both IPv4 and IPv6; dual stack mesh was retired * Monitoring was updated with new thresholds and now also tracks IPv4/IPv6 efficiency (https://psetf.aglt2.org/etf/check_mk/) * Documentation was updated as well (https://opensciencegrid.org/networking/) * WLCG Network Throughput Support Unit: see [[NetworkTransferMetrics#Network_Throughput_Support_Unit][twiki]] for summary of recent activities. %ENDSECTION{"07022019"}% ---+++!! Report 08/11/2018 %STARTSECTION{"08112018"}% * *perfSONAR infrastructure status - CC7/4.1 campaign ongoing* * Sites were reminded to upgrade to CC7 and review their configuration (preferably by end of October) * Still only around 50% of nodes are on CC7 as of today - we'll soon start contacting sites directly * Some sites waiting for/deploying new hardware; e.g. SARA deployed 100Gbps perfSONAR (first in Europe), BNL deployed 2x40 Gbps perfSONAR * *WG update was presented at HEPiX and LHCOPN/LHCONE workshop* * LHCOPN/LHCONE workshop had a dedicated talk on MTU - topic that was raised by the WG - draft recommendation was proposed (https://indico.cern.ch/event/725706/) * WLCG/OSG network services working fine * *WLCG Network Throughput Support Unit*: see [[NetworkTransferMetrics#Network_Throughput_Support_Unit][twiki]] for summary of recent activities. * Number of cases recently popped up, some are still unresolved and are being followed up by the corresponding R&Es and/or will be raised at the upcoming Asia Tier Forum * Case in point (NCP): https://docs.google.com/document/d/1HHhK9t4PpYPzZOfJUAhupRAodhPT6HNIZi6J8ljpBtw/edit# %ENDSECTION{"08112018"}% ---+++!! Report 13/09/2018 %STARTSECTION{"13092018"}% * perfSONAR infrastructure status * perfSONAR 4.1 was released few weeks ago - main new feature is an improved central/remote configuration * *WLCG broadcast was sent this week to remind sites to upgrade to CC7* and review their configuration (preferably by end of October) * Around 50% of sonars are on CC7 as of today * *WG update will be presented at the upcoming HEPiX* * WLCG/OSG network services * Central configuration service (meshconfig/psconfig) was updated to the version released in 4.1 (officially supported by perfSONAR team) * *psconfig.opensciencegrid.org is currently unreachable via IPv6 from non-LHCONE sites* due to issue with routing, this is being followed up by the network team at MSU * NSF funded projects: *SAND and IRIS-HEP are starting*, both will contribute in different ways to the OSG Network Area - more details will be provided in the HEPiX talk * WLCG Network Throughput Support Unit: see [[NetworkTransferMetrics#Network_Throughput_Support_Unit][twiki]] for summary of recent activities. %ENDSECTION{"13092018"}% ---+++!! Report 07/06/2018 %STARTSECTION{"07062018"}% * perfSONAR infrastructure status * perfSONAR 4.1 beta will be released in the coming weeks - main new feature is an improved central/remote configuration * CC7 campaign had only modest progress recently - 86 instances on CC7 (from 81 in April, out of total 210) * *WLCG broadcast* will be sent to remind sites to plan an upgrade to CC7 and review their configuration * WG update was presented at HEPiX and will be presented at CHEP * WLCG/OSG network services * Following retirement of *OSG GOC*, all central services were migrated to AGLT2, which took considerable effort in planning and deployment * Transition happened without downtime and was transparent to all sites * One exception are *sites using the old OIM/myOSG central configuration URL*, which was deprecated during 3.5 update campaign (meshconfig URLs starting with myosg.grid.iu.edu/pfmesh...) * Impacted sites are asked to update their meshconfig-agent.conf following http://opensciencegrid.org/networking/perfsonar/installation/#installation ASAP * WLCG Network Throughput Support Unit: see [[NetworkTransferMetrics#Network_Throughput_Support_Unit][twiki]] for summary of recent activities. %ENDSECTION{"07062018"}% ---+++!! Report 12/04/2018 %STARTSECTION{"12042018"}% * perfSONAR 4.0.2 and CC7 campaign - 210 instances updated to 4.0.2; 81 instances already on CC7 * *WLCG broadcast* will be sent to remind sites to plan an upgrade to CC7 and review the firewall port openings * perfSONAR 4.1 release, planned in Q2 2018 will no longer ship SL6 packages * Attended perfSONAR developers F2F meeting in Amsterdam and presented feedback from OSG/WLCG * WG reports planned for upcoming HEPiX and CHEP * Networking and perfSONAR were also major topics at the OSG-All Hands (https://indico.fnal.gov/event/15344/) * 4 presentations were given on various topics related to the WG * One of the outcomes was a proposal to create a dedicated site-based documentation showing all links relevant to a given site * WLCG/OSG network services * *Successfully migrated and commissioned new data pipeline*, we now have data flowing to UC/UNL/FNAL via RabbitMQ * Grafana was updated to reflect recent changes (http://monit-grafana-open.cern.ch/dashboard/db/home?orgId=16) * [[http://monit-grafana-open.cern.ch/dashboard/db/perfsonar-ipv6?orgId=16][IPv6 dashboard]] was added to help compare IPv4 vs IPv6 performance side by side * Outreach and other activities: * *GEANT has added several perfSONAR instances on LHCONE* at their major network hubs (ams, gva, lon, par, fra) - both IPv4 and IPv6 * Advania was added to HNSciCloud test mesh * MGHPCC (http://www.mghpcc.org/) plans to deploy up to 22 perfSONARs, currently in discussion how we can help * WLCG Network Throughput Support Unit: see [[NetworkTransferMetrics#Network_Throughput_Support_Unit][twiki]] for summary of recent activities. %ENDSECTION{"12042018"}% ---+++!! Report 01/03/2018 %STARTSECTION{"01032018"}% * perfSONAR 4.0.2 and CC7 campaign - 190 instances updated to 4.0.2; 64 instances already on CC7 * *WLCG broadcast* will be sent to remind sites to plan an upgrade to CC7 and review the firewall port openings * perfSONAR 4.1 release, planned in Q2 2018 will no longer ship SL6 packages * WLCG/OSG network services * On-going work on improving OSG collector and migrating to new message bus (RabbitMQ) * Migration to ES version 6 at UC is being planned, changes are needed to the underlying data model, which will require changes in the Grafana dashboards * Revised documentation is available at https://opensciencegrid.github.io/networking/ * LHCOPN and perfSONAR dashboards done in collaboration with OSG, UC, CERN IT/CS and IT/MONIT are available at http://monit-grafana-open.cern.ch/dashboard/db/home?orgId=16 * WLCG Network Throughput Support Unit: see [[NetworkTransferMetrics#Network_Throughput_Support_Unit][twiki]] for summary of recent activities. * LHCOPN/LHCONE Workshop will take place next week - update on WG activities will be presented (https://indico.cern.ch/event/681168/) * perfSONAR developers F2F meeting will take place next week in Amsterdam - feedback from OSG/WLCG will be presented %ENDSECTION{"01032018"}% ---+++!! Report 18/01/2018 %STARTSECTION{"18012018"}% * perfSONAR 4.0.2 - 190 instances updated out of which 53 are already on CC7 * *WLCG broadcast* will be re-sent next week to remind sites of the *upcoming important dates* and new documentation * perfSONAR 4.1 release, planned in Q1 2018 will no longer ship SL6 packages * EOL for SL6 support in Q3 2018 * All sites are encouraged to upgrade to CC7 as soon as possible * WLCG/OSG network services * On-going work on improving OSG collector and broadcasting results via GOC's RabbitMQ * Revised documentation is available at https://opensciencegrid.github.io/networking/ * LHCOPN and perfSONAR dashboards done in collaboration with OSG, UC, CERN IT/CS and IT/MONIT are available at http://monit-grafana-open.cern.ch/dashboard/db/home?orgId=16 * WLCG Network Throughput Support Unit: see [[NetworkTransferMetrics#Network_Throughput_Support_Unit][twiki]] for summary of recent activities. %ENDSECTION{"18012018"}% ---+++!! Report 06/12/2017 %STARTSECTION{"06122017"}% * perfSONAR workshop held by JISC in November ( [[https://wiki.geant.org/download/attachments/84476097/SIG-PMV-Nov2017-Jisc-perfSONAR-v2.pdf?version=1&modificationDate=1511887250545&api=v2][slides]]) - WLCG WG activities mentioned * perfSONAR 4.0.2 was released on November 28th * *WLCG broadcast* will be sent this week to notify sites of the *upcoming important dates* and new documentation * perfSONAR 4.1 release, planned in Q1 2018 will no longer ship SL6 packages * EOL for SL6 support in Q3 2018 * All sites are encouraged to upgrade to CC7 as soon as possible * WLCG/OSG network services * On-going work on improving OSG collector and broadcast results via GOC's RabbitMQ * New documentation is available at https://opensciencegrid.github.io/networking/ * New LHCOPN and perfSONAR dashboards done in collaboration with OSG, UC, CERN IT/CS and IT/MONIT are available at http://monit-grafana-open.cern.ch/dashboard/db/home?orgId=16 * WLCG Network Throughput Support Unit: see [[NetworkTransferMetrics#Network_Throughput_Support_Unit][twiki]] for summary of recent activities. * !HNSciCloud will use perfSONAR results for network performance evaluation of the providers * *HEPiX WG on SDN/NFV* was established and will look into networking R&D topics - sites interested please subscribe via https://listserv.in2p3.fr/cgi-bin/wa?SUBED1=hepix-nfv-wg * WG scope and objectives yet to be defined, Doodle for an initial meeting will be sent soon * WG proposal presentation is at https://indico.cern.ch/event/637013/contributions/2739266/ %ENDSECTION{"06122017"}% ---+++!! Report 02/11/2017 %STARTSECTION{"02112017"}% * WG update was presented at HEPiX and LHCOPN/LHCONE workshop (co-located) * perfSONAR 4.0.2 is planned to be released in November * WLCG/OSG network services * *New documentation* is available at https://opensciencegrid.github.io/networking/ (still work in progress) * Once 4.0.2 is out, WLCG broadcast will be sent to all sites about important updates and actions to take * *New LHCOPN and perfSONAR dashboards* done in collaboration with OSG, CERN IT/CS and IT/MONIT are available at http://monit-grafana-open.cern.ch/dashboard/db/home?orgId=16 * WLCG Network Throughput Support Unit: see [[NetworkTransferMetrics#Network_Throughput_Support_Unit][twiki]] for summary of recent activities. * !HNSciCloud meshes are being created (per provider), will enable tests between !HNSciCloud sites and providers %ENDSECTION{"02112017"}% ---+++!! Report 05/10/2017 %STARTSECTION{"05102017"}% * WG update will be presented at HEPiX and LHCOPN/LHCONE workshop (co-located) * perfSONAR YouTube channel at https://www.youtube.com/channel/UCjK-P49pAKK9hUrrNbbe0Sg * perfSONAR 4.0.1 auto-deployed to 197 instances (21 are already on centos7) * Port 443/https is now used as a controller port for pscheduler and needs to be open on central firewalls * Some sites suffer from an MA access issue after the upgrade, this is being followed up * perfSONAR 4.0.2 is planned to be released in November * Brings new SNMP plugin that can be used to retrieve local site router traffic * WLCG/OSG network services * New documentation is in preparation and will be hosted at https://opensciencegrid.github.io/networking/ * OSG collector handling multiple backends (Datastore, CERN ActiveMQ and GOC RabbitMQ) now in production * GOC will distribute raw data to 3 different locations, FNAL for tape archive, Nebraska for long-term ES storage, Chicago for short-term ES storage * Preparing new LHCOPN and perfSONAR dashboards in collaboration with CERN IT/CS and IT/MONIT * WLCG Network Throughput Support Unit: see [[NetworkTransferMetrics#Network_Throughput_Support_Unit][twiki]] for summary of recent activities. * HNSciCloud will create its own perfSONAR mesh to follow up on the network performance btw. providers and sites %ENDSECTION{"05102017"}% ---+++!! Report 14/09/2017 %STARTSECTION{"14092017"}% * WG update will be presented at HEPiX and LHCOPN/LHCONE workshop (co-located) * perfSONAR 4.0.1 was released and was auto-deployed to 187 instances (21 are already on centos7) * Port 443/https is now used as a controller port for pscheduler and needs to be open on central firewalls * perfSONAR YouTube channel at https://www.youtube.com/channel/UCjK-P49pAKK9hUrrNbbe0Sg * WLCG/OSG network services * New documentation is in preparation and will be hosted at https://opensciencegrid.github.io/networking/ * New central mesh configuration interface (MCA) and monitoring (ETF) in production (http://meshconfig.grid.iu.edu; https://psetf.grid.iu.edu/etf/check_mk/) * OSG collector handling multiple backends (Datastore, CERN ActiveMQ and GOC RabbitMQ) now in production * GOC will distribute raw data to 3 different locations, FNAL for tape archive, Nebraska for long-term ES storage, Chicago for short-term ES storage * Central dashboard service (psmad.grid.iu.edu) suffers from a bug which prevents showing statuses correctly (as well as retrieve the graphs), ESNet is working on a fix * Preparing new LHCOPN and perfSONAR dashboards in collaboration with CERN IT/CS and IT/MONIT * WLCG Network Throughput Support Unit: see twiki for summary of recent activities. %ENDSECTION{"14092017"}% ---+++!! Report 06/07/2017 %STARTSECTION{"06072017"}% * Detailed WG update presented as part of the network session at the [[https://indico.cern.ch/event/609911/contributions/2604121/][WLCG workshop in Manchester]] * perfSONAR 4.0 was released on 17th of April * 194 nodes updated so far * ES/Kibana dashboard showing perfSONAR infrastructure status in testing * WLCG/OSG network services * New central mesh configuration interface (MCA) in production (http://meshconfig.grid.iu.edu) * Accessible to mesh administrators only - please contact wlcg-perfsonar-support@cern.ch to request access * New monitoring based on ETF in production (https://psetf.grid.iu.edu/etf/check_mk/) * New OSG collector handling multiple backends (Datastore, CERN ActiveMQ and GOC RabbitMQ) in production * New LHCOPN grafana dashboards done in collaboration with CERN IT/CS and IT/MONIT in testing * Now with open access at http://monit-grafana-open.cern.ch/dashboard/db/lhcopn?orgId=16 * Additional perfSONAR dashboards to be added soon * Throughput call was held on Wed May 24th at 4pm CEST (https://indico.cern.ch/event/640627/) mainly focusing on review of new production services %ENDSECTION{"06072017"}% ---+++!! Report 18/05/2017 %STARTSECTION{"18052017"}% * perfSONAR 4.0 was released on 17th of April * 180 sites have updated so far * Some sites reported issues with load after updating, under investigation * WLCG/OSG network services * New central mesh configuration interface (MCA) will be deployed to production next week - transition will be transparent to all sites * [[https://github.com/soichih/meshconfig-admin][MCA]] was developed by OSG and becomes part of perfSONAR. * Monitoring based on ETF is planned to be deployed in ITB * OSG collector will be updated to handle multiple backends (datastore, two message buses) * LHCOPN grafana dashboards established in collaboration with CERN IT/CS and MONIT team (access restricted to CERN users, public access in the works) * https://monit-grafana.cern.ch/dashboard/db/lhcopn?orgId=14 * https://monit-grafana.cern.ch/dashboard/db/lhcopn-detailed?orgId=14 * Next Throughput call will be on Wed May 24th at 4pm CEST (https://indico.cern.ch/event/640627/) %ENDSECTION{"18052017"}% ---+++!! Report 06/04/2017 %STARTSECTION{"06042017"}% * LHCOPN/LHCONE workshop in BNL took place this week (https://indico.cern.ch/event/581520/) * Russian T1/T2 provisioning direct peering with GEANT, RAL provisioning 3rd 10Gbit link in LHCOPN * Both ESNet and GEANT reported 85% growth in LHCONE (YoY) * Update on WG activities was presented (https://indico.cern.ch/event/581520/#17-lhconelhcopn-perfsonar-upda) * perfSONAR 4.0 to be released on 17th of April * Site on auto-updates will get it automatically - no action needed. * _Sites planning to update perfSONARs to CC7 are encouraged to wait until 4.1 is released._ * *Minimal hardware requirements were shifted*: Sites running perfSONARs with less than 4GB RAM and 2 core CPU with clock speed less than 2GHz are encouraged to keep running the old version (3.5.1) * WLCG/OSG network services * New central mesh configuration interface (MCA) will be deployed to production - transition will be transparent to all sites * [[https://github.com/soichih/meshconfig-admin][MCA]] was developed by OSG, but becomes part of perfSONAR * Integrates perfSONAR lookup service with OIM/GOCDB services, so we can now easily add NREN perfSONARs into our meshes * Monitoring was updated to cover new features released in 4.0 and is now based on [[http://etf.cern.ch/docs][ETF]] * OSG collector was updated to collect additional perfSONAR metrics (such as TCP retransmits, path MTU, etc) * LHCOPN traffic and LHCONE [[https://gitlab.cern.ch/network-analytics/ps-telemetry/tree/master][simulated link utilisation]] now available for subscriptions from the netmon brokers * WLCG Network Throughput Support Unit: see [[NetworkTransferMetrics#Network_Throughput_Support_Unit][twiki]] for summary of recent activities. * BNL/ASGC throughput improved by factor 10 - details reported at the LHCOPN/LHCONE [[https://indico.cern.ch/event/581520/#17-lhconelhcopn-perfsonar-upda][workshop]] %ENDSECTION{"06042017"}% ---+++!! Report 26/01/2017 %STARTSECTION{"26012017"}% * pre-GDB on NETWORKING took place on 10th of January (https://indico.cern.ch/event/571501/) * Summary was presented at the GDB next day (https://indico.cern.ch/event/578982/) * Throughput meeting held on 14th of Dec (https://indico.cern.ch/event/595286/) * Discussed ongoing validation of perfSONAR 4.0; RC3 to be released next week * perfSONAR developers meeting ongoing this week * WLCG Network Throughput Support Unit: see [[NetworkTransferMetrics#Network_Throughput_Support_Unit][twiki]] for summary of recent activities. %ENDSECTION{"26012017"}% ---+++!! Report 01/12/2016 %STARTSECTION{"01122016"}% * pre-GDB on NETWORKING will take place on 10th of January, preliminary agenda now available at https://indico.cern.ch/event/571501/ * If you plan to attend register at https://indico.cern.ch/event/571501/ to help us with logistics * Please let us know what you would like to see come out from this meeting. If there are additional topics you would like to see in the agenda or modifications to existing items, please let us know. * Invitation was sent to all four experiments * Next throughput meeting planned 14th of Dec * Focus on perfSONAR RC validation * perfSONAR team announced that they plan to release 4.0 RC3, which will push final release to next year * WLCG Network Throughput Support Unit: see [[NetworkTransferMetrics#Network_Throughput_Support_Unit][twiki]] for summary of recent activites. %ENDSECTION{"01122016"}% ---+++!! Report 03/11/2016 %STARTSECTION{"03112016"}% * pre-GDB on NETWORKING will take place on 10th of January, participation of experiments and sites is crucial, if you plan to attend PLEASE register at https://indico.cern.ch/event/571501/ * WG results recently reported at various events: * [[https://indico.cern.ch/event/555063/sessions/203482/#20161008][Network session at the WLCG workshop]] * [[https://indico.cern.ch/event/505613/sessions/205333/#20161011][CHEP 2016 networking plenary talks]] * [[https://indico.cern.ch/event/505613/contributions/2227437/][CHEP 2016 infrastructure track]] * [[https://indico.cern.ch/event/531810/sessions/208397/#20161017][HEPiX Security and Networking sessions]] * Throughput meeting was held on 27th Oct: * Focused on the network analytics, see [[https://indico.cern.ch/event/577645/][minutes]] for details * perfSONAR 4.0 RC2 was released yesterday, we will intensify validation effort towards final release planned end of November, update campaign to follow once final release is out * We are now using a new mailing list wlcg-network-throughput-wg@cern.ch - joint mailing list for European and NA throughput [[https://indico.cern.ch/category/4372/][meetings]] * WLCG Network Throughput Support Unit: CERN - RRCKI followed up, see [[NetworkTransferMetrics#Network_Throughput_Support_Unit][twiki]] for details. %ENDSECTION{"03112016"}% ---+++!! Report 29/09/2016 %STARTSECTION{"29092016"}% * [[https://indico.cern.ch/event/555063/sessions/203482/#20161008][Network session]] at the WLCG workshop * Q&A session planned, questions will be sent in advance, we encourage all to participate * Inder Monga (Director of ESNet) will join the session * LHCOPN/LHCONE workshop was held in Helsinki, Sept 19-20 (https://indico.cern.ch/event/527372/) * GEANT reported peaks over 100GBps and growth of over 65% from Q2 2015 to Q2 2016 * ESNet reported that LHCONE traffic has increased 118% in the past year * Positive feedback received on the LHC Network Evolution talk * pre-GDB on networking focusing on the long-term network evolution planned on *January 10th* - save the date * Throughput meetings were held on 15th Sept: * Hendrik Borras (Univ. of Heidelberg) presented early results on the network telemetry based on perfSONAR * perfSONAR 4.0 RC1 was released, RC2 planned in October with final release sometime in November * We are now using a new mailing list wlcg-network-throughput-wg@cern.ch - joint mailing list for European and NA throughput [[https://indico.cern.ch/category/4372/][meetings]] * WLCG Network Throughput Support Unit: New cases were reported on IPv6 and are being followed up, see [[NetworkTransferMetrics#Network_Throughput_Support_Unit][twiki]] for details. %ENDSECTION{"29092016"}% ---+++!! Report 01/09/2016 %STARTSECTION{"01092016"}% * Network session is planned at the WLCG workshop covering IPv6, LHCOPN/LHCONE status and LHC network evolution * LHCOPN/LHCONE workshop will be held in Helsinki, Sept 19-20 (https://indico.cern.ch/event/527372/) * pre-GDB on networking focusing on the long-term network evolution postponed to January * Throughput meetings were held on July, 27 and August, 16: * Mark Feit from Internet2 presented pScheduler (new test scheduler in perfSONAR 4.0) * Xinran Wang and Ilija Vukotic from Univ. of Chicago presented their Network Analytics work * OSG datastore and collector are experiencing problems since the upgrade last week, the issue is being followed up by [[https://ticket.grid.iu.edu/goc/28117][OSG]] * Plan on migration to the new perfSONAR 4.0 configuration was drafted and will be followed up with OSG * We are now using a new mailing list wlcg-network-throughput-wg@cern.ch - joint mailing list for European and NA throughput [[https://indico.cern.ch/category/4372/][meetings]] * WLCG Network Throughput Support Unit: New cases were reported and are being followed up, see [[NetworkTransferMetrics#Network_Throughput_Support_Unit][twiki]] for details. %ENDSECTION{"01092016"}% ---+++!! Report 07/07/2016 %STARTSECTION{"07072016"}% * WG update presented at ATLAS TIM including discussion on the mid-long term network evolution * pre-GDB on networking focusing on the mid-long term network evolution will be held in December * North American Throughput meeting held on 22nd of June: * Andy Lake presented new features planned in perfSONAR 4.0 * Next meeting end of July, main topic: pScheduler (replaces bwctl) * WLCG Throughput meeting held 16th of June: * Main topic was re-organization of the meshes, the proposal was agreed and implemented * New experiment-based meshes were introduced in the production dashboard in effect (see [[NetworkTransferMetrics#Links][twiki]]) * Next meeting in Sept. (co-located with LHCOPN/LHCONE) * WLCG Network Throughput Support: Several new cases were reported and are being followed, see [[NetworkTransferMetrics#Network_Throughput_Support_Unit][twiki]] for details. * perfSONAR 4.0 (formerly 3.5) RC to become available end of August, WLCG validation and deployment campaign will follow. * Introduces several major changes such as new configuration management and interface as well as migration from BWCTL to pScheduler %ENDSECTION{"07072016"}% ---+++!! Report 02/06/2016 %STARTSECTION{"02062016"}% * WLCG Network Throughput SU: * ASGC connectivity - After numerous tests performed in collaboration with ASGC and ESNet (http://etf.cern.ch/perfsonar_asgc.txt) the root cause has been confirmed to be the local N7K router at ASGC. Once the perfSONARs were moved directly to the central router the measured network performance has improved by factor 10. Our recommendation is to re-wire all the existing data transfer nodes to bypass the local router as well as to tune the central router and data transfer nodes to improve their performance for long path transfers (200ms+). * Two new tickets received related to packet loss observed at RAL and SARA * Stable operations of the perfSONAR pipeline (collector, datastore, publisher and dashboard) * North American Throughput meeting held on 1st of June: * Shawn presented OSG network area roadmap, main focus will be on developing notification/alerting, support for higher-level services (analytics) and prepare for SDN * Next meeting is on 22nd June - main topic will be perfSONAR 4.0 * WLCG Throughput meeting will be held on 16th of June - main topic is re-organization of the meshes * perfSONAR 4.0 (formerly 3.6) RC expected end of June %ENDSECTION{"02062016"}% ---+++!! Report 28/05/2016 * WG review took place (https://indico.cern.ch/event/514078/) ---+++!! Report 28/04/2016 %STARTSECTION{"28042016"}% * WLCG Network Throughput SU: * CBPF connectivity (https://ggus.eu/index.php?mode=ticket_info&ticket_id=120081) - resolved * ASGC connectivity (https://ggus.eu/index.php?mode=ticket_info&ticket_id=119820) - ongoing * BNL-SARA-CERN connectivity (https://ggus.eu/index.php?mode=ticket_info&ticket_id=120957) - ongoing * Stable operations of the perfSONAR pipeline (collector, datastore, publisher and dashboard) * North American Throughput meeting held on 6th April: * Jason Zurawski from ESNet presented the art of debugging network issues with perfSONAR * WLCG Throughput meeting held on 14th of April: * Discussed design and limitations of the current WLCG bandwidth mesh, throughput tests between WLCG sites * Followed up on the WLCG deployment/operations status * WG was presented at HEPiX in DESY * Added section with useful links to the WG homepage https://twiki.cern.ch/twiki/bin/view/LCG/NetworkTransferMetrics %ENDSECTION{"28042016"}% ---+++!! Report 07/04/2016 %STARTSECTION{"07042016"}% * WLCG Network Throughput SU: * CBPF connectivity (https://ggus.eu/index.php?mode=ticket_info&ticket_id=120081) - resolved * ASGC connectivity (https://ggus.eu/index.php?mode=ticket_info&ticket_id=119820) - ongoing * Packet loss and high latency for certain packets (queuing issue ?) reported by perfSONAR on ASGC to CERN, but not confirmed by the counters * Narrowed down to the StartLight to ASGC segment, but unfortunately there are very few sonars in Asia with very limited peering, which will impact further investigation * Throughput tests show peaks of 400Mbit/s (200Mbit/s usual) with frequent retransmissions occurring in bunches, we'll try to run tcpdump to understand the root cause * Stable operations of the perfSONAR pipeline (collector, datastore, publisher and dashboard) * Throughput meeting held on 6th April: * Jason Zurawski from ESNet presented the art of debugging network issues with perfSONAR * Next meeting is 14th of April (https://indico.cern.ch/event/517373/) * Update on WG will be presented at HEPiX in DESY * WG review will take place at the next WLCG ops coordination on 28th April * Added section with useful links to the WG homepage https://twiki.cern.ch/twiki/bin/view/LCG/NetworkTransferMetrics %ENDSECTION{"07042016"}% ---+++!! Report 17/03/2016 %STARTSECTION{"17032016"}% * ICFA SCIC meeting was held at J-Park in February, slides from the report (including WG contribution) can be found at http://icfa-scic.web.cern.ch/ICFA-SCIC/meetings.html * LHCOPN/LHCONE Meeting held in Taipei (https://indico.cern.ch/event/461511/) * WLCG Network Throughput SU: ASGC connectivity * Packet loss and high latency for certain packets (queuing issue ?) reported by perfSONAR on ASGC to CERN, but not confirmed by the counters * Narrowed down to the StartLight to ASGC segment, but unfortunately there are very few sonars in Asia with very limited peering, which will impact further investigation * Stable operations of the perfSONAR pipeline (collector, datastore, publisher and dashboard) * Throughput meetings held on Feb 24th and March 9th : * Soichi Hayashi presented the new configuration interface that will become part of perfSONAR 3.6 * Shawn presented the way we currently monitor the perfSONAR infrastructure, including OSG production services * perfSONAR 3.5.1 released, 184 instances were auto-updated, only 13 instances on 3.4 %ENDSECTION{"17032016"}% ---+++!! Report 18/02/2016 %STARTSECTION{"18022016"}% * WG has contributed to the International Committee for Future Accelerators (ICFA) Annual networking report (https://cds.cern.ch/record/2130751) * WLCG Network Throughput SU: BNL to PIC throughput degradation * Root cause was instability of the GEANT Spain fiber channels * Issue was reported by ATLAS and involved ESNet, LHCONE, perfSONAR and BNL * WLCG Network Throughput SU: FNAL to CERN * Issue at ESNet, resolved by LHCOPN ops * Stable operations of the perfSONAR pipeline (collector, datastore, publisher and dashboard) * Meeting held on LHCb DIRAC bridge on January 18th: * Ongoing developments on adding additional graphs (latencies, throughput) and bug fixing, plan is to go production by Q3 2016 * Throughput meeting held on January 27th: * Ilija gave a presentation on [[https://docs.google.com/presentation/d/1hnKjcE3FJjgSHTFhM2XfVpASZRT4UsbdEiW5L0yEOCU/edit?usp=sharing][Accessing and Analyzing OSG/WLCG network metrics using ElasticSearch and Kibana]] * perfSONAR RC for v3.5.1 released, validation by WLCG started %ENDSECTION{"18022016"}% ---+++!! Report 21/01/2016 %STARTSECTION{"21012016"}% * WLCG Network Throughput SU: [[https://ggus.eu/index.php?mode=ticket_info&ticket_id=118730][GGUS-118730]] Throughput degradation between CA and EU * Root cause was instability of the transatlantic link (WIX reported submarine shunt fault), which in turn impacted Geant- CANARIE link. * perfSONAR network helped to identify the problematic segment and once Canarie was notified the issue was resolved by re-routing. * Issue was reported by ATLAS, but many different people were involved (ATLAS, TRIUMF, perfSONAR support, LHCONE, Canarie, WIX). * Multiple GGUS tickets were open, but only one was followed up, something to improve in the future. * Experiments: Please check if everyone was notified of the on-going incident and let us know if we need to add additional contacts (wlcg-network-throughput mailing list) * OSG perfSONAR production services: Storage failure (OASIS) at GOC has impacted the entire perfSONAR pipeline, initially just the datastore, but later on also collector and publisher. The issue was resolved yesterday and the systems are recovering now. We have proposed changes that would remove dependency on the shared storage. %ENDSECTION{"21012016"}% ---+++!! Report 07/01/2016 %STARTSECTION{"07012016"}% * Stable operations of the perfSONAR pipeline (collector, datastore, publisher and dashboard), minor instability in the dashboard reported yesterday, being followed up by OSG * Additional monitoring metrics will be added to psomd.grid.iu.edu to capture collector's efficiency and report on freshness of the metadata in the OSG Datastore (for each sonar). * Proposed re-organization of the WG meetings, split into two areas, perfSONAR operations (throughput calls) and research/pilot projects * perfSONAR operations - main scope would be to continue with perfSONAR support, follow up on the existing infrastructure while at the same time start looking into issues already shown by the existing tools and try to fix them at the source. As this scope is well aligned with the existing North American throughput calls, we could alternate the meetings and publish common notes. * Research/pilot projects - will have separate on-demand meetings with notes published to WG mailing list * F2F meeting once a year, co-located with GDB or other workshop/conference * Pilot projects: LHCb DIRAC bridge available [[https://dirac.cis.gov.pl:8443/DIRAC/CIS-Development/visitor/systems/accountingPlots/network][online]] %ENDSECTION{"07012016"}% ---+++!! Report 19/11/2015 %STARTSECTION{"19112015"}% * perfSONAR collector, datastore, publisher and dashboard in production (stable operations) * Additional monitoring metrics will be added to psomd.grid.iu.edu to capture collector's efficiency and report on freshness of the metadata in the OSG Datastore (for each sonar). * perfSONAR 3.5: 205 sonars were updated, ALL sites are encouraged to enable auto-updates for perfSONAR * Pilot projects: ATLAS Panda, perfSONAR stream now in ATLAS Network Analytics (https://twiki.cern.ch/twiki/bin/view/AtlasComputing/ATLASAnalytics), several KIBANA dashboards available - [[http://cern.ch/go/sk8j][Site link stats]]. Jorge and Ilija working on cost matrix using the round-trip time and packet loss in Mathis's formula to infer bandwidth (predictions based on this model will follow). * Pilot projects: LHCb DIRAC bridge is now functional, processing perfSONAR stream and inserting packet loss metrics in DIRAC, includes mapping to LHCb sites. Henryk, Federico and Stefan are working on this. %ENDSECTION{"19112015"}% ---+++!! Report 05/11/2015 %STARTSECTION{"05112015"}% * perfSONAR collector, datastore, publisher and dashboard now in production (stable operations) * perfSONAR 3.5: 205 sonars were updated, ALL sites are encouraged to enable auto-updates for perfSONAR * Detailed report from the WG presented at [[https://indico.cern.ch/event/319753/][GDB]] * Meeting held yesterday, encouraging all mesh leaders to participate * Started discussion on the network outage and at risk announcements from NRENs * Pilot projects: ATLAS Panda, perfSONAR stream now in ATLAS Network Analytics (https://twiki.cern.ch/twiki/bin/view/AtlasComputing/ATLASAnalytics), several KIBANA dashboards available [[http://cern.ch/go/Z7F8][MWT2]] [[http://cern.ch/go/C7pv][FZK2]]. Jorge and Ilija working on cost matrix using the round-trip time and packet loss in Mathis's formula to infer bandwidth (predictions based on this model will follow). %ENDSECTION{"05112015"}% ---+++!! Report 22/10/2015 %STARTSECTION{"22102015"}% * perfSONAR collector, datastore, publisher and dashboard now in production ! * [[psmad.grid.iu.edu/maddash-webui/][psmad]] becomes the official dashboard for perfSONAR meshes * perfSONAR 3.5: 183 sonars were updated, ALL sites are encouraged to enable auto-updates for perfSONAR. * Detailed report from the WG presented at [[https://indico.cern.ch/event/384358/session/7/#20151013?slotId=2][HEPiX/GDB]], we will also present status update again at the November's GDB * ATLAS started processing perfSONAR stream to create a network cost-matrix for use by PANDA with additional use cases in scheduled transfers and dynamic data access * LHCb also started processing perfSONAR stream and correlates it with the network and transfer metrics in DIRAC * Next WG meetings will be on 4th of Nov and 2nd of Dec %ENDSECTION{"22102015"}% ---+++!! Report 01/10/2015 %STARTSECTION{"01102015"}% * Meeting held yesterday, https://indico.cern.ch/event/400643/ * Publishing of the perfSONAR results using OSG production service planned for 13th of October (OSG production date) * OSG dashboard (psmad.grid.iu.edu) will go production on the same date, already showing more recent results than maddash.aglt2.org, one issue to be fixed is to correctly show tests done in one-direction only * WLCG-wide meshes campaign finalized with 94 sonars in latency testing, 115 sonars in traceroutes and 104 in throughput. * Sonars that were not included in the WLCG-wide meshes were reported to the mesh leaders and will be followed up (currently they reside in the global meshes, once issues are fixed they'll be moved to WLCG meshes) * Started re-creating project meshes, Belle II and Dual-stack (IPv4/IPv6 bandwidth), plans for other meshes to be discussed * Once infrastructure is in production, we plan to focus on the integration projects, there are ongoing pilot projects for ATLAS and LHCb * There is also interest in perfSONAR in the IT Analytics WG as well as from the network community Asia Tier Centre Forum (https://indico.cern.ch/event/395656/) * perfSONAR 3.5 was released on Monday 28th Sept, 162 sonars were auto-updated, 68 still on 3.4, all sites are encouraged to enable auto-updates for perfSONAR * Next WG meetings will be on 4th of Nov and 2nd of Dec %ENDSECTION{"01102015"}% <verbatim> WLCG perfSONAR service status report on 2015-10-01 04:02:21.078035 ======= Active perfSONAR instances: 250 GOCDB registered total: 193 OIM registered total: 85 perfSONAR-PS versions deployed: 3.4.1 : 7 3.4.2 : 61 3.5.0 : 162 Unknown: 18 Incorrectly configured (failing >4 metrics): 5 </verbatim> ---+++!! Report 17/09/2015 %STARTSECTION{"17092015"}% * [[http://psds.grid.iu.edu][OSG perfSONAR datastore]] entered production on 14th of Sept providing storage and interface for all perfSONAR results. * Publishing of the perfSONAR results using pre-production (ITB) services was successfully established, working to resolve issue with some event types not being published, production still pending SLA. * WLCG-wide meshes campaign with latency testing ramped up to 81 sonars caused some instabilities of the sonars with 4GB RAM, therefore we have decreased the number of tests performed and this has improved the situation. * Final version of the perfSONAR 3.5 is planned to be released on 28th of September and will be auto-deployed to all WLCG instances. There were no issues found in the testbed, but we plan to update couple of production instances in advance to check if everything is fine. * ESNet and OSG have started developments on the perfSONAR configuration interface - open source project motivated by the existing version developed for WLCG. There has been also interest from GEANT and ESNet to collaborate on an open source project based on the existing proximity service. * Follow up meeting was held to discuss findings of the FTS performance study lead by Saul Youssef (Boston University), new optimization algorithm was proposed and discussed. * Next WG meeting will be on 30th of Sept (https://indico.cern.ch/event/400643/) %ENDSECTION{"17092015"}% ---+++!! Report 03/09/2015 %STARTSECTION{"03092015"}% * Meeting held yesterday, 2nd of September https://indico.cern.ch/event/393102/ * OSG enabled publishing of the perfSONAR results to the netmon-test-mb.cern.ch from the ITB collector service today. Production setup is still pending SLA. * OSG perfSONAR dashboard (psmad.grid.iu.edu), which is already connected to the OSG datastore already showing up to date content. * MadAlert - new project to analyse meshes and report infrastructure issues vs network problems already reporting from psmad (MadAlert http://maddash.aglt2.org/madalert.html). * perfSONAR operations status * Latency mesh: 81 sonars (94% efficiency) * Traceroute mesh: 112 sonars (90% efficiency) * perfSONAR 3.5rc2 was released yesterday and will be auto-deployed to all testbed instances, one issue with Postgresql reported from UC instance %ENDSECTION{"03092015"}% ---+++!! Report 20/08/2015 %STARTSECTION{"20082015"}% * Established production and validation ActiveMQ brokers at CERN (netmon-mb.cern.ch and netmon-test-mb.cern.ch), they will be used to broadcast data collected by perfSONARs to experiments. * OSG will test-enable publishing of the perfSONAR results to the netmon-test-mb.cern.ch from the ITB collector service. * Proximity service - developed mapping matrix that experiments could use to map storages to sonars and use it to process the perfSONAR stream from. Currently tested by LHCb, which is developing a perfSONAR to DIRAC connector. * New project to analyse meshes and report infrastructure issues vs network problems is being developed at AGLT2 (MadAlert http://maddash.aglt2.org/madalert.html). Plan is to continue to develop it targeting an eventual way to automate problem finding. * perfSONAR operations status * Progress made on the WLCG-wide meshes, latency mesh now with 70 sonars. * Validation of the perfSONAR 3.5rc1 started, final release expected in October. * ESNet is finalizing the development design document on the perfSONAR configuration interface - open source project motivated by the existing version developed for WLCG. %ENDSECTION{"20082015"}% ---+++!! Report 30/07/2015 %STARTSECTION{"30072015"}% * Successfully tested publishing of the perfSONAR results to the message bus directly from the OSG collector. Discussing possible SLA to run this as a production service in collaboration with OSG. * OSG datastore on track to go production at the end of July, this will be a service provided to the WLCG, it will store all the perfSONAR data and provide an API * Started testing proximity service, which helps to map sonars to storages and thus enables integration of the network and transfer metrics. * Review of the experiments use cases was presented/discussed at the last meeting, see slides for details (https://indico.cern.ch/event/393101/) * FTS performance study update - see slides for details (https://indico.cern.ch/event/393101/), observations from the report so far: * Peak transfer rates between Europe and North America are less asymmetric than they were last month (to be followed up) * Almost all incoming to BNL uses TCP=1 (Alejandro confirmed this is how BNL is configured right now, the other FTS instances use auto-tuning) * CMS T1s have better transfer rates compared to ATLAS and LHCb (to be followed up) * CMS uses TCP=1 more often than ATLAS and LHCb for large files * TCP stream=1 transfer do timeout about 2-3% of the time, however timeouts are concentrated at a few sites. * Throughput dependence on TCP streams possibly understood (see http://egg.bu.edu/lhc/fts/docs/2015-05-26-status/results_so_far.pdf) * perfSONAR operations status * Agreed to establish WLCG-wide meshes for top 100 sites (based on the contributed storage and location). This will enable full mesh testing of latencies, traceroutes and throughput (ongoing). * ESNet interested in the perfSONAR configuration interface developed for WLCG, development design document for an open-source project based it is currently discussed. %ENDSECTION{"30072015"}% ---+++!! Report 02/07/2015 %STARTSECTION{"02072015"}% * perfSONAR status * Agreed to establish WLCG-wide meshes for top 100 sites (based on the contributed storage and location). This will enable full mesh testing of latencies, traceroutes and throughput * Working in collaboration with ESNet to narrow down on an issue affecting latency measurements for long distance testing (US to Europe, Europe to Asia, etc.). A fix has been released and will be auto-deployed to all sites. * perfSONAR 3.5 RC is planned to be released next week. The following sites agreed to participate in the validation testbed: Nebraska, BNL, SWT2, AGLT2, MWT, TAMU, IEPSAS-Kosice * perfSONAR support involved in debugging the network issues at RAL * Successfully tested publishing perfSONAR results directly from the OSG collector (that populates OSG/esmond datastore). * Started testing proximity service, which helps to map sonars to storages and thus enables integration of the network and transfer metrics. * Next meeting will be on 8th of July (https://indico.cern.ch/event/393101/), planning a detailed update on OSG datastore and FTS performance study. %ENDSECTION{"02072015"}% ---+++!! Report 18/06/2015 %STARTSECTION{"18062015"}% * perfSONAR status * Proposed to establish WLCG-wide meshes for top 100 sites (based on their storage contribution and geographical location). This would enable full mesh testing of latencies, traceroutes and bandwidth. * Potential bug was identified and submitted to ESNet affecting latency measurements for long distance testing (US to Europe, Europe to Asia, etc.). * Currently evaluating the possibility to publish perfSONAR results directly from the OSG collector (that populates OSG/esmond datastore). Set of patches to extend the OSG collector were submitted for consideration. * Next meeting will be on 8th of July (https://indico.cern.ch/event/393101/), planning a detailed update on OSG datastore and FTS performance study. %ENDSECTION{"18062015"}% ---+++!! Report 04/06/2015 %STARTSECTION{"04062015"}% * perfSONAR status * Detailed report from the WG was presented on Monday at the LHCOPN-LHCONE meeting - LBL Berkeley (US) (https://indico.cern.ch/event/376098/) * Both LHCOPN and LHCONE meshes stable now, consistently delivering metrics. RAL shows signs of continuing network problems in both latency and bandwidth. * Based on the positive experience in ramping up latency mesh, we plan to establish full WLCG meshes for all types of tests and use it as a baseline for other meshes * In collaboration with ESNet, a bug was found in parsing tracepath results, causing significant reduction in efficiency of getting tracepath results. Plan is to revert back to traceroutes and only run low frequency tracepath tests until the issue is fixed. * The old mesh configuration interface hosted from grid-deployment.web.cern.ch will be decomissioned on Monday (8th of June). Few sites that still have the old URLs configured have been notified. * Network performance incidents process - new GGUS SU (WLCG Network Throughput) already available, more information at https://twiki.cern.ch/twiki/bin/view/LCG/NetworkTransferMetrics#Network_Performance_Incidents * Test deployed esmond2mq at CERN (developed in collaboration with LHCb), core functionality works fine, waiting for the OSG datastore to enter production in order to run it continuously * Next meeting postponed to 10th of June (https://indico.cern.ch/event/382624/). Plan is to focus it on discussing full WLCG meshes proposal, proximity service and initial report from the FTS performance study. * Very special thanks for major contributions to the WG and farewell to Soichi Hayashi (OSG) and Aaron Brown (Internet2). %ENDSECTION{"04062015"}% <verbatim> WLCG perfSONAR service status report on 2015-06-04 04:02:22.794725 ======= Active perfSONAR instances: 240 Registered/monitored perfSONAR instances: 260 perfSONAR-PS versions deployed: 3.4.1 : 17 3.4.2 : 200 Unknown: 21 Incorrectly configured (failing >4 metrics): 23 </verbatim> ---+++!! Report 21/05/2015 %STARTSECTION{"21052015"}% * perfSONAR status * Security: New SSL vulnerability dubbed Logjam: https://weakdh.org/sysadmin.html. WLCG perfSONAR hosts should NOT be vulnerable to this attack. The Apache configuration installed by the Toolkit disables the cipher suites in question by default. * Network performance incidents process - new GGUS SU (WLCG Network Throughput) will become available on 24th of June. * Next meeting 3rd of June (https://indico.cern.ch/event/382624/). Plan is to focus it on latency ramp up and proximity service. %ENDSECTION{"21052015"}% ---+++!! Report 07/05/2015 %STARTSECTION{"07052015"}% * perfSONAR status * Security: NDT 3.7.0.1 was released, fixing potential security issue in NDT. This shouldn't affect WLCG sites that followed our instructions, since they should have NDT/NPAD disabled. We encourage ALL sites to double check this and also to ensure they have auto-updates enabled. The latest perfSONAR Toolkit version that all sites should be running is 3.4.2-12.pSPS (Latest versions of all sub-components are Toolkit-3.4.2 (3.4.2-12.pSPS), BWCTL-1.5.4-1.el6, OWAMP-3.4-10.el6, NDT-3.7.0.1-2.el6, NPAD-1.5.6-3.el6, esmond-1.0-13.el6, Regular Testing Daemon-3.4.2-4.pSPS, iperf3-3.0.11-1.el6). * All meshes migrated from iperf to iperf3 and from traceroute to tracepath. This should improve our bandwidth measurements and enable MTU path discovery. * Very good progress in ramping up latency tests, currently with 34 sonars, we're able to consistently get results for all tested links. * Network performance incidents process put in place as was agreed at the last meeting (https://twiki.cern.ch/twiki/bin/view/LCG/NetworkTransferMetrics#Network_Performance_Incidents) * OSG/Datastore validation progressing well, resolved all performance issues and targeting July for production (progress already visible at http://psmad.grid.iu.edu/maddash-webui/). * Publishing results to message bus progressing, development has finalized for esmond2mq prototype and we plan to enter pilot phase. Initial version of the proximity service (mapping sonars to storages) in testing. * Last meeting held yesterday (https://indico.cern.ch/event/382623/) - focused on FTS perfromance * Hassen Riahi (FTS dashboard) reported on FTS performance for WLCG during the first phase of production (3 months) * Initial report on the FTS performance study presented by Saul Youssef (Boston University), common study for ATLAS, CMS and LHCb. Early results already provide valuable insights and also show how we could benefit from integrating FTS and perfSONAR. Agreed to follow up on a regular basis at the next meetings. * Next meeting 3rd of June (https://indico.cern.ch/event/382624/). Plan is to focus it on latency ramp up and proximity service. %ENDSECTION{"07052015"}% <verbatim> WLCG perfSONAR service status report on 2015-05-07 04:02:24.706444 ======= Active perfSONAR instances: 235 Registered/monitored perfSONAR instances: 259 perfSONAR-PS versions deployed: 3.4.1 : 23 3.4.2 : 183 Unknown: 25 Incorrectly configured (failing >4 metrics): 17 </verbatim> ---+++!! Report 02/04/2015 %STARTSECTION{"02042015"}% * perfSONAR status * Security: CVE released today for cassandra, which is used by the perfSONAR measurement archive software, esmond. NO action required to protect perfSONAR Toolkit since vulnerable ports are both disabled and firewalled. * perfSONAR 3.4.2 was released and auto-deployed to 163 sonars, there are 42 instances still on 3.4.1. We no longer have any active instances on older versions. * We encourage ALL sites that are still on 3.4.1 to check status of their sonars (mainly disk space) and enable auto updates ASAP. * Significant improvement observed in getting consistently all the needed metrics after this update. The plan is to resume validation in LHCOPN/LHCONE and continue with a ramp up to full mesh latency tests. * Full mesh trace paths now at 80% * Network performance incidents follow up (proposal): * New mailing list and GGUS SU will be established to follow up, proposed name is wlcg-network-throughput, initial participation will be the same as for the WG mailing list (transfer systems, experiments, perfsonar support, esnet, lhcopn/lhcone). * Experiments can report to the GGUS SU/mailing list potential network performance incidents/degradations, WLCG perfSONAR support unit will investigate and confirm if this is network related issue. Once confirmed, it will notify relevant sites and will try to assist in narrowing down the problem to particular link(s). Affected sites will be contacted and should open an incident with their network providers. Tracking of the ongoing incidents will be done on the WG page (https://twiki.cern.ch/twiki/bin/view/LCG/NetworkTransferMetrics#Network_Performance_Incidents). * Sites observing a network performance problem should follow their standard procedure, i.e. report to their network team and if necessary escalate to their network provider while informing the wlcg-network-throughput mailing list. If confirmed to be WAN related, WLCG perfSONAR support unit can assist in further debugging of the problem. For the non-technical (policy) issues or if unclear, sites should escalate to the WLCG operations coordination. * Next WG meeting: 8th of April (https://indico.cern.ch/event/382622/) %ENDSECTION{"02042015"}% <verbatim> WLCG perfSONAR service status report on 2015-04-02 04:02:22.925555 ======= Active perfSONAR instances: 233 Registered/monitored perfSONAR instances: 259 perfSONAR-PS versions deployed: 3.4.1 : 42 3.4.2 : 163 Unknown: 26 Incorrectly configured (failing >4 metrics): 27 </verbatim> ---+++!! Report 19/03/2015 %STARTSECTION{"19032015"}% * WG meeting was held on 18th of March (https://indico.cern.ch/event/379017/) * perfSONAR status * All sites should be running 3.4.1, final deadline was 16th of February, 5 sites received tickets (3 of them responded) * Testing/evaluation of the 3.4.2rc candidate ongoing, additional issues were identified and fixed by the ESNet developers team. * Plan is to follow up the testbed for next couple of days, if there are no issues reported, 3.4.2rc will get a green light (once released, this should propagate to all sites within 24 hours) * Datastore (esmond) status * Esmond testing is ongoing, gathering 100% of the meshes (some with missing data due to issues in 3.4.1) * Network performance incidents follow up * Procedure was proposed and is still under discussion within the WG. * Integration projects * Revised proposal for the experiments interface to perfSONAR, esmond2mq prototype was developed and tested, feedback will be reported to OSG and ESNet. * Next meeting: 8th of April (https://indico.cern.ch/event/382622/) %ENDSECTION{"19032015"}% ---+++!! Report 05/03/2015 %STARTSECTION{"05032015"}% * WG meeting was held on 18th of February (https://indico.cern.ch/event/372546/) * All sites should be running 3.4.1, final deadline was 16th of February, 5 sites received tickets (2 of them responded) * Follow up campaign to bring all perfSONARs to the correct configuration ongoing, started with LHCOPN/LHCONE instances, several issues found and reported * Testbed established to evaluate/test 3.4.2rc (release candidate), which was released last week. Several issues fixed that were reported by us during LHCOPN/LHCONE configuration campaign. One new issue found and reported to the development team. * New meshes: IPv6/IPv4 dual stack (lead by Duncan Rand), Latin America (lead by Renato Santana, Pedro Diniz) * Testing and evaluation of the pilot instances for esmond/maddash ongoing (psds.grid.iu.edu, psmad.grid.iu.edu) * Production instance of the infrastructure monitoring (psomd.grid.iu.edu) updated with new tests that check completeness/freshness of data in the local measurement archives (high level functional test) * Integration of the network and transfer metrics: two pilot projects proposed in the last WG meeting * LHCb pilot project to provide experiment agnostic prototype to access central datastore (esmond) and publish available metrics to messaging * Extending ATLAS FTS performance study to CMS and LHCb * Networking degradation between SARA and AGLT2 under investigation - to be followed up at the next WG meeting * Original issue noted when many large file transfers SARA->AGLT2 failed. Cause was FTS timeout since files 2-6GB were moving at 10-100s of Kbytes/sec. Problem reported to this working group. * perfSONAR regular tests between T2 and T1 have been paused so manual perfSONAR tests were done showing poor performance (200-500 Kbytes/sec). * Saul Youssef's examination of FTS logs indicated possible problematic trans-Atlantic link was involved. Additional reports of poor performance between CERN EOS and MWT2 used same link. * Recommended procedure (by LHCONE/LHCOPN working group) is to have either end-site contact their R&E network provider to open a ticket. AGLT2 contacted Internet2 and opened a ticket (ISSUE=2688 PROJ=144) * Temporary debug mesh setup to test paths between SARA, CERN and AGLT2,MWT2. See https://maddash.aglt2.org/maddash-webui/index.cgi?dashboard=Debug%20Mesh%20(temp) * BW graph SARA-AGLT2 at https://maddash.aglt2.org/serviceTest/graphWidget.cgi?url=http://ps.lhcopn-ps.sara.nl/esmond/perfsonar/archive/&source=ps.lhcopn-ps.sara.nl&dest=psmsu02.aglt2.org# * Internet2 has opened ticket with GEANT(TT#2015022734000453) and the issue is actively being pursued. * Work underway getting suitable intermediate perfSONAR instances onto LHCONE to help localize the issue. * Next WG meeting will be on 18th of March (https://indico.cern.ch/event/379017/) %ENDSECTION{"05032015"}% <verbatim> WLCG perfSONAR service status report on 2015-03-05 04:02:24.416548 ======= Active perfSONAR instances: 225 Registered/monitored perfSONAR instances: 249 perfSONAR-PS versions deployed: 3.2.2 : 1 3.3.2 : 1 3.4.1 : 207 3.4.2 : 13 Unknown: 27 Incorrectly configured (failing >4 metrics): 31 </verbatim> ---+++!! Report 05/02/2015 %STARTSECTION{"05022015"}% * WG still waiting on input from ATLAS on use-cases/requirements for network metrics * Meeting to discuss the use cases will be held on 18th of February (https://indico.cern.ch/event/372546/) * 2nd broadcast was sent to remind sites to update to 3.4.1 - final deadline is 16th of February - sites that won't update by this date will receive tickets * Production version of perfSONAR infrastructure monitoring available at http://pfomd.grid.iu.edu/ (you need to have your certificate loaded in the browser to access) * Pilot versions of maddash and datastore (http://pfds.grid.iu.edu) available * perfSONAR operations meeting was held last week - minutes available at https://indico.cern.ch/event/369420/ * Agreed to start full mesh latency testing starting with top-k sites and gradually moving to all sites * Follow up campaign to bring all perfSONARs to the correct configuration * perfSONAR Workshop was held in Columbus Ohio January 21-22, 2015, press release available at http://www.internet2.edu/news/detail/7727/ * Shawn presented input from WLCG/OSG (https://meetings.internet2.edu/2015-ftw-perfsonardeployment-best-practices/program) * LHCOPN/LHCONE meeting to be held in Cambridge, 9-10th of February (https://indico.cern.ch/event/342059/) %ENDSECTION{"05022015"}% <verbatim> WLCG perfSONAR service status report on 2015-02-05 04:02:21.711952 ======= Active perfSONAR instances: 220 Registered/monitored perfSONAR instances: 241 perfSONAR-PS versions deployed: 3.2.2 : 1 3.3.2 : 4 3.4.1 : 185 Unknown: 51 Incorrectly configured (failing >4 metrics): 51 </verbatim> ---+++!! Report 20/11/2014 %STARTSECTION{"04122014"}% * Metrics area meeting held last week, minutes available at https://indico.cern.ch/event/354593/ * WG waiting on input from transfer systems and experiments on use-cases/requirements for network metrics * Strawman planned for early next year * Status of perfSONAR presented also at ATLAS jamboree yesterday * Update campaign ongoing, hard deadline for all sites to update is 8th January 2015 * perfSONAR data store configured in ITB; stress testing ongoing %ENDSECTION{"04122014"}% <verbatim> WLCG perfSONAR service status report on 2014-12-04 04:02:19.233227 ======= perfSONAR instances monitored: 214 perfSONAR-PS versions deployed: 3.3.2 : 26 3.4.1 : 118 Unknown: 66 GOCDB registered total: 190 OIM registered total: 80 Unreachable instances (not monitored): 42 Incorrectly configured (failing >4 metrics): 69 </verbatim> ---+++!! Report 20/11/2014 %STARTSECTION{"20112014"}% * 107 instances updated to 3.4.1 following the WLCG and EGI broadcasts sent with the new install/update instructions * Second broadcast to be sent next week, deadline to update will be 8th January 2015 * Planning to start validation of the existing 3.4.1 sonars next week * perfSONAR data store configured in ITB; stress testing to start next week * Metrics area meeting to be held next week (http://doodle.com/ezrfh8eybu7iybxyqzrcbze9) %ENDSECTION{"20112014"}% <verbatim> WLCG perfSONAR service status report on 2014-11-20 04:02:13.263575 ======= perfSONAR instances monitored: 214 perfSONAR-PS versions deployed: 3.3.2 : 29 3.4.1 : 107 Unknown: 74 GOCDB registered total: 190 OIM registered total: 70 Unreachable instances (not monitored): 45 Incorrectly configured (failing >4 metrics): 70 </verbatim> ---+++!! Report 06/11/2014 %STARTSECTION{"06112014"}% * WLCG OPS broadcast sent with new install/update instructions for perfSONAR 3.4+, requesting ALL sites to update before 8th of December * New perfSONAR configuration system in production, announced as part of the WLCG OPS broadcast * Two major milestones close to be finalized ( [[https://its.cern.ch/jira/browse/METRICS-5][T2.1 perfSONAR Commissioning/Operations]], [[https://its.cern.ch/jira/browse/METRICS-7][T2.3 perfSONAR Configuration]]) * Testing and validation of the [[https://its.cern.ch/jira/browse/METRICS-6][perfSONAR data store]] ongoing - production date to be announced * Next perfSONAR operations meeting to be held next week (http://doodle.com/qydib32fkv48er2r ) * Metrics area meeting to be announced %ENDSECTION{"06112014"}% <verbatim> WLCG perfSONAR service status report on 2014-11-06 04:02:17.829838 ======= perfSONAR instances monitored: 214 perfSONAR-PS versions deployed: 3.2.2 : 1 3.3.1 : 2 3.3.2 : 47 3.4.1 : 82 Unknown: 78 GOCDB registered total: 188 OIM registered total: 55 Unreachable instances (not monitored): 47 Incorrectly configured (failing >4 metrics): 78 </verbatim> ---+++!! Report 16/10/2014 %STARTSECTION{"16102014"}% * Update on WG presented at GDB last week (Details at [[https://indico.cern.ch/event/272778/][agenda]]) * perfSONAR 3.4 released 7th of October, we recommend ALL sites to wait with upgrade until the re-install instructions are broadcasted via WLCG and EGI * Performed internal security audit in collaboration with perfSONAR developers - summary to be provided in the re-install instructions * Metrics area meeting was canceled, doodle for the new one will be sent shortly * POODLE: SSLv3.0 vulnerability (CVE-2014-3566) announced yesterday - https://access.redhat.com/articles/1232123 - affecting perfSONARs as well. Patches from distributions not available yet (16th Oct) - perfSONAR team provided their own fixes yesterday (perl-perfSONAR_PS-Toolkit-3.4-29.pSPS and perl-perfSONAR_PS-Toolkit-SystemEnvironment-3.4-29.pSPS). We recommend all sites running 3.3 to temporarily disable SSL3. We recommend ALL sites to wait with upgrade to 3.4 until the re-install instructions are broadcasted via WLCG and EGI. * perfSONAR operations meeting this Friday (Oct 3 at 3PM), minutes at https://indico.cern.ch/event/342995/ * Highlights: Agreed to introduce several major changes in operations (introduce GGUS SU, security mailing list, setup infrastructure monitoring, introduce automated mesh configurations) * Next operations meeting will be held next week, please vote at http://doodle.com/qydib32fkv48er2r %ENDSECTION{"16102014"}% <verbatim> WLCG perfSONAR service status report on 2014-10-16 04:03:54.594325 ======= perfSONAR instances monitored: 214 perfSONAR-PS versions deployed: 3.2.2 : 1 3.3.1 : 2 3.3.2 : 66 Unknown: 141 GOCDB registered total: 172 OIM registered total: 55 Unreachable instances (not monitored): 79 Incorrectly configured (failing >4 metrics): 109 </verbatim> ---+++!! Report 02/10/2014 %STARTSECTION{"02102014"}% * Details on the shell shock vulnerabilites and its impact on perfSONAR available at https://twiki.cern.ch/twiki/bin/view/LCG/ShellShockperfSONAR * We recommend ALL sites that didn't patch bash before Friday Sep 26 to terminate their instances and wait until perfSONAR 3.4 is released * perfSONAR 3.4 to be released on Mon Oct 6, WLCG and EGI broadcasts will be sent with the installation instructions * perfSONAR operations meeting this Friday (Oct 3 at 3PM), agenda at https://indico.cern.ch/event/342995/ %ENDSECTION{"02102014"}% <verbatim> WLCG perfSONAR service status report on 2014-10-02 04:02:15.996763 ======= perfSONAR instances monitored: 214 perfSONAR-PS versions deployed: 3.3.1 : 2 3.3.2 : 96 Unknown: 112 GOCDB registered total: 173 OIM registered total: 55 Unreachable instances (not monitored): 90 Incorrectly configured (failing >4 metrics): 111 </verbatim> ---+++!! Report 18/09/2014 %STARTSECTION{"18092014"}% * Kick-off meeting minutes and slides available at https://indico.cern.ch/event/336520/ * The meeting had very good participation including experiments, ESNet Science Engagement Group (perfSONAR development team), Panda, PhEDEx, FTS, FAX as well as majority of the perfSONAR regional contacts. An initial overview of the current status in the network and transfer metrics was presented and a list of topics and tasks to work on in the short-term was proposed. Very good feedback was received and we have agreed on the topics to discuss at the follow up meetings. * Please check [[NetworkTransferMetrics][Twiki]] for updated task table * 5 sites received tickets on running an outdated version of perfSONAR * Follow up meetings: * Metrics area meeting focusing on use cases and review of the transfer systems (T1.1, T1.2) * 13-17th October http://doodle.com/xvwdvysdrdzap8wh * Meetings focusing on perfSONAR operations (T2.1): * 29 Sept - 3 October http://doodle.com/e6epkkqmdx6ka3r7 * 20 Oct - 24 October http://doodle.com/qydib32fkv48er2r %ENDSECTION{"18092014"}% <verbatim> WLCG perfSONAR service status report on 2014-09-18 09:56:52.693187 ======= perfSONAR instances monitored: 214 perfSONAR-PS versions deployed: 3.2.2 : 4 3.3.1 : 3 3.3.2 : 175 Unknown: 28 GOCDB registered total: 172 OIM registered total: 53 Unreachable instances (not monitored): 7 Incorrectly configured (failing >4 metrics): 26 </verbatim> ---+++!! Report 04/09/2014 %STARTSECTION{"04092014"}% * [[https://indico.cern.ch/event/336520/][Kick-off meeting]] will take place on Mon 8th of Sept at 3PM CEST * Early version of the WLCG perfSONAR configuration interface will be deployed to production next week. * Pythia Network Diagnosis Infrastructure (PuNDIT) project will be funded by NSF and starts at the beg. of September (lead by Shawn). The project will use perfSONAR-PS data to identify and localize network problems using the Pythia algorithms. PuNDIT will collaborate with OSG and WLCG over its two year duration. * Sites with incorrect versions of perfSONAR will receive tickets at the beg. of next week (9 sites in total) <verbatim> WLCG perfSONAR service status report on 2014-09-04 10:03:22.949799 ======= perfSONAR instances monitored: 214 perfSONAR-PS versions deployed: 3.2.2 : 6 3.3.1 : 3 3.3.2 : 173 Unknown: 28 GOCDB registered total: 170 OIM registered total: 53 Unreachable instances (not monitored): 8 Incorrectly configured (failing >4 metrics): 28 </verbatim> %ENDSECTION{"04092014"}% ---+++!! Report 21/08/2014 %STARTSECTION{"21082014"}% * Updated WG page with list of members, task tracking, coming events and reports (https://twiki.cern.ch/twiki/bin/view/LCG/NetworkTransferMetrics) * [[https://indico.cern.ch/event/336520/][Kick-off meeting]] will take place on Mon 8th of Sept at 3PM CEST * On July 21st perfSONAR Toolkit 3.4rc2 became available for testing, version 3.4 is a major milestone for the WG as it enables access via REST API and introduces several important performance improvements, therefore deployment campaign will follow once we get a stable release * Work is progressing on the WLCG perfSONAR configuration interface (finalized design, work is ongoing on a prototype implementation) * OSG perfSONAR datastore plan has been agreed and testing of the store based on [[http://software.es.net/esmond/intro.html][esmond]] is ongoing <verbatim> WLCG perfSONAR service level report on 2014-08-20 16:59:32.876708======= perfSONAR instances monitored: 214 perfSONAR-PS versions deployed: 3.2.2 : 6 3.3.1 : 3 3.3.2 : 174 Unknown: 27 GOCDB registered total: 170 OIM registered total: 53 Unreachable instances (not monitored): 8 Incorrectly configured (failing >4 metrics): 30</verbatim> %ENDSECTION{"21082014"}% -- Main.MarianBabik - 19 May 2014
Edit
|
Attach
|
Watch
|
P
rint version
|
H
istory
:
r198
|
r164
<
r163
<
r162
<
r161
|
B
acklinks
|
V
iew topic
|
Raw edit
|
More topic actions...
Topic revision: r162 - 2019-09-27
-
MarianBabik
Log In
LCG
LCG Wiki Home
LCG Web Home
Changes
Index
Search
LCG Wikis
LCG Service
Coordination
LCG Grid
Deployment
LCG
Apps Area
Public webs
Public webs
ABATBEA
ACPP
ADCgroup
AEGIS
AfricaMap
AgileInfrastructure
ALICE
AliceEbyE
AliceSPD
AliceSSD
AliceTOF
AliFemto
ALPHA
ArdaGrid
ASACUSA
AthenaFCalTBAna
Atlas
AtlasLBNL
AXIALPET
CAE
CALICE
CDS
CENF
CERNSearch
CLIC
Cloud
CloudServices
CMS
Controls
CTA
CvmFS
DB
DefaultWeb
DESgroup
DPHEP
DM-LHC
DSSGroup
EGEE
EgeePtf
ELFms
EMI
ETICS
FIOgroup
FlukaTeam
Frontier
Gaudi
GeneratorServices
GuidesInfo
HardwareLabs
HCC
HEPIX
ILCBDSColl
ILCTPC
IMWG
Inspire
IPv6
IT
ItCommTeam
ITCoord
ITdeptTechForum
ITDRP
ITGT
ITSDC
LAr
LCG
LCGAAWorkbook
Leade
LHCAccess
LHCAtHome
LHCb
LHCgas
LHCONE
LHCOPN
LinuxSupport
Main
Medipix
Messaging
MPGD
NA49
NA61
NA62
NTOF
Openlab
PDBService
Persistency
PESgroup
Plugins
PSAccess
PSBUpgrade
R2Eproject
RCTF
RD42
RFCond12
RFLowLevel
ROXIE
Sandbox
SocialActivities
SPI
SRMDev
SSM
Student
SuperComputing
Support
SwfCatalogue
TMVA
TOTEM
TWiki
UNOSAT
Virtualization
VOBox
WITCH
XTCA
Welcome Guest
Login
or
Register
Cern Search
TWiki Search
Google Search
LCG
All webs
Copyright &© 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use
Discourse
or
Send feedback