Summary of the HTTP Deployment TF's Activities
The HTTP Deployment TF's activities fell into two areas, as described below.
Policy and advice on HTTP for WLCG
The TF created documents and tools to guide adoption of HTTP within WLCG
Operational support for HTTP deployment
The TF created a Nagios probe for use with the SAM/Nagios framework and, with the help of the monitoring team, began to monitor endpoints based on lists from the experiments.
At the beginning of the operational push, the monitoring showed 36 problematic endpoints for Atlas and a fraction of the LHCb ones (the exact number was not recorded). After a period
of ticketing the sites, during which
79 GGUS tickets
were issued and followed up, the situation (at time of writing) shows 7 problematic endpoints for Atlas, and 2 for LHCb. All sites have been notified
about problems detected.
After cleaning up a number of configuration issues related to sites, to the experiment configuration databases, and to the monitoring, most of the effort was dedicated to tackling instability in the endpoints.
A number of issues in DPM configuration were uncovered and advice on how to properly support HTTP was summarised and circulated.
During the life of the TF, the monitoring was based purely on ETF. It was decided that once this system became the source of data for production monitoring via SAM3, responsibility for "HTTP operations" would pass to the experiments. This has now happened.
ETF Monitoring;
Atlas -
https://etf-atlas-prod.cern.ch/etf/check_mk/index.py?start_url=%2Fetf%2Fcheck_mk%2Fview.py%3Fview_name%3Dservicedesc%26service%3Dwebdav.HTTP-All-%2Fatlas%2FRole%253Dproduction
LHCb -
https://etf-lhcb-prod.cern.ch/etf/check_mk/index.py?start_url=%2Fetf%2Fcheck_mk%2Fview.py%3Fview_name%3Dservicedesc%26service%3Dwebdav.HTTP-All-%2Flhcb%2FRole%253Dproduction
Site View;
Atlas -
http://wlcg-mon.cern.ch/dashboard/request.py/siteviewhistory?columnid=1337&debug=false
LHCb -
http://wlcg-mon.cern.ch/dashboard/request.py/siteviewhistory?columnid=1318&debug=false
The future
To fully profit from the monitoring now in place, the experiments will have to enable vo-feeds which explicitly identify HTTP as a separate service and list the relevant endpoints. This will enable them to integrate HTTP endpoints into their standard operations and thus maintain or improve the stability of the infrastructure.
--
OliverKeeble - 2016-04-29