Summary of the HTTP Deployment TF's Activities
The HTTP Deployment TF's activities fell into two areas;
Policy and advice on HTTP for WLCG
Here the TF created documents and tools to guide adoption of HTTP within WLCG
Operational support for HTTP deployment
The TF created a Nagios probe for use with the SAM/Nagios framework and, with the help of the monitoring team, began to monitor endpoints based on lists from the experiments.
At the beginning of the operational push, the monitoring showed 36 problematic endpoints for Atlas and a fraction of the LHCb ones (the exact number was not recorded). After a period
of ticketing the sites, during which 79 GGUS tickets were issued and followed up, the situation (at time of writing) shows 7 problematic endpoints for Atlas, and 2 for LHCb. All sites have been notified
about problems detected.
After cleaning up a number of configuration issues related to sites, to the experiment configuration databases, and to the monitoring, most of the effort was dedicated to tackling instability in the endpoints.
A number of issues in DPM configuration were uncovered and advice on how to properly support HTTP was circulated.
While TF monitoring was based purely on ETF, it was decided that one this system became the source of data for production monitoring via SAM3, responsibility would pass to the experiments. This has now happened.
The future
To fully profit from the monitoring now in place, the experiments will have to enable vo-feeds which explicitly identify HTTP as a separate service and list the relevant endpoints.
--
OliverKeeble - 2016-04-29