Difference: HTTPTFSummary (1 vs. 5)

Revision 52016-08-04 - OliverKeeble

Line: 1 to 1
 
META TOPICPARENT name="HTTPDeployment"

Summary of the HTTP Deployment TF's Activities

Line: 24 to 24
 After cleaning up a number of configuration issues related to sites, to the experiment configuration databases, and to the monitoring, most of the effort was dedicated to tackling instability in the endpoints. A number of issues in DPM configuration were uncovered and advice on how to properly support HTTP was summarised and circulated.
Added:
>
>

Monitoring

 During the life of the TF, the monitoring was based purely on ETF. It was decided that once this system became the source of data for production monitoring via SAM3, responsibility for "HTTP operations" would pass to the experiments. This has now happened.
Changed:
<
<
ETF Monitoring;
>
>

ETF Monitoring

  Atlas - https://etf-atlas-prod.cern.ch/etf/check_mk/index.py?start_url=%2Fetf%2Fcheck_mk%2Fview.py%3Fview_name%3Dservicedesc%26service%3Dwebdav.HTTP-All-%2Fatlas%2FRole%253Dproduction

LHCb - https://etf-lhcb-prod.cern.ch/etf/check_mk/index.py?start_url=%2Fetf%2Fcheck_mk%2Fview.py%3Fview_name%3Dservicedesc%26service%3Dwebdav.HTTP-All-%2Flhcb%2FRole%253Dproduction

Changed:
<
<
Site View;
>
>

Site View

  Atlas - http://wlcg-mon.cern.ch/dashboard/request.py/siteviewhistory?columnid=1337&debug=false

Revision 42016-05-11 - OliverKeeble

Line: 1 to 1
 
META TOPICPARENT name="HTTPDeployment"

Summary of the HTTP Deployment TF's Activities

Line: 22 to 22
 about problems detected.

After cleaning up a number of configuration issues related to sites, to the experiment configuration databases, and to the monitoring, most of the effort was dedicated to tackling instability in the endpoints.

Changed:
<
<
A number of issues in DPM configuration were uncovered and advice on how to properly support HTTP was circulated.
>
>
A number of issues in DPM configuration were uncovered and advice on how to properly support HTTP was summarised and circulated.
 
Changed:
<
<
While TF monitoring was based purely on ETF, it was decided that one this system became the source of data for production monitoring via SAM3, responsibility would pass to the experiments. This has now happened.
>
>
During the life of the TF, the monitoring was based purely on ETF. It was decided that once this system became the source of data for production monitoring via SAM3, responsibility for "HTTP operations" would pass to the experiments. This has now happened.
 
Changed:
<
<
ETF Monitoring
>
>
ETF Monitoring;
  Atlas - https://etf-atlas-prod.cern.ch/etf/check_mk/index.py?start_url=%2Fetf%2Fcheck_mk%2Fview.py%3Fview_name%3Dservicedesc%26service%3Dwebdav.HTTP-All-%2Fatlas%2FRole%253Dproduction

Revision 32016-05-11 - OliverKeeble

Line: 1 to 1
 
META TOPICPARENT name="HTTPDeployment"

Summary of the HTTP Deployment TF's Activities

Changed:
<
<
The HTTP Deployment TF's activities fell into two areas;
>
>
The HTTP Deployment TF's activities fell into two areas, as described below.
 

Policy and advice on HTTP for WLCG

Changed:
<
<
Here the TF created documents and tools to guide adoption of HTTP within WLCG
>
>
The TF created documents and tools to guide adoption of HTTP within WLCG
 
Changed:
<
<
  • For sites - HTTPTFSAMProbe
>
>
 

Operational support for HTTP deployment

The TF created a Nagios probe for use with the SAM/Nagios framework and, with the help of the monitoring team, began to monitor endpoints based on lists from the experiments.

At the beginning of the operational push, the monitoring showed 36 problematic endpoints for Atlas and a fraction of the LHCb ones (the exact number was not recorded). After a period

Changed:
<
<
of ticketing the sites, during which 79 GGUS tickets were issued and followed up, the situation (at time of writing) shows 7 problematic endpoints for Atlas, and 2 for LHCb. All sites have been notified
>
>
of ticketing the sites, during which 79 GGUS tickets were issued and followed up, the situation (at time of writing) shows 7 problematic endpoints for Atlas, and 2 for LHCb. All sites have been notified
 about problems detected.

After cleaning up a number of configuration issues related to sites, to the experiment configuration databases, and to the monitoring, most of the effort was dedicated to tackling instability in the endpoints.

Line: 26 to 26
  While TF monitoring was based purely on ETF, it was decided that one this system became the source of data for production monitoring via SAM3, responsibility would pass to the experiments. This has now happened.
Added:
>
>
ETF Monitoring

Atlas - https://etf-atlas-prod.cern.ch/etf/check_mk/index.py?start_url=%2Fetf%2Fcheck_mk%2Fview.py%3Fview_name%3Dservicedesc%26service%3Dwebdav.HTTP-All-%2Fatlas%2FRole%253Dproduction

LHCb - https://etf-lhcb-prod.cern.ch/etf/check_mk/index.py?start_url=%2Fetf%2Fcheck_mk%2Fview.py%3Fview_name%3Dservicedesc%26service%3Dwebdav.HTTP-All-%2Flhcb%2FRole%253Dproduction

Site View;

Atlas - http://wlcg-mon.cern.ch/dashboard/request.py/siteviewhistory?columnid=1337&debug=false

LHCb - http://wlcg-mon.cern.ch/dashboard/request.py/siteviewhistory?columnid=1318&debug=false

 

The future

Changed:
<
<
To fully profit from the monitoring now in place, the experiments will have to enable vo-feeds which explicitly identify HTTP as a separate service and list the relevant endpoints.
>
>
To fully profit from the monitoring now in place, the experiments will have to enable vo-feeds which explicitly identify HTTP as a separate service and list the relevant endpoints. This will enable them to integrate HTTP endpoints into their standard operations and thus maintain or improve the stability of the infrastructure.
 

Revision 22016-05-11 - OliverKeeble

Line: 1 to 1
 
META TOPICPARENT name="HTTPDeployment"
Deleted:
<
<
 

Summary of the HTTP Deployment TF's Activities

Added:
>
>
The HTTP Deployment TF's activities fell into two areas;

Policy and advice on HTTP for WLCG

Here the TF created documents and tools to guide adoption of HTTP within WLCG

Operational support for HTTP deployment

The TF created a Nagios probe for use with the SAM/Nagios framework and, with the help of the monitoring team, began to monitor endpoints based on lists from the experiments.

At the beginning of the operational push, the monitoring showed 36 problematic endpoints for Atlas and a fraction of the LHCb ones (the exact number was not recorded). After a period of ticketing the sites, during which 79 GGUS tickets were issued and followed up, the situation (at time of writing) shows 7 problematic endpoints for Atlas, and 2 for LHCb. All sites have been notified about problems detected.

After cleaning up a number of configuration issues related to sites, to the experiment configuration databases, and to the monitoring, most of the effort was dedicated to tackling instability in the endpoints. A number of issues in DPM configuration were uncovered and advice on how to properly support HTTP was circulated.

While TF monitoring was based purely on ETF, it was decided that one this system became the source of data for production monitoring via SAM3, responsibility would pass to the experiments. This has now happened.

The future

To fully profit from the monitoring now in place, the experiments will have to enable vo-feeds which explicitly identify HTTP as a separate service and list the relevant endpoints.

 
Added:
>
>

 

-- OliverKeeble - 2016-04-29 \ No newline at end of file

Revision 12016-04-29 - OliverKeeble

Line: 1 to 1
Added:
>
>
META TOPICPARENT name="HTTPDeployment"

Summary of the HTTP Deployment TF's Activities

-- OliverKeeble - 2016-04-29

 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback