SSB abstract for EGI CF 2012

Title

Site Status Board: a flexible monitoring system developed in close collaboration with user communities.

Overview

Development of highly customizable and flexible solutions requires close collaborative work between developers and the user community. Gathering user requirements and understanding user needs helps developers to provide common, but highly customizable solutions,that will fit the needs of different groups of users. One such example of successful collaborative development is Dashboard Site Status Board (SSB) framework, which allows Virtual Organizations (VO) to monitor their computing activities at distributed sites and to evaluate site performance from the VO perspective.

Description

Collaborative development proved to be a key of the success of the SSB which is heavily used by the LHC VOs for the computing shifts and site commissioning activities. The selection, significance and combination of monitoring metrics fall clearly in the domain of the VO administrators. Therefore VO administrators and computing teams define monitoring metrics and custom views of the monitoring data, in addition to developing sensors and data publishers. The responsibilities of the SSB team include development and support of the SSB framework and the SSB services which store, aggregate and visualize monitoring data. The collaboration extends beyond the customization of metrics and views to the development of new functionality and visualizations. SSB Developers and VO administrators cooperate closely to ensure that requirements are met and, wherever possible, new functionality is pushed upstream to benefit all users and VOs.

The contribution covers the evolution of SSB over recent years to satisfy diverse use cases through this collaborative development process.

Impact

The Dashboard SSB is intensively used by Atlas and CMS for the distributed computing shifts, for estimating data processing and data transfer efficiencies at a particular site, and for implementing automatic exclusion of sites from computing activities, in event of problems. Atlas administrators have defined 30 views in which they monitor ~200 metrics and CMS administrators have defined 8 views with ~100 metrics. In the recent years 93 million records were collected for CMS and 56 millions for Atlas. Atlas and CMS SSB web services have 100-250 unique users per service per week.

Conclusions

Close collaboration between SSB developers and VO administrators has led to creation of a successful, highly customizable, easy to configure and flexible solution for monitoring of the computing activities at distributed sites. The SSB framework was designed in a generic way which allows it to be easily adapted to the needs of different Virtual Organizations.

Track classification

Users and communities

Comments

(None).

-- IvanDzhunov - 17-Nov-2011

Edit | Attach | Watch | Print version | History: r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r1 - 2011-11-17 - IvanDzhunov
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback