Grid Service Monitoring Tool Summary

This document contains links and summary information on a wide range of monitoring tools

Tool Categories

We split the tools into the following three categories according to their main function, but note that much overlap occurs between the categories

Tools

Nagios

  • What they say : "Nagios is a host and service monitor designed to inform you of network problems before your clients, end-users or managers do. It has been designed to run under the Linux operating system, but works fine under most *NIX variants as well. The monitoring daemon runs intermittent checks on hosts and services you specify using external "plugins" which return status information to Nagios. When problems are encountered, the daemon can send notifications out to administrative contacts in a variety of different ways (email, instant message, SMS, etc.). Current status information, historical logs, and reports can all be accessed via a web browser." [ From http://www.nagios.org/about/ ]

Nagios usage at EGEE CEE ROC

LEMON

  • What they say : "Lemon is a server/client based monitoring system. On every monitored node, a monitoring agent launches and communicates using a push/pull protocol with sensors which are responsible for retrieving monitoring information. The extracted samples are stored on a local cache and forwarded to a central Measurement Repository using UDP or TCP transport protocol with or without authentication/encryption of data samples. Sensors can collect information on behalf of remote entities like switches or power supplies. The Measurement Repository can interface to a relational database or a flat-file backend for storing the received samples. Web based interface is provided for visualizing the data. Lemon is part of the ELFms toolsuite, which includes as well quattor and LEAF. (It has no functional dependencies on neither quattor nor LEAF.)" [ from http://lemon.web.cern.ch/lemon/index.shtml ]

LEMON Usage at CERN

Ganglia

  • What they say : "Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids. It is based on a hierarchical design targeted at federations of clusters. It leverages widely used technologies such as XML for data representation, XDR for compact, portable data transport, and RRDtool for data storage and visualization. It uses carefully engineered data structures and algorithms to achieve very low per-node overheads and high concurrency. The implementation is robust, has been ported to an extensive set of operating systems and processor architectures, and is currently in use on thousands of clusters around the world." [from http://www.ganglia.info ]

SLS

  • What they say : "To provide a web-based tool that dynamically shows availability, basic information and/or statistics about IT services, as well as dependencies between them."

SAM

  • What they say : "The monitoring of grid sites in production/preproduction via SFT (Site Functionality Test) has been replaced by a new monitoring tool, by SAM. SAM stands for Service Availability Monitor."

GridICE

  • What they say : "GridICE is a distributed monitoring tool designed for Grid systems. It promotes the adoption of de-facto standard Grid Information Service interfaces, protocols and data models. Further, different aggregations and partitions of monitoring data are provided based on the specific needs of different users categories (VO, GOC, Site). Being able to start from summary views and to drill down to details, it is possible to verify the composition of virtual pools or to sketch the sources of problems. A complete history of monitoring data is also maintained to deal with the need for retrospective analysis." [from http://gridice.forge.cnaf.infn.it/ ]

GridICE monitored domains:

GStat

  • What they say :

GridView

  • What they say : "Gridview is a monitoring and visualization tool being developed to provide a high level view of various functional aspects of the Worldwide LHC Computing Grid (LCG). Currently it shows the statistics of data transfers, jobs running and service availability information for the WLCG."

IC Real Time Monitor

  • What they say: "In 2007 the Large Hadron Collider will be turned on in Switzerland. This will be the worlds largest particle accelerator and will need massive computing power. GridPP is a UK collaboration of particle physicists building the British part of the Grid which will provide this computing power. The Grid's computer resources are distributed around the world and GridPP gathers data from various sources about all of the sites on the Grid. This data collection means we can monitor the Grid working in real time."
-- JamesCasey - 20 Feb 2007
Edit | Attach | Watch | Print version | History: r10 < r9 < r8 < r7 < r6 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r10 - 2007-10-28 - JamesCasey
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback