Grid Service Monitoring Tool Summary

This document contains links and summary information on a wide range of monitoring tools

Tool Categories

We split the tools into the following three categories according to their main function, but note that much overlap occurs between the categories

Fabric Monitoring

Fabric Monitoring

Nagios

  • What they say : "Nagios is a host and service monitor designed to inform you of network problems before your clients, end-users or managers do. It has been designed to run under the Linux operating system, but works fine under most *NIX variants as well. The monitoring daemon runs intermittent checks on hosts and services you specify using external "plugins" which return status information to Nagios. When problems are encountered, the daemon can send notifications out to administrative contacts in a variety of different ways (email, instant message, SMS, etc.). Current status information, historical logs, and reports can all be accessed via a web browser." [From http://www.nagios.org/about/ ]

Nagios usage at EGEE CEE ROC

LEMON

  • What they say : "Lemon is a server/client based monitoring system. On every monitored node, a monitoring agent launches and communicates using a push/pull protocol with sensors which are responsible for retrieving monitoring information. The extracted samples are stored on a local cache and forwarded to a central Measurement Repository using UDP or TCP transport protocol with or without authentication/encryption of data samples. Sensors can collect information on behalf of remote entities like switches or power supplies. The Measurement Repository can interface to a relational database or a flat-file backend for storing the received samples. Web based interface is provided for visualizing the data. Lemon is part of the ELFms toolsuite, which includes as well quattor and LEAF. (It has no functional dependencies on neither quattor nor LEAF.)" [from http://lemon.web.cern.ch/lemon/index.shtml]]

LEMON Usage at CERN

Ganglia

  • What they say : "Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids. It is based on a hierarchical design targeted at federations of clusters. It leverages widely used technologies such as XML for data representation, XDR for compact, portable data transport, and RRDtool for data storage and visualization. It uses carefully engineered data structures and algorithms to achieve very low per-node overheads and high concurrency. The implementation is robust, has been ported to an extensive set of operating systems and processor architectures, and is currently in use on thousands of clusters around the world." [from http://www.ganglia.info]

Service Availability

SLS

  • What they say : "To provide a web-based tool that dynamically shows availability, basic information and/or statistics about IT services, as well as dependencies between them."

SAME

  • Links: SFTDevel

  • What they say :

Passive monitoring reporting

GridView

  • Links :

  • What they say :

IC RTM

-- JamesCasey - 20 Feb 2007

Edit | Attach | Watch | Print version | History: r10 | r5 < r4 < r3 < r2 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r3 - 2007-02-21 - JamesCasey
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback