How to validate a ROC or NGI Nagios box


There are 4 different configurations which we cover with the egee-NAGIOS packaging and configuration:

  • Site-Nagios - Monitoring of a site
  • Regional-Nagios - Monitoring of an EGEE ROC
  • National-Nagios - Monitoring at the NGI level
  • Project-Nagios - Central project monitoring

This document covers what you need to do in order to configure a National-Nagios or Regional-Nagios in order that they take over the definitive testing role within EGEE from the Project Nagios instance currently running.

Process for a National or Regional Nagios to get validated

A high level description of the whole process is :

  1. Join tool-admins mailing list. Register here
  2. Register your node as the relevant flavour of Nagios in GOCDB (Regional-Nagios, National-Nagios)
  3. Register for access to the SAM PI (ONLY if you don't want to use ATP as the topology provider for NCG)
    1. open a GGUS ticket
    2. ask in the ticket to get it assigned to 'Nagios' Support Unit
    3. mention in the ticket the IP address of your Nagios instance and that you need access to the SAM PI to configure it.
  4. Install egee-NAGIOS using the relevant configuration below
  5. Publish your GRIS running on your Nagios node into the information system
  6. Raise a GGUS ticket to the Nagios support unit to start the validation process and for this, please start with the steps written below.

Validation Process

The Project level Nagios hosts are listed here: The validation process consists of comparing the setup of a new regional or national Nagios against this current project Nagios instance.

In order to validate your instance, please follow these steps:

  1. Ensure that all the egee-sa1 packages are upgraded. For this, the following query shouldn’t return any data:
    [root~]# repoquery --pkgnarrow=updates --disablerepo=\* --enablerepo=egee-sa1 -qa --queryformat ' yum update %{name} '
  2. Ensure that ncg cron job is executed regularly (every 3 hours in our case):
  3. Check if your services are being tested by the metrics defined in the ROC SAM critical profile, described here:
  4. Once you have done this, open a GGUS ticket to be assigned to the 'Nagios' support unit. Please mention in the ticket which is your ROC/NGI Nagios instance, so we:
    1. add the nagios instance to the ops-monitor nagios ( ) to compare the number of services and hosts with the project level instance
    2. and to compare the status of your services to the ones defined in the central Nagios instance at CERN.
    3. for a ROC to validate an NGI Nagios instance, you should use the ops-monitor nagios
      1. Select Service Groups --> Summary --> SERVICE_NagiosNGIDiff
      2. Select your NGI and then check the ngi.nagios.diff which describes the differences between the services and probes run by your ROC and your NGI. This check compares the metrics defined in the ROC_CRITICAL profile, i.e, it covers the sBDII, SRMv2 and CE services.
      3. Go to your NGI Nagios instance and check those services & metrics to see why they are failing while not in the ROC Nagios instance.
      4. Once all those discrepancies are fixed or understood, the ROC considers that the NGI Nagios instance is validated.

List of ROC Nagios to validate.

Software version

You should have deployed the same version of the components that are running at your corresponding CERN Nagios instance, which are the ones included in the meta package egee-NAGIOS-1.0.0-48.el5.noarch.rpm, available at the egee-SA1 repository: Releases are advertised through the tool-admins mailing list and through the SAM blog


General Configuration details

VO to run the tests as

The tests should be run as the ops VO. We will accept two DNs per ROC or NGI for testing purpose, and you can join the ops VO from here

For ROC Nagios submissions, you should also request the lcgadmin role (/ops/Role=lcgadmin), which is being used for SAM and CERN ROC Nagios submissions.

For NGIs, you should join your corresponding /ops/NGI/* group, so you do not need to use the lcgadmin role.

DNs already registered in OPS

VOs/users to be supported

For the web interface you should support dteam VO at a minimum. This will allow first-level support to access the web interface. ROC and site contacts will be automatically added from the GOCDB, and Nagios will be be configured so that ROC and Site contacts can resubmit jobs.

Topology sources

Currently we use SAM as the source of topology (i.e. site names and services at the site).

Probes to use

In these instances we don't import any remote probe results (e.g. SAM, NPM).

This leads to the following config for all configurations


Specific Configuration details

We consider two different possibilities which each require a slightly different additional configuration in YAIM :

Existing EGEE ROC


A new NGI having sites in a ROC in GOCDB



If you have any questions, do not hesitate to send an email to tool-admins AT
Edit | Attach | Watch | Print version | History: r33 < r32 < r31 < r30 < r29 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r33 - 2011-03-22 - EmirImamagic
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    EGEE All webs login

This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright &© by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Ask a support question or Send feedback