This page documents the tools we use for configuring Nagios headnodes. There are two specific usecases
  • Configuring Nagios for a single VO (ie those instances managed by IT-SDC-MI).
  • Configuring Nagios at a grid site (quite possibly for more than one VO).


Here's how we think this will all work in the future.
  • If a machine is going to use the following approach to installing Nagios, it must have the following:
    • The necessary (vo-specific?) ncg-metric-config and grid-monitoring-probes RPMs installed.
    • A network connection to reach the POEM and VO feed URLs
  • We (will) ship an RPM which contains the parser.rb script (see below) along with a few Ruby template (.erb) files.
  • The parser.rb script itself needs to be run every n-hours. In our usecase, we'll probably use Puppet to manage a cronjob to do this.

Puppet-specific notes

  • In addition to the RPMs described above, you will need to set the nagconf_vo parameter to the VO of your choice in Foreman to trigger the initial installation of NagConf. Currently it's installed simply by copying files from the Puppet files directory, but in the future this will be an RPM install
  • It's possible to configure nagconf to only perfom "hostcheck" tests against a node (ie, is this node up) rather than all of the grid-specific tests, by setting the Foreman parameter nagconf_hostcheck to true



This script does the main work of configuring a Nagios instance. In production environments, this will most likely be executed by a cronjob, but can be called on the command line with the following parameters.
  • --vo: (Mandatory). Specify a single VO or list of comma separated VOs (e.g. atlas,cms)
  • --poem: (Optional). The POEM URL is hardcoded into parser.rb, but can be overridden with this command line option
  • --site: (Optional). If provided, Nagios will only be configured for the specified site's nodes. Use the GOCDB-registered name for the site.
  • --confdir: (Optional). Location to store Nagios configuration. Defaults to /etc/nagios/nagconf

Execution logic

  1. From getPOEM(), get the POEM XML data and store it in a nested hash (@poem_hash). An md5 checksum is calculated of the XML data (and the XML itself stored in a temporary file) so we know if it has changed since the last execution.
  2. For each VO that we want to configure
    1. With getVOfeed(), get the VO feed (as before, calculating an md5 checksum and storing the XML in a file) and create two lists and a hash:
      • The list @all_sites contains a simple list of all sites in the VO feed or, if the --site parameter was passed on the command line, the single site that we're configuring Nagios for. This is only used by template_host.groups.erb file to create the Nagios host.groups.cfg file.
      • The list @all_flavours contains all service flavours for the site(s) we're interested in. It is used to create hostgroups and servicegroups (ie by template_host.groups.erb and template_service.groups.erb).
      • @atp_hash is a nested hash. Each top-level key is a site name, which points to a sub-hash containing mappings between hostnames and their corresponding service flavours. This is used by all .erb files in one way or another.
    2. Run buildNagios() which is responsible for reading in the configuration files provided by the ncg-metric-config and grid-monitoring-probes rpms. This function is also responsible for running the .erb template files to generate the Nagios config.
  3. Once all config files have been generated, the temporary files containing XML are renamed, so that their md5 checksums can be calculated the next time parser.rb runs. This allows us to decide whether the POEM or VO feed data have changed since last execution.

*.erb files

These are Ruby template files. They're basically Ruby (with some addittional, ugly, markup) that is executed to produce (many) Nagios configuration stanzas. It's a very efficient, though not pretty, way of doing this.

*.templates.cfg files

These are template configuration files, used by Nagios (not parser.rb) and stolen from an existing WLCG Nagios installation. They should be included in our own RPM.

To do

  • Hard-code the cms.conf etc locations, rather than use current directory
  • Write to and then validate before restarting nagios - not sure if this is possible (nagios.conf would need to have entry?) How about: move old config to nagconf.old; write new config in nagconf; if nagios -v succeeds great! else move nagconf -> broken and move nagconf.old -> nagconf...
  • Recently implemented multiple VOs. This is probably going to result in duplicated configuration objects in the *.host.cfg files, but will likely only affect site-specific installations.
  • Ability to force a re-generation of the config; this can currently be achieved by deleting the files ncg_poem.tmp and ncg_vofeed*.tmp; in fact, these files should probably also be renamed to nagconf_poem etc
  • There are some cases where the same node is defined to exist at two sites (with the same service). This causes problems because we end up creating duplicate host definitions in Nagios (i.e. multiple SITENAME directories contain the host definition in their host.cfg files). Nagios does not allow this. The current workaround is to only add the host into a host.cfg file once (the first time it's seen). Whether this will result in an unstable nagios config is unclear at the time of writing. Either way, we need to fix this in the long term. Oh, this also means we may end up with some empty host.cfg files (such as for CERN-PROD-AI, CERN-PROD-HLT and CERN-PROD).

This topic: LCG > WebHome > WLCGMonitoringConsolidation > NagConf
Topic revision: r5 - 2013-10-25 - MikeKenyon
This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback