NCG Overview

NCG is a three phase configuration generator for automation of the production of monitoring configurations for Grid sites and their services. The "N" in NCG originally stood for "Nagios", since this was the original target monitoring system. NCG now has more general application through a modular, extensible design with Nagios being the default target.

The three phases of NCG are as follows -

  1. Gather information about all hosts and Grid services associated with a named Site. This is referred to as topology information gathering and is primarily derived from the Information System(BDII), the SAM database and locally site-defined data held in text files.
  2. Merge the topology information with data defining tests (probe description database) which are appropriate for gathering metrics of the state of each type of Grid service. After this merging a complete map of the site monitoring system is available.
  3. Use the output map from the two phases to generate configuration files for a specific target monitoring tool (e.g. Nagios). Conceptually, this is the only phase which is dependent on the target tool.

The following sections cover manual installation of a Nagios Grid Service monitor. They cover simple installation from scratch so will have to be adapted by administrators with existing Nagios installations. Please send feedback to egee3-operations-automation-discuss@cernNOSPAMPLEASE.ch (register here) or file a Savannah bug.

Details on how to install Nagios along with NCG via Yaim modules is available here.

Probe type options

NCG distinguishes between three classes of probe: local and remote.

In this context remote means probes which are executed against your site services by some external agent. Two such external agents are the central SAM grid monitoring services and the network monitoring probes run by the ENOC. Both these central services test site services and publish the results through well defined interfaces. Configuring your site monitoring to use remote probes means your site monitoring fetches the results from the central service and displays or acts upon the state determined by the external agent. In the case of Nagios these remote results are displayed as passive service checks .

Local probes are tests which a site monitoring service schedules itself and result in some interaction between the monitoring service and the monitored grid service, generally through the execution of a command on a User Interface to check some functionality against and expected result or testing at a lower level of, for instance, a service listening on a specific port. In the Nagios sense, local probes are displayed as active service checks .

Using a remote gLite UI via NRPE

Local probes require User Interface (UI) middleware to be deployed on Nagios server. If this is not an option, admins can use existing UI node on site for running local probes. In the following sections specific actions needed in case when such NRPE UI is used will be emphasized.

RPM Installation Repositories

In order to install the egee-SA1 repository, create a file with the following contents in /etc/yum.repos.d/egee-SA1.repo:

[egee-SA1]
name=EGEE SA1 software
baseurl=http://www.sysadmin.hep.ac.uk/rpms/egee-SA1/sl4/$basearch/
enabled=1
gpgcheck=0

NCG Configuration

Run WLCG Nagios configuration generator:

ncg.pl
By default the output is stored as a set of Nagios configuration files created in the directory /etc/nagios/wlcg.d.

Important: each time site configuration changes (e.g. new services are added, hosts are removed) it is necessary to rerun ncg.pl and restart nagios service (/etc/init.d/nagios restart).

Input to the configurator is taken by default from the file /etc/ncg/ncg.conf which must be edited by the site administrator to reflect the desired input. The location of the config input file can be changed using the --config option. To view valid all options run: ncg.pl --help. You will at least have to specify your site name as SITENAME. The following paragraphs describe the format of the configuration file and how to access information about the valid input parameters.

Config file ncg.conf uses the same structure as Apache HTTP Server configuration and thus provides for the setting of 'global' and module-specific parameters according to markup sections which correspond to the perl modules inside NCG. A little knowledge of perl does help in understanding the structure of the input but not a prerequisite. For example, consider the following snippet taken from ncg.conf (note this does not define a complete configuration):

GLITE_VERSION=3.1.0
<NCG::ConfigGen>
  <Nagios>
    MYPROXY_SERVER=${MYPROXY_SERVER}
    PROBES_TYPE=remote
  </Nagios>
</NCG::ConfigGen>
In the above snippet, the global parameter GLITE_VERSION is given the value '3.1.0'. Other parameters are specified for the module NCG::ConfigGen::Nagios with the value for MYPROXY_SERVER being taken from the same-named environment variable. Each module has a separate section, allowing for a flexible and modular configuration. Example configurations are included in the distribution in the directory /etc/ncg and are generally self documenting.

In addition to the global section the following module sections are defined:

  • Topology -
    • NCG::SiteInfo - controls the gathering of information describing a Site's hosts and services
    • NCG::LocalRules - controls the local manipulation of the configuration by the addition or removal of hosts, services, contact information etc.
  • Probe Description (it is not expected that the average user using default installations should have to change or configure probe descriptions) -
    • NCG::LocalMetrics - defines metrics in terms of probe to be used, attributes to pass to the probe, VO and other metric dependencies
    • NCG::LocalMetricSets - controls which sets of metrics are appropriate to test a specific Grid service (sometimes called node type). In effect this defines a grouping of sub-services.
    • NCG::LocalMetricsAttrs - controls the gathering of variable metric attributes (e.g. actual service port number used) from information sources such as the information system or by applying service specific heuristics.
    • NCG::RemoteMetrics - controls the extraction of the lists of remote (off-site/central services which probe the site services) metrics available for the site services.
  • Configuration Generation -
    • NCG::ConfigGen - controls the final phase of configuration generation for a specific monitoring tool

The NCG configurator is written in perl and each module is self-documenting. Additional information describing available keyword parameters and local file formats can be found by using the perldoc utility.

  1. Example to see information about the file format for defining local metrics use the command perldoc NCG::LocalMetrics::File
  2. Example to see information about generating configuration for Nagios use the command perldoc NCG::ConfigGen::Nagios

Specifying a full path in the examples is also possible. e.g. perldoc /usr/lib/perl/vendor_perl/5.8.5/NCG/LocalMetrics/File.pm.

Look in GridMonitoringNcgRecipes for examples on using NCG configuration.

By default for Nagios, NCG generates the following configuration files in /etc/nagios/wlcg.d : commands.cfg contacts.cfg  hosts.cfg  services.cfg  wlcg.nagios.cfg. Where an existing Nagios installation exists and local configuration manipulation via /etc/ncg/ncg.conf or referenced files is insufficient to allow proper integration of the Grid service monitoring without changes the separation of the output configuration more easily allows the possibility of further local customisation.

Multi site configuration

NCG starting from version 0.9.12-0 supports generating configuration for monitoring multiple sites with a single Nagios instance. Currently the only supported mechanism is to list sites in a static file.

  • Add NCG::SiteSet::File section to /etc/ncg/ncg.conf:
<NCG::SiteSet>
  <File>
    DB_FILE=<LOCAL_FILE_CONFIG>
  </File>
</NCG::SiteSet>

  • Comment global definition of variable BDII in file /etc/ncg/ncg.conf. This is required because of the following bug.

  • Add list of sites to file :
    • If site is present in SAM (module NCG::SiteInfo::SAM is included):
SITE!<sitename1>
SITE!<sitename2>
...
    • If site is not present in SAM (only module NCG::SiteInfo::LDAP is used):
SITE_BDII!<sitename1>!<site1_bdii>
SITE_BDII!<sitename2>!<site2_bdii>
...

  • In case that sites are not present in SAM make sure that ADD_HOSTS variable in NCG::SiteInfo::LDAP is switched on:
<NCG::SiteInfo>
  <LDAP>
    ...
    ADD_HOSTS=1
  </LDAP>
</NCG::SiteInfo>

  • In case you had an existing single site installation manually remove config directory /etc/nagios/wlcg.d.

  • Rerun ncg.pl

Static file rules

File modules enable users to modify configuration generated from various information sources. By using this method users can add hosts and services which are not published in information system, tune generated config or completely remove hosts and services which they don't want to be monitored. Below is the full list of rules which can be used in static files. Description of rules is divided in blocks, but all rules can be listed in the same file in any order.

NCG::SiteSet rules are applied to list of sites gathered by other NCG::SiteSet modules:

  1. ADD_SITE!sitename: add new site to multisite configuration
  2. ADD_SITE_BDII!sitename!site_bdii_address: add new site with the defined BDII address to multisite configuration
  3. REMOVE_SITE!sitename: remove site from the multisite configuration, useful if you don't want to monitor site which is defined in external information source.

NCG::SiteInfo rules are applied to individual sites:

  1. ADD_HOST_SERVICE_VO!host!service!VO: add service for defined VO to the defined host
  2. ADD_HOST_SERVICE!host!service: add service to the defined host
  3. REMOVE_HOST!host: remove host from the site
  4. REMOVE_SERVICE!service: removes service from all hosts gathered by other SiteInfo modules
  5. REMOVE_HOST_SERVICE!host!service: remove service from the defined host
  6. ADD_LB!host!node: add load balancing node for the defined host
  7. REMOVE_LB!host!node: remove load balancing node for the defined host
  8. SITE_COUNTRY!country: define site's country
  9. SITE_GRID!grid: defines to which grid does site belong
  10. SITE_PARENT!router.fqdn or SITE_PARENT!router.fqdn!router.ip: define site's border router

NCG::LocalMetrics rules are applied to individual sites:

  1. SERVICE_METRIC!metricset!metric: define new metric for defined service
  2. METRIC_PROBE!metric!probe: define probe for the new metric
  3. METRIC_METRICSET!metric!metricset: define metricset for the new metric
  4. METRIC_DOCURL!metric!url: url with probe documentation
  5. METRIC_NATIVE!metric!native: native probe (this must be defined)
  6. METRIC_CONFIG!metric!config!value: set config parameter for metric (e.g. timeout) (can be defined multiple times)
  7. METRIC_DEPENDENCY!metric!metricParent!value: set metricParent which which metric depend on (can be defined multiple times)
  8. METRIC_ATTRIBUTE!metric!attribute!value: define which attribute metric requires
  9. METRIC_FLAG!metric!flag: define flag for the new metric (can be defined multiple times)
  10. METRIC_PARENT!metric!parent: define parent for the new metric (if the metric has one)
  11. REMOVE_HOST_SERVICE_METRIC!host!service!metric: remove single metric for defined host and service
  12. REMOVE_SERVICE_METRIC!service!metric: remove single metric for defined service from all hosts gathered by other SiteInfo modules
  13. REMOVE_HOST_METRIC!host!metric: remove single metric for defined host
  14. REMOVE_METRIC!service!metric: remove single metric from all hosts gathered by other SiteInfo modules

NCG::LocalMetricsAttrs rules are applied to individual sites:

  1. ATTRIBUTE!name!value: define global attribute
  2. HOST_ATTRIBUTE!host!name!value: define attribute for defined host
  3. SERVICE_ATTRIBUTE!service!name!value: define attribute for each host which contains defined service

Troubleshooting

There is a list of FAQs related to Nagios and NCG maintained here.

Edit | Attach | Watch | Print version | History: r7 < r6 < r5 < r4 < r3 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r7 - 2010-05-18 - DavidCollados
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    EGEE All webs login

This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright & by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Ask a support question or Send feedback