My Nagios Deployment History

This page collects a bunch of information created for personal use. This is NOT an official document on Nagios or Nagios installation at CERN

Installation and configuration

For the installation I used a single virtual machine, and followed the instruction at GridMonitoringNcgYaim. YAIM terminated successfully. Some of the link to the repo files are broken, it is advisable to use yaimgen to install the UI and then separately Nagios.

Issues

  • The yaim function config_nrpe_share fails when NCG_NRPE_OUTPUT_DIR is not set.

  • Nagios is not started when configuring it with Yaim.

The httpd and nagios service are correctly running.

The file /var/log/httpd/error_log has:
[Tue Apr 07 09:31:53 2009] [error] [client 127.0.0.1] Directory index forbidden by rule: /var/www/html/
The httpd server answers correctly to http requests but there are problems with https:

[root@vtb-generic-80 yum.repos.d]# curl http://localhost/
HELLO GIANNI!
[root@vtb-generic-80 yum.repos.d]# curl https://localhost/
curl: (60) SSL certificate problem, verify that the CA cert is OK. Details:
error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed
More details here: http://curl.haxx.se/docs/sslcerts.html

This problem has been solved appending the BitFace CA certificate to the file /usr/share/ssl/certs/ca-bundle.crt and adding the line

SSLCACertificateFile /usr/share/ssl/certs/ca-bundle.crt
to the file /etc/httpd/conf.d/ssl.conf. This line was removed from that file by YAIM during the nagios configuration. Yet, the curl https://localhost/ test gives the same error. At this point one should be able to see the Nagios web interface at: https://SERVER_NAME/nagios/.

Monitoring a Linux machine with native checks

Using NRPE

For installing and using NRPE the following document has been used NRPE2.0. Thanks to Ethan Galstad for writing such a clear introduction!

Issues

  • NRPE configuration: on a SLC4 machine, where the FTS service was installed, the configuration failed because the C compiler was missing. It has been installed with 'yum install gcc'. Then the configuration script failed for missing SSL headers, they have been installed with 'yum install openssl-devel'.
  • iptables configuration: if you get the following error when inserting a rule in the iptables chain:
[root@lxbra2310 nrpe-2.12]# iptables -I INPUT -p tcp -m tcp --dport 5666 -j accept
iptables v1.2.11: Couldn't load target `accept':/lib/iptables/libipt_accept.so: cannot open shared object file: No such file or directory
Try `iptables -h' or 'iptables --help' for more information.
you need to change '-j accept' into '-j ACCEPT'.
  • iptables: the Nagios host cannot execute the check_nrpe on the remote host:
[root@vtb-generic-69 nrpe-2.12]# /usr/local/nagios/libexec/check_nrpe -H 128.142.182.87   
Connection refused by host

After re-executing the previous iptables -I command, the problem disappeared, now the remote host is correctly contacted:

[root@vtb-generic-69 nrpe-2.12]# /usr/local/nagios/libexec/check_nrpe -H 128.142.182.87
NRPE v2.12

I found out that on the monitored machine there is cron job that runs hourly with the purpose of maintaining a certain configuration of the firewall, the right setup for a production environment has to be clarified.

After this, the check_nrpe!check_load service has been added to the object definition for the remote host and it worked fine. The service details window in Nagios looks like the following picture: Screnshot-1.png

Using specific tests without proxy

For this test we used the FTS-basic tests available from the certification tests repository. The bash script FTS-basic check the host, the Tomcat server and the LDAP server. For this test the test script has been copied to the /tmp directory and owned by the group nagioscmd.

At this point, the object created to manage the FTS host checks is fts32.cfg

Using specific tests that require a proxy

Most of the tests used to monitor grid services need a VOMS proxy in order to execute command from a UI. As a first test, I used the FTS-service script which check some FTS properties using the CLI, for which you need a valid proxy. The proxy file has been created using the nagios account (test_user key/cert owned by nagios):
[root@vtb-generic-69 ~]# ls -ltr /tmp/x509up_u100
-rw-------  1 nagios nagios 6415 May 28 10:55 /tmp/x509up_u100
The test script is in /tmp. After testing the script from the nagios account manually, I updated the object file fts32.cgf to include:
define command{
        command_name    FTS-services
        command_line     /tmp/FTS-services --site cert-tb-cern --fts $HOSTADDRESS$ --bdii lxbra2305.cern.ch
}
define service{
        use generic-service
        host_name       lxbra2310.cern.ch
        service_description      FTS service checks
        check_command   FTS-services
}

The following screenshot shows the successful execution of the check: Screnshot-3.png

In a production installation, the proxy used by nagios has to be periodically renewed. NCG (see below) provides a script to do this using the MyProxy server.

Using NCG

NCG is the Nagios configuration generator. It reads site specific information from a BDII and produces Nagios configuration files to monitor the resources published in the BDII for that site. The NCG installation is specified here: GridMonitoringNcgYaim. The NCG installation has been tried on a new virtual machine, vtb-generic-95.

Issues

  • The yaim function config_nrpe_share fails when NCG_NRPE_OUTPUT_DIR is not set.
  • The default ncg.conf works, but to automatically add hosts found in the BDII you have to set ADD_HOST=1 in the NCG::SiteInfo::LDAP module, restart ncg.pl and the Nagios daemon.

At this point Nagios shows in the web interface all the hosts found in the BDII with the CERN site name: Screnshot-2.png

Tech Corner

This section collects some technical notes/tips about Nagios collected while reading various docs.

  • A service, in Nagios language, is always a host,service pair. Therefore, you can have two service definition with the same name and different hosts.
  • The file resource.cfg, readable only by nagios, is a good place to store passwords defined as macro.

-- GianniPucciani - 07 Apr 2009

Topic attachments
I Attachment History Action Size Date Who Comment
PNGpng Screenshot-3.png r1 manage 55.8 K 2009-05-28 - 11:08 GianniPucciani  
PNGpng Screenshot2.png r1 manage 87.3 K 2009-04-21 - 16:58 GianniPucciani  
Unknown file formatcfg fts32.cfg r1 manage 1.2 K 2009-04-21 - 13:35 GianniPucciani  
PNGpng screenshot1.png r1 manage 61.3 K 2009-04-20 - 17:11 GianniPucciani service details nagios screenshot
Edit | Attach | Watch | Print version | History: r15 < r14 < r13 < r12 < r11 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r15 - 2009-05-29 - GianniPucciani
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback