Yaim Based Installation of Nagios & NCG
NCG Overview
An overview of NCG is provided at
GridMonitoringNcgOverview. This document describes an automated installation based on
YAIM and yum.
Tutorial
Configuring the repositories
In order to install via Yaim, you need to add some yum repositories.
SL5 is now only maintained version These are:
Requirements
You need
- a
host certificate
in order to secure the Nagios web portal.
Installing packages
Once this is done, you can install by doing
yum install httpd && yum install egee-NAGIOS lcg-CA
. The explicit httpd is needed since it must be installed before the nagios RPM. The nagios RPM as supplied by DAG has a missing RPM PreRequisite.
All On One Box
-
yum install httpd
-
yum install glite-UI
or yum groupinstall 'glite-UI (production - x86_64)'
-
yum install lcg-CA egee-NAGIOS
-
vi site-info.def
# Configure with below parameters.
-
/opt/glite/yaim/bin/yaim -s /root/site-info.def -c -n glite-UI -n glite-NAGIOS
On Two Boxes
- Box 1 , NAGIOS Host.
-
yum install httpd
-
yum install egee-NAGIOS lcg-CA
-
/opt/glite/yaim/bin/yaim -s /root/site-info.def -c -n glite-NAGIOS
- Box 2, UI and NRPE Node.
- SL4
yum install lcg-CA glite-UI egee-NRPE
- SL5
yum install lcg-CA egee-NRPE && yum groupinstall 'glite-UI (production - x86_64)'
-
/opt/glite/yaim/bin/yaim -s /root/site-info.def -c -n glite-NRPE -n glite-UI
If you plan to have an NRPE node running the local tests on an existing UI then install that with
yum install egee-NRPE
.
YAIM's site-info.def File
The configuration requires you to set the following variables in the
YAIM site-info.def file:
Variable |
Description |
Example |
INSTALL_ROOT |
Location of grid middleware. |
/opt |
SITE_NAME |
the site you wish to monitor |
MY-SITE |
NAGIOS_HOST |
Nagios Hostname |
nagios.example.org |
NCG_NRPE_UI |
UI hostname for running NRPE. This should only be set if using a remote UI. If the UI is on the local box don't set it. |
ui.example.org |
PX_HOST |
MyProxy Server to retrieve a certificate to run local tests under |
myproxy.example.org |
SITE_BDII_HOST |
The site BDII for the monitored site, SITE_NAME |
site-bdii.example.org |
BDII_HOST |
A top level BDII that you can use. |
e.g lcg-bdii.cern.ch |
VOS |
A list of VOs who can view the nagios information. |
"ops dteam alice" |
VO_<VONAME>_VOMS_SERVERS |
URI for the VOMS service. |
vomss://voms.cern.ch:8443/voms/ops?/ops/ |
NAGIOS_ADMIN_DNS |
comma separated list of local admin DNs that can perform actions via the nagios web interface |
"/DC=ch/OU=Users/CN=Dr Kildare,/DC=ch/OU=User/CN=Dr Who" |
MYSQL_ADMIN |
Root password of MySQL |
Unset by default if this is set then MySQL will be configured on the localhost to support NDOUtils. When not set then MySQL and the NDOUtils schema must be loaded by hand outside of YAIM. The easiest option is to set this to a string of your choice. |
NAGIOS_NSCA_PASS |
The shared secret used by NSCA for sending results back to the nagios server via NSCA |
tomato |
ATP_DB_PASS |
The mysql password for the ATP database |
lemon |
ATP_DB_NAME |
The database name for the ATP database |
atp |
MS_DB_PASS |
The mysql password for the metricstore database |
lemon |
MS_DB_NAME |
The database name for the Metric Store database |
metricstore |
MDDB_DB_PASS |
The mysql password for the MDDB database |
lemon |
MDDB_DB_NAME |
The database name for the MDDB database |
mddb |
MYEGEE_DB_TYPE |
Type of the DB (Pdo_Mysql/Oracle) |
Pdo_Mysql |
MYEGEE_DB_USER |
The database username for the MyEGEE portal |
lemon |
MYEGEE_DB_PASS |
The database password for the MyEGEE portal |
lemon |
MYEGEE_DB_SCHEMA |
The database schema of the MyEGEE portal |
atp |
MYEGEE_DB_ATP |
The ATP database name of the MyEGEE portal |
atp |
MYEGEE_DB_MS |
The Metric Store database name of the MyEGEE portal |
metricstore |
MYEGEE_DB_MDDB |
The Metric Description database name of the MyEGEE portal |
mddb |
MYEGEE_DB_HOST |
The MySQL hostaname of the MyEGEE portal. It is only needed when used with MySQL. |
localhost |
MYEGI_ADMIN_NAME |
System administrator name (optional) |
David Horat |
MYEGI_ADMIN_EMAIL |
System administrator email (optional) |
example@exampleNOSPAMPLEASE.com |
MYEGI_DEFAULT_PROFILE |
Default nagios profile (Default value: ROC_CRITICAL) |
ROC_CRITICAL |
MYEGI_DATABASE_ENGINE |
Type of DB (values: mysql/oracle) (default: mysql) |
mysql |
MYEGI_DATABASE_USER |
The database username for the MyEGI portal (default: myegi) |
myegi |
MYEGI_DATABASE_NAME |
The database name to use by the MyEGI portal (default: mrs) |
mrs |
MYEGI_DATABASE_PASSWORD |
The database password for the MyEGI portal (mandatory) |
lemon |
MYEGI_DATABASE_HOST |
The database hostname (default: localhost) |
localhost |
MYEGI_DATABASE_PORT |
The database port (optional) |
3306 |
MYEGI_DEBUG |
Turn on/off debug mode for MyEGI (values: True/False) (default value: False) |
False |
ROC_NAME |
Your ROC name, use GOCDB names. In fact only compulsory if you are a ROC. Cannot contain multiple values. In case of multiple ROCs remove ROC_NAME and use NCG_GOCDB_ROC_NAME. |
no default |
ATP_WEB_SECRET_KEY |
some secret key string based on uuid for ATP web front-end |
no default |
NCG_MDDB_SUPPORTED_PROFILES |
List of supported profiles - is needed by MRS to know profiles for status recalculations. |
ROC,ROC_CRITICAL, ROC_OPERATORS |
The following variables have defaults but you may well want to change them.
Variable |
Default |
Description |
NCG_INCLUDE_EMPTY_HOSTS |
1 |
Show hosts without services associated |
NCG_ENABLE_NOTIFICATIONS (TODO) |
0 |
If set to true nagios will be configured to send notifications |
NCG_NAGIOS_ADMIN (TODO) |
root@localhost |
Email address which will receive notifications for Nagios internal checks (e.g. GridProxy-Get, GridProxy-Valid, MyProxy-ProxyLifetime, org.egee.SendToMsg, etc) |
NAGIOS_MYPROXY_USER |
nagios |
Change the myproxy username, i.e the -l option to myproxy-init, myproxy-login |
MSG_BROKER_CACHE_NETWORK |
PROD |
Set the Broker service to look for in the information system |
MSG_BROKER_CACHE_HOST |
null |
The hostname of broker to hard code to , setting this will disable the variable MSG_BROKER_CACHE_NETWORK and auto discovery of broker |
NAGIOS_HTTPD_ENABLE_CONFIG |
false |
Set true to update apache configuration for X509 auth. Will overwrite /etc/httpd/conf.d/nagios.conf and ssl.conf . If you don't do this you will have to configure apache by hand for X509 certificate authentication |
NAGIOS_NCG_ENABLE_CRON (TODO) |
false |
Set true for YAIM to enable ncg cronjob for rerunning of ncg.pl every 3 hours. |
NAGIOS_NCG_ENABLE_CONFIG |
false |
Set true for YAIM to write /etc/ncg/ncg.conf and execute ncg.pl for you. |
NAGIOS_SUDO_ENABLE_CONFIG |
false |
Set true for YAIM to append to /etc/sudoers to allow nagios to call certain probes as root |
NAGIOS_NAGIOS_ENABLE_CONFIG |
false |
Set true for YAIM to write /etc/nagios/nagios.cfg for you and reload NAGIOS. |
NAGIOS_CGI_ENABLE_CONFIG |
false |
Set true for YAIM to write /etc/nagios/cgi.cfg for you. |
NCG_LDAP_FILTER |
"" |
If set your NAGIOS will not monitor the SITE_NAME specified above but will instead query the top bdii for GlueSite objects that match this. e.g To monitor all sites maintained under the Italian ROC set this value to GlueSiteOtherInfo=EGEE_ROC=ITALY |
NCG_GOCDB_ROC_NAME |
"" |
Set this to a GOCDB ROC name to collect a list of sites from GOCDB within a ROC. e.g CERN. In case of multiple ROCs set space separated list. |
NCG_GOCDB_COUNTRY_NAME |
"" |
Set this to a GOCDB Country name to collect a list of sites from the GOCDB with a country. |
NCG_TOPOLOGY_USE_SAM |
false |
If true, uses SAM for getting services |
NCG_TOPOLOGY_USE_GOCDB |
true |
If true, uses SAM for getting services |
NCG_TOPOLOGY_USE_ENOC |
true |
If true, uses SAM for getting services |
NCG_TOPOLOGY_USE_LDAP |
true |
If true, uses SAM for getting sitenames |
NCG_REMOTE_USE_SAM |
true |
If true, show SAM remote results in Nagios |
NCG_REMOTE_USE_NAGIOS |
false |
If true, show project or ROC remote results in Nagios |
NCG_REMOTE_USE_ENOC |
true |
If true, show ENOC (DownCollector) remote results in Nagios |
NAGIOS_ROLE |
site |
This can be one of Site , ROC , Project , VO and denotes if the nagios is acting in a site, roc or project level monitoring role. |
NCG_PROBES_TYPE |
remote,local,all |
Defines which type of probes should be configured. Local probes are probes executed by the Nagios. Remote probes are probes imported from external systems (e.g. SAM, remote Nagios, ENOC Downcollector). Default is all. |
NCG_VO |
dteam |
List of VOs the tests should run as. A space seperated list e.g "dteam cms lhcb". You must have a member of each VO willing to store a proxy for your retrieval. |
GGUS_SERVER_FQDN |
null |
The hostname of GGUS endpoint, setting this also open GGUS tickets for service notifications |
ATP_WEB_DB_USER |
user name for ATP database. |
no default |
ATP_WEB_DB_PASS |
password for ATP database. |
lemon |
ATP_WEB_DB_NAME |
name for ATP database. |
no default |
ATP_WEB_DB_ENGINE |
database engine for ATP database. |
oracle |
ATP_WEB_DEBUG |
debug flag for ATP web front-end. |
false |
ATP_WEB_TEMPLATE_DEBUG |
template debug flag for ATP web front-end. |
false |
ATP_WEB_VIEW_TEST |
functional-test flag for ATP web front-end. |
false |
ATP_WEB_INTERNAL_IPS |
internal IP setting for ATP web front-end. |
127.0.0.1 |
ATP_WEB_SERVER_EMAIL |
server email-id for ATP web. |
root@localhost |
ATP_WEB_EMAIL_HOST |
email host for ATP web front-end. |
localhost |
ATP_WEB_PREFIX |
server prefix for ATP web front-end. |
localhost |
The following variables are optional and have no default.
Variable |
Default |
Description |
VO_<VONAME>_NCG_VO_FQAN |
no default A comma separated list of VOMS FQANs to run metrics as for a given VO. You must have a member of each VO with the appropriate FQAN willing to store a proxy for your retrieval. |
Defining a Set of Sites
Note setting more than one of
NCG_LDAP_FILTER
,
NCG_GOCDB_ROC_NAME
or
NCG_GOCDB_COUNTRY_NAME
will give the union of the sites.
Co-Existing with Existing Nagios and/or Apache
The 4 variables
NAGIOS_*_ENABLE_CONFIG
above are
false by default but can be set to
true resulting in more
configuration being done for you. Setting all to true is the easiest but may clobber any existing work. When
merging in with an existing configuration of NAGIOS or Apache you may wish to leave some or all of them as false.
Variable |
Files that will be Edited or Replaced by YAIM |
Actions |
NAGIOS_HTTPD_ENABLE_CONFIG |
/etc/httpd/conf.d/nagios.conf & ssl.conf |
apache will be reloaded |
NAGIOS_NCG_ENABLE_CONFIG |
/etc/ncg/ncg.conf |
ncg.pl will be executed creating /etc/nagios/wlcg.d/* and /etc/nagios/nrpe/*. |
NAGIOS_NAGIOS_ENABLE_CONFIG |
/etc/nagios/nagios.cfg |
nagios will be reloaded |
NAGIOS_CGI_ENABLE_CONFIG |
/etc/nagios/cgi.cfg |
No actions |
The following configuration files are edited by
YAIM even if all the above are set to false. If this is a problem for your existing installations
advise us.
Will be altered with every run of YAIM |
/etc/nagios/ndo2db.cfg |
/etc/voms2htpasswd.conf |
/etc/nagios-proxy-refresh.conf |
/etc/httpd/conf.d/nrpe.conf |
/etc/broker-cache-file.conf |
/etc/sysconfig/msg-to-queue |
/etc/nagios/nsca.cfg |
/etc/nagios/send_nsca.cfg |
/etc/ncg/ncg-localdb.d/yaim-nagios-to-msg-queue.conf |
The following variables have defaults, and can be changed if you have a non-standard installation of nagios or httpd.
Variable |
Default |
Description |
NCG_MAIN_DB_FILE |
/etc/ncg/ncg.localdb |
location of your local configurations for NCG |
NCG_TEMPLATES_DIR |
/usr/share/grid-monitoring/config-gen/nagios |
the location of NCG configuration templates |
NCG_OUTPUT_DIR |
/etc/nagios/wlcg.d |
Where the nagios configuration files for the server will be generated |
NCG_NRPE_OUTPUT_DIR |
/etc/nagios/nrpe/ |
Where NRPE configuration files will be generated |
NAGIOS_HTPASSWD_FILE |
/etc/nagios/htpasswd.users |
Location of allowed users for nagios web portal |
NAGIOS_DB_HOST (removed) |
localhost |
Hostname of MySQL for NDOUtils, localhost makes everything very easy |
NAGIOS_DB_USER (removed) |
ndouser |
MySQL user name for NDOUtils. |
NAGIOS_DB_PASS (removed) |
ndopassword |
Especially if MySQL is on localhost then can be left alone. You may wish to change though. |
Configuration
To configure, you just need to run yaim. There are really two deployment options to consider:
Once this is completed successfully, you should be able to browser the Nagios web portal at
https://SERVER_NAME/nagios/
.
You may find that your web server hangs connections because of
BUG:48458
. If this is the case installing the
dummy-ca-certs package and restarting apache should hopefully resolve this.
Security
If using
local probes then these require a valid proxy certificate. This is obtained from a MyProxy service. Allow
the host running the local probes, the NRPE node, to retrieve a valid proxy. i.e
MyProxy should have a
YAIM configuration of at least.
PX_HOST=myproxy.example.ch
GRID_TRUSTED_RETRIEVERS="'/DC=ch/DC=cern/OU=computers/CN=nrpe-ui.example.ch'"
GRID_AUTHORIZED_RETRIEVERS="'/DC=ch/DC=cern/OU=computers/CN=nrpe-ui.example.ch'"
SITE_EMAIL="mybigsite@example.ch"
Finnally from a UI interface somewhere you must upload a proxy to
MyProxy that can be retrieved by the NRPE node.
$ myproxy-init -c 336 -k NagiosRetrieve-nrpe-ui.example.ch-dteam -s myproxy.example.org \
-l nagios -x -Z "/DC=ch/OU=computers/CN=nrpe-ui.example.ch"
Information System
Nagios now publishes important information about itself into the information system. Please add the GRIS running on the Nagios node into your site
BDII.