Slide 1: YAIM, NAGIOS and NCG Tutorial Outline
- Introduction
- Introduce deployment scenarios, increasing complexity.
- One NAGIOS Host - remote and native probes.
- MyProxy and Proxy Retrieval.
- One NAGIOS Host + One NRPE Host - remote, native and local probes.
- Meta Packages, YAIM variables and targets.
- Live install and configuration of YAIM, NCG and NAGIOS.
- Troubleshooting probe failures.
- Disabling probes.
- Tuning probes.
- Enabling notifications.
- Message bridge.
Slide 2: Introduction
- gLite style meta packages
egee-NAGIOS
and egee-NRPE
existF
- The YAIM module
glite-yaim-nagios
has been written.
- Configures NAGIOS to test your Site using NCG utilities.
- YAIM configures:
- NAGIOS and NCG for a deployment against a single site.
- Single site is defined by a siteBDII endpoint.
- Especially useful if you have limited or no NAGIOS experience.
- Required software in a RPM repository
hosted at Manchester for SA1.
- Prerequisites:
- The DAG repository
must be enabled for nagios, perl-Modules, ..
- A vaguely working site.
- You must have a siteBDII populated to some extent.
- Small amount of hardware, SL4, i386 or x86_64.
- Testing against a tier1 completed on 256 MB virtual for me.
- X509 host certificates.
- lcg-CA Certificate Authority Set.
- Access to SAM tests database
Slide 3: Repository Information
- SA1 package repository is hosted in Manchester.
- To enabled add the egee-SA1.repo
to /etc/yum.repos.d/
.
[egee-SA1]
name=EGEE SA1 software
baseurl=http://www.sysadmin.hep.ac.uk/rpms/egee-SA1/sl4/$basearch/
enabled=1
[egee-SA1 SRPMS]
name=EGEE SA1 SRPMS
baseurl=http://www.sysadmin.hep.ac.uk/rpms/egee-SA1/sl4/SRPMS/
enabled=1
- Source RPMs are available for all packages.
- yum's createrepo is enabled for the SRPMS packages.
- Repoview
pretty HTML pages are enabled in the repodata directory.
- Change logs for packages are being added to the RPM %changelogs.
Slide 4: Recap of Native, Remote and Local Probes.
- Remote Probes
- Collect results from tests executed elsewhere , e.g from SAM (
gather_sam
) or from ENOC (gather_npm
).
- Native Probes
- Use of standard nagios probes. e.g.
check_ldap
, check_tcp
, ...
- Local Probes
- Makes use of WLCG format written probes, e.g the
LFC-probe
is executed with the check_wlcg
nagios probe.
- Native vs Local Probes
- Generally "local" probes require a proxy whilst "native" probes are network level checks.
This is NOT a true statement though.
Slide 5: NAGIOS Host with Remote and Native Probes.
digraph G {
graph [bgcolor="#ffff99", rankdir="LR"];
bdii [fillcolor="white", style=filled, label="SiteBDII at your Site.", shape=box, fontname=Courier, fontsize=11];
nagios [fillcolor="white", style=filled, label="Nagios Host at your Site.", shape=box, fontname=Courier, fontsize=11];
sam [fillcolor="white", style=filled, label="SAM Service", shape=box, fontname=Courier, fontsize=11];
rgma [fillcolor="white", style=filled, label="A site service port, e.g R-GMA on 8443.", shape=box, fontname=Courier, fontsize=11];
enoc [fillcolor="white", style=filled, label="ENOC Service", shape=box, fontname=Courier, fontsize=11];
nagios -> sam [label="Remote Probe"];
nagios -> enoc [label="Remote Probe"];
nagios -> rgma [fontcolor="blue", color="blue", label="Native Probe"];
nagios -> bdii [fontcolor="red" , color="red", label="NCG"] ;
{ rank=same; nagios;bdii };
}
- Suitable for a site starting from scratch, very simple.
-
yum install httpd
An annoying bug, httpd should be an RPM PreReq of nagios rpm.
-
yum install egee-NAGIOS lcg-CA
-
/opt/glite/yaim/bin/yaim -c /root/site-info.def -n glite-NAGIOS
- NCG executed by YAIM queries SiteBDII and remote testers (e.g SAM, ENOC).
- NCG populates NAGIOS configuration with hosts and probes to run against those hosts.
Comments
The
egee-NAGIOS meta package brings in at least the following major packages
-
nagios
-
httpd
-
mod_ssl
For X509 authentication.
-
nagios-plugins
-
msg-nagios-bridge
Sends nagios notifications out to the wider grid.
-
yaim
-
voms2htpasswd
Maintains a htpasswd file of DN from VOMS servers who have access.
-
fetch-crl
Maintain CRLs for apache.
-
grid-monitoring-*
WLCG Monitoring Probes
-
nagios-proxy-refresh
Refreshes grid-proxies though not enabled in this non "local" test version.
Slide 6: YAIM Configuration for NAGIOS Host with Remote and Native Probes.
Variable |
Example |
Notes |
INSTALL_ROOT |
/opt |
YAIM always needs this. |
SITE_NAME |
BIGMAN-LCG2 |
Needed by NCG |
SITE_BDII_HOST |
site-bdii.example.org |
Needed by NCG |
PX_HOST |
myproxy.example.org |
MyProxy Host, a bug , should not be required yet. |
VOS |
"dteam ops" |
List of VOs whose members have read access to the NAGIOS portal. |
NAGIOS_HOST |
my-nagios.example.org |
Host where you are installing nagios |
NAGIOS_ADMIN_DNS |
/DC=ch/OU=Users/CN=Dr Kildare,/DC=ch/OU=User/CN=Dr Who |
Persons with write access to the NAGIOS portal. |
NAGIOS_HTTPD_ENABLE_CONFIG |
true |
Your /etc/httpd.d/conf.d/ssl.conf and nagios.conf will be clobbered. |
NAGIOS_MYPROXY_NAME |
myproxy_credential_name |
myproxy credential name (-k option) |
NCG_PROBES_TYPE |
remote,native |
These are the probes we want. |
-
/opt/glite/yaim/bin/yaim -c -s /root/site-info.def -n glite-NAGIOS
- NCG collects its hosts and services from the BDII endpoint
ldap://site-bdii.example.org:2170/mds-vo-name=BIGMAN-LCG2,o=grid.
- The resulting NAGIOS Web interface is located at https://nagioshost.example.org/nagios
- Visible as read only to set a VO members, maintained by
voms2htpasswd
.
- Write operations, e.g rerun test, are restricted to the list of DNs you specify above.
- Your list a local admins.
- The configuration of the main apache files is switched of by default. e.g
ssl.conf
.
- We will extend the same concept to
/etc/nagios/nagios.conf
as well shortly.
Comments
- YAIM with the
egee-NAGIOS
target configures NCG, Nagios and apache creating or editing the following files.
-
/etc/nagios/nagios.cfg
Used by NAGIOS Warning this file is always overwritten which is most likely wrong_
-
/etc/ncg/ncg.conf
Main NCG configuration file. Always overwritten.
-
/etc/httpd/conf.d/ssl.conf
Overwritten if enabled. Sets up X509 client certificate authentication.
-
/etc/httpd/conf.d/nagios.conf
Overwritten if enabled. Sets up X509 client certificate authentication.
-
/etc/voms2htpasswd.conf
-
/etc/nagios/cgi.conf
This is edited by YAIM settings '*' for read operations and so uses apache's FakeBasicAuth. Write privileges are populated with the admin DNs above.
Slide 7: Live Demo for NAGIOS Host with Remote and Native Probes.
- The
site-info.def
used is available as site-info-remote-native-nagios.def
in nagios-tutorial.tgz.
Comments
- Clearly in the above example all the output has been removed only so it fits into this one screen shot.
Slide 8: Live Demo Results for NAGIOS Host with Remote and Native Probes
- Native Probe: The
org.Nagios.BDII-Check
uses NAGIOS's standard check_ldap
probe.
- Remote Probe: The
gather_npm
and gather_sam
probes query ENOC or SAM and submit the passive test results for all the SAM tests.
Slide 9: HowTo Get Results for Multiple Sites.
- Ask access to sites SAM tests if you haven't done it with original request
- An extra YAIM variable,
NCG_LDAP_FILTER
, should be added to your site-info.def
file on your NAGIOS host.
- This is an LDAP filter that can be used to extract the GlueSite objects of interest to you from the information system.
- Examples:
NCG_LDAP_FILTER value |
Result |
GlueSiteName=UKI-NORTHGRID* |
Will capture all sites prefixed with UKI-NORTHGRID |
GlueSiteOtherInfo=EGEE_ROC=ITALY |
Will capture all sites hosted by the Italian ROC. |
- This would be entered in the
site-info.def
file as. NCG_LDAP_FILTER="GlueSiteOtherInfo=EGEE_ROC=ITALY"
- For ideas on how to match a GlueSite object see: How to publish my GlueSite
.
Comments
Slide 10: MyProxy and Proxy Retrieval to Nagios.
- Most of the "local" probes require a valid user proxy to run. e.g globus-url-copy a file to a CE.
- Model chosen is to allow the NAGIOS host to be a trusted_retriever of your proxy.
- User uploads to MyProxy service with
myproxy-init
specifying exactly which DNs can retrieve the proxy.
- The NAGIOS service runs
nagios-proxy-refresh
to keep a valid proxy from MyProxy.
- The NAGIOS host certificate is used to identify to the MyProxy service.
- The NAGIOS service never has access to the host certificate which is owned by root.
- Assume PATCH:2184
has been released.
- Configurations of MyProxy beyond what is needed by the WMS become possible.
- Easy to work around if PATCH:2184
is not released.
- Choice of MyProxy is yours.
- Use one you trust in your region.
- Set one up your self. It's easy and can coexist with NAGIOS if you like.
Slide 11: MyProxy Configuration With YAIM.
Comments
- In this case
myproxy-init
command uploads a proxy to myproxy.example.org
that can be retrieved for 2 weeks by the hostkey of nrpe-ui.example.ch
- Use an existing MyProxy host if you have one an the admin is willing to alter the above. Else just install one on your NAGIOS host.
- It is the easiest gLite service to run.
- An example site-info.def file for a myproxy service is available as site-info-myproxy.def in nagios-tutorial.tgz.
Slide 12: Enabling Remote Probes and NRPE on a seperate glite-UI.
digraph G {
graph [bgcolor="#ffff99", rankdir="LR"];
bdii [fillcolor="white", style=filled, label="SiteBDII at your Site.", shape=box, fontname=Courier, fontsize=11];
srm [fillcolor="white", style=filled, label="SRM at your Site.", shape=box, fontname=Courier, fontsize=11];
nagios [fillcolor="white", style=filled, label="Nagios Host at your Site.", shape=box, fontname=Courier, fontsize=11];
nrpe [fillcolor="white", style=filled, label="NRPE Host at your Site.", shape=box, fontname=Courier, fontsize=11];
myproxy [fillcolor="white", style=filled, label="MyProxy Host at your Site.", shape=box, fontname=Courier, fontsize=11];
sam [fillcolor="white", style=filled, label="SAM Service", shape=box, fontname=Courier, fontsize=11];
rgma [fillcolor="white", style=filled, label="A site service port, e.g R-GMA on 8443.", shape=box, fontname=Courier, fontsize=11];
enoc [fillcolor="white", style=filled, label="ENOC Service", shape=box, fontname=Courier, fontsize=11];
nagios -> sam [label="Remote Probe"];
nagios -> enoc [label="Remote Probe"];
nagios -> rgma [fontcolor="blue", color="blue", label="Native Probe"];
nagios -> bdii [fontcolor="red" , color="red", label="NCG"] ;
nagios -> nrpe [fontcolor="green", color="green", label="NRPE"] ;
nrpe -> nagios [label="Fetch NRPE Configuration"];
nrpe -> myproxy [fontcolor="darkorange4", color="darkorange4", label="Retrieve Valid Proxy"] ;
nrpe -> srm [fontcolor="purple", color="purple", label="Local Probe"];
{ rank=same; nagios;bdii; nrpe; myproxy };
}
- A
glite-UI
has been added that is called via NRPE from the NAGIOS service to run "remote" probes.
- NRPE requires configuration which it gathers from the NAGIOS host. .. See later.
Slide 13: Setting Up the UI with NRPE Service for Local Probes.
- A standard
glite-UI
is needed.
- To it we add the
egee-NRPE
meta package and configure both with YAIM.
-
yum install glite-UI lcg-CA egee-NRPE
- Additional YAIM variables.
Variable |
Example |
Notes |
NCG_PROBES_TYPE |
remote,native,local |
Now the addition of local probes |
NCG_NRPE_UI |
nrpe-ui.example.ch |
The hostname of the NRPE UI node |
- Plus standard UserInterface variables.
Variable |
Eample |
Notes |
BDII_HOST |
lcg-bdii.cern.ch |
Top Level BDII |
MON_HOST |
mon01.cern.ch |
Local R-GMA Mon Box |
REG_HOST |
lcgic01.gridpp.rl.ac.uk |
Central R-GMA registry |
RB_HOST |
wms103.cern.ch |
Any old WMS |
VO_DTEAM_VOMSES |
"'dteam lcg-voms.cern.ch 15004 /DC=ch/DC=cern/OU=computers/CN=lcg-voms.cern.ch dteam'" |
Voms file for dteam |
-
/opt/glite/yaim/bin/yaim -c -s /root/site-info.def -n glite-UI -n glite-NRPE
Slide 14: Live Demo of NRPE and UI Node.
- YAIM sets up NRPE
/etc/nagios/nrpe.cfg
- Sets up proxy retrieval
/etc/nagios-proxy-refresh.conf
- Sets up NRPE configuration retrieval
/etc/mirror-nrpe-conf.conf
Comments
- Again the lack output above is just so it fits on a single screen.
- The file
/etc/nagios/nrpe.cfg
is edited to
- Allow connections from the nagios host.
- Change the default timeouts on commands.
- Include all NCG created files located in
/etc/nagios/nrpe
Slide 15: Nagios Proxy Refresh
- The command
nagios-proxy-refresh
attempts to retrieve a proxy for running tests with.
- They are retrieved from the MyProxy service of course.
- A cron runs every 4 hours calling nagios-proxy-refresh.
- Log file is located at =/var/log/nagios-proxy-refresh.conf which is rotated.
-
/sbin/service nagios-proxy-refresh stop|start
enables and disables the cron.
- A
start
also runs a proxy-refesh there and then.
- There are also nagios checks to:
- Check the validity of the retrieved proxy.
- Check the time left to expiry on the credential stored in MyProxy.
Slide 16: Enabling Local Tests on the NAGIOS Node.
- Use same YAIM configuration as NRPE Node.
- Different target of course
egee-NAGIOS
-
/opt/glite/yaim/bin/yaim -c -s /root/site-info.def -n glite-NAGIOS
- NCG creates NAGIOS configuration as before.
- NCG configures NAGIOS to call
check_nrpe
to run tests via NRPE node.
- NCG also now creates actual NRPE configuration for the NRPE node.
- Generated in
/etc/nagios/nrpe
on NAGIOS host!
- NRPE configuration is needed however on the NRPE node.
Slide 17: Use of mirror-nrpe-config
- The NAGIOS box webserver is now serving.
/etc/nagios/nrpe
- mirror-nrpe-config command runs on NRPE node.
- Mirrors in NRPE configuation from NAGIOS host
- Again cron enabled/disabled with a
/sbin/service mirror-nrpe-config start/stop
- A
start
issues a mirror.
Slide 18: Live Enabling of Local Probes on NAGIOS box.
Slide 19: Diagnosing Test Failiures
- Some tests have "more info" configured in NAGIOS, e.g:
-
check_ldap
, links the NAGIOS documentation for check_ldap
-
ENOC
Links through the the ENOC test results on their web page.
- nagios-run-check
Slide 20: Customising Probes and Probe Documentation
- For various reasons we may need to disable or tune various tests. e.g
- We test ResourceBDIIs but these need not be visible from a ROC.
- We test LocalLoggers on CEs but again this need only be visible from the WNs at a site.
- The NCG configuration can be altered with a
/etc/ncg/ncg.localdb
- See
ncg.localdb.exampe
for some examples.
REMOVE_SERVICE!ce1.triumf.ca!org.glite.LocalLogger
REMOVE_SERVICE!ce2.triumf.ca!org.glite.LocalLogger
REMOVE_SERVICE!srm.triumf.ca!BDII
- This will remove the service checks in NAGIOS service Group org.glite.LocalLogger
Slide 21: NAGIOS Publication to ActiveMQ
- NAGIOS probe results are being published via the messaging system.
- You subscribe to them with a stomp client
Slide 22: Current Problems And Short Term Additions
- Discovered in last few days, will be fixed shortly.
- Only
voms.cern.ch
VOs supported by voms2htpasswd
.
- Two NAGIOS boxes cannot use the same MyProxy
- Add new YAIM options to:
- Disable editing of main nagios configuration files.
- Easier integration into existing NAGIOS based sites.
- Disable generation of
ncg.conf
file.
- Direct NCG configuration will always be richer than YAIM.
- Sites in all but simplest case may well end up tuning this.
Slide 23: Support, Bug Reports and Contributions.