Slide 1: YAIM, NAGIOS and NCG Tutorial Outline
- Introduction
- Software Repositories
- Site, Multi-Site or ROC.
- Introduce deployment scenarios, increasing complexity.
- One NAGIOS Host including a UI - example will be a site.
- One NAGIOS Host + One NRPE Host - example will be a ROC.
- MyProxy and Proxy Retrieval.
- Troubleshooting probe failures.
- Disabling/Tuning probes.
Slide 2: Introduction
- gLite style meta packages
egee-NAGIOS
and egee-NRPE
exist
- YAIM,
glite-yaim-nagios
configures:
- NAGIOS and NCG for a deployment against a single/multi site or ROC.
- Prerequisites:
- The DAG repository
must be enabled for nagios, perl-Modules, ..
- A vaguely working site.
- You must have a siteBDII populated to some extent.
- Small amount of hardware, SL4/5 - i386/x86_64 all possible.
- Testing against CERN ROC completed on 512 MB virtual for me.
- X509 host certificates.
- lcg-CA Certificate Authority Set.
- Access to SAM tests database
- Access to GOCDB PI for ROCS GOCDB PI level 2 required
Slide 3: Repository Information
- SA1 package repository is hosted in Manchester.
- To enable add the
sa1-release
RPM from REPO.
- Contains
/etc/yum.repos.d/sa1-release.repo
- Only production repos enabled but contains disabled testing and devel repos as well as SRPMS and debuginfo repos.
- Repoview
pretty HTML pages are enabled in the repoview directory.
- Change logs for packages are being added to the RPM %changelogs.
- RSS feeds exist from the repoview pages.
- Transparent upgrades to SA1 software announced here only.
- EGEESA1PackageRepository has full details of repositories.
Slide 4: Remote and Local Probes.
- Local Probes
- These are standard nagios probes run locally from nagios box.
- Perl ones are mostly written with standard cpan Nagios::Plugin module.
- Python ones are written with our own python-GridMon module.
- Remote Probes
- Collect results from tests executed elsewhere , e.g from SAM (
gather_sam
) or from ENOC (gather_npm
).
- NCG queries central SAM or ENOC, e.g for SAM
- NCG creates passive probe slots on NAGIOS for all SAM probes on the correct hosts.
- gather_sam then runs as an active nagios check and drops passive results on nagios command pipe for each SAM result.
- i.e central SAM results appear nearly immediately in your local nagios.
- Probe Contribution
- We accept probe contribution of course. We have SVN, build services and repositories.
Slide 5: NAGIOS Host with Remote and Local Probes.
- Suitable for a site starting from scratch, very simple.
- Set up a gLite UI first.
- NCG executed by YAIM queries SiteBDII and remote testers (e.g SAM, ENOC).
- NCG populates NAGIOS configuration with hosts and probes to run against those hosts.
- NCG also processes configurations from a ROC nagios via the msg-system - see later.
- NCG creates nagios and/or NRPE configuration from this data.
Slide 6: YAIM Configuration for NAGIOS Host with Remote and Native Probes.
Variable |
Example |
Notes |
INSTALL_ROOT |
/opt |
YAIM always needs this. |
SITE_NAME |
BIGMAN-LCG2 |
Needed by NCG |
SITE_BDII_HOST |
site-bdii.example.org |
Needed by NCG |
PX_HOST |
myproxy.example.org |
MyProxy host to contact for credentials |
VOS |
"dteam ops" |
List of VOs whose members have read access to the NAGIOS portal. |
NAGIOS_HOST |
my-nagios.example.org |
Host where you are installing nagios |
NAGIOS_ADMIN_DNS |
/DC=ch/OU=Users/CN=Dr Kildare,/DC=ch/OU=User/CN=Dr Who |
Persons with write access to the NAGIOS portal. |
NAGIOS_HTTPD_ENABLE_CONFIG |
true |
Your /etc/httpd.d/conf.d/ssl.conf and nagios.conf will be clobbered. |
NAGIOS_NCG_ENABLE_CONFIG |
true |
Your /etc/ncg/ncg.conf configuration will be clobbered. |
NAGIOS_SUDO_ENABLE_CONFIG |
true |
Your /etc/sudoers will be appened to. |
NAGIOS_CGI_ENABLE_CONFIG |
true |
Your /etc/nagios/cgi.cfg will be edited. |
NAGIOS_NAGIOS_ENABLE_CONFIG |
true |
Your /etc/nagios/nagios.cfg will be clobberd |
NCG_VO |
"ops dteam" |
Who should we run tests as, requires a freindly VO member |
NAGIOS_ROLE |
roc or site |
Is this a site level or roc level nagios |
NCG_NRPE_UI |
ui.example.org |
Set if and only if you want an NRPE enabled UI |
NCG_GOCDB_ROC_NAME |
CERN |
GocDB ROC name if a ROC nagios |
- The
NAGIOS_ENABLE_*
variables allow YAIM to work with existing installs, all are false by default.
Slide 7: Site or ROC level and Multisite.
- Role controls location of nagios box in the full architecture. eg:
- SiteA Nagios collect results from ROC.
- ROC nagios does not probe resource BDIIs.
- Multisite - A site nagios can probe multiple sites, e.g via YAIM variable
NCG_LDAP_FILTER
NCG_LDAP_FILTER value |
Result |
GlueSiteName=UKI-NORTHGRID* |
Will capture all sites prefixed with UKI-NORTHGRID |
GlueSiteOtherInfo=EGEE_ROC=ITALY |
Will capture all sites hosted by the Italian ROC. |
- Sites can be added/deleted easily with fine tuning of NCG - see later.
- Possible deployment configurations:
- Two ROC Nagios covering half or all the sites.
- One Site Nagios covering more than one site - Site A and C.
Slide 8: Live Demo for NAGIOS Host with UI Pointing at a Site.
- Commands to install a Nagios Host and UI Pointing at the site.
Slide 9: Installing a Site Nagios and UI together
- Install httpd && gLite UI yumgroup && egee-NAGIOS && lcg-CA && dummy-ca-certs.
- Configure site-info.def
-
/opt/glite/yaim/bin/yaim -c -s /root/site-info.def -n glite-NAGIOS -n glite-UI
- NCG collects its hosts and services from the GOCDB and possibly a BDII endpoint
ldap://site-bdii.example.org:2170/mds-vo-name=BIGMAN-LCG2,o=grid.
- The resulting NAGIOS Web interface is located at https://nagioshost.example.org/nagios
- Visible as read only to set a VO members, maintained by
voms2htpasswd
.
- A static list of extra DNs can also be specified.
- Write operations, e.g rerun test, are restricted to:
- The list of DNs you specify above.
- The site owners as specified in the GOCDB.
Slide 10: Probes Running On the local Nagios Box
- On installation verify probes running on the local nagios box first.
- Probes check a valid grid proxy is available.
- Probes check and send messages to the msg-system.
- Probes contact the GOCDB to import service downtimes.
- Remote probes like gather_sam are associated to the nagios host.
Slide 11: MyProxy and Proxy Retrieval to Nagios.
- Grid probes require a valid user proxy to run. e.g globus-url-copy a file to a CE.
- NAGIOS host permitted to be a trusted_retriever of your proxy.
- User uploads to MyProxy service with
myproxy-init
specifying exactly which DNs can retrieve the proxy.
myproxy-init -c 336 -k NagiosRetrieve-nagios.example.org-<VO> \
-s myproxy.example.org \
-l nagios -x -Z "/DC=ch/OU=computers/CN=nagios.example.ch"
- The NAGIOS uses host certificate to authenticate to the MyProxy.
- Retrieval is now a nagios probe running on nagios node.
Comments
- YAIM configuration for a MyProxy service.
SITE_NAME=YourSite
PX_HOST=myproxy.example.ch
GRID_TRUSTED_RETRIEVERS="'/DC=ch/DC=cern/OU=computers/CN=nrpe-ui.example.ch'"
GRID_AUTHORIZED_RETRIEVERS="'/DC=ch/DC=cern/OU=computers/CN=nrpe-ui.example.ch'"
SITE_EMAIL="mybigsite@example.ch"
- This configuration allows one to specify at
myproxy-init
time that the user MAY specify that nrpe-ui.example.ch
can retrieve the proxy.
- It is NOT the default that all uploaded proxies can be retrieved by
nrpe-ui.example.ch
- On a UI run the following to upload your own proxy.
myproxy-init -c 336 -k !NagiosRetrieve-nagios.example.org-<VO> -s myproxy.example.org -l nagios \
-x -Z "/DC=ch/OU=computers/CN=nagios.example.ch"
Slide 12: A ROC nagios calling glite-UI via NRPE.
- One node with nagios , one node with gLite UI.
- Allows less gLite on your precious Nagios node.
- An existing UI can be used - tests your UI as well.
- Suitable for ROC or site Nagios.
- Other Nodes, e.g DPM, WN , ... can be tested the same way.
- YAIM on UI just requires one extra variable NAGIOS_HOSTNAME
- Install the
egee-NRPE
meta package and configure UI with:
- /opt/glite/yaim/bin/yaim -c -s /root/site-info.def -n glite-UI -n glite-NRPE
- YAIM on NAGIOS just requires one extra variable NCG_NRPE_UI
- Install only
egee-NAGIOS
on your NAGIOS box and configure:
- /opt/glite/yaim/bin/yaim -c -s /root/site-info.def -n NAGIOS
Slide 13: Demo of Installing ROC Nagios & gLite UI.
Slide 14: Transmission of NRPE Configuration
- Problem: NRPE node requires probe configuration generated by NCG.
- NCG places tar ball of NRPE configuration on NAGIOS node.
- NAGIOS calls a probe via NRPE that then on the gLite UI:
- Pulls tar.ball from web server (certificate auth)
- Unpacks tar.ball of NRPE configuration.
- Reloads (HUPs) NRPE daemon if needed.
- Probe Name - NRPE-Push
Slide 15: More Info, PNP4Nagios and Debugging.
- Some tests have "more info" configured in NAGIOS, e.g:
-
check_ldap
, links the NAGIOS documentation for check_ldap
-
ENOC
Links through the the ENOC test results on their web page.
- When metrics are available PNP4Nagios plots them.
- Debugging with nagios-run-check
- Requires a host and nagios service, look it up on web interface.
- It provides the command line to run the test.
Slide 16: ENOC used Populate Network Parents.
- Site Routers are collected from the ENOC who knows this.
- Sites nodes are configured with router as their parent.
# nagios-run-check -v -d -s org.nagios.LocalLogger-PortCheck \
-H ce1.triumf.ca
Executing command:
su nagios -c '"/usr/lib64/nagios/plugins"/"check_tcp" -H "206.12.1.15" -t "60" "-p 9002"'
# nagios-run-check -s org.nagios.LocalLogger-PortCheck -H ce1.triumf.ca
CRITICAL - Socket timeout after 60 seconds
-
- You can them copy paste the same command but add a -v for instance.
Slide 17: Tuning Probe Configuration
- NCG supports configuration tweaks in /etc/ncg/ncg-localdb.d
- Example: ADD_HOST_SERVICE!myproxy.example.org!MyProxy
- Example: ADD_SITE!SecretSite and ADD_SITE_BDII!SecretSite!secretbdii.example.org
- Adds a new site and specifies its BDII.
- Other possibilities:
- Remove a site.
- Define a completely new probe.
- Change a flag to probe, e.g change a timeout.
- Remove a service or host.
Slide 18: Support, Bug Reports and Contributions.
--
SteveTraylen - 2009-09-17