Start Presentation

Slide 1: YAIM, NAGIOS and NCG Tutorial Outline

  • Introduction
  • Introduce deployment scenarios, increasing complexity.
    • One NAGIOS Host - remote and native probes.
    • MyProxy and Proxy Retrieval.
    • One NAGIOS Host + One NRPE Host - remote, native and local probes.
  • Meta Packages, YAIM variables and targets.
  • Live install and configuration of YAIM, NCG and NAGIOS.
  • Troubleshooting probe failures.
  • Disabling probes.
  • Tuning probes.
  • Enabling notifications.
  • Message bridge.

Slide 2: Introduction

  • gLite style meta packages egee-NAGIOS and egee-NRPE existF
  • The YAIM module glite-yaim-nagios has been written.
    • Configures NAGIOS to test your Site using NCG utilities.
  • YAIM configures:
    • NAGIOS and NCG for a deployment against a single site.
    • Single site is defined by a siteBDII endpoint.
  • Especially useful if you have limited or no NAGIOS experience.
  • Required software in a RPM repository hosted at Manchester for SA1.
  • Prerequisites:
    • The DAG repository must be enabled for nagios, perl-Modules, ..
    • A vaguely working site.
      • You must have a siteBDII populated to some extent.
    • Small amount of hardware, SL4, i386 or x86_64.
      • Testing against a tier1 completed on 256 MB virtual for me.
    • X509 host certificates.
    • lcg-CA Certificate Authority Set.
    • Access to SAM tests database

Slide 3: Repository Information

  • SA1 package repository is hosted in Manchester.
  • To enabled add the egee-SA1.repo to /etc/yum.repos.d/.
[egee-SA1]
name=EGEE SA1 software
baseurl=http://www.sysadmin.hep.ac.uk/rpms/egee-SA1/sl4/$basearch/
enabled=1

[egee-SA1 SRPMS]
name=EGEE SA1 SRPMS
baseurl=http://www.sysadmin.hep.ac.uk/rpms/egee-SA1/sl4/SRPMS/
enabled=1
  • Source RPMs are available for all packages.
    • yum's createrepo is enabled for the SRPMS packages.
    • Repoview pretty HTML pages are enabled in the repodata directory.
  • Change logs for packages are being added to the RPM %changelogs.

Slide 4: Recap of Native, Remote and Local Probes.

Remote Probes
Collect results from tests executed elsewhere , e.g from SAM (gather_sam) or from ENOC (gather_npm).
Native Probes
Use of standard nagios probes. e.g. check_ldap , check_tcp, ...
Local Probes
Makes use of WLCG format written probes, e.g the LFC-probe is executed with the check_wlcg nagios probe.
Native vs Local Probes
Generally "local" probes require a proxy whilst "native" probes are network level checks.
This is NOT a true statement though.

Slide 5: NAGIOS Host with Remote and Native Probes.

DirectedGraphPlugin_1.png diagram

  • Suitable for a site starting from scratch, very simple.
  • yum install httpd An annoying bug, httpd should be an RPM PreReq of nagios rpm.
  • yum install egee-NAGIOS lcg-CA
  • /opt/glite/yaim/bin/yaim -c /root/site-info.def -n glite-NAGIOS
  • NCG executed by YAIM queries SiteBDII and remote testers (e.g SAM, ENOC).
    • NCG populates NAGIOS configuration with hosts and probes to run against those hosts.

Comments

The egee-NAGIOS meta package brings in at least the following major packages
  • nagios
  • httpd
  • mod_ssl For X509 authentication.
  • nagios-plugins
  • msg-nagios-bridge Sends nagios notifications out to the wider grid.
  • yaim
  • voms2htpasswd Maintains a htpasswd file of DN from VOMS servers who have access.
  • fetch-crl Maintain CRLs for apache.
  • grid-monitoring-* WLCG Monitoring Probes
  • nagios-proxy-refresh Refreshes grid-proxies though not enabled in this non "local" test version.

Slide 6: YAIM Configuration for NAGIOS Host with Remote and Native Probes.

Variable Example Notes
INSTALL_ROOT /opt YAIM always needs this.
SITE_NAME BIGMAN-LCG2 Needed by NCG
SITE_BDII_HOST site-bdii.example.org Needed by NCG
PX_HOST myproxy.example.org MyProxy Host, a bug , should not be required yet.
VOS "dteam ops" List of VOs whose members have read access to the NAGIOS portal.
NAGIOS_HOST my-nagios.example.org Host where you are installing nagios
NAGIOS_ADMIN_DNS /DC=ch/OU=Users/CN=Dr Kildare,/DC=ch/OU=User/CN=Dr Who Persons with write access to the NAGIOS portal.
NAGIOS_HTTPD_ENABLE_CONFIG true Your /etc/httpd.d/conf.d/ssl.conf and nagios.conf will be clobbered.
NAGIOS_MYPROXY_NAME myproxy_credential_name myproxy credential name (-k option)
NCG_PROBES_TYPE remote,native These are the probes we want.

  • /opt/glite/yaim/bin/yaim -c -s /root/site-info.def -n glite-NAGIOS
  • NCG collects its hosts and services from the BDII endpoint
    ldap://site-bdii.example.org:2170/mds-vo-name=BIGMAN-LCG2,o=grid.
  • The resulting NAGIOS Web interface is located at https://nagioshost.example.org/nagios
    • Visible as read only to set a VO members, maintained by voms2htpasswd.
    • Write operations, e.g rerun test, are restricted to the list of DNs you specify above.
    • Your list a local admins.
  • The configuration of the main apache files is switched of by default. e.g ssl.conf.
    • We will extend the same concept to /etc/nagios/nagios.conf as well shortly.

Comments

  • YAIM with the egee-NAGIOS target configures NCG, Nagios and apache creating or editing the following files.
    • /etc/nagios/nagios.cfg Used by NAGIOS Warning this file is always overwritten which is most likely wrong_
    • /etc/ncg/ncg.conf Main NCG configuration file. Always overwritten.
    • /etc/httpd/conf.d/ssl.conf Overwritten if enabled. Sets up X509 client certificate authentication.
    • /etc/httpd/conf.d/nagios.conf Overwritten if enabled. Sets up X509 client certificate authentication.
    • /etc/voms2htpasswd.conf
    • /etc/nagios/cgi.conf This is edited by YAIM settings '*' for read operations and so uses apache's FakeBasicAuth. Write privileges are populated with the admin DNs above.

Slide 7: Live Demo for NAGIOS Host with Remote and Native Probes.

  • The site-info.def used is available as site-info-remote-native-nagios.def in nagios-tutorial.tgz.
configure-remote-native-nagios.png

Comments

  • Clearly in the above example all the output has been removed only so it fits into this one screen shot.

Slide 8: Live Demo Results for NAGIOS Host with Remote and Native Probes

  • Native Probe: The org.Nagios.BDII-Check uses NAGIOS's standard check_ldap probe.
  • Remote Probe: The gather_npm and gather_sam probes query ENOC or SAM and submit the passive test results for all the SAM tests.

results-remote-native-nagios.png

Slide 9: HowTo Get Results for Multiple Sites.

  • Ask access to sites SAM tests if you haven't done it with original request
  • An extra YAIM variable, NCG_LDAP_FILTER, should be added to your site-info.def file on your NAGIOS host.
    • This is an LDAP filter that can be used to extract the GlueSite objects of interest to you from the information system.
  • Examples:

NCG_LDAP_FILTER value Result
GlueSiteName=UKI-NORTHGRID* Will capture all sites prefixed with UKI-NORTHGRID
GlueSiteOtherInfo=EGEE_ROC=ITALY Will capture all sites hosted by the Italian ROC.

  • This would be entered in the site-info.def file as. NCG_LDAP_FILTER="GlueSiteOtherInfo=EGEE_ROC=ITALY"
  • For ideas on how to match a GlueSite object see: How to publish my GlueSite.

Comments

  • To test your filter run changing to your own ldap filter of choice
        $ ldapsearch -x -H ldap://lcg-bdii.cern.ch:2170 -b 'Mds-vo-name=local,o=Grid' \
                   '(GlueSiteOtherInfo=EGEE_ROC=ITALY)' GlueSiteName
        
  • This will return of a list of sites that will be included.

Slide 10: MyProxy and Proxy Retrieval to Nagios.

  • Most of the "local" probes require a valid user proxy to run. e.g globus-url-copy a file to a CE.
  • Model chosen is to allow the NAGIOS host to be a trusted_retriever of your proxy.
    1. User uploads to MyProxy service with myproxy-init specifying exactly which DNs can retrieve the proxy.
    2. The NAGIOS service runs nagios-proxy-refresh to keep a valid proxy from MyProxy.
    3. The NAGIOS host certificate is used to identify to the MyProxy service.
  • The NAGIOS service never has access to the host certificate which is owned by root.
  • Assume PATCH:2184 has been released.
    • Configurations of MyProxy beyond what is needed by the WMS become possible.
    • Easy to work around if PATCH:2184 is not released.
  • Choice of MyProxy is yours.
    • Use one you trust in your region.
    • Set one up your self. It's easy and can coexist with NAGIOS if you like.

Slide 11: MyProxy Configuration With YAIM.

  • yum install glite-PX lcg-CA
  • YAIM configuration
           SITE_NAME=YourSite
           PX_HOST=myproxy.example.ch
           GRID_TRUSTED_RETRIEVERS="'/DC=ch/DC=cern/OU=computers/CN=nrpe-ui.example.ch'"       
           GRID_AUTHORIZED_RETRIEVERS="'/DC=ch/DC=cern/OU=computers/CN=nrpe-ui.example.ch'"
           SITE_EMAIL="mybigsite@example.ch"
        
  • /opt/glite/yaim/bin/yaim -c -s /root/site-info.def -n glite-PX
  • This configuration allows one to specify at myproxy-init time that the user MAY specify that nrpe-ui.example.ch can retrieve the proxy.
    • It is NOT the default that all uploaded proxies can be retrieved by nrpe-ui.example.ch
  • On a UI run the following to upload the proxy.
    • myproxy-init -c 336 -k NagiosRetrieve-nrpe-ui.example.org- -s myproxy.example.org -l nagios -x -Z "/DC=ch/OU=computers/CN=nrpe-ui.example.ch"

Comments

  • In this case myproxy-init command uploads a proxy to myproxy.example.org that can be retrieved for 2 weeks by the hostkey of nrpe-ui.example.ch
  • Use an existing MyProxy host if you have one an the admin is willing to alter the above. Else just install one on your NAGIOS host.
  • It is the easiest gLite service to run.
  • An example site-info.def file for a myproxy service is available as site-info-myproxy.def in nagios-tutorial.tgz.

Slide 12: Enabling Remote Probes and NRPE on a seperate glite-UI.

DirectedGraphPlugin_2.png diagram

  • A glite-UI has been added that is called via NRPE from the NAGIOS service to run "remote" probes.
  • NRPE requires configuration which it gathers from the NAGIOS host. .. See later.

Slide 13: Setting Up the UI with NRPE Service for Local Probes.

  • A standard glite-UI is needed.
  • To it we add the egee-NRPE meta package and configure both with YAIM.
  • yum install glite-UI lcg-CA egee-NRPE
  • Additional YAIM variables.
Variable Example Notes
NCG_PROBES_TYPE remote,native,local Now the addition of local probes
NCG_NRPE_UI nrpe-ui.example.ch The hostname of the NRPE UI node

  • Plus standard UserInterface variables.
Variable Eample Notes
BDII_HOST lcg-bdii.cern.ch Top Level BDII
MON_HOST mon01.cern.ch Local R-GMA Mon Box
REG_HOST lcgic01.gridpp.rl.ac.uk Central R-GMA registry
RB_HOST wms103.cern.ch Any old WMS
VO_DTEAM_VOMSES "'dteam lcg-voms.cern.ch 15004 /DC=ch/DC=cern/OU=computers/CN=lcg-voms.cern.ch dteam'" Voms file for dteam

  • /opt/glite/yaim/bin/yaim -c -s /root/site-info.def -n glite-UI -n glite-NRPE

Slide 14: Live Demo of NRPE and UI Node.

configure-remote-native-local-nagios.png

  • YAIM sets up NRPE /etc/nagios/nrpe.cfg
  • Sets up proxy retrieval /etc/nagios-proxy-refresh.conf
  • Sets up NRPE configuration retrieval /etc/mirror-nrpe-conf.conf

Comments

  • Again the lack output above is just so it fits on a single screen.
  • The file /etc/nagios/nrpe.cfg is edited to
    • Allow connections from the nagios host.
    • Change the default timeouts on commands.
    • Include all NCG created files located in /etc/nagios/nrpe

Slide 15: Nagios Proxy Refresh

  • The command nagios-proxy-refresh attempts to retrieve a proxy for running tests with.
  • They are retrieved from the MyProxy service of course.
  • A cron runs every 4 hours calling nagios-proxy-refresh.
    • Log file is located at =/var/log/nagios-proxy-refresh.conf which is rotated.
    • /sbin/service nagios-proxy-refresh stop|start enables and disables the cron.
    • A start also runs a proxy-refesh there and then.
  • There are also nagios checks to:
    1. Check the validity of the retrieved proxy.
    2. Check the time left to expiry on the credential stored in MyProxy.

Slide 16: Enabling Local Tests on the NAGIOS Node.

  • Use same YAIM configuration as NRPE Node.
    • Different target of course egee-NAGIOS
  • /opt/glite/yaim/bin/yaim -c -s /root/site-info.def -n glite-NAGIOS
  • NCG creates NAGIOS configuration as before.
  • NCG configures NAGIOS to call check_nrpe to run tests via NRPE node.
  • NCG also now creates actual NRPE configuration for the NRPE node.
    • Generated in /etc/nagios/nrpe on NAGIOS host!
    • NRPE configuration is needed however on the NRPE node.

Slide 17: Use of mirror-nrpe-config

  • The NAGIOS box webserver is now serving. /etc/nagios/nrpe
  • mirror-nrpe-config command runs on NRPE node.
    • Mirrors in NRPE configuation from NAGIOS host
    • Again cron enabled/disabled with a
      /sbin/service mirror-nrpe-config start/stop
    • A start issues a mirror.

Slide 18: Live Enabling of Local Probes on NAGIOS box.

results-local-native-remote.png

Slide 19: Diagnosing Test Failiures

  • Some tests have "more info" configured in NAGIOS, e.g:
    1. check_ldap, links the NAGIOS documentation for check_ldap
    2. ENOC Links through the the ENOC test results on their web page.
  • nagios-run-check
    • Requires a host and nagios service, look it up on web interface.
    • It provides the command line to run the test.
            #  nagios-run-check -v -d -s org.nagios.LocalLogger-PortCheck -H ce1.triumf.ca
            Executing command:
             su nagios -c '"/usr/lib64/nagios/plugins"/"check_tcp" -H "206.12.1.15" -t "60" "-p 9002"'
            # nagios-run-check -s org.nagios.LocalLogger-PortCheck -H ce1.triumf.ca
            CRITICAL - Socket timeout after 60 seconds
            

Slide 20: Customising Probes and Probe Documentation

  • For various reasons we may need to disable or tune various tests. e.g
    • We test ResourceBDIIs but these need not be visible from a ROC.
    • We test LocalLoggers on CEs but again this need only be visible from the WNs at a site.
  • The NCG configuration can be altered with a /etc/ncg/ncg.localdb
    • See ncg.localdb.exampe for some examples.
REMOVE_SERVICE!ce1.triumf.ca!org.glite.LocalLogger
REMOVE_SERVICE!ce2.triumf.ca!org.glite.LocalLogger
REMOVE_SERVICE!srm.triumf.ca!BDII
  • This will remove the service checks in NAGIOS service Group org.glite.LocalLogger

Slide 21: NAGIOS Publication to ActiveMQ

  • NAGIOS probe results are being published via the messaging system.
  • You subscribe to them with a stomp client
    • e.g perl(Net::Stomp)

Slide 22: Current Problems And Short Term Additions

  • Discovered in last few days, will be fixed shortly.
    • Only voms.cern.ch VOs supported by voms2htpasswd.
    • Two NAGIOS boxes cannot use the same MyProxy
    • Add new YAIM options to:
      1. Disable editing of main nagios configuration files.
        • Easier integration into existing NAGIOS based sites.
      • Disable generation of ncg.conf file.
        • Direct NCG configuration will always be richer than YAIM.
        • Sites in all but simplest case may well end up tuning this.

Slide 23: Support, Bug Reports and Contributions.

Topic attachments
I Attachment History Action Size Date Who Comment
PNGpng EGEE.png r1 manage 28.5 K 2008-09-18 - 11:10 SteveTraylen  
PNGpng configure-remote-native-local-nagios.png r1 manage 48.8 K 2008-09-20 - 20:58 SteveTraylen  
PNGpng configure-remote-native-nagios.png r1 manage 63.0 K 2008-09-18 - 17:22 SteveTraylen configure remote and native probes.
Compressed Zip archivetgz nagios-tutorial.tgz r2 r1 manage 1.6 K 2008-09-22 - 11:33 SteveTraylen Nagios Tutorial Files.
PNGpng results-local-native-remote.png r1 manage 333.8 K 2008-09-22 - 09:18 SteveTraylen Nagios Reults Local Native and Remote.
PNGpng results-remote-native-nagios.png r1 manage 277.7 K 2008-09-20 - 15:46 SteveTraylen Results of remote and native only.
Edit | Attach | Watch | Print version | History: r33 < r32 < r31 < r30 < r29 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r33 - 2013-08-22 - TWikiGuest
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    EGEE All webs login

This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright & by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Ask a support question or Send feedback