Start Presentation

Slide 1: YAIM, NAGIOS and NCG Tutorial Outline

  • Introduction
  • Software Repositories
  • Site, Multi-Site or ROC.
  • Introduce deployment scenarios, increasing complexity.
    • One NAGIOS Host including a UI - example will be a site.
    • One NAGIOS Host + One NRPE Host - example will be a ROC.
  • MyProxy and Proxy Retrieval.
  • Troubleshooting probe failures.
  • Disabling/Tuning probes.

Slide 2: Introduction

  • gLite style meta packages egee-NAGIOS and egee-NRPE exist
  • YAIM, glite-yaim-nagios configures:
    • NAGIOS and NCG for a deployment against a single/multi site or ROC.
  • Prerequisites:
    • The DAG repository must be enabled for nagios, perl-Modules, ..
    • A vaguely working site.
      • You must have a siteBDII populated to some extent.
    • Small amount of hardware, SL4/5 - i386/x86_64 all possible.
      • Testing against CERN ROC completed on 512 MB virtual for me.
    • X509 host certificates.
    • lcg-CA Certificate Authority Set.
    • Access to SAM tests database
    • Access to GOCDB PI for ROCS GOCDB PI level 2 required

Slide 3: Repository Information

Slide 4: Remote and Local Probes.

Local Probes
These are standard nagios probes run locally from nagios box.
  • Perl ones are mostly written with standard cpan Nagios::Plugin module.
  • Python ones are written with our own python-GridMon module.

Remote Probes
Collect results from tests executed elsewhere , e.g from SAM (gather_sam) or from ENOC (gather_npm).
  • NCG queries central SAM or ENOC, e.g for SAM
    • NCG creates passive probe slots on NAGIOS for all SAM probes on the correct hosts.
    • gather_sam then runs as an active nagios check and drops passive results on nagios command pipe for each SAM result.
    • i.e central SAM results appear nearly immediately in your local nagios.

Probe Contribution
We accept probe contribution of course. We have SVN, build services and repositories.

Slide 5: NAGIOS Host with Remote and Local Probes.

DirectedGraphPlugin_1.png diagram

  • Suitable for a site starting from scratch, very simple.
  • Set up a gLite UI first.
  • NCG executed by YAIM queries SiteBDII and remote testers (e.g SAM, ENOC).
    • NCG populates NAGIOS configuration with hosts and probes to run against those hosts.
    • NCG also processes configurations from a ROC nagios via the msg-system - see later.
  • NCG creates nagios and/or NRPE configuration from this data.

Slide 6: YAIM Configuration for NAGIOS Host with Remote and Native Probes.

Variable Example Notes
INSTALL_ROOT /opt YAIM always needs this.
SITE_NAME BIGMAN-LCG2 Needed by NCG
SITE_BDII_HOST site-bdii.example.org Needed by NCG
PX_HOST myproxy.example.org MyProxy host to contact for credentials
VOS "dteam ops" List of VOs whose members have read access to the NAGIOS portal.
NAGIOS_HOST my-nagios.example.org Host where you are installing nagios
NAGIOS_ADMIN_DNS /DC=ch/OU=Users/CN=Dr Kildare,/DC=ch/OU=User/CN=Dr Who Persons with write access to the NAGIOS portal.
NAGIOS_HTTPD_ENABLE_CONFIG true Your /etc/httpd.d/conf.d/ssl.conf and nagios.conf will be clobbered.
NAGIOS_NCG_ENABLE_CONFIG true Your /etc/ncg/ncg.conf configuration will be clobbered.
NAGIOS_SUDO_ENABLE_CONFIG true Your /etc/sudoers will be appened to.
NAGIOS_CGI_ENABLE_CONFIG true Your /etc/nagios/cgi.cfg will be edited.
NAGIOS_NAGIOS_ENABLE_CONFIG true Your /etc/nagios/nagios.cfg will be clobberd
NCG_VO "ops dteam" Who should we run tests as, requires a freindly VO member
NAGIOS_ROLE roc or site Is this a site level or roc level nagios
NCG_NRPE_UI ui.example.org Set if and only if you want an NRPE enabled UI
NCG_GOCDB_ROC_NAME CERN GocDB ROC name if a ROC nagios

  • The NAGIOS_ENABLE_* variables allow YAIM to work with existing installs, all are false by default.

Slide 7: Site or ROC level and Multisite.

ROCSiteNagios.png
  • Role controls location of nagios box in the full architecture. eg:
    • SiteA Nagios collect results from ROC.
    • ROC nagios does not probe resource BDIIs.
  • Multisite - A site nagios can probe multiple sites, e.g via YAIM variable NCG_LDAP_FILTER

NCG_LDAP_FILTER value Result
GlueSiteName=UKI-NORTHGRID* Will capture all sites prefixed with UKI-NORTHGRID
GlueSiteOtherInfo=EGEE_ROC=ITALY Will capture all sites hosted by the Italian ROC.
  • Sites can be added/deleted easily with fine tuning of NCG - see later.
  • Possible deployment configurations:
    1. Two ROC Nagios covering half or all the sites.
    2. One Site Nagios covering more than one site - Site A and C.

Slide 8: Live Demo for NAGIOS Host with UI Pointing at a Site.

  • Commands to install a Nagios Host and UI Pointing at the site. install-site-terminal-2009.png

Slide 9: Installing a Site Nagios and UI together

  • Install httpd && gLite UI yumgroup && egee-NAGIOS && lcg-CA && dummy-ca-certs.
  • Configure site-info.def
  • /opt/glite/yaim/bin/yaim -c -s /root/site-info.def -n glite-NAGIOS -n glite-UI
  • NCG collects its hosts and services from the GOCDB and possibly a BDII endpoint
    ldap://site-bdii.example.org:2170/mds-vo-name=BIGMAN-LCG2,o=grid.
  • The resulting NAGIOS Web interface is located at https://nagioshost.example.org/nagios
    • Visible as read only to set a VO members, maintained by voms2htpasswd.
      • A static list of extra DNs can also be specified.
    • Write operations, e.g rerun test, are restricted to:
      1. The list of DNs you specify above.
      2. The site owners as specified in the GOCDB.

Slide 10: Probes Running On the local Nagios Box

  • On installation verify probes running on the local nagios box first.
    • Probes check a valid grid proxy is available.
    • Probes check and send messages to the msg-system.
    • Probes contact the GOCDB to import service downtimes.
    • Remote probes like gather_sam are associated to the nagios host.

nagios-host-probe-results-2009.png

Slide 11: MyProxy and Proxy Retrieval to Nagios.

  • Grid probes require a valid user proxy to run. e.g globus-url-copy a file to a CE.
  • NAGIOS host permitted to be a trusted_retriever of your proxy.
    1. User uploads to MyProxy service with myproxy-init specifying exactly which DNs can retrieve the proxy.
        myproxy-init -c 336 -k NagiosRetrieve-nagios.example.org-<VO> \
           -s myproxy.example.org \
           -l nagios  -x -Z "/DC=ch/OU=computers/CN=nagios.example.ch"
    2. The NAGIOS uses host certificate to authenticate to the MyProxy.
      • Retrieval is now a nagios probe running on nagios node.

Comments

  • YAIM configuration for a MyProxy service.
           SITE_NAME=YourSite
           PX_HOST=myproxy.example.ch
           GRID_TRUSTED_RETRIEVERS="'/DC=ch/DC=cern/OU=computers/CN=nrpe-ui.example.ch'"       
           GRID_AUTHORIZED_RETRIEVERS="'/DC=ch/DC=cern/OU=computers/CN=nrpe-ui.example.ch'"
           SITE_EMAIL="mybigsite@example.ch"
        
  • This configuration allows one to specify at myproxy-init time that the user MAY specify that nrpe-ui.example.ch can retrieve the proxy.
    • It is NOT the default that all uploaded proxies can be retrieved by nrpe-ui.example.ch
  • On a UI run the following to upload your own proxy.
          myproxy-init -c 336 -k !NagiosRetrieve-nagios.example.org-<VO> -s myproxy.example.org -l nagios \
                  -x -Z "/DC=ch/OU=computers/CN=nagios.example.ch"
      

Slide 12: A ROC nagios calling glite-UI via NRPE.

  • One node with nagios , one node with gLite UI.
    • Allows less gLite on your precious Nagios node.
    • An existing UI can be used - tests your UI as well.
  • Suitable for ROC or site Nagios.
  • Other Nodes, e.g DPM, WN , ... can be tested the same way.
    • Planned for the future.
  • YAIM on UI just requires one extra variable NAGIOS_HOSTNAME
    • Install the egee-NRPE meta package and configure UI with:
    • /opt/glite/yaim/bin/yaim -c -s /root/site-info.def -n glite-UI -n glite-NRPE
  • YAIM on NAGIOS just requires one extra variable NCG_NRPE_UI
    • Install only egee-NAGIOS on your NAGIOS box and configure:
    • /opt/glite/yaim/bin/yaim -c -s /root/site-info.def -n NAGIOS

Slide 13: Demo of Installing ROC Nagios & gLite UI.

install-site-terminal-2009.png

Slide 14: Transmission of NRPE Configuration

  • Problem: NRPE node requires probe configuration generated by NCG.
  • NCG places tar ball of NRPE configuration on NAGIOS node.
  • NAGIOS calls a probe via NRPE that then on the gLite UI:
    1. Pulls tar.ball from web server (certificate auth)
    2. Unpacks tar.ball of NRPE configuration.
    3. Reloads (HUPs) NRPE daemon if needed.
  • Probe Name - NRPE-Push

Slide 15: More Info, PNP4Nagios and Debugging.

  • Some tests have "more info" configured in NAGIOS, e.g:
    1. check_ldap, links the NAGIOS documentation for check_ldap
    2. ENOC Links through the the ENOC test results on their web page.
    3. When metrics are available PNP4Nagios plots them.
  • Debugging with nagios-run-check
    • Requires a host and nagios service, look it up on web interface.
    • It provides the command line to run the test.

Slide 16: ENOC used Populate Network Parents.

  • Site Routers are collected from the ENOC who knows this.
  • Sites nodes are configured with router as their parent. gstat.png

 #  nagios-run-check -v -d -s org.nagios.LocalLogger-PortCheck \
      -H ce1.triumf.ca
   Executing command:
  su nagios -c '"/usr/lib64/nagios/plugins"/"check_tcp" -H "206.12.1.15" -t "60" "-p 9002"'
 # nagios-run-check -s org.nagios.LocalLogger-PortCheck -H ce1.triumf.ca
      CRITICAL - Socket timeout after 60 seconds
    • You can them copy paste the same command but add a -v for instance.

Slide 17: Tuning Probe Configuration

  • NCG supports configuration tweaks in /etc/ncg/ncg-localdb.d
  • Example: ADD_HOST_SERVICE!myproxy.example.org!MyProxy
    • Adds a new host as a MyProxy host.
  • Example: ADD_SITE!SecretSite and ADD_SITE_BDII!SecretSite!secretbdii.example.org
    • Adds a new site and specifies its BDII.
  • Other possibilities:
    • Remove a site.
    • Define a completely new probe.
    • Change a flag to probe, e.g change a timeout.
    • Remove a service or host.

Slide 18: Support, Bug Reports and Contributions.

-- SteveTraylen - 2009-09-17

Topic attachments
I Attachment History Action Size Date Who Comment
PNGpng ROCSiteNagios.png r1 manage 31.5 K 2009-09-19 - 18:11 SteveTraylen MutiSiteNagios
PNGpng gstat.png r2 r1 manage 131.8 K 2009-09-21 - 17:56 SteveTraylen Network topolgy.
PNGpng install-nagios-4-nrpe.png r1 manage 52.8 K 2009-09-18 - 17:14 SteveTraylen Nagios Install for use with NRPE Node.
PNGpng install-site-terminal-2009.png r1 manage 50.0 K 2009-09-18 - 16:17 SteveTraylen Installing a site nagios and UI.
PNGpng nagios-host-probe-results-2009.png r1 manage 122.3 K 2009-09-18 - 16:18 SteveTraylen Nagios Results of probes on Nagios Host.
Compressed Zip archivetgz yaim-ncg-tutorial2009.tgz r1 manage 2.1 K 2009-09-19 - 19:57 SteveTraylen Tar files containing configurations for this tutorial.
Edit | Attach | Watch | Print version | History: r11 < r10 < r9 < r8 < r7 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r11 - 2013-08-30 - TWikiGuest
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    EGEE All webs login

This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright &© by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Ask a support question or Send feedback