TWiki
>
LCG Web
>
WLCGMonitoringConsolidation
>
NagConf
(revision 5) (raw view)
Edit
Attach
PDF
---+ Introduction This page documents the tools we use for configuring Nagios headnodes. There are two specific usecases * Configuring Nagios for a single VO (ie those instances managed by IT-SDC-MI). * Configuring Nagios at a grid site (quite possibly for more than one VO). ---+ Vision Here's how we think this will all work in the future. * If a machine is going to use the following approach to installing Nagios, it must have the following: * The necessary (vo-specific?) =ncg-metric-config= and =grid-monitoring-probes= RPMs installed. * A network connection to reach the POEM and VO feed URLs * We (will) ship an RPM which contains the =parser.rb= script (see below) along with a few Ruby template (=.erb=) files. * The =parser.rb= script itself needs to be run every n-hours. In our usecase, we'll probably use Puppet to manage a cronjob to do this. ---+++ Puppet-specific notes * In addition to the RPMs described above, you will need to set the =nagconf_vo= parameter to the VO of your choice in Foreman to trigger the initial installation of NagConf. Currently it's installed simply by copying files from the Puppet files directory, but in the future this will be an RPM install * It's possible to configure nagconf to only perfom "hostcheck" tests against a node (ie, is this node up) rather than all of the grid-specific tests, by setting the Foreman parameter =nagconf_hostcheck= to =true= ---+ Components ---++ =parser.rb= This script does the main work of configuring a Nagios instance. In production environments, this will most likely be executed by a cronjob, but can be called on the command line with the following parameters. * --vo: (Mandatory). Specify a single VO or list of comma separated VOs (e.g. atlas,cms) * --poem: (Optional). The POEM URL is hardcoded into =parser.rb=, but can be overridden with this command line option * --site: (Optional). If provided, Nagios will only be configured for the specified site's nodes. Use the GOCDB-registered name for the site. * --confdir: (Optional). Location to store Nagios configuration. Defaults to /etc/nagios/nagconf ---++ Execution logic 1 From =getPOEM()=, get the POEM XML data and store it in a nested hash (=@poem_hash=). An md5 checksum is calculated of the XML data (and the XML itself stored in a temporary file) so we know if it has changed since the last execution. 1 For each VO that we want to configure 1 With =getVOfeed()=, get the VO feed (as before, calculating an md5 checksum and storing the XML in a file) and create two lists and a hash: * The list =@all_sites= contains a simple list of all sites in the VO feed or, if the ==--site== parameter was passed on the command line, the single site that we're configuring Nagios for. This is only used by =template_host.groups.erb= file to create the Nagios =host.groups.cfg= file. * The list =@all_flavours= contains all service flavours for the site(s) we're interested in. It is used to create hostgroups and servicegroups (ie by =template_host.groups.erb= and =template_service.groups.erb=). * =@atp_hash= is a nested hash. Each top-level key is a site name, which points to a sub-hash containing mappings between hostnames and their corresponding service flavours. This is used by all =.erb= files in one way or another. 1 Run =buildNagios()= which is responsible for reading in the configuration files provided by the ncg-metric-config and grid-monitoring-probes rpms. This function is also responsible for running the =.erb= template files to generate the Nagios config. 1 Once all config files have been generated, the temporary files containing XML are renamed, so that their md5 checksums can be calculated the next time =parser.rb= runs. This allows us to decide whether the POEM or VO feed data have changed since last execution. ---++ =*.erb= files These are Ruby template files. They're basically Ruby (with some addittional, ugly, markup) that is executed to produce (many) Nagios configuration stanzas. It's a very efficient, though not pretty, way of doing this. ---++ =*.templates.cfg= files These are template configuration files, used by Nagios (not =parser.rb=) and stolen from an existing WLCG Nagios installation. They should be included in our own RPM. ---+ To do * Hard-code the cms.conf etc locations, rather than use current directory * Write to nagconf.new and then validate before restarting nagios - not sure if this is possible (nagios.conf would need to have nagios.new entry?) How about: move old config to nagconf.old; write new config in nagconf; if nagios -v succeeds great! else move nagconf -> broken and move nagconf.old -> nagconf... * Recently implemented multiple VOs. This is probably going to result in duplicated configuration objects in the *.host.cfg files, but will likely only affect site-specific installations. * Ability to force a re-generation of the config; this can currently be achieved by deleting the files =ncg_poem.tmp= and =ncg_vofeed*.tmp=; in fact, these files should probably also be renamed to =nagconf_poem= etc * There are some cases where the same node is defined to exist at two sites (with the same service). This causes problems because we end up creating duplicate host definitions in Nagios (i.e. multiple SITENAME directories contain the host definition in their host.cfg files). Nagios does not allow this. The current workaround is to only add the host into a host.cfg file once (the first time it's seen). Whether this will result in an unstable nagios config is unclear at the time of writing. Either way, we need to fix this in the long term. Oh, this also means we may end up with some empty host.cfg files (such as for CERN-PROD-AI, CERN-PROD-HLT and CERN-PROD).
Edit
|
Attach
|
Watch
|
P
rint version
|
H
istory
:
r7
<
r6
<
r5
<
r4
<
r3
|
B
acklinks
|
V
iew topic
|
Raw edit
|
More topic actions...
Topic revision: r5 - 2013-10-25
-
MikeKenyon
Log In
LCG
LCG Wiki Home
LCG Web Home
Changes
Index
Search
LCG Wikis
LCG Service Coordination
LCG Grid Deployment
LCG Applications Area
Public webs
Public webs
ABATBEA
ACPP
ADCgroup
AEGIS
AfricaMap
AgileInfrastructure
ALICE
AliceEbyE
AliceSPD
AliceSSD
AliceTOF
AliFemto
ALPHA
ArdaGrid
ASACUSA
AthenaFCalTBAna
Atlas
AtlasLBNL
AXIALPET
CAE
CALICE
CDS
CENF
CERNSearch
CLIC
Cloud
CloudServices
CMS
Controls
CTA
CvmFS
DB
DefaultWeb
DESgroup
DPHEP
DM-LHC
DSSGroup
EGEE
EgeePtf
ELFms
EMI
ETICS
FIOgroup
FlukaTeam
Frontier
Gaudi
GeneratorServices
GuidesInfo
HardwareLabs
HCC
HEPIX
ILCBDSColl
ILCTPC
IMWG
Inspire
IPv6
IT
ItCommTeam
ITCoord
ITdeptTechForum
ITDRP
ITGT
ITSDC
LAr
LCG
LCGAAWorkbook
Leade
LHCAccess
LHCAtHome
LHCb
LHCgas
LHCONE
LHCOPN
LinuxSupport
Main
Medipix
Messaging
MPGD
NA49
NA61
NA62
NTOF
Openlab
PDBService
Persistency
PESgroup
Plugins
PSAccess
PSBUpgrade
R2Eproject
RCTF
RD42
RFCond12
RFLowLevel
ROXIE
Sandbox
SocialActivities
SPI
SRMDev
SSM
Student
SuperComputing
Support
SwfCatalogue
TMVA
TOTEM
TWiki
UNOSAT
Virtualization
VOBox
WITCH
XTCA
Welcome Guest
Login
or
Register
Cern Search
TWiki Search
Google Search
LCG
All webs
Copyright &© 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback