Slide 1: Overview
- Introduction
- Messaging System
- Nagios - Messaging System Integration
- ROC->Site Configuration
- ROC->Site Metric Results
- SAM CE & WN Probes
- Tips & tricks
Slide 2: Introduction
- Interaction between monitoring instances
- Nagios and messaging system integration
- Data flow
- Messaging system topics
Slide 3: Messaging System
- Apache ActiveMQ
- FUSE Message Broker
- Status
- Networked brokers deployed at CERN and SRCE
Slide 4: Topics & Queues
- General rule
- topic per entity (e.g. site, roc/site, vo/site, ngi/site)
- topic per message type (e.g. metric results, configuration)
- Configuration
- /topic/grid.config.metricOutput.EGEE.role[.VO].sitename
- specific topic - keeps the last message
- JSON formatted configuration
- e.g. ROC publishing site's config
- /topic/grid.config.metricOutput.EGEE.roc.egee_srce_hr
Slide 5: Topics & Queues
- Metric results
- /topic/grid.probe.metricOutput.EGEE.role[.VO].sitename
- e.g. ROC publishing site's results
- /topic/grid.probe.metricOutput.EGEE.roc.egee_srce_hr
- format of messages:
serviceURI: grid01.uibk.ac.at:2170
serviceType: BDII
siteName: HEPHY-UIBK
metricStatus: OK
metricName: org.bdii.Freshness
summaryData: OK: createTimestamp=Mon Sep 21 21:35:26 2009 UTC, diff=0 min
gatheredAt: ha3-egee.srce.hr
timestamp: 2009-09-21T21:35:48Z
nagiosName: org.bdii.Freshness
role: roc
EOT
Slide 6: Topics & Queues
- Nagios instance specific queue
- grid.probe.metricOutput.EGEE.UUID
- used for results from SAM probes
- Notifications
- grid.probe.notification
- Nagios notifications
- used by Regional Dashboard
Slide 7: Nagios - Messaging System Integration
Slide 8: Nagios - Messaging System Integration
Slide 9: ROC->Site Configuration - Source
- NCG dumps config to SQLite based ConfigCache
- Nagios probe org.egee.SendToMsg publishes config
- configuration is on the defined topic
- /topic/grid.config.metricOutput.EGEE.roc.egee_srce_hr
Slide 10: ROC->Site Configuration - Destination
- Daemon msg-to-queue
- msg-to-queue is subscribed to list of defined topics
- MSGAdapter module parse configuration & store it to local ConfigCache
- configuration file example:
<MSGAdapter>
TOPIC="/topic/grid.config.metricOutput.EGEE.roc.egee_srce_hr"
<TOPIC_PARAMS>
activemq.retroactive=True
</TOPIC_PARAMS>
HANDLER="GridMon::MsgHandler::ConfigOutput"
<HANDLER_PARAMS>
CACHE_FILE=/var/cache/msg/config-cache/config.db
CACHE_TABLE=config_incoming
</HANDLER_PARAMS>
</MSGAdapter>
Slide 11: ROC->Site Configuration - Destination
- Nagios probe org.egee.ConfigCheck
- associated with Nagios host
- alarms if new config is available
Slide 12: ROC->Site Configuration - Destination
- Rerunning NCG imports ROC checks
Slide 13: ROC->Site Metric Results - Source
- Nagios handler dumps result to DirQueue based MsgCache
- handler is executed after each probe
- Nagios probe org.egee.SendToMsg publishes set of results
- results are on the defined topic
- /topic/grid.probe.metricOutput.EGEE.roc.egee_srce_hr
Slide 14: ROC->Site Metric Results - Destination
- Daemon msg-to-queue
- MSGAdapter module parse results & store them to local MsgCache
- configuration file example:
<MSGAdapter>
TOPIC="/topic/grid.probe.metricOutput.EGEE.roc.egee_srce_hr"
HANDLER="GridMon::MsgHandler::MetricOutput"
<HANDLER_PARAMS>
SOURCE=remote
CACHE_DIR=/var/spool/msg-nagios-bridge/incoming
</HANDLER_PARAMS>
</MSGAdapter>
Slide 15: ROC->Site Metric Results - Destination
- Nagios probe org.egee.RecvFromQueue
- associated with Nagios host
- imports results from local MsgCache to Nagios
Slide 16: SAM CE & WN Probes
- Utilize specific queue for publishing results from WNs
- Nagios probe org.sam.CE-JobStatus
- associated with each CE service
- executes complex SAM WN probes via WMS
- WN probes communicate back via specific queue
- Nagios probe org.sam.CE-JobMonit
- associated with Nagios or NRPE UI server
- updates status of all org.sam.CE-JobStatus probes on Nagios
Slide 17: SAM CE & WN Probes
- Daemon msg-to-queue and Nagios probe org.egee.RecvFromQueue
- importing WN results and org.sam.CE-JobSubmit
- Nagios probe org.sam.CE-JobSubmit
- remote probe reported via the messaging system
- associated with each CE service
- contains the final state of job started by org.sam.CE-JobStatus
- Nagios probes org.sam.WN-...
- remote probes reported via the messaging system
- associated with each CE service
- metric results from WN probes executed on WN boxes
Slide 18: Tips & tricks
- Speed things up by forcing internal checks
- force execution of metric of service on Nagios
- force publishing with org.egee.SendToMsg
- force consuming with org.egee.RecvFromQueue
Slide 19: Tips & tricks
- Use Extra Actions link
- see additional information
- force execution of metric on remote instance
- Results are not there?
- is ROC executing checks?
- is daemon msg-to-queue working?
--
PeterJones - 01-Jun-2010