Roger: alarm masking and state management


What is roger?

Roger is more less a functional replacement for the old quattor "sms" tool. There are a few differences, but the main one is that alarm masking and application/drain state are now separate.

What is roger if I've never heard of sms?

It manages two important pieces of machine state. Whether you want alarms switched on (or, more accurately, whether you want alarms to be masked), and what the current state of the machine is. Optionally you can then use those state transitions to take actions, such as to remove a machine from a load balancer.

How do I get it?

It's on aiadm. To install it onto your own machines, which means you get roger_actions too, then include the following in your manifest:

include teigi::roger::client

Who can use it?

If you have root on a machine, or own the machine in landb, then you will be able to make changes to the state of that machine in roger.

Can't I just use puppet for this?

Well, you could, but a lot of the time machine state and alarm masks need to be changed because a machine is having problems. If the machine is healthy enough to run puppet, then sometimes you wouldn't need to turn the alarms off...

Ok, but is it integrated with puppet?

Yes. Machines first add themselves to roger via puppet. Data from roger is added to the machine via a puppet catalog compilation, so it can be used in puppet catalogs, actions scripts are executed when puppet runs, and they get there in the first place with puppet configuration. Also the machine itself, using its host keytab, can set its own data in roger. It is therefore possible to use hostgroups to maintain drain state, if that is useful, and have puppet code update roger so external machines can act on the data.

What are the commands?

The commands should be up to date on aiadm, but some man pages are linked from here - they are more likely to be up to date on aiadm however...




Current foreman integration

The alarm masking was made with GNI in mind, but in the meantime, roger uses your kerberos credentials to switch alarms on and off in foreman for your machines. As it's using LAS and foreman in the background, this means there's only an on/off toggle for all alarms. It also requires you have the correct permissions in foreman for the machine.

Future GNI integration

The alarms are being split into the following:

  • nc_alarms: no contact alarms, which are raised when your machine becomes unresponsive
  • hw_alarms: hardware alarms from physical hardware
  • os_alarms: operating system alarms
  • app_alarms: application alarms

The idea being that alarms will get routed to the right groups to deal with them.

App state

Available fields

Currently the following are available: - production - draining - quiesce There isn't anything specific in roger itself that happens as a result of the different states, it's up to downstream use to decide what to do with different states. However as an example, the load balancer action has the following logic:

if new_state != old_state:
    if new_state == "production":
        /bin/rm -f /etc/iss.nologin
        /bin/touch /etc/iss.nologin


Host actions

roger_actions has two modes of running. Without options, it will contact the roger server, get the current key/value pairs, and compare them with the contents of /etc/roger/current.yaml. For any deltas it will execute scripts in /etc/roger/actions/$key/* with the old state as $1 and the new state as $2. If run with --puppet, instead of contacting roger, it will instead look in /etc/roger/puppet.yaml. It's the responsibility of puppet to populate that file.

Off host actions

roger writes to a messaging bus for all changes, and consumers can take action based on state changes. Please contact us if you'd like to use this facility.
Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r2 - 2014-01-10 - GavinMcCance
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback