What is Roger?

Roger is a front end to a riak key/value store with some authentication and authorization that matches CERNs machine ownership model. Key/Values useful to monitoring are stored, can be updated/queried from anywhere at anytime, or can be used to filter into puppet via compilations. On the machine side they can be used to fire off actions according to value transitions.

What keys & values exactly?

Here's an example entry:

$ curl --key ~/private/x509/priv.pem --cert ~/private/x509/bejones.pem -k https://aiteigi01.cern.ch:8202/roger/v1/state/aiteigi01.cern.ch/
{"app_alarmed": false, "appstate": "production", "expires": "", "hostname": "aiteigi01.cern.ch", "hw_alarmed": true, "message": "switch hw alarms on", "os_alarmed": false, "update_time": "1375889868", "updated_by": "bejones"}

key description expected values
app_alarmed toggle application alarms for target true or false
os_alarmed toggle operating system alarms for target true or false
hw_alarms toggle hardware alarms for target true or false
appstate is the machine in production or some drain state production, draining, quiesce
expires the time at which an entry will expire epoch time string
hostname name of the machine fqdn
update_time when the entry was created or updated epoch time string
updated_by authenticated user who updated the entry string
message a few words about why string

In the case of expiry, any "get" operation will ensure that if an entry has expired, you get the previous entry from the history. The use case is people wanting to add/remove alarms for a set period of time, as is done currently with sms.

What's the API

It's REST, but for the purposes of monitoring I presume that it's only really queries that you'll be interested in. Please note that in the following examples the server name of the URLs is not going to be production, but the rest should be.

Note there are two ports: 8201 for kerberos (mod_auth_krb), and 8202 for ssl. You can use either. All connections must be authenticated, but other than that anyone can read (the exception being machines using their cert or keytab, which can only see their own entries).

So, again, here's getting one entry:

$ curl --key ~/private/x509/priv.pem --cert ~/private/x509/bejones.pem -k https://aiteigi01.cern.ch:8202/roger/v1/state/aiteigi01.cern.ch/
{"app_alarmed": false, "appstate": "production", "expires": "", "hostname": "aiteigi01.cern.ch", "hw_alarmed": true, "message": "switch hw alarms on", "os_alarmed": false, "update_time": "1375889868", "updated_by": "bejones"}

You can get everything (and wiki isn't going to help with the formatting of this much):

$ curl --key ~/private/x509/priv.pem --cert ~/private/x509/bejones.pem -k https://aiteigi01.cern.ch:8202/roger/v1/state/
{"meta": {"limit": 20, "next": "/roger/v1/state/?limit=20&offset=20", "offset": 0, "previous": null, "total_count": 1562}, "objects": [{"app_alarmed": true, "appstate": "production", "expires": "", "hostname": "p05153026053325.cern.ch", "hw_alarmed": true, "message": "bulk update from puppetdb", "os_alarmed": true, "update_time": "1375912723", "updated_by": "bejones"}, {"app_alarmed": true, "appstate": "production", "expires": "", "hostname": "lxbrf18b04.cern.ch", "hw_alarmed": true, "message": "bulk update from puppetdb", "os_alarmed": true, "update_time": "1375912468", "updated_by": "bejones"}, {"app_alarmed": true, "appstate": "production", "expires": "", "hostname": "lxbsp2810.cern.ch", "hw_alarmed": true, "message": "bulk update from puppetdb", "os_alarmed": true, "update_time": "1375912579", "updated_by": "bejones"}, {"app_alarmed": true, "appstate": "production", "expires": "", "hostname": "lxbrf18b11.cern.ch", "hw_alarmed": true, "message": "bulk update from puppetdb", "os_alarmed": true, "update_time": "1375912471", "updated_by": "bejones"}, {"app_alarmed": true, "appstate": "production", "expires": "", "hostname": "lxbsq2302.cern.ch", "hw_alarmed": true, "message": "bulk update from puppetdb", "os_alarmed": true, "update_time": "1375912623", "updated_by": "bejones"}, {"app_alarmed": true, "appstate": "production", "expires": "", "hostname": "lxbsp2335.cern.ch", "hw_alarmed": true, "message": "bulk update from puppetdb", "os_alarmed": true, "update_time": "1375912551", "updated_by": "bejones"}, {"app_alarmed": true, "appstate": "production", "expires": "", "hostname": "b6502beeeb.cern.ch", "hw_alarmed": true, "message": "bulk update from puppetdb", "os_alarmed": true, "update_time": "1375912366", "updated_by": "bejones"}, {"app_alarmed": true, "appstate": "production", "expires": "", "hostname": "p05153026607938.cern.ch", "hw_alarmed": true, "message": "bulk update from puppetdb", "os_alarmed": true, "update_time": "1375912758", "updated_by": "bejones"}, {"app_alarmed": true, "appstate": "production", "expires": "", "hostname": "lxbsq2215.cern.ch", "hw_alarmed": true, "message": "bulk update from puppetdb", "os_alarmed": true, "update_time": "1375912618", "updated_by": "bejones"}, {"app_alarmed": true, "appstate": "production", "expires": "", "hostname": "c2repacksrv401.cern.ch", "hw_alarmed": true, "message": "bulk update from puppetdb", "os_alarmed": true, "update_time": "1375912447", "updated_by": "bejones"}, {"app_alarmed": true, "appstate": "production", "expires": "", "hostname": "b6f395192e.cern.ch", "hw_alarmed": true, "message": "bulk update from puppetdb", "os_alarmed": true, "update_time": "1375912414", "updated_by": "bejones"}, {"app_alarmed": true, "appstate": "production", "expires": "", "hostname": "p05153061403857.cern.ch", "hw_alarmed": true, "message": "bulk update from puppetdb", "os_alarmed": true, "update_time": "1375912823", "updated_by": "bejones"}, {"app_alarmed": true, "appstate": "production", "expires": "", "hostname": "b64a42d2e0.cern.ch", "hw_alarmed": true, "message": "bulk update from puppetdb", "os_alarmed": true, "update_time": "1375912364", "updated_by": "bejones"}, {"app_alarmed": true, "appstate": "production", "expires": "", "hostname": "lxbrf29b10.cern.ch", "hw_alarmed": true, "message": "bulk update from puppetdb", "os_alarmed": true, "update_time": "1375912509", "updated_by": "bejones"}, {"app_alarmed": true, "appstate": "production", "expires": "", "hostname": "p05151876953154.cern.ch", "hw_alarmed": true, "message": "bulk update from puppetdb", "os_alarmed": true, "update_time": "1375912720", "updated_by": "bejones"}, {"app_alarmed": true, "appstate": "production", "expires": "", "hostname": "b581fc2d4b.cern.ch", "hw_alarmed": true, "message": "bulk update from puppetdb", "os_alarmed": true, "update_time": "1375912343", "updated_by": "bejones"}, {"app_alarmed": true, "appstate": "production", "expires": "", "hostname": "p05153061053073.cern.ch", "hw_alarmed": true, "message": "bulk update from puppetdb", "os_alarmed": true, "update_time": "1375912791", "updated_by": "bejones"}, {"app_alarmed": true, "appstate": "production", "expires": "", "hostname": "b6db8ad68b.cern.ch", "hw_alarmed": true, "message": "bulk update from puppetdb", "os_alarmed": true, "update_time": "1375912408", "updated_by": "bejones"}, {"app_alarmed": true, "appstate": "production", "expires": "", "hostname": "p05151876753742.cern.ch", "hw_alarmed": true, "message": "bulk update from puppetdb", "os_alarmed": true, "update_time": "1375912716", "updated_by": "bejones"}, {"app_alarmed": true, "appstate": "production", "expires": "", "hostname": "p05153026904396.cern.ch", "hw_alarmed": true, "message": "bulk update from puppetdb", "os_alarmed": true, "update_time": "1375912780", "updated_by": "bejones"}]}

The main thing to note with list results is that there is always pagination. You have to be able to deal with it, but note that there's meta info of what the limit is, offset from 0, total count, and URIs for "next" and "previous".

Or, you can search. Most of the fields are in principle searchable, but in theory the update_time might be the most useful, to see what's changed since T:

$ curl --key ~/private/x509/priv.pem --cert ~/private/x509/bejones.pem -k https://aiteigi01.cern.ch:8202/roger/v1/state/?update_time__gt=1375912880
{"meta": {"limit": 20, "next": null, "offset": 0, "previous": null, "total_count": 5}, "objects": [{"app_alarmed": true, "appstate": "production", "expires": "", "hostname": "siteargus02.cern.ch", "hw_alarmed": true, "message": "bulk update from puppetdb", "os_alarmed": true, "update_time": "1375912881", "updated_by": "bejones"}, {"app_alarmed": true, "appstate": "production", "expires": "", "hostname": "vmargus02.cern.ch", "hw_alarmed": true, "message": "bulk update from puppetdb", "os_alarmed": true, "update_time": "1375912883", "updated_by": "bejones"}, {"app_alarmed": true, "appstate": "production", "expires": "", "hostname": "vmargus01.cern.ch", "hw_alarmed": true, "message": "bulk update from puppetdb", "os_alarmed": true, "update_time": "1375912882", "updated_by": "bejones"}, {"app_alarmed": true, "appstate": "production", "expires": "", "hostname": "vmargus03.cern.ch", "hw_alarmed": true, "message": "bulk update from puppetdb", "os_alarmed": true, "update_time": "1375912883", "updated_by": "bejones"}, {"app_alarmed": true, "appstate": "production", "expires": "", "hostname": "siteargus03.cern.ch", "hw_alarmed": true, "message": "bulk update from puppetdb", "os_alarmed": true, "update_time": "1375912881", "updated_by": "bejones"}]}

Search is django style, so key and "gt, gte, lt, lte" joined with two underbars.

What will production look like?

It's riak, so it scales horizontally. All machines have to be able to query their data directly. In the first instance there'll be three servers, behind a DNS load balancer.

Edit | Attach | Watch | Print version | History: r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r1 - 2013-08-08 - BeJones
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback