High Availability Implementation for VOMS

The CERN requirements for VOMS requires a highly available configuration. As discussed in VomsNotes, the high availability functions for the critical VOMS are available as standard in the application using a shared reliable database and an application front end. This does not cover the administration interface which is rated as a Medium criticality service.

A voms-ping function would be required to provide a way for the slave to monitor the master status. voms-ping should take the server name as an argument (like ping) and perform an VOMS application level connection to the server and check that the application can reply. While this does not guarantee the entire application is running, it covers the most common use cases (such as core dump of server or machine motherboard failure).

The VOMS Ping script

This script has been provided here, with the name 'voms-ping' and must be run directly from the server that must be tested, without parameters. Page LCGVomsCernSetup contains the relevant rpm.

Its return value will be 0 if all the server are up and running, and 1 otherwise. In case the result is 1, then the output of the script will list exactly what server had problems, and whether that problem was in the core server or in the admin components.

The VOMRS Ping script

This is in preparation (see table row in VomsServiceMonitor). It should be integrated in LinuxHA and only run on the host which is the master.

Configuration assuming Linux HA

To provide a full high availability function for VOMS,

  • Master/Slave set up using Linux-HA and a shared database containing all the state data
  • No high availability is provided as part of the VOMSRS interface

Using Linux-HA with a small voms resource script (start/stop/monitor/status) provide this function. The take over time is estimated at around 30 seconds following detection of a failure. There may be a substantial delay between occurrence of failure and detection.

The HA configuration has been implemented as follows

Edit drawing `VomsWlcgHaNormal` (requires a Java 1.1 enabled browser)

In the event of a failure or an operator initiated switch for planned maintenance, the configuration is changed

  • Service IP now points to slave server

Edit drawing `VomsWlcgHaFail` (requires a Java 1.1 enabled browser)

Resource Switching

The VOMS application consists of several components

The configuration proposed is that VOMS and VOMS Admin should always be running on the master and slave. Only VOMRS should be stopped and re-started when the server switches. The monitoring would also reflect this selection.

https://savannah.cern.ch/bugs/?func=detailitem&item_id=15788#comment2 requests a version-invariant vomrs service. This is needed for leaving file /etc/ha.d/resource.d/gridvoms untouched across vomrs releases. The LinuxHA set-up is included in rpm CERN-CC-gridvoms-1.1-3 in CDB and on the hosts. The LinuxHA activity is logged in /var/log/messages both on voms102 and voms103.

Conclusion

For the cost of two machines with small disk space and a highly available database backend, a highly available VOMS implementation can be made which is resiliant to network, machine and storage failures.

-- TimBell - 19 Oct 2005

Topic attachments
I Attachment History Action Size Date Who Comment
Unknown file formatdraw VomsWlcgHaFail.draw r4 r3 r2 r1 manage 2.6 K 2006-05-18 - 12:05 TimBell TWiki Draw draw file
GIFgif VomsWlcgHaFail.gif r4 r3 r2 r1 manage 2.5 K 2006-05-18 - 12:05 TimBell TWiki Draw GIF file
Unknown file formatdraw VomsWlcgHaNormal.draw r3 r2 r1 manage 2.5 K 2006-05-18 - 11:50 TimBell TWiki Draw draw file
GIFgif VomsWlcgHaNormal.gif r3 r2 r1 manage 2.4 K 2006-05-18 - 11:50 TimBell TWiki Draw GIF file
Edit | Attach | Watch | Print version | History: r12 < r11 < r10 < r9 < r8 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r12 - 2008-08-05 - SteveTraylen
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback