Failover procedure for Grid (core) Services
This pages are based on a presentation given at the COD-15 meeting in Lyon an
IN2P3, 6-8 February 2008.
It describes several failover schemes to make Grid core and site services highly available against hardware failures.
List of Grid core services
Here is an example of typical Grid core service:
- Top level BDII
- Central LFC
- VOMS server
- WMS-LB/RB
- FTS
- Metadata servers (AMGA, 3d, etc.)
- MyProxy
Grid site services
Here is an example of typical Grid site services:
- CE
- SiteBDII
- Local LFC
- MON-Boc
- UIs/VOBOX
- Local Metadata servers
Failover levels
Depending on the architecture of the individual service we will define a list of service categories:
- Central service failover without shared data dependence BDII, WMS, ...
- Central service failover with shared data dependence LGC,VOMS, ...
- On site service failover (could be combined with load balancing)
- All Grid services and also non Grid site services (pbs, DNS, etc.)
Independent central services
This form of failover requires that the Grid service client has the availability to switch automatically from one service server to another.
A typical example for this is the top level
BDII. The lcg-utils client could be configured to use a list of top level
BDIIs. This is possible because each top level
BDII gathers the information individually from the site
BDIIs. This failover is very similar to the DNS failover for TCP clients.
Services with shared databases information
This failover scenario is valid for two servers, which share the same Grid service information stored in a service database. One of them is in active state and the other one is inactive. The database is synchronized between the servers. The client acceses the service through a virtual host or an alias which points to the real active server. In case of a failure of the active server this virtual host or alias will be changed to point to the passive server. An example for this could be an gLite LFC services with an synchronized oracle database. Also the LHC 3d service uses database synchronization with streams even in this case it is read only.
Services with shared information area
This case is similar to the previous scenario but as a requirement the client has to be aware of possible multiple service servers like in the first failover scenerio (independent central services) and both servers are in active state. Some locking mechanism of the database synchronization to avoid data inconsitency.
Site service failover
Here we show a simple service failover scenario for on-site Grid services using a classical HA (High availability) failover cluster. Both service nodes are able to access a shared disk area. One of the nodes is in an active state the other passive. The service is accessed by a single IP of the cluster belonging to the active node. The servers are interchanging a heartbeat signal to inform about the state of each other. If the active server fails, the second one takes over the cluster IP and starts the service using the service information on the shared storage. Before is starts the service it makes sure that the other server does not access to the storage at the same time by for example shutting it down (normally done by stonith: shoot the other node in the head ).
Site service failover using virtual machines (VM)
A more generic possibilty to make site Grid services HA failover redundant, that could be used for almost all services, could be realized using virtual machines in combination with failover Cluster software. To deploy this scenario the two Grid service nodes run on virtual machines, whose images and configuration files reside on the share cluster storage. Using for example Red Hats Global File System (GFS) and Cluster Logical Volume Manager (CLVM) both nodes can see the virtual machine files at the same time. When both nodes are working correctly the first Grid service VM runs on the physical server node 1 and the other on the physical server node 2. When one of the physical nodes fails, the affected virtual machine will be automatically live migrated to the other one by the cluster service and the service will only be interrupted for a small time (60-300 ms).
Load balancing with failover
The last failover scenario explained here will use Linux Virtual Server (LVS). LVS allows load balancing between two and more service nodes. The service request reaches the LVS node and will be forwarded one of the real Grid service nodes. There are different policies, which could be implemented for this load balancing like for example using the node which has a lower load or which has less open TCP/IP connections. The LVS can check if the node where it wants to forward the request is really up and if not take it out of the list of possible recipients so that a service failure will be avoided.
To prevent the LVS node to become a single point of failure a HA cluster service could be build using a second LVS server explained before (Site service failover). LVS could be also used for some of the upper scenarios to combine failover and load balancing.
Conclusions
Above some possible scenarios of implementing HA failover for central and local site Grid services were shown. To conclude some of the pros and contras will be listed.
- Service recovery should be implemented for al Grid services where it is possible
- Failover reached by installing a secondary service server
- No possible for all Grid services
- For some important VO services decentralized hosting could be of interest (LFC, VOMS, ...)
- Not single site dependence
- Technically complicated
- Higher costs (Oracle licenses, etc.)
- Site service clustering enables failover at the site
- Service runs like on a single machine but with failover
- Higher costs depended on the storage solution
- Each Grid service has to be handled differently
- Some Grid services are not clusterizable
- Service independent failover with virtual machines
- Theoretically all services could be made failover
- No hardware dependency on the Grid middleware OS
- Easy maintenance of the services (life migration)
- Loss of performance over all disc access
- Higher hardware requirements to get the same performance
- Higher costs depended on the shared storage environment
- Service load balancing and failover
- Enables load balancing with failover depending on the service
- Two other clustered machines needed
- More complex network structure
--
KaiNeuffer - 17 Apr 2008