MSG Architecture

Introduction

The Messaging system for Grids (MSG) is an implementation of a messaging system initially targetted at improving integration between operational tools used in the EGEE grid project and other grid projects used by WLCG. WE believe it can also be used in wider contexts too, for example, user-level monitoring. For more information on messsaging systems the following wikipedia entries are useful background reading:

This document gives a high-level overview of the MSG Architecture, describing the various components and their responsibility. We describe the physical characteristics and configuration of the MSG Server (Broker) here.

Architectural models

MSG implements both the Point-to-Point and Publish-Subscribe models.

Point-to-Point

Here publisher sends a message to a destination (queue) and a consumer reads from the queue. This model has the following characteristics: * Only one consumer will get the message * A message is removed from the queue once it is successfully consumer * The whole interaction is asynchronous. The consumer does not have to be active when the message is published to a queue.

Publish-subscribe

In a publish-subscribe model a client can act in one of two roles:
  • Publish : Publishers create messages and publish them to a destination (topic)
  • Subscribe : Consumers create a subscription to a topic in order to receive messages that meet certain criteria.

This model has the following characteristics:

  • Multiple consumers can receive the same message
  • A consumer only receive messages created after their subscription
  • A consumer must stay active, unless they have created a durable subscription. If a subscription is durable, it will persist even if a consumer is not connected. At next connection, all messages received while the consumer was disconnected are available for consumption.

Message Broker

In both models, we use a Broker to routes messages from publishers to subscribers. Clients do not communicate directly; all communication goes through the intermediary of the broker. This leads to the asynchronous behavior inherent in the models.

0811-Messaging-models-v1.0.png

Among the advantages of using a broker are :

  • Simplicity - a client merely needs to know where a broker is, not where all possible clients might be
  • Reliable delivery - a broker can provide a guarantee to a producer that it will deliver the messages
  • asynchronous behavior - not all publishers and consumers need to be connected at all time
  • Persistence - even if consumers are not currently connected, the broker can store them for later delivery
  • Different communication models depending upon application : Many different scenarios such as point-to-point and broadcast are supported by a single broker

There are also some drawbacks:

  • Need for an extra component in the form of the broker.
  • Since all communication goes through the broker, it needs to be highly available
  • The broker can also be a bottleneck since all communication routes through it.

Federations of brokers

Due to the large scale of the EGEE grid, we plan to use a network of brokers to reduce network latency for clients. Also, this will help improve robustness and reliability. We initially project that we'll have a backbone of three message broker clusters, with related projects (such as OSG) hosting their own broker for interoperation.

0811-Messaging-backbone-v1.0.png

Routing in a messaging network

A message consists of a Header and a Body. The body is transparent to the messaging system - the message system never looks inside and hence no decisions on routing are made based on it. A header consists of (Keyword, Value) pairs which can be used for routing decisions. A message is sent to a destination. Routing decisions are made based on contents of the header and the destination. There are various ways that the same routing can be done using different features of the messaging system such as:
  • Wildcard subscriptions - a subscription to a set of destinations that meet a certain wildcard e.g. sites.*
  • Selectors - select a subset of information on a destination based on features of a header e.g. site.NOTIFICATION where header 'Region' equals 'UKI'
There are difference performance profiles for each solution depending on the numbers of consumers, producers and the complexity of the selectors. It is important to benchmark various solutions at scale before choosing one !

The routing model is consumer-driven i.e. the consumer decides what messages it wants to receive - the producer never knows the end client of its messages. To illustrate this, let us consider a simple 'heartbeat' application where a monitoring server at each site sends notifications that it is running. Each site sends a message every 20 minutes to say it is alive. A consumer at the ROC level monitors all sites within it's ROC while a consumer at the EGEE project level monitors the heartbeat of all sites on the grid. The entities involved are shown below.

0811-Messaging-models-b-v1.0.png

there are two possible ways to model this in MSG:

  • Wildcard subscriptions. In this model, each site sends the notifications to a site-specific destination e.g site.<SITE_NAME>. The ROC-level consumer will create a subscription for each site it wants to monitor. The project level consumer will create a single wildcard subscription for all sites. So we will have the following:
    • Producers:
      • Site 1 - send messages to site.SITE1
      • Site 2 - send messages to site.SITE2
      • Site 2 - send messages to site.SITE3
    • Consumers:
      • ROC Level consumer - 2 subscriptions to site.SITE1 and site.SITE2 respectively
      • Project level consumer - 1 subscription to site.*
  • Selectors. When using selectors, all the sites send their notifications to the same destination site.heartbeat. Each site provides a header called 'ROC' which is the name of the ROC they belong to. The ROC-level consumer subscribes to this destination with a selector of 'ROC' = 'MyROC'. The Project-level consumer simply subscribes to all messages on the destination. So we have the following configuration:
    • Producers:
      • Site 1 - send messages to site.heartbeat with header ROC: MyROC
      • Site 2 - send messages to site.heartbeat with header ROC: MyROC
      • Site 2 - send messages to site.heartbeat with header ROC: OtherROC
    • Consumers:
      • ROC Level consumer - 1 subscription to site.heartbeat with a selector 'ROC' equals 'MyROC'
      • Project level consumer - 1 subscription to site.heartbeat

Other MSG features

There are other messaging features, which could be of use in the above heartbeat examples. For instance:
  • Persistence: If you need an absolute guarentee that a message will be delivered even if the brokers crash and restart you can use persistent messages. These are saved to a disk-based message store at the broker in order to preserve them across broker restarts. Note that persistence comes at a cost - often an application using persistent messages will only run at 1/10 the rate of the same application using non-persistent messages.
  • Message Expiration: In our example, a heartbeat is sent every 20 minutes. Therefore the usefulness of a heartbeat is only 20 minutes - once the next one arrives it is no longer useful to have old heartbeats. One could set an expiration time on the message therefore of 20 minutes. After that time if the broker is still waiting to deliver the message, it will just delete the message instead.
  • Last image caching: When a consumer subscribes to a destination, it normally will receive only messages sent after the time of subscription. Often it is useful to get a good initial value - most usually the last message received on the destination. This is called Last Image Caching. Note that in our heartbeat example it would only be useful in the 'wildcard subscriptions' model where each site sends the heartbeat to a separate destination. Some brokers provide other subscription recovery policies such as 'last n messages', 'last m bytes' or 'all messages in last x seconds/minutes'.

Note that some of the above features such as persistence and message expiration are standard on brokers that implement standards like JMS while more advanced features are often specific to the particular broker implementation. All the above features (and many more!) are available in the broker we currently use - Apache ActiveMQ

Protocols and Interoperation

Different message brokers provide difference client bindings and on-the-wire bindings, with the best known probably being JMS. There are also broker-specific bindings and protocols such as openwire for ActiveMQ. These typically provide higher performance and access to broker-specific features at the cost of lack of interoperability with other broker implementations.

For MSG, we require client bindings in a wide variety of languages, such as C, C++, Python, Perl. Also we don't want to get tied into a specific vendor binding. Therefore we look at broker-neutral protocols. The two main competitors in this area are STOMP and AMQP.

STOMP

STOMP is a text-based protocol based on the design principles of protocols such as HTTP. It is text based, and clients are very easy to write. For debugging purposes you can telnet to a STOMP server and directly enter commands. The commands send to the server by a simple STOMP publisher might look like :
CONNECT
login: <username>
passcode:<passcode>

^@

SEND
destination:/queue/site.heartbeat

hello queue a
^@

DISCONNECT

^@

A simple STOMP producer, written in perl, would look like:

use Net::Stomp;
my $stomp = Net::Stomp->new( { hostname => 'stomp-host.example.com', port => '6163' } );
$stomp->connect () ;
$stomp->send({'destination' => '/queue/site.heartbeat', body => 'test message' } );
$stomp->disconnect;

A simple STOMP consumer, written in perl, would look like:

use Net::Stomp ;
my $stomp = Net::Stomp->new ({hostname => 'stomp-host.example.com',
                               port => '6163' }) ;
$stomp->connect () ; 
$stomp->subscribe ( {
                   'destination' => '/queue/site.heartbeat',
                   'ack'         => 'client',
                   'activemq.prefetchSize' => 1});
while(1) {
     my $frame = $stomp->receive_frame;
     warn $frame->body ;
     print $frame->as_string ;
     $stomp->ack({frame => $frame} ) ;
}
$stomp->disconnect ;
There is a wide variety of client libraries in many languages proving STOMP bindings -more details are available on the STOMP website.

AMQP

AMQP is a new messaging protocol, originally defined by organisations working on messaging solutions for the finalncial industry, that is designed to be interoperable between different broker implementations. It provides a efficient wire-level protocol, along with automatically generated client bindings. It also defines the semantics and behavior of the broker.

There are several AMQP brokers available including Apache Qpid, RabbitMQ and OpenAMQ. There is a bit of version fragmentation between the different implementations, which implement AMQP 0.8, 0.9 and 0.10. A 1.0 specification is currently under development.

Currently, we track the development AMQP as a possible interoperability specification for MSG.

MSG Components

Msg Broker

We use Apache ActiveMQ as the message broker.

Client Bindings

Edit | Attach | Watch | Print version | History: r11 < r10 < r9 < r8 < r7 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r11 - 2008-12-09 - JamesCasey
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    EGEE All webs login

This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright &© by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Ask a support question or Send feedback