EMI Messaging Guidelines

Introduction

Here are guidelines for potential users of messaging within EMI. The goal is to provide enough practical information so that EMI product teams can start investigating whether using messaging in their products can be beneficial or not.

The first usage of commodity messaging in the Grid is quite recent (end of EGEE 3) and uses mainly:

  • broker: ActiveMQ 5.x
  • protocol: STOMP (Python: stomp.py, Perl: Net::STOMP::Client)
  • message format: something very close to JSON

However, other solutions do exist:

It is very important to keep in mind our environment:

  • the messaging world is changing fast
  • there is no clear best-of-breed messaging technology yet

Information

You should get familiar with the messaging technology in order to understand what messaging can and cannot do for you:

General Recommendations

Keep in mind that what is currently being used may change in the future.

Stick to mainstream features (i.e. the ones available in all major implementations) to avoid vendor lock-in. Use the MIG Messaging Basic Block as it can be implemented with all major brokers.

Minimize the amount of code you write to use the messaging infrastructure:

  • factorize common code
  • re-use existing code when if it fits your needs
  • isolate technology independent code from the rest, especially the brokers and destinations to use, which should be easily configurable

Do not (only) trust the documentation, test that things indeed work as you expect.

The following things may happen so be prepared for them:

  • messages can get lost (so you may have to resend them if you really care)
  • messages can arrive out of order (so you may have to reorder them if order is important)
  • messages can be delivered multiple times (so you may have to filter duplicates)

Do take into account security requirements:

  • by default, do not trust the data you receive
  • if needed, use cryptography: encryption (for privacy) and signing (to verify the sender)

Messaging should be used to transfer small pieces of information and not for large data transfers.

Messaging Protocol

Based on the EGEE experience and on its technology agnostic nature, the recommended messaging protocol to use is STOMP. It should be good enough for most use cases and it is supported by most brokers.

For STOMP, we recommend the following client libraries:

If you decide to use another protocol, this may bind you to a specific broker technology. For instance, OpenWire is only supported by ActiveMQ.

Although it is not a messaging protocol, the best solution for Java programs is probably JMS since most brokers support it. When changing broker technology, you will not have to change your code (as the API is standard) but you will have to change the library (JMS provider) used.

Message Header

The message header should be used to put simple information about the message body (aka meta data). It contains a list of key/value pairs.

Since what is allowed in the header may vary with the protocol and broker used, it is safer to restrict yourself to the minimum that is available everywhere:

  • use descriptive keys that do not conflict with well-known ones such as persistent or destination; ideally: prefix them like in glite-role
  • for the keys, use only ASCII letters, digits, dots and dashes (like a host name)
  • for the values, use only printable ASCII characters
  • avoid using too many keys (a dozen max)
  • avoid using too long values (1KB in total for the header is reasonable)

Message Body

The message body should be used to put what you want to send (aka data).

For simple applications, JSON is recommended: it is human friendly, widely used and well supported in many programming languages. This is very important because messaging is used to link different software components together: even if all your code is in Java today, tomorrow another component in Python will want to read your messages...

If you really need something else than JSON, there is no obvious best format for all use cases. See the MIG Message Formats page for more information.

Message Size

Messages are copied many times (client library, socket buffer, network, memory, disk...) and therefore should be reasonably small.

For the total message size (i.e. header plus body), the 1KB - 10KB range is probably the best and 1MB should be seen as an absolute maximum.

Large messages are incompatible with a high message rate. For instance, a broker that has been measured as being able to handle 700k 20B msg/s could not handle more than 1.1k 256kB msg/s. These were local benchmarks, the network could very easily become a bottleneck.

On the other end of the scale, too small messages are not efficient as 1B of body requires around 100B of header. It is much better to send 1 message of 1KB rather than 1k messages of 1B.

Applications can easily adjust the message size. They could split large data chunks onto several messages or, conversely, merge small data chunks in order to use larger messages.

Note also that, in some cases, compression can be very beneficial. For instance text data can typically be reduced by 90% with gzip. This however has a CPU cost.

Message Rate

Caveat: the only reliable performance numbers are the ones measured in a realistic environment.

Establishing a session to a broker can be very expensive, especially when using X.509 authentication. Try to minimise the number of sessions by grouping messages before sending them. Long lived connections can be problematic too as they consume resources on the broker. All the message rates below are for an existing session.

What matters is the total number of messages that come in and go out of the broker. For a topic with 10 subscribers, each incoming message will be delivered 10 times so the total count would be 11 messages.

For small messages, on WAN and using STOMP, an application should stay below 1k msg/s. On LAN and with a binary protocol such as OpenWire, it should stay below 10k msg/s.

For persistent messages (that are copied to disk to survive a service interruption), these numbers should probably reduced by one order of magnitude.

For big messages (more than 1kB), throughput can easily become a bottleneck. For instance: 1kB messages at 1k msg/s represent 10Mbit/s at network level.

Edit | Attach | Watch | Print version | History: r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r1 - 2011-02-03 - LionelCons
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    EMI All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback