Using ActiveMQ Store and MSG publisher/MSG consumer python scripts.
http://activemq.apache.org/amq-message-store.html
Test Scenarios:
-- Send messages in loop for 2 minutes. Pause 10 minutes. Send random few messages for 10 minutes, pause 10 minutes. Duration: 1 hour. Message Size: random 1,2,5k.
First observations:
A lot of messages were lost when running the stress cycle. - Actual problem was the limit for open file descriptors: Each script would open a new connection. Rate of closing the connections by the server was lower than the creation of new connections. After ulimit was increased, this problem was stabilized.
Persistence seems to work as long as the server processes the messages into the message store (no messages are lost do the connection failure previously described).
Increasing the first loop to 10 minutes, a big degradation occurs with the increase on the number of open connections.
Sent messages 30051 - Received messages: 29449 Lost: ~2%
Error occured for a few messages: " ERROR RecoveryListenerAdapter - Message id ID:lxb6118.cern.ch-51583-1204887427969-4:52708:-1:1:1 could not be recovered from the data store! "
Running 2 producers, 1 consumer, messages in bulks of 1000 x 1K, For 3hours. Loop: Send maximum messages for 1 hour, sleep 10 min, send few messages for 3min, sleep 5 min, repeat. producer2 kicks in 40 minutes after the first producer.
A few messages failed: The only traceable errors were
2008-03-11 18:23:00,139 [138.5.237:33191] ERROR Service - Async error occurred: java.lang.RuntimeException: org.apache.activemq.kaha.RuntimeStoreException: java.io.IOException: Could not locate data file data-topic-data-1
2008-03-11 18:23:02,840 [138.5.237:33191] ERROR DataManagerImpl - Looking for key 1 but not found in fileMap: {2=data-topic-data-2 number = 2 , length = 33554418 refCount = 7316, 3=data-topic-data-3 number = 3 , length = 4831686 refCount = 2322}
2008-03-11 18:23:02,840 [138.5.237:33191] ERROR MapContainerImpl - Failed to get value for offset=730779, key=(1, 3779446, 53), value=(1, 3779504, 69), previousItem=0, nextItem=-1
2008-03-11 18:23:02,941 [138.5.237:33191] ERROR TopicStorePrefetch - Failed to fill batch
2008-03-11 18:23:02,941 [138.5.237:33191] ERROR Service - Async error occurred: java.lang.RuntimeException: org.apache.activemq.kaha.RuntimeStoreException: java.io.IOException: Could not locate data file data-topic-data-1
2008-03-11 18:23:09,512 [42.131.89:33644] ERROR DataManagerImpl - Looking for key 1 but not found in fileMap: {2=data-topic-data-2 number = 2 , length = 33554418 refCount = 7316, 3=data-topic-data-3 number = 3 , length = 4886960 refCount = 2300}
2008-03-11 18:23:09,512 [42.131.89:33644] ERROR MapContainerImpl - Failed to get value for offset=730779, key=(1, 3779446, 53), value=(1, 3779504, 69), previousItem=0, nextItem=-1
2008-03-11 18:23:09,614 [42.131.89:33644] ERROR TopicStorePrefetch - Failed to fill batch
2008-03-11 18:23:09,617 [42.131.89:33644] ERROR StoreDurableSubscriberCursor - Failed to get current cursor
Already sent a message to activemq users mailing list to see if someone knows if it is an issue. I will try to reproduce it in the meantime. First messages lost on the producer Plxplus225.cern.ch-570 was {179945,179946}:
in total, 523037 messages were sent, 520206 received. (0,54% lost)
On producer Plxplus236-570 519037 were sent, 516946 received.(0,40% lost) First messages lost: {17543;17544}