WSO2 Message Broker Tuning for Production

WSO2 message broker can be set up as a standalone node or it can be set up as a cluster. In both cases tuning the product to suit the messaging needs is important. This guide addresses what are the tuning parameters affecting the performance of the server, how they depend on each other and what each parameter does on the background.




Deployment Tuning and Recommendations 

First there are two main things to be considered deployment wise when using WSO2 MB.

1. You need to allocate enough memory for the MB server. By default it is set to 1 GB in /bin/wso2server.sh (for Windows wso2server.bat) file

-Xms256m -Xmx1024m -XX:MaxPermSize=256m 

Generally we recommend to set at least 2 GB memory for production instances.

2. Even if we ship Apache Cassandra server embedded into WSO2 MB it is always recommended to run Apache Cassanda as a separate server instance and point WSO2 Message broker to that external instance. 

To point WSO2 MB to an externally running Cassandra server following configs are used.


  • Edit <MB_HOME>/repository/conf/advanced/qpid-config.xml file as follows.




...
<clustering>
    <externalCassandraServerRequired>true</externalCassandraServerRequired>



  • Edit <MB_HOME>/repository/conf/advanced/qpid-virtualhosts.xml file. You should edit the section relevant to the virualhost you are using. By default this virtualhost would be “carbon”. Thus we edit the <store> section under that virtualhost.  Use proper username and password here to connect to Cassandra server.



<store>
                <class>org.wso2.andes.server.store.CassandraMessageStore</class>
                <username>admin</username>
                <password>admin</password>
                <cluster>ClusterOne</cluster>
                <idGenerator>org.wso2.andes.server.cluster.coordination.TimeStampBasedMessageIdGenerator</idGenerator>
                <connectionString>localhost:9160</connectionString>


3. Even if WSO2 MB is packed with Apache Zookeeper server inside, if you are deploying the product as a cluster, it is recommended to use an external Zookeeper server. Relevant configurations are

In qpid-config.xml file
 <!--Are we running an External Zookeeper server ? true|false -->
         <externalZookeeperServerRequired>true</externalZookeeperServerRequired>
        <coordination>
            <!-- Apache Zookeeper Address -->
            <ZooKeeperConnection>127.0.0.1:2181</ZooKeeperConnection>
             <!-- Format yyyy-MM-dd HH:mm:ss -->

There are SASL security configurations needed to be done to communicate with the zookeeper server. Please refer them here.


Most of the performance related tuning for WSO2 MB resides in qpid-config.xml under <tuning>. We will look at most critical configurations.

Message Batch Sizes Tuning


For Queue Messages (and Durable Topic Messages 2.2.0 onwards)

1. <maxNumberOfUnackedMessages>1000</maxNumberOfUnackedMessages>

This is for flow control. When MB delivers messages to a particular subscriber it keeps track of acknowledgements received from that subscriber (channel). If the subscriber fails to ack fast enough and unacked message count exceeds this limit MB will mark it has no further room to receive messages. It will skip sending messages to that subscriber until it acks for some of sent messages.

2. <maxNumberOfReadButUndeliveredMessages>1000</maxNumberOfReadButUndeliveredMessages>

When delivering messages to subscribers we read messages from Cassandra server and keep them in memory getting them ready to be sent. We do not collect messages in memory scheduling them to deliver if we already have enough messages scheduled in the queue. 

Increasing this value will increase message delivery speed if subscribers ack faster. Consider average size of a message when increasing  this value because even if a small value it can fill up the memory.

3. <messageBatchSizeForSubscribersQueues>20</messageBatchSizeForSubscribersQueues>

not used (depreciated) 

4. <messageBatchSizeForSubscribers>

    4.1 <default>50</default>

When messages are read into memory to deliver to subscribers MB will read messages from Cassandra store. For an iteration by default (when server starts) it will read this number of messages from Cassandra within an iteration. If it could read the above given number of messages it will try to read 100 messages more from store next time it reads. If it failed to read above given number of messages in the next iteration it will read 50 messages less. 

     4.2 <max>300</max>

In above process message reading from Cassandra will not exceed this number.

     4.3 <min>20</min>

In above process broker will anyway try to read this number of messages. Message reading count will not go below this value.

5. <globalQueueWorkerMessageBatchSize>700</globalQueueWorkerMessageBatchSize>

If you refer WSO2 MB messaging architecture here, we keep incoming messages in a common pool and distribute them among nodes. While doing that we will read this many number of messages into memory to distribute from common message pool. We will read only messge metadata here. Thus no need to consider message size when changing this parameter.

6. <messageBatchSizeForBrowserSubscriptions>200</messageBatchSizeForBrowserSubscriptions>

WSO2 Message Broker supports message browsing using JMS. When messages are browsed, when there is a lot of messages in queues we might not need to browse all the messages. Number of messages browseable can be limited using this parameter.

For Both Topic and Queue Messages


7. <medataDatePublisherMessageBatchSize>200</medataDatePublisherMessageBatchSize>

Inside MB server metadata of published messages are buffered before they are written to Cassandra storage. Message metadata is always in KB size range. When configuring this make sure it aligns with contentPublisherMessageBatchSize as both metadata and content should be there to make a complete message during message delivery. If content is written too more behind than metadata it can cause problems.

8. <contentPublisherMessageBatchSize>200</contentPublisherMessageBatchSize>

When messages are being published to the message broker we write content to Cassandra store. While this is done messages are buffered in memory.  If the internal buffer reaches this size of messages we do flush it. Thus when a high TPS is there, for incoming messages you can control buffering it via this configuration. 
If a high volume is flushed to Cassandra at once, it might take some time to replicate the written data, or it might block outside Cassandra readers and writers while it is writing a lot of data at once (Cassandra read timeouts can happen as other threads concurrently read from Cassandra at such a scenario). So many factors along with Cassandra tuning come into play.


Threading Tuning


1. <flusherPoolSize>10</flusherPoolSize>

This the the Thread pool size which will be used by the Sender Task to send messages asynchronously to the subscriber for Queue Messages.  You will have to consider the value of parameter and number of local queue subscriptions for the node when configuring this pool size. 

2. <subscriptionPoolSize>20</subscriptionPoolSize>

WSO2 MB has a “In Order Message Delivery Mechanism Across Broker Cluster”. This coordinates messages one by one hence very slow. This configuration is for configuring the pool size of thread pool to submit this message delivery threads.  Make this to a higher number if there is a lot of subscriptions to the system at a given instance with parameter set to true. (This mode of operation is discouraged because of the slowness of message delivery due to coordination across cluster for each message).

3. <internalSequentialThreadPoolSize>5</internalSequentialThreadPoolSize>

When message content and metadata is written to Cassandra Storage those jobs are submitted to a Thread pool. This configuration is for setting size for that thread pool. If TPS of incoming messages are high consider increasing this thread pool size for fast acceptance of incoming messages to be written to Cassandra storage.

4. <andesInternalParallelThreadPoolSize>50</andesInternalParallelThreadPoolSize>

This is the Thread pool size which will be used by the Andes core to schedule its internal parallel tasks, such as acknowledgement handling and message content writing. Consider increasing for fast message handling in general. 

5. <publisherPoolSize>50</publisherPoolSize>

Both queue messages and topic messages are scheduled to be delivered to local subscriptions of the node  using a thread pool of this size. Thus if there are lot of subscribers or lot of messages to be delivered at a given instance consider increasing this value. 


Tuning of WaitTimes 


1. <maxAckWaitTime>10</maxAckWaitTime>

When delivering queue messages to subscriptions we read messages from Cassandra store, keep metadata in-memory for a message set and deliver. When acknowledgement is received this internal memory keeping delivery information is cleared and message is removed from store. If acknowledgment was not received within above configured number of seconds internal delivery information will be cleared anyway and message will be scheduled to be redelivered later. 

2. <queueMsgDeliveryCurserResetTimeInterval>60000</queueMsgDeliveryCurserResetTimeInterval>

When queue messages are delivered to subscriptions, MB reset the reading of messages (messages are ordered to the arrival sequence) time to time in order to be fair with message distribution across subscriptions. Anyway, this reset will happen every 50 read iterations automatically. When reset only, older messages which failed to get delivered are attempted again to deliver. Set this value in milliseconds as necessary.

3. <maxAckWaitTimeForBatch>120</maxAckWaitTimeForBatch>

Max wait time(in seconds)  for a acknowledgement for Batch of messages that that is sent from subscribers. Not used now.

4. <queueWorkerInterval>500</queueWorkerInterval>

When delivering messages to subscribers messages are read from Cassandra store in iterations. If there are no messages to be read from Cassandra reading will wait above configured time before trying again. Also if there are no subscriptions to receive messages or read messages seems to be growing being late to process or after every 10 reading iterations to give Cassandra a break broker will wait above configured time before going on next iteration. 

5. <pubSubMessageRemovalTaskInterval>5000</pubSubMessageRemovalTaskInterval>

Messages for topics are removed automatically from Cassandra storage when eligible (decided by parameter) after every above configured number of milliseconds. Consider the load on Cassandra when configuring this value. 

6. <contentRemovalTaskInterval>4000</contentRemovalTaskInterval>

Messages for queues are removed automatically from Cassandra storage when eligible after every above configured number of milliseconds. Consider the load on Cassandra when configuring this value. In general broker, checking periodically, removes message metadata and content leisurely.

7. <contentRemovalTimeDifference>120</contentRemovalTimeDifference>

 Messages for topics are removed automatically from Cassandra storage when eligible. This eligibility is evaluated for all topic messages across cluster using above configured time interval in milliseconds. Meaning, every topic message will be removed (irrespective of delivered, not delivered) after above configured time is elapsed after being published to the broker. This is not applicable for durable topic messages from 2.2.0 onwards.

8. <topicPublisherTaskInterval>1000</topicPublisherTaskInterval>

Not used currently. 

9. <virtualHostSyncTaskInterval>3600</virtualHostSyncTaskInterval>

When configured in clustered mode exchanges, queues, and bindings are synchronized across the cluster every above configured time interval in seconds. This task is there for only rectifying any synchronization problems. 

Cassandra Tuning


As instructed above Apache Cassandra should be running in a dedicated server.

Memory allocated for cassandra is following :

In cassandra-env.sh

MAX_HEAP_SIZE="4G"
HEAP_NEWSIZE="800M"

 In general HEAP_NEWSIZE parameter should be 1/4 of the MAX_HEAP_SIZE

Override these to set the amount of memory to allocate to the JVM at start-up. For production use you may wish to adjust this for your environment. MAX_HEAP_SIZE is the total amount of memory dedicated  to the Java heap; HEAP_NEWSIZE refers to the size of the young generation. Both MAX_HEAP_SIZE and HEAP_NEWSIZE should be either set or not (if you set one, set the other).

The main trade-off for the young generation is that the larger it is, the longer GC pause times will be. The shorter it is, the more expensive GC will be (usually). The example HEAP_NEWSIZE assumes a modern 8-core+ machine for decent pause  times. If in doubt, and if you do not particularly want to tweak, go with 100 MB per physical CPU core.


In cassandra.yaml we have following setting for memtable :

flush_largest_memtables_at: 0.75
reduce_cache_sizes_at: 0.e85
reduce_cache_capacity_to: 0.6


Consider reducing the value 'flush_largest_memtables_at' around 0.45 under massive write load Following configurations are identified as good for large messages:


Example config which worked for large sized messages

Changed the memory allocations of the JVM In cassandra-env.sh file to the following values. (Server is 8GB memory)

MAX_HEAP_SIZE="6G"
HEAP_NEWSIZE="1500M"

Adjusted the memory table flushing threshold In cassandra.yaml as below.

flush_largest_memtables_at: 0.5

Changed the commit log related parameters to overcome the OOM issues observed with larger message sizes.

commitlog_total_space_in_mb: 16
commitlog_segment_size_in_mb: 16


Tune Other Products for JMS


ESB JMS related tuning


Set following system properties in wso2server.sh script
-Dsnd_t_core=200
-Dsnd_t_max=250

JMS transport is only allows to create 100 threads (default value for snd_t_max) and after that consumer threads are not getting created. Increase the above parameters according to the requirement.










Hasitha Hiranya

2 comments:

  1. Message Broker Online Training, ONLINE TRAINING – IT SUPPORT – CORPORATE TRAINING http://www.21cssindia.com/courses/massage-broker-online-training-108.html The 21st Century Software Solutions of India offers one of the Largest conglomerations of Software Training, IT Support, Corporate Training institute in India - +919000444287 - +917386622889 - Visakhapatnam,Hyderabad Message Broker Online Training, Message Broker Training, Message Broker, Message Broker Online Training| Message Broker Training| Message Broker| "Courses at 21st Century Software Solutions
    Talend Online Training -Hyperion Online Training - IBM Unica Online Training - Siteminder Online Training - SharePoint Online Training - Informatica Online Training - SalesForce Online Training - Many more… | Call Us +917386622889 - +919000444287 - contact@21cssindia.com
    Visit: http://www.21cssindia.com/courses.html"

    ReplyDelete
  2. Message Broker Message Broker Training "
    Message Broker Online Training
    Send ur Enquiry to contact@21cssindia.com
    WMQ
    Introduction to MQ and MB
    Role of MQ and in MB in EAI
    Messages" more… Online Training- Corporate Training- IT Support U Can Reach Us On +917386622889 - +919000444287 http://www.21cssindia.com/courses/massage-broker-online-training-108.html

    ReplyDelete

Instagram