Load Balancing

In normal publish-subscribe operations, a message is sent to all RTclients that have subscribed to the subject the message is being published to. However, in some situations you may wish to have messages sent to only one subscribing RTclient. An example of this is a project where there is high message throughput and each message takes some time to process. In this case, you may wish to replicate a set of RTclients and have them take turns processing the messages to better keep up with message flow.

This is accomplished in SmartSockets using load balancing. Rather than have a single RTclient handle all the messages, you can use load balancing to process the messages across multiple RTclients. This is very useful when processing a heavy message load. A load-balanced message is routed to only a single RTclient, not to all RTclients subscribed to the destination subject. The RTclient to which the message is routed is selected based on the load balancing mode specified. Load balancing implies that there is a set of RTclients that are all equally capable of processing load-balanced messages.

For example, consider the simple example shown in Figure 18. There are three receivers, all subscribed to the same subject. Messages 1, 2, and 3 are published to that subject. On the left side of the figure, each message is routed to all receivers because there is no load balancing. The right side shows what happens when the messages are marked to be delivered using round-robin load balancing. The first message is delivered to Receiver 1, the second message to Receiver 2, and the third message to Receiver 3. Each message is delivered to only a single RTclient.

Load balancing can be specified per-message or per-message type using the load balancing mode message property (see Load Balancing Mode for more information). Setting the load balancing mode for each message takes precedence over per-message type. By default, messages are not load balanced and are distributed to all subscribers.

Load balancing is dynamic in that whenever an RTclient connects or disconnects to an RTserver, the load balancing calculations are updated in real time. When an RTclient publishes the first message using load balancing to a subject, RTserver starts collecting subject subscription information from the appropriate RTservers to accurately track load balancing accounting. This increases the scalability of load balancing due to the fact that only the relevant RTservers dynamically exchange load balancing information. The RTclient API function TipcSrvSubjectGmdInit can also be used to manually initialize GMD accounting for a subject to which messages will be published.

Overriding Load Balancing

It is important to note that any RTclient has the ability to override load balancing, on a subject basis, so an RTclient could subscribe to a subject that has load-balanced messages being sent to it, but receive all the messages, such as for an archive). A simple example would be if you are monitoring a project and wish to see all the messages sent to a specific subject. If you did a normal subscribe and the messages were being load balanced, you would not see each one if there was another RTclient subscribed to the same subject in the project. To override load balancing, the function TipcSrvSubjectSetSubscribeLb(subject_name, TRUE, FALSE) should be used instead of TipcSrvSubjectSetSubscribe(subject_name, TRUE). The first parameter specifies to which subject you are subscribing or unsubscribing. The second parameter specifies whether you are subscribing (TRUE) or unsubscribing (FALSE). The last parameter specifies whether you wish to receive all messages (FALSE) or be included in the load balancing calculations (TRUE). For example, an RTclient wishing to receive all messages published to the subject "/manual/chapter4", regardless of whether the messages are load balanced or not, would call the function as follows:

TipcSrvSubjectSetSubscribeLb("/manual/chapter4", TRUE, FALSE);

The subscribe command can also be used with the -load_balancing_off parameter to override load balancing:

MON> subscribe -load_balancing_off "/manual/chapter4"

Load Balancing Modes

Table 10 Load Balancing Modes
Mode	Description
T_IPC_LB_NONE	This is the default and specifies no load balancing. The message is sent to all subscribers.
T_IPC_LB_ROUND_ROBIN	The list of subscribing RTclients is held in a circular list, with each successive message simply sent to the next RTclient in the list. This mode is a good choice when the subscribers are all capable of receiving and processing a request with nearly equal speed. There is no additional overhead with this mode.
T_IPC_LB_WEIGHTED	The message is published to the RTclient that has the fewest pending requests. This mode is a good choice when the subscribers differ significantly in their ability to process a request promptly, such as hardware speed differences or network delays. This method can only be used with GMD and requires no additional overhead beyond what GMD requires.
T_IPC_LB_SORTED	The message is always sent to the first RTclient in the list. The list is formed by doing an alphabetical sort of the unique subject name of each RTclient. This mode is a good choice when you want a specific subscriber to process all messages until it fails, when a hot standby can take over. There is no additional overhead with this mode.

Weighted load balancing takes into account the receiver’s ability to process messages using acknowledgments. These acknowledgments are a result of a message being sent guaranteed. Weighted load balancing can only be used with GMD. When RTclient publishes a message that uses weighted load balancing, RTserver cycles through the list of receivers and sends the message to the first RTclient which has the fewest number of unacknowledged messages. Using the number of unacknowledged messages takes into account the speed of the receiver and the speed of the round trip to that receiver.

For example, with a message stream that requires significant CPU processing resources, a fast receiver only one RTserver hop away would be able to process and acknowledge messages faster then a slow receiver several RTserver hops away. As a result, the fast receiver which is closet would often be favored over the slower receiver which would not be able to acknowledge messages as quickly. So, the weighting of receivers by the number of acknowledgments takes into account not just the speed of the receiver’s system, but also the speed of systems running RTservers in between the publisher and subscriber, the networks used to connect the systems, and any overhead that could be introduced by the load balancing algorithm.

Load Balancing and GMD

A message that is being load balanced can also be guaranteed (have a delivery mode of SOME or ALL; see Chapter 4, Guaranteed Message Delivery for more information). If a message is to be both load balanced and guaranteed, message delivery failures are handled in the following ways.

Note that ROUND_ROBIN and SORTED can be used with or without GMD. WEIGHTED requires GMD to be used as RTserver uses the GMD acknowledgments in its load balancing calculations. A warm RTclient is not counted in the load balancing calculation unless all subscribing RTclients are warm RTclients (see Warm RTclient in RTserver).

Load Balancing Example

This example shows a publishing RTclient that creates and sends 20 messages. The first 10 messages are sent with a load balancing mode of NONE, so all subscribers receive them. The next 10 messages are sent with a load balancing mode of ROUND_ROBIN so they are evenly distributed across the subscribers.

$RTHOME/examples/smrtsock/manual

RTHOME:[EXAMPLES.SMRTSOCK.MANUAL]

%RTHOME%\examples\smrtsock\manual

The online source files have additional #ifdefs to provide C++ support; these #ifdefs are not shown to simplify the example.

The subscribers simply print out the contents of the messages. The source code for the subscribers is not shown, but the lbrecv.c file is located in the same directory as the sender program.

 
/* lbsend.c - send messages (some load balanced, some not) */ 
 
#include <rtworks/ipc.h> 
#define MSG_COUNT 1001 
 
int main(argc, argv) 
int argc; 
char **argv; 
{ 
  T_OPTION option; 
  T_IPC_MSG msg; 
  T_IPC_MT mt; 
  T_INT4 i; 
 
  /* Set the option Project to partition ourself. */ 
  option = TutOptionLookup("project"); 
  if (option == NULL) { 
    TutOut("Could not look up option named project: error <%s>.\n", 
           TutErrStrGet()); 
    TutExit(T_EXIT_FAILURE); 
  } 
  if (!TutOptionSetEnum(option, "smartsockets")) { 
    TutOut("Could not set option named project: error <%s>.\n", 
           TutErrStrGet()); 
    TutExit(T_EXIT_FAILURE); 
  } 
 
  /* Define a new message type */ 
  mt = TipcMtCreate("msg_count", MSG_COUNT, "int4"); 
  if (mt == NULL) { 
    TutOut("Could not create message type MSG_COUNT: error 
<%s>.\n", 
           TutErrStrGet()); 
    TutExit(T_EXIT_FAILURE); 
  } 
 
  /* Connect to RTserver */ 
  if (!TipcSrvCreate(T_IPC_SRV_CONN_FULL)) { 
    TutOut("Could not connect to RTserver: error <%s>.\n", 
           TutErrStrGet()); 
    TutExit(T_EXIT_FAILURE); 
  } 
 
  /* Create a message of type MSG_COUNT */ 
  msg = TipcMsgCreate(mt); 
  if (msg == NULL) { 
    TutOut("Could not create msg of type MSG_COUNT: error <%s>.\n", 
            TutErrStrGet()); 
          TutExit(T_EXIT_FAILURE); 
  } 
 
  /* Set the destination subject of the message */ 
  if (!TipcMsgSetDest(msg, "/manual/chapter4")) { 
    TutOut("Could not set subject of message: error <%s>.\n", 
            TutErrStrGet()); 
    TutExit(T_EXIT_FAILURE); 
  } 
 
  for (i = 0; i < 20; i++) { 
    /* Reset num of fields to 0 so we can reuse message */ 
    if (!TipcMsgSetNumFields(msg, 0)) { 
      TutOut("<%d> Could not clear message: error <%s>.\n", 
             i, TutErrStrGet()); 
      TutExit(T_EXIT_FAILURE); 
    } 
 
    /*  
     * First 10 messages, send to everyone, 
     * Second 10 messages, load balance using round robin 
     */    
    if (i < 10) { 
      /* This is the default behavior and not required */ 
        if (!TipcMsgSetLbMode(msg, T_IPC_LB_NONE)) { 
        TutOut("Could not set load balance to NONE: error <%s>.\n", 
               TutErrStrGet()); 
      } 
    } 
    else { 
      if (!TipcMsgSetLbMode(msg, T_IPC_LB_ROUND_ROBIN)) { 
        TutOut("Could not set load balance to ROUND_ROBIN\n"); 
            TutOut("  error <%s>.\n", TutErrStrGet()); 
      } 
    } 
 
    /* Build the data part of the message with 1 integer */ 
    if (!TipcMsgAppendInt4(msg, i)) { 
      TutOut("<%d> Could not build message: error <%s>.\n", 
             i, TutErrStrGet()); 
      TutExit(T_EXIT_FAILURE); 
    } 
     
    /* Publish the message */ 
    if (!TipcSrvMsgSend(msg, TRUE)) { 
      TutOut("<%d> Could not publish message: error <%s>.\n", 
             i, TutErrStrGet()); 
    } 
   
    /* Make sure message is flushed */   
    if (!TipcSrvFlush()) { 
      TutOut("<%d> Could not flush message: error <%s>.\n", 
             i, TutErrStrGet()); 
    } 
 
  } 
 
  /* Destroy the message */ 
  if (!TipcMsgDestroy(msg)) { 
    TutOut("<%d> Could not destroy message: error <%s>.\n", 
           i, TutErrStrGet()); 
  } 
 
  return T_EXIT_SUCCESS; /* all done */ 
} /* main */

Compiling, Linking, and Running

To compile, link, and run the example programs, first you must either copy the programs to your own directory or have write permission in these directories:

$RTHOME/examples/smrtsock/manual

RTHOME:[EXAMPLES.SMRTSOCK.MANUAL]

%RTHOME%\examples\smrtsock\manual

Step 1

$ rtlink -o lbsend.x lbsend.c 
$ rtlink -o lbrecv.x lbrecv.c

$ cc lbsend.c, lbrecv.c 
$ rtlink /exec=lbsend.exe lbsend.obj 
$ rtlink /exec=lbrecv.exe lbrecv.obj

$ nmake /f lbsdw32m.mak 
$ nmake /f lbrcw32m.mak

Step 2

Then you start two copies of lbrecv and start the publishing process, lbsend. To do this, execute the start command in two separate windows.

Step 3

Step 4

An example of the output from the window where the first lbrecv is running is shown:

Connecting to project <smartsockets> on <_node> RTserver 
Using local protocol 
Message from RTserver: Connection established. 
Start subscribing to subject </_workstation.talarian.com_24000> 
Message data = 0 
Message data = 1 
Message data = 2 
Message data = 3 
Message data = 4 
Message data = 5 
Message data = 6 
Message data = 7 
Message data = 8 
Message data = 9 
Message data = 11 
Message data = 13 
Message data = 15 
Message data = 17 
Message data = 19

An example of the output from the window where the second lbrecv is running is shown:

Connecting to project <smartsockets> on <_node> RTserver. 
Using local protocol. 
Message from RTserver: Connection established. 
Start subscribing to subject </_workstation.talarian.com_24003>. 
Message data = 0 
Message data = 1 
Message data = 2 
Message data = 3 
Message data = 4 
Message data = 5 
Message data = 6 
Message data = 7 
Message data = 8 
Message data = 9 
Message data = 10 
Message data = 12 
Message data = 14 
Message data = 16 
Message data = 18

Note that both programs processed the first ten messages. This is because the load balancing mode of these messages was set to NONE. The next ten messages were evenly distributed across the two processes because load balancing mode was set to ROUND_ROBIN for these messages. The first lbrecv processed messages 11, 13, 15, 17, and 19 and the second lbrecv processed messages 10, 12, 14, 16, and 18.

As an interesting exercise to see how an RTclient can still receive all messages, whether they are being load balanced or not, make a copy of the lbrecv program. Change the line in the main program from:

if (!TipcSrvSubjectSetSubscribe("/manual/chapter4", TRUE)) {

if (!TipcSrvSubjectSetSubscribeLb("/manual/chapter4", TRUE, 
FALSE)) {

Compile, link, and run the new program alongside two copies of lbrecv. You see that the new program processes all 20 messages, while the two copies of lbrecv behave just as they did before.