GMD Composition

Guaranteed message delivery is implemented with several components in messages and connections:

Most of these components are discussed in detail in Connection Composition and Message Composition. This section gives an overview of how these separate features are integrated in GMD.

Sequence Number

One of the simplest but most important parts of GMD is message sequence numbers. As described in Sequence Number, the sequence number uniquely identifies the message for GMD so that duplicate messages can be detected by the receiver. Each time a message is sent with GMD, a per-connection outgoing sequence number is incremented, copied to the message sequence number, and saved to the GMD area. Each GMD area also stores the highest sequence number that has been received and acknowledged by this process from each sending process. In the case of a peer-to-peer connection, there is only one sending process.

If recovery is necessary, the sender and receiver can restart exactly where they left off and not use incorrect sequence numbers. Processes performing recovery start with the old sequence numbers to avoid reprocessing messages they have already processed once. This is the main reason that file-based GMD is the recommended type of GMD. Memory-only GMD is useful, though, for small impromptu processes such as prototypes or a debugging session with RTmon.

Note that sequence numbers are not used or needed to detect gaps in streams of messages sent through connections. The underlying reliable network protocols, such as TCP/IP, used by connections already take care of preventing lost data. Connections only need to resend messages for GMD when a network failure occurs.

GMD Area

The GMD area property of a connection holds guaranteed message delivery information for both incoming and outgoing messages. There are two types of GMD:

file-based GMD

File-based GMD stores the GMD information in files for reliable operation even when network failures occur. Once the data is written to the GMD area files, GMD can recover from many failures to the process, the process’s node, or the network (but the files do need to be available for recovery to occur). For example, if a process crashes and is restarted, the restarted process can reopen the file-based GMD area and recover its GMD state, consisting of which messages need to be resent and which messages have already been processed.

memory-based GMD

Memory-based GMD stores GMD information in a GMD area that is held in memory and is faster than file-based GMD. It protects your messages against network failures and lost connections that do not affect memory. However, if a system failure wipes out memory, such as when a program crashes and restarts, the GMD messages stored in memory in the GMD area are lost.

There is an option, Ipc_Gmd_Type, that sets whether file-based or memory-based GMD is initially attempted.

Sender

When a message is sent with GMD through a connection, the message sequence number is set to an incremented counter, and then a copy of the message is saved in the sender’s connection GMD area. The copy is removed when acknowledgment of delivery is received by the sender from the receiving processes.

The sender stores complete messages into the GMD area, which therefore can use large amounts of disk or memory resources if the receiving process falls behind. See Limiting GMD Resources for details on how to constrain GMD resources.

For recovery from network failures, the burden of recovery is on the sender. The sender can reopen the file-based GMD area and simply resend all messages in the GMD area. When messages are resent with GMD, their sequence numbers are not changed. The sender does not have to worry about deciding which message to resend because the receiver discards the duplicate messages that it has already processed.

Receiver

When a GMD message is acknowledged by the receiver, the sequence number of the message is saved in the receiver’s connection GMD area as the highest sequence number received. When a resent message is read from a connection, the message sequence number is checked against the highest sequence number in the receiver’s connection GMD area. This allows duplicate messages to be detected and discarded.

The receiver only stores highest sequence numbers into the GMD area, which does not usually require much disk or memory resources. RTclient receiver processes store one highest sequence number for each sending RTclient process, however.

Asynchronous Operation For High Performance

Just as operating systems use data buffers and asynchronous techniques to ensure good performance, GMD is generally asynchronous in the sense that processes do not block waiting for GMD operations to complete. Sending processes do not wait for acknowledgment of successful delivery from receiving processes. Most failure notifications (through GMD_FAILURE messages) also occur asynchronously.

Accessing the GMD Area

The GMD area is not directly accessible. The function TipcConnMsgSend adds a message to a connection’s GMD area. The function TipcConnRead removes a message from a connection’s GMD area when acknowledgment is received indicating successful delivery. TipcConnRead also checks for duplicate messages based on the highest sequence number information stored in the GMD area. The function TipcConnGmdMsgDelete removes a message from a connection’s GMD area as a result of GMD failure. The function TipcConnGetGmdNumPending gets the number of messages within the GMD area. The function TipcConnGmdResend reads all messages from the GMD area and resends them. The function TipcMsgAck updates the GMD area with highest sequence number information.

Creating the GMD Area

TipcConnGmdFileCreate creates the GMD area on disk for file-based GMD. It checks the Ipc_Gmd_Directory option to determine in what directory to create the GMD area. Each particular GMD area is created once with TipcConnGmdFileCreate:

Once the GMD area is created, it cannot be changed or destroyed except by destroying the connection. The function TipcConnSetGmdMaxSize can be used to set the maximum size (in bytes) of a connection GMD area. See Limiting GMD Resources for more details.

Delivery Mode

As described in Delivery Mode, the delivery mode of a message controls what level of guarantee is used when the message is sent through a connection (always with TipcConnMsgSend). The available delivery modes are:


T_IPC_DELIVERY_BEST_EFFORT	In this mode, no special actions, such ACKs, are taken to ensure delivery of sent messages. The message is delivered unless network failures or process failures cause the message to be lost. If the message is not delivered, there is no way for the sender to know that delivery failed. When there is a failure, it is possible for some messages to be lost or to be delivered in a different order than they were published. This is the default mode.
T_IPC_DELIVERY_ORDERED	In this mode, no special actions, such as ACKs, are taken to ensure delivery of sent messages. Messages can still be lost in the event of a failure, but this mode ensures that messages are delivered in the order in which they were published. This is useful for applications where order is critical, but the overhead required by GMD results in unacceptable performance degradation.
T_IPC_DELIVERY_SOME	In this mode, the sending process saves a copy of the message in the connection GMD area until the message is successfully delivered, and the sender can also resend the message if necessary. Delivery is considered successful if the sent message is acknowledged by at least one receiving process.
T_IPC_DELIVERY_ALL	In this mode, the sending process saves a copy of the message in the connection GMD area until the message is successfully delivered, and the sender can also resend the message if necessary. Delivery is not considered successful until all receiving processes acknowledge the sent message. For two processes communicating using a non-RTclient and non-RTserver T_IPC_CONN connection, T_IPC_DELIVERY_SOME and T_IPC_DELIVERY_ALL are identical, because there is only one process receiving the message. For RTclient processes, the two modes do differ if more than one RTclient process is subscribing to the subject in the destination of the message.

GMD Message Types

As described in Acknowledgment of Delivery, GMD needs some form of acknowledgment to know when a message has been successfully delivered. Connections, RTclient, and RTserver use several different message types to implement GMD:


GMD_ACK	Sent by receiver to acknowledge successful GMD.
GMD_FAILURE	Notification of GMD failure (constructed and processed by sender).
GMD_DELETE	Sent by RTclient to notify RTserver to cancel GMD for a message.
GMD_NACK	Sent by RTserver to notify RTclient of certain types of GMD failure.
GMD_STATUS_CALL	Sent by RTclient to query RTserver for GMD status.
GMD_STATUS_RESULT	Sent by RTserver to RTclient with GMD status information.

The message types GMD_DELETE, GMD_NACK, GMD_STATUS_CALL, and GMD_STATUS_RESULT are not used by connection GMD, only by RTclient and RTserver GMD. These message types are discussed in detail in GMD Message Types.

GMD_ACK

GMD_ACK messages are sent by a receiving process to acknowledge successful delivery of a message with GMD. GMD_ACK messages are sent automatically when a message is destroyed, but can be sent manually instead. GMD_ACK messages are automatically processed by connections so that the SmartSockets programs are not cluttered with having to read and process one GMD_ACK message for each outgoing message sent with GMD.

GMD_FAILURE

GMD handles most network failures, but there are some that GMD cannot overcome on its own, such as a receiving process which goes into an infinite loop. Unlike sockets, which do not provide a way to tell how much data was lost, GMD explicitly notifies a sending process that a GMD failure has taken place. When most types of GMD failure happen, a GMD_FAILURE message is delivered back to the sender process. Each GMD_FAILURE message contains several fields, including the failed message and an error number indicating the type of failure.

For connection GMD, the only GMD_FAILURE error number possible is a delivery timeout, which occurs if a sender does not get acknowledgment of successful delivery within a specified period of time, which is configurable with the message, message type, and connection delivery timeout properties.

A GMD_FAILURE message indicates the message could not be delivered with GMD successfully within the parameters, such as delivery timeout, set by the application. When a GMD failure occurs, it is up to the sender to decide what to do and then take some user-defined action if recovery is feasible. Unfortunately, this level of recovery is very application-specific, and SmartSockets cannot perform it on its own. Recovering from GMD failures is discussed further in Handling GMD Failures.

Delivery Timeout

The Delivery Timeout property specifies how long GMD has to deliver a message and it works together with the value set for the Server_Read_Timeout option. The connection delivery timeout property is used as a default for messages with no preset delivery timeout. The delivery timeout is specific to GMD.

Delivery timeouts are checked only when data is received from the RTserver. If no messages are being received, the RTclient uses the value set for Server_Read_Timeout as the interval for sending a keep alive message to the RTserver. When the RTserver replies, then the delivery timeouts are checked. If the delivery timeout is set to a value smaller than the value for Server_Read_Timeout, the actual timeout used is the Server_Read_Timeout because the delivery timeouts are not checked until after the Server_Read_Timeout interval has triggered a keep alive message.