Guaranteed message delivery is implemented with several components in messages and connections:
Most of these components are discussed in detail in Connection Composition and Message Composition. This section gives an overview of how these separate features are integrated in GMD.
One of the simplest but most important parts of GMD is message sequence numbers. As described in Sequence Number, the sequence number uniquely identifies the message for GMD so that duplicate messages can be detected by the receiver. Each time a message is sent with GMD, a per-connection outgoing sequence number is incremented, copied to the message sequence number, and saved to the GMD area. Each GMD area also stores the highest sequence number that has been received and acknowledged by this process from each sending process. In the case of a peer-to-peer connection, there is only one sending process.
If recovery is necessary, the sender and receiver can restart exactly where they left off and not use incorrect sequence numbers. Processes performing recovery start with the old sequence numbers to avoid reprocessing messages they have already processed once. This is the main reason that file-based GMD is the recommended type of GMD. Memory-only GMD is useful, though, for small impromptu processes such as prototypes or a debugging session with RTmon.
Note that sequence numbers are not used or needed to detect gaps in streams of messages sent through connections. The underlying reliable network protocols, such as TCP/IP, used by connections already take care of preventing lost data. Connections only need to resend messages for GMD when a network failure occurs.
The GMD area property of a connection holds guaranteed message delivery information for both incoming and outgoing messages. There are two types of GMD:
File-based GMD stores the GMD information in files for reliable operation even when network failures occur. Once the data is written to the GMD area files, GMD can recover from many failures to the process, the process’s node, or the network (but the files do need to be available for recovery to occur). For example, if a process crashes and is restarted, the restarted process can reopen the file-based GMD area and recover its GMD state, consisting of which messages need to be resent and which messages have already been processed.
Memory-based GMD stores GMD information in a GMD area that is held in memory and is faster than file-based GMD. It protects your messages against network failures and lost connections that do not affect memory. However, if a system failure wipes out memory, such as when a program crashes and restarts, the GMD messages stored in memory in the GMD area are lost.
There is an option, Ipc_Gmd_Type, that sets whether file-based or memory-based GMD is initially attempted.
When a message is sent with GMD through a connection, the message sequence number is set to an incremented counter, and then a copy of the message is saved in the sender’s connection GMD area. The copy is removed when acknowledgment of delivery is received by the sender from the receiving processes.
The sender stores complete messages into the GMD area, which therefore can use large amounts of disk or memory resources if the receiving process falls behind. See Limiting GMD Resources for details on how to constrain GMD resources.
For recovery from network failures, the burden of recovery is on the sender. The sender can reopen the file-based GMD area and simply resend all messages in the GMD area. When messages are resent with GMD, their sequence numbers are not changed. The sender does not have to worry about deciding which message to resend because the receiver discards the duplicate messages that it has already processed.
When a GMD message is acknowledged by the receiver, the sequence number of the message is saved in the receiver’s connection GMD area as the highest sequence number received. When a resent message is read from a connection, the message sequence number is checked against the highest sequence number in the receiver’s connection GMD area. This allows duplicate messages to be detected and discarded.
The receiver only stores highest sequence numbers into the GMD area, which does not usually require much disk or memory resources. RTclient receiver processes store one highest sequence number for each sending RTclient process, however.
Just as operating systems use data buffers and asynchronous techniques to ensure good performance, GMD is generally asynchronous in the sense that processes do not block waiting for GMD operations to complete. Sending processes do not wait for acknowledgment of successful delivery from receiving processes. Most failure notifications (through GMD_FAILURE messages) also occur asynchronously.
The GMD area is not directly accessible. The function TipcConnMsgSend adds a message to a connection’s GMD area. The function TipcConnRead removes a message from a connection’s GMD area when acknowledgment is received indicating successful delivery. TipcConnRead also checks for duplicate messages based on the highest sequence number information stored in the GMD area. The function TipcConnGmdMsgDelete removes a message from a connection’s GMD area as a result of GMD failure. The function TipcConnGetGmdNumPending gets the number of messages within the GMD area. The function TipcConnGmdResend reads all messages from the GMD area and resends them. The function TipcMsgAck updates the GMD area with highest sequence number information.
TipcConnGmdFileCreate creates the GMD area on disk for file-based GMD. It checks the Ipc_Gmd_Directory option to determine in what directory to create the GMD area. Each particular GMD area is created once with TipcConnGmdFileCreate:
Once the GMD area is created, it cannot be changed or destroyed except by destroying the connection. The function TipcConnSetGmdMaxSize can be used to set the maximum size (in bytes) of a connection GMD area. See Limiting GMD Resources for more details.
As described in Delivery Mode, the delivery mode of a message controls what level of guarantee is used when the message is sent through a connection (always with TipcConnMsgSend). The available delivery modes are:
As described in Acknowledgment of Delivery, GMD needs some form of acknowledgment to know when a message has been successfully delivered. Connections, RTclient, and RTserver use several different message types to implement GMD:
The message types GMD_DELETE, GMD_NACK, GMD_STATUS_CALL, and GMD_STATUS_RESULT are not used by connection GMD, only by RTclient and RTserver GMD. These message types are discussed in detail in GMD Message Types.
GMD_ACK messages are sent by a receiving process to acknowledge successful delivery of a message with GMD. GMD_ACK messages are sent automatically when a message is destroyed, but can be sent manually instead. GMD_ACK messages are automatically processed by connections so that the SmartSockets programs are not cluttered with having to read and process one GMD_ACK message for each outgoing message sent with GMD.
GMD handles most network failures, but there are some that GMD cannot overcome on its own, such as a receiving process which goes into an infinite loop. Unlike sockets, which do not provide a way to tell how much data was lost, GMD explicitly notifies a sending process that a GMD failure has taken place. When most types of GMD failure happen, a GMD_FAILURE message is delivered back to the sender process. Each GMD_FAILURE message contains several fields, including the failed message and an error number indicating the type of failure.
For connection GMD, the only GMD_FAILURE error number possible is a delivery timeout, which occurs if a sender does not get acknowledgment of successful delivery within a specified period of time, which is configurable with the message, message type, and connection delivery timeout properties.
A GMD_FAILURE message indicates the message could not be delivered with GMD successfully within the parameters, such as delivery timeout, set by the application. When a GMD failure occurs, it is up to the sender to decide what to do and then take some user-defined action if recovery is feasible. Unfortunately, this level of recovery is very application-specific, and SmartSockets cannot perform it on its own. Recovering from GMD failures is discussed further in Handling GMD Failures.
The Delivery Timeout property specifies how long GMD has to deliver a message and it works together with the value set for the Server_Read_Timeout option. The connection delivery timeout property is used as a default for messages with no preset delivery timeout. The delivery timeout is specific to GMD.
Delivery timeouts are checked only when data is received from the RTserver. If no messages are being received, the RTclient uses the value set for Server_Read_Timeout as the interval for sending a keep alive message to the RTserver. When the RTserver replies, then the delivery timeouts are checked. If the delivery timeout is set to a value smaller than the value for Server_Read_Timeout, the actual timeout used is the Server_Read_Timeout because the delivery timeouts are not checked until after the Server_Read_Timeout interval has triggered a keep alive message.
TIBCO SmartSockets™ User’s Guide Software Release 6.8, July 2006 Copyright © TIBCO Software Inc. All rights reserved www.tibco.com |