Handover message content
The content format of all messages that are exchanged asynchronously shall be unified. A scenario where different applications use different formats, e.g. byte arrays and serialized strings, should be avoided.
Furthermore the content should be self-contained, i.e. no transport level means, e.g. JMS properties, should be used to carry application data.
The benefit of this combination is the easy exchangeability of both the content and transport layers.
Pieces of content
An important goal of the design is that the client must be able to determine if and how it can interpret the message content only by inspecting a message's meta data that is part of the content.
Each message must contain both control system information and application specific information.
Each message should include e.g. a sender, a receiver, a timestamp, etc. All this information together is the
control system header wich must be given and has a fixed structure.
In the message data a place must be reserved for clients to put their application specific data. This place will be called the
message body. The body is optional and may be structured by the client as needed.
Illustration of the structure of a message:
+--------------------+
| |
| +----------------+ | `
| | transport data | | > transport specific formatted data (e.g. for routing)
| +----------------+ | /
| |
| +----------------+ | `
| | header: fixed | | |
| +----------------+ | > standardized format for all control system applications
| | body: variable | | |
| +----------------+ | /
| |
+--------------------+
Content format
Several data serialization libraries exist. There is no need to develop a new format. See
here to find out why our choice is
Google's protocol buffers. To make it short: it is
- flexible,
- popular,
- open-source,
- fast,
- space efficient and
- provides libraries for JAVA, C++ and Python).
Body implementation options
While the header's structure is fixed and must simply be filled with the needed information the body is to be structured according to the needs of the application. The definition of this structure may be given already at compile-time or not until runtime. A compile-time definition has the advantages of e.g. compile time error checking and centralized type definition while a runtime definition can speed up development.
In all cases the same data serialization as for the base message should be used.
The following options are based on protocol buffers. The base message for all options already includes the header. It looks like
// See below for full CsMessage type definitions
message CsMessage {
// Header
required string sender = 1;
required string receiver = 2;
required string type = 3;
required string version = 4;
required int64 timestamp = 5;
}
This option defines a variable size byte array for the client to serialize another protocol buffers object to. That object is structured according to his/her definition and has to be serialized/deserialized independently of the actual control system message.
message CsMessage {
// Header here
optional bytes prbfBody = 6;
}
The advantage is that no language dependent differences in the protocol buffers implementations must be respected to extract the body (as it is the case with the next option). Furthermore protocol buffers offer the possiblity to de/serialize messages using streams, so no unnecessary copying of the body bytes must be done (although one scan is done once for both de/serialization).
Inheriting the base message type
Protocol buffers allow a client to extend a message definition. When de/serializing the extended message the additional data is written as if it was part of the base definition.
message CsMessage {
// Header here
extensions 100 to max; // user may use ids 100 and up as needed
}
// include CsMessage base type
extend CsMessage {
// client puts his structure here, e.g.
required int64 counter = 100;
}
The additional items are handled differently by e.g. the JAVA and C++ implementations. For JAVA they must be registered
before deserialization to be respected. This would imply that a client had to deserialize once, check the message type, register extensions and deserialize again. On the other hand for C++ extensions are always recognized.
In both languages extended items are not accessed the same way as header items.
Dynamic structure definition
Protocol buffers can be used to allow arbitrarily deep nested, typed structures to be defined only at runtime (like e.g. JSON or BSON).
message Obj {
enum Type {
LONG = 0;
DOUBLE = 1;
BOOL = 2;
STRING = 3;
OBJECT = 4;
}
required string name = 1;
required Type type = 2;
optional int64 longVal = 3;
optional double doubleVal = 4;
optional bool boolVal = 5;
optional string stringVal = 6;
repeated Obj members = 7;
}
message CsMessage {
// Header here
repeated Obj members = 6;
}
Each
CsMessage now has a list of member objects of different types where one can itself be a list of member objects. All in all arbitrary typed object structures can be expressed.
(Optional) properties
Convenience functionality could be added to a message in form of a typed property list (similar to JMS). These could be used to allow a developer to send simple application data without the need to maintain definition source code.
message Property {
enum Type {
LONG = 0;
DOUBLE = 1;
BOOL = 2;
STRING = 3;
}
required string key = 1;
required Type type = 2;
optional int64 longVal = 3;
optional double doubleVal = 4;
optional bool boolVal = 5;
optional string stringVal = 6;
}
message CsMessage {
// Header here
repeated Property properties = 5;
}
Comparison of app specific data representations
| | byteblob | extensions | properties | dynamic object structures |
| | | | | |
compiletime check | | yes | yes | no | no |
runtime definition | | no | no | yes | yes |
space efficiency | | good | good | bad (serialize key names) | bad (serialize member names) |
time efficiency (overhead)
| | one scan over blob | one deserialization of header (JAVA)
| key names | member names |
language differences for implementation | | no | yes
| no
| no
|
Example application
In the
testuser's ~fasstest/workspaces/eclipseWorkspaces/
csmsg workspace there is a small client(C++)/server(JAVA) test application that implements all four message content format alternatives (in one message type for the sake of simplicity). The client user can choose the content format of the message by providing the type description on the command line (see source code).
The source codes are in:
- the eclipse project ExampleJava: the server that extracts the different contents based on message type
- the eclipse project ExampleCpp: the client to choose and send a message content format.
- ~fasstest/workspaces/eclipseWorkspaces/csmsg/CsMessageBackend.proto: the protobuf definition file that includes all descibed approaches.
- When changing the definition use ~fasstest/workspaces/eclipseWorkspaces/csmsg/generateProto.sh to generate the new source code directly into the eclipse projects (which you must refresh from inside eclipse to recognize the changes).
Note: The implementations do not yet take advantage of the protocolBuffer's ability to access all byte data through streams and is thus probably not as efficient as possible.