Handover message content

The content format of all messages that are exchanged asynchronously shall be unified. A scenario where different applications use different formats, e.g. byte arrays and serialized strings, should be avoided.

Furthermore the content should be self-contained, i.e. no transport level means, e.g. JMS properties, should be used to carry application data.

The benefit of this combination is the easy exchangeability of both the content and transport layers.

Pieces of content

An important goal of the design is that the client must be able to determine if and how it can interpret the message content only by inspecting a message's meta data that is part of the content.

Each message must contain both control system information and application specific information.

Each message should include e.g. a sender, a receiver, a timestamp, etc. All this information together is the control system header wich must be given and has a fixed structure.

In the message data a place must be reserved for clients to put their application specific data. This place will be called the message body. The body is optional and may be structured by the client as needed.

Illustration of the structure of a message:
+--------------------+
|                    |
| +----------------+ | ` 
| | transport data | |  > transport specific formatted data (e.g. for routing)
| +----------------+ | /
|                    |
| +----------------+ | `  
| | header: fixed  | |  |
| +----------------+ |  > standardized format for all control system applications
| | body: variable | |  |
| +----------------+ | /
|                    |
+--------------------+

Content format

Several data serialization libraries exist. There is no need to develop a new format. See here to find out why our choice is Google's protocol buffers. To make it short: it is
  • flexible,
  • popular,
  • open-source,
  • fast,
  • space efficient and
  • provides libraries for JAVA, C++ and Python).

Body implementation options

While the header's structure is fixed and must simply be filled with the needed information the body is to be structured according to the needs of the application. The definition of this structure may be given already at compile-time or not until runtime. A compile-time definition has the advantages of e.g. compile time error checking and centralized type definition while a runtime definition can speed up development.

In all cases the same data serialization as for the base message should be used.

The following options are based on protocol buffers. The base message for all options already includes the header. It looks like
// See below for full CsMessage type definitions
message CsMessage {
  // Header
  required string sender = 1;
  required string receiver = 2;
  required string type = 3;
  required string version = 4;
  required int64 timestamp = 5;
}

Byte blob in predefined format

This option defines a variable size byte array for the client to serialize another protocol buffers object to. That object is structured according to his/her definition and has to be serialized/deserialized independently of the actual control system message.

message CsMessage {
  // Header here
  optional bytes prbfBody = 6;
}

The advantage is that no language dependent differences in the protocol buffers implementations must be respected to extract the body (as it is the case with the next option). Furthermore protocol buffers offer the possiblity to de/serialize messages using streams, so no unnecessary copying of the body bytes must be done (although one scan is done once for both de/serialization).

Inheriting the base message type

Protocol buffers allow a client to extend a message definition. When de/serializing the extended message the additional data is written as if it was part of the base definition.
message CsMessage {
  // Header here
  extensions 100 to max; // user may use ids 100 and up as needed
}
// include CsMessage base type
extend CsMessage {
  // client puts his structure here, e.g.
  required int64 counter = 100;
}

The additional items are handled differently by e.g. the JAVA and C++ implementations. For JAVA they must be registered before deserialization to be respected. This would imply that a client had to deserialize once, check the message type, register extensions and deserialize again. On the other hand for C++ extensions are always recognized.

In both languages extended items are not accessed the same way as header items.

Dynamic structure definition

Protocol buffers can be used to allow arbitrarily deep nested, typed structures to be defined only at runtime (like e.g. JSON or BSON).
message Obj {
 enum Type {
  LONG = 0;
  DOUBLE = 1;
  BOOL = 2;
  STRING = 3;
  OBJECT = 4;
 }

 required string name = 1;
 required Type type = 2;

 optional int64 longVal = 3;
 optional double doubleVal = 4;
 optional bool boolVal = 5;
 optional string stringVal = 6;
 repeated Obj members = 7;
}

message CsMessage {
  // Header here
  repeated Obj members = 6;
}
Each CsMessage now has a list of member objects of different types where one can itself be a list of member objects. All in all arbitrary typed object structures can be expressed.

(Optional) properties

Convenience functionality could be added to a message in form of a typed property list (similar to JMS). These could be used to allow a developer to send simple application data without the need to maintain definition source code.
message Property {
 enum Type {
  LONG = 0;
  DOUBLE = 1;
  BOOL = 2;
  STRING = 3;
 }

 required string key = 1;
 required Type type = 2;

 optional int64 longVal = 3;
 optional double doubleVal = 4;
 optional bool boolVal = 5;
 optional string stringVal = 6;
}

message CsMessage {
 // Header here
 repeated Property properties = 5;
}

Comparison of app specific data representations

byteblob extensions properties dynamic object structures
compiletime check yes yes no no
runtime definition no no yes yes
space efficiency good good bad (serialize key names) bad (serialize member names)
time efficiency (overhead)
one scan over blob one deserialization of header (JAVA)
key names member names
language differences for implementation no yes
no
no

Example application

In the testuser's ~fasstest/workspaces/eclipseWorkspaces/csmsg workspace there is a small client(C++)/server(JAVA) test application that implements all four message content format alternatives (in one message type for the sake of simplicity). The client user can choose the content format of the message by providing the type description on the command line (see source code).

The source codes are in:
  • the eclipse project ExampleJava: the server that extracts the different contents based on message type
  • the eclipse project ExampleCpp: the client to choose and send a message content format.
  • ~fasstest/workspaces/eclipseWorkspaces/csmsg/CsMessageBackend.proto: the protobuf definition file that includes all descibed approaches.
  • When changing the definition use ~fasstest/workspaces/eclipseWorkspaces/csmsg/generateProto.sh to generate the new source code directly into the eclipse projects (which you must refresh from inside eclipse to recognize the changes).
Note: The implementations do not yet take advantage of the protocolBuffer's ability to access all byte data through streams and is thus probably not as efficient as possible.
Topic revision: r7 - 18 Aug 2011, UnknownUser
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback