Torture Report about GMT with Debian on PC and SL6/CentOS 7 on SCU3
Setup
A schedule containing three messages is iterated by the Data Master. The messages are sent via five layers of White Rabbit Switches to three different nodes. The White Rabbit network was operated without "traffic noise" and without fancy features like VLANs required for the operation of FAIR. The test has been done with release candidates for the control system release 8 (RC8). Here, it is tested how messages propagate through GMT and are transmitted on-time to Linux userland applications. No forward error correction was applied.
Data Master
The DM used the following schedule
<page>
...
<plan>
<!-- Plan A, normal operation -->
...
<chain>
<!-- Cycle A0, entry -->
<meta>
<rep>10</rep> // This controls the number of iterations, here: 10 times
<period>5000000</period> // This controls the length of one iteration, here: 5ms
<branchpoint>yes</branchpoint>
</meta>
<msg> // Three messages in the schedule
<!-- Msg A0A -->
...
</msg>
<msg>
<!-- Msg A0B -->
...
</msg>
<msg>
<!-- Msg A0C -->
...
</msg>
</chain>
...
</plan>
</page>
In the above example the schedule
- contains three messages
- has a length of 5ms for each iteration
- is repeated 10 times
For the measurements described here, only two parameters have been modified
- LENGTH_ITER: the length of each iteration
- N_ITER: the number of iterations
Data master lm32 cores are clocked at 125 MHz. Only one of the lm32 cores was used to generate the messages. The data master aims at emitting messages to the White Rabbit network in advance to the planned execution time of actions. This interval is specified by the parameter
preptime, which was set to 150us for the measurements reported here. Preptime takes into account all latencies from the data master through the timing network to the timing receiver up to the point, where the message is handed over as a so-called "timing event" to a receiving component connected to the ECA.
Nodes
- SCU3 (scuxl0001) with SL6
- SCU3 (scuxl0001) with Centos7
- IPC (tsl0111) with Debian Jessie and PCIe TR.
For the measurements reported here, the nodes just run drivers, kernel modules, saflib and simple userland command line tools such as "saft-ctl" and "saft-gmt-check" which are provided together with saftlib. FESA was not involved here.
Measurement A: Bursts of Messages
- LENGTH_ITER: 0.01 ms
- N_ITER: 80
Description: The schedule containing three messages was iterated 80 times, thus generating 240 messages with a rate of 300kHz.
Expectation:
- The software action channel ("calendar") is configured with a capacity of 256 entries. In an ideal world, no messages will be lost.
- It is impossible for the host system to keep up with a rate of 300kHz. This will create back-pressure from the host system to the ECA channel. Thus, all but the first message ("timing event") will be delivered delayed.
Result:
- No loss of messages ("timing events") is observed, neither in the WR network nor in the nodes (independently of the form factor).
- Due to backpressure of the host system, 239 have been delivered "delayed" by the ECA.
Conclusion:
- Under ideal conditions (no other network traffic) the system behaves as expected.
- However, the statistics is low and it is impossible to give a value for the overall robustness (in terms of "packet erasure" or "bit errors" of the overall system).
Measurement B: Determination of Maximum Rate (for a FEC)
It is expected, that the rate is limited by the back-pressure of the host system to the ECA channel. In the following, this rate is determined by measuring the number of lost messages ("timing events") as a function of the message rate. SCU runs
CentOS7.
- LENGTH_ITER: this is the parameter
- N_ITER: 2000 (this is 6000 "timing events" in total)
LENGH_ITER [ms] |
requested rate [kHz] |
requested bandwidth [Mbit/s] |
late messages (IPC) |
lost messages: overflow (IPC) |
lost messages: DM-ECA (IPC) |
late messages (SCU) |
lost messages: overflow (SCU) |
lost messages: DM-ECA (SCU) |
remark |
10.0000 |
0.300 |
0.264 |
0 |
0 |
0 |
0 |
0 |
0 |
|
9.0000 |
0.333 |
0.293 |
0 |
0 |
0 |
0 |
0 |
0 |
|
8.0000 |
0.375 |
0.330 |
0 |
0 |
0 |
0 |
0 |
0 |
|
7.0000 |
0.429 |
0.377 |
0 |
0 |
0 |
0 |
0 |
0 |
|
6.0000 |
0.500 |
0.440 |
0 |
0 |
0 |
0 |
0 |
0 |
|
5.0000 |
0.600 |
0.528 |
0 |
0 |
0 |
0 |
0 |
0 |
|
4.0000 |
0.750 |
0.660 |
0 |
0 |
0 |
0 |
0 |
0 |
|
3.0000 |
1.000 |
0.880 |
0 |
0 |
0 |
0 |
0 |
0 |
safe rate SCU: 1 kHz |
2.5000 |
1.200 |
1.056 |
0 |
0 |
0 |
0 |
0 |
0 |
|
2.0000 |
1.500 |
1.320 |
0 |
0 |
0 |
0 |
725 |
0 |
|
1.8000 |
1.667 |
1.467 |
0 |
0 |
0 |
0 |
1012 |
0 |
|
1.6000 |
1.875 |
1.650 |
0 |
0 |
0 |
0 |
1927 |
0 |
|
1.4000 |
2.143 |
1.886 |
0 |
0 |
0 |
0 |
2097 |
0 |
|
1.2000 |
2.500 |
2.200 |
0 |
0 |
0 |
0 |
2870 |
0 |
|
1.0000 |
3.000 |
2.640 |
0 |
0 |
0 |
0 |
3407 |
0 |
|
0.8000 |
3.750 |
3.300 |
0 |
0 |
0 |
0 |
3443 |
0 |
|
0.7000 |
4.286 |
3.771 |
0 |
0 |
0 |
0 |
3682 |
0 |
|
0.6000 |
5.000 |
4.400 |
0 |
0 |
0 |
0 |
3899 |
0 |
|
0.5000 |
6.000 |
5.280 |
0 |
0 |
0 |
0 |
4271 |
0 |
|
0.4000 |
7.500 |
6.600 |
0 |
0 |
0 |
0 |
4780 |
0 |
safe rate IPC: 7.5 kHz |
0.3500 |
8.571 |
7.543 |
0 |
0 |
0 |
0 |
4952 |
0 |
|
0.3000 |
10.000 |
8.800 |
0 |
510 |
0 |
0 |
5047 |
0 |
|
0.2500 |
12.000 |
10.560 |
0 |
1466 |
0 |
0 |
5129 |
0 |
|
0.2000 |
15.000 |
13.200 |
0 |
2684 |
0 |
0 |
5141 |
0 |
|
0.1500 |
20.000 |
17.600 |
0 |
3679 |
0 |
0 |
5284 |
0 |
|
0.1000 |
30.000 |
26.400 |
0 |
4466 |
0 |
0 |
5439 |
0 |
|
0.0500 |
60.000 |
52.800 |
0 |
5278 |
0 |
0 |
5592 |
0 |
|
0.0100 |
300.000 |
264.000 |
0 |
5684 |
0 |
0 |
5718 |
0 |
saturation of GMT! |
0.0030 |
1000.000 |
880.000 |
267 |
5692 |
0 |
219 |
5719 |
0 |
late events! |
0.0003 |
10000.000 |
8800.000 |
266 |
5696 |
0 |
234 |
5720 |
0 |
byond 1 Gbit/s! |
Table 1: Data. The "requested bandwidth" assumes a message size of 110 byte (one message per Ethernet frame).
Results
- 6000 messages have been sent by the DM at rates from 300 Hz up to 10 MHz, thus requesting a bandwidth of up to 8.8 GBit/s, which is byond the theoretical limit of 1 GBit/s.
- No messages have been lost between the DM and the ECA unit. Even at a requested rate of 10 MHz, all messages get delivered to the ECA. There is no difference between the number of timing messages sent by the data master and the number of the messages processed by the ECA.
- Loss of messages ("loss of timing events") only happens as overflow errors of the ECA. Those errors appear, if back-pressure from the receiving component (here: the host system) prevents draining the action channel quickly enough (see discussion below).
- In case of the SCU, a maximum bandwidth of at most 1 kHz can be achieved for messages or "timing events".
- In case of the IPC, a maximum bandwidth of at most 7.5 kHz can be achieved.
- If the bandwidth from the DM to the ECA would be unlimited, the expected number of overflows errors would be 5744 (6000-256). However, it can be observed that the system saturates already at a rate of 300 kHz. Probably, this is the maximum rate at which a single lm32 core of the Data Master can emit timing messages.
- If a rate of 1 MHz an higher is requested by the DM, some of the messages arrive late at the input of the ECA (see discussion on "late errors" below).
Measurement C: Determination of Robustness
Description:
Here, the system was operated at a safe rate of about 600 Hz for some time and the number of lost messages was determined (package erasure channel). In total, about 5E7 messages were sent by the data master.
Results:
- For the IPC (Debian Jessie) and the SCU3 (CentOS 7) all messages were successfully received and transferred to a userland application on the host system as "timing events". This corresponds to a loss rate of 2E-8 or less.
- For the SCU3 on SL6, the measurement was not successful due to an excessive consumption of RAM of the saftlib daemon, probably due to a memory leak. However, SL6 is not considered as platform for the future. On the positive side, this memory leak was not observed on the SCU3 with CentOS7.
Conclusion and Discussion
Robustness
Although the GMT does not work with robustness yet, the loss rate in the message erasure channel is 2E-8 or less. This may sound positive, but it has to kept in mind that this result was observed without additional "noise" in the White Rabbit network and without using features like VLANs. In addition, the GMT was throttled to a message rate of 600 Hz only. It is highly recommended to repeat this measurements with higher statistics in a more realistic scenario.
High Data Rates
The system
data master -> White Rabbit network -> Timing Receiver is capable of sustaining requested data rates of 300 kHz to the input of the ECA. At higher requested data rates, the system
data master -> White Rabbit network -> Timing Receiver is no longer able to maintain the requested rate (see discussion on "Late Errors" below). Remember, that only 6000 consecutive messages have been sent per entire schedule and their was no "background noise" on the White Rabbit network and many features required for operation at FAIR were not activated (no VLANs, no forward error correction, ...). Thus the result of 300 kHz most likely over-estimates what can be achieved later at FAIR.
Late Errors
It was possible to operate GMT at rates of 1 MHz and higher. Although all messages were successfully generated and transmitted to the ECA without any loss, it is observed that messages arrive
late at the ECA (a
preptime of only 150000ns was used at the data master). Such a
late error occurs, if a message arrives after its scheduled execution time at the input of the ECA. While it is expected that a message rate of 10 MHz is impossible to process due to the 1GBit/s limitation of the White Rabbit network, the reason for late errors at a requested rate of 1 MHz is not clear: Maybe, it is a limit of the White Rabbit network. Maybe the lm32 clocked at 125 MHz in the data master is not quick enough to emit the messages timely. Maybe the limit here is in the timing receivers themselves: A lm32 softcore embedded in the White Rabbit core is used as to filter packets received from the network before forwarding them to the ECA. It is interesting that the number of late errors slightly differs depending on the form factor, although SCU (
ArriaII) and PCIe TR (
ArriaV) were connected to the same physical White Rabbit switch. If this is due to the a difference between the timing receivers or the algorithm the switch forwards network packages to different download ports needs to be investigated.
Back-pressure: Limitation of maximum rates for FECs
Back-pressure from the host system manifests itself in a missing capability of draining the software action channel of the ECA quickly enough. As a consequence, two effects are observed.
- First, the software action channel is no longer able to hand the "timing event" over to the host system on the scheduled time. The "timing event" becomes delayed. This happens if the host system is still busy with fetching a previous "timing event" from the software action channel. Such a difference between "scheduled time" and "execution time" is called a delay error. The error is reported to the host system.
- Second, the fill state of the calendar in the software action channel might reach its maximum capacity of 256 entries. This happens, if the software action channel is filled quicker, than the receiving component (here: host system) is able to drain that channel. This is a fatal situation, as the ECA can't add a new message to the software action channel (as there is no free space left) and must discard further incoming messages to this channel. This is called an overflow error. All the ECA can do here is to record the error and report it to the host system.
Message Loss
Although the system was requested to operate with rates exceeding the physical limit of 1
GigE. It was tried to provoke the occurrence of message loss in various channels, such as in the White Rabbit switches, the package filter at the network interface of a timing receiver, the ECA, ... However,
all messages were delivered successfully to the ECA and the ECA was capable of processing
all of them. Furthermore, no messages were lost within the stack of he host system which indicates, that the PCIe host bus bridge, saftlib API and userland software are working reliably. During the measurements reported here, only "overflow errors" were observed, see discussion below.
Discussion: Overflow Errors
On the one hand: The situation about overflow error is not so bad as it sounds: First, the maximum event rate is for one specific front-end system only and is only limiting the rate of messages ("timing events") delivered to the host system. Other receiving components, like I/Os and SCUbus are capable of sustaining much higher event rates. Second, the maximum rate is only relevant for a single timing group (a timing receiver may only be member of a
single timing group), not for the total of messages emitted by the data master for
all timing groups.
On the other hand: Overflow errors could be the dominating loss channel for "timing events". There is nothing the GMT can do about, not in the data maser, not in the network, or in any part of the timing receiver up to the output of the ECA. Loss of "timing events" due to overflow errors are a principle problem which only occurs if the settings management requests the GMT to deliver messages ("timing events") faster than the software can process. Queuing in buffers or adding more RAM does not help as it only jeopardizes the real-time capability, making the idea of having hard real-time system such as the GMT pointless for software actions.
Even if the maximum event rate of the host system could be increased by more powerful computers or smarter software, this only pushes the limit but does not make the problem disappear. The value of "maximum rate" must be determined and made known to overall control system. The maximum rate must be considered by the setting management system when generating schedules for the data master. However, the requested rate for software actions does not only depend on the schedule generated by the settings management system, but also depends on the configuration of front-ends by the front-end software (other processes requesting CPU load, number of ECA rules requested by one FESA class, number of FESA instances, functionality of FESA classes ... ).
Pushing the maximum rate
As can be observed, the maximum rate is higher for the IPC than for the SCU. This difference is
not due the timing receiver or the FPGA, but only due to the difference in CPU and host system hardware. Effectively, the CPUs of the host system became heavily loaded (not shown in table). This is an indication, that the maximum event rate could be pushed to higher values without changing the hardware but possibly by modifying software. Ideas exist, but this needs to be investigated further.
--
DietrichBeck - 30 January 2018