Gateway Data Master <-> UNILAC PZ (dm-unipz)
Introduction
dm-unipz
'dm-unipz' is the interface between the White Rabbit based Data Master und the MIL-based UNILAC 'Pulszentrale'. The task of dm-unipz is to synchronize the Data Master to the beams delivered by the UNILAC. Background information and further reading are available
- here, focus on UNIPZ
- 'Booster-Mode'
A dedicated How-To is available
here.
dm-unidm
With the
Injector Controls Upgrade, the legacy UNILAC control system is replaced by a new one based on the modern architecture used for FAIR. It is planned that both Data Masters are components of a so-called Data Master cluster. Here, all Data Masters can communicate with each other by exchanging information via a White Rabbit network. Although implemented in principle, the direct communication between Data Masters is not yet used. As quick solution the firmware 'dm-unidm' has been invented. Basically, this is a hacky version of dm-unipz, stripped from all MIL Devicebus communication. There is no communication from the 'Ring' Data Master to the 'UNILAC' Data Master. dm-unidm only serves to trigger the 'Ring' Data Master when EVT_READY_TO_SIS has been played at UNILAC. Then, dm-unidm terminates the 'wait' in the main thread and starts the injection thread in the 'Ring' Data Master.
dm-unidm is part of the dm-unipz project. It is built and deployed together. It uses the same command line tool. It runs on the same SCU as dm-unipz.
Switching between dm-unipz and dm-unidm just requires to exchange the lm32 firmware on the relevant SCU. This only requires renaming one symbolic link at nfs-init and to reboot the SCU. Please check the
how-to.
Overview
Figure: Overview on the interfaces of the gateway (see text).
An overview on the gateway is depicted in the figure above. The gateway is hosted by a SCU. Its "glue" is a firmware hosted in lm32 softcore in the FPGA. The softcore communicates to three Wishbone (WB) devices; a Etherbone Master (EBM), the Event-Condition-Action unit (ECA) and the MIL-Macro. The ECA serves to receives scheduled commands from the Data Master (DM) and executes on-time actions. The actions drive the activity of the firmware via events. The MIL-Macro provides two functionalities. First, it serves as a so-called MIL-Devicebus master to a bit I/O close to the UNILAC "Pulszentrale". Second, it receives events via the so-called MIL-Eventbus. Upon reception of the "UNI_READY" event, the MIL-Eventbus receiver generates TTL pulse that is connected to a LEMO input and subsequently timestamped in the Timsestamp Latch Unit (TLU) via the ECA. Finally the EBM serves to transmit replies to the DM. The firmware provides a dual-port RAM (not shown), which allows the software on the host system to communicate with the firmware in the lm32.
Figure: Context of the gateway (see text).
The context of "dm-unipz" is given in the figure above. The gateway and a Timing Receiver (TR) are connected to the Data Master (DM) via a White Rabbit network. The gateway furthermore connects to the UNILAC Puslzentrale via Devicebus (as master) and Eventbus (as slave). An oscilloscope displays two digital pulses generated by a MIL based TIF and a White Rabbit based TR.
Timing Messages
Starting with beam-time 2022, the so-called
booster mode shall be implemented. The relevant timing messages are listed in the table below.
Event Name |
Event Number |
short description |
Parameter Field |
Remark |
CMD_UNI_TCREQ |
0x15e |
request TK |
63..32 (N/A), 31..0 (DM dynpar0) |
dynpar0 contains the 32bit address of a block 'slow wait with timeout' |
CMD_UNI_TCREL |
0x15f |
release TK |
63..00 (N/A) |
|
CMD_UNI_BPREP |
0x161 |
prepare beam |
63..00 (N/A) |
the corresponding 'unprepare' is done, when beam from UNILAC has been received (or CMD_UNI_BREQ(_NOWAIT) failed) |
CMD_UNI_BREQ |
0x160 |
request beam |
63..32 (dynpar1), 31..16 (reserved), 15..8 (CPU Idx), 7..0 (thread Idx) |
upon beam delivery by UNILAC, this will terminate the 'slow wait' at DM and start a corresponding thread; dynpar1 contains the 32bit address of the thread origin |
CMD_UNI_BREQ_NOWAIT |
0x162 |
request beam |
63..32 (dynpar1), 31..16 (reserved), 15..8 (CPU Idx), 7..0 (thread Idx) |
upon beam delivery by UNILAC, this will start a corresponding thread; dynpar1 contains the 32bit address of the thread origin |
Table: Timing Messages used to control the DM-UNIPZ Gateway. The values
CPU Idx
and
thread Idx
are explicitly given by LSA as part of the schedule. The values
dynpar0
and
dynpar1
are indicated as edges in the LSA schedule but the values are written to the timing message by the DM firmware on-the fly during run-time.
Procedure
The following procedure is applied (somewhat simplified)
- the DM prepares the gateway via a Timing Message
- the DM tells the gateway to request the 'Transfer Kanal' (TK) via a Timing Message
- the gateway requests the TK from UNIPZ via Devicebus (Modulbus I/O)
- the gateway starts waiting for acknowledgement or timeout from UNIPZ
- UNIPZ signals an acknowledgement or ("not ok") after the TK has been prepared
- the gateway reads the acknowledgement from UNIPZ via Devibus (Modulbus I/O); otherwise: timeout
- the gateway instructs the DM to continue with its schedule
- loop (1 or more iterations driven by the DM following a schedule provided by LSA)
- the DM tells the gateway to request beam via a Timing Message
- the gateway requests beam from UNIPZ via Devicebus (Modulbus I/O)
- the gateway starts waiting for the MIL Event "READY_TO_SIS" or timeout
- UNIPZ sends a MIL Event "READY_TO_SIS" 10ms prior to beam delivery (or "not ok")
- the MIL-Macro of the gateway receives the Event "READY_TO_SIS"
- the MIL-Macro generates a TTL pulse
- (via a Lemo cable, the TTL pulse is guided to bidirectional I/O)
- the time of the incoming TTL pulse is latched via the Timestamp Latch Unit (TLU) connected to the ECA
- the ECA generates an event and an action towards the LM32 is triggered, indicating the "READY_TO_SIS" event to the lm32
- the lm32 receives the ECA event that includes the timestamp from the TLU, t_Evt
- the gateway adds an offset of exactly 1.5ms to the timestamp: t_flex = t_Evt + 1.5ms
- the gateway instructs the DM to continue the schedule exactly at t_flex
- the gateway releases the beam request at UNIPZ via Deviceubs (Modulbus I/O)
- the DM continues scheduling events starting exactly at t_flex. NB: this part of the schedule is not aligned to BuTiS T0 ticks but starts exactly at t_flex
- the beam is transferred from UNILAC to SIS18
- exactly 10ms after the MIL Event "READY_TO_SIS", and in coincidence with
- exactly 8.5ms after t_flex
- the DM tells the gateway to release the TK and continues with its 'normal' schedule. NB: from here on, the schedule is again aligned to BuTiS T0 ticks
- the gateway releases the TK at UNIPZ via Devicebus (Modulbus I/O)
Firmware
FSM
Figure: FSM of the firmware. Shown are states and transitions. Implicitly, all states may transit to the ERROR state (transitions not shown). Description see text.
The figure above depicts states and transitions of a Finite State Machine. As soon as the firmware is loaded in the lm32, it is in the initial
S0 state and performs a basic initialization. The states and their transitions are described below. For details on Entry-, Do- and Exit actions please check the source code.
- S0: Initial State. Firmware performs basic initialization.
- Initialization successful: automatic transition -> IDLE
- Initialization failed: automatic transition -> FATAL
- FATAL: This state is entered whenever a non-recoverable error is detected. Examples for such an error are missing ECA or MIL-Macro. It is impossible to recover from such a situation; this is a final state.
- IDLE: Basic (unconfigured) state. In this state the firmware does not react to MIL events or ECA actions. The firmware can only be controlled by commands via the DP-RAM. This state is also safe for uploading new firmware to the lm32 softcore.
- command "configure" -> CONFIGURED
- CONFIGURED: After undergoing the process of configuration within the entry-action, the firmware is configured.
- command "configure" -> CONFIGURED
- command "idle" -> IDLE
- command "startop" -> OPREADY"
- OPREADY: This should be the normal state for all operational situations including failed transfers from UNILAC to SIS (a failed transfer does not cause a transition to the ERROR state).
- command "stopop" -> STOPPING (-> CONFIGURED)
- STOPPING: This is an intermediate state handling a clean transition from OPREADY -> CONFIGURED. If in this state, there is an automatic transition to CONFIGURED.
- ERROR: This state is entered whenever a severe error is detected. Examples for such an error is a physically disconnected MIL Devicebus to the Bit I/O close to UNILAC PZ.
- command "recover" -> IDLE
- autorecovery mode : If in state ERROR, the FW tries autorecovery ERROR -> IDLE -> CONFIGURED -> OPREADY
Status
In December 2017, the gateway was deployed to the production system for the first time. The gateway SCU
- has a MIL Devicebus connection to the Modulbus I/O of the UNILAC Pulszentrale at LSB6.
- receives MIL Events from the UNILAC Pulszentrale at LSB6.
- receives Timing Messages from the Data Master at BG2
- communicates with the Data Master BG2 via the White Rabbit production network.
The gateway was operated between 19 December 2017 and 3 January 2018. About 492 thousands "dry injectios" from UNILAC Pulszentrale to the new White Rabbit based timing system have been achieved successfully. However, there are a couple of issues that need to be addressed.
issue |
value |
status |
description |
recommended action |
updated status May 2018 |
lm32 latency mean |
3us |
ok |
latency of lm32 to react on MIL events |
already considered in configuration of firmware |
irrelevant, use TLU timestamping |
lm32 latency jitter |
170ns sdev |
ok |
standard deviation, required 1us |
within specs, TLU would improve value |
N/A |
lm32 latency min |
2.7us |
ok |
shortest possible reaction time |
not required |
N/A |
lm32 latency max |
3.3us |
ok |
longest reaction time. |
not required |
N/A |
be aware of lm32 latency excess (see below) |
N/A |
lm32 latency max-min |
0.6us |
ok |
max range of jitter, required 1us |
within specs, TLU would improve value |
N/A |
be aware of lm32 latency excess (see below) |
N/A |
MIL Eventbus error |
< 1E-7 |
ok |
failure to receive MIL event (not observed) |
not required |
same |
GMT latency mean |
999.995us |
ok |
execution of first WR timing event after MIL |
set to specified value via lm32 config |
as specified |
event. Specified value 1000.0us |
GMT latency jitter |
170ns sdev |
ok |
standard deviation, required 1us, ok |
within specs, TLU would improve value |
better than 100 ns (TLU timestamping) |
GMT latency min |
999.7us |
ok |
shortest reaction time |
not required |
N/A |
GMT latency max |
1000.3us |
ok |
longest reaction time. |
not required |
N/A |
be aware of lm32 latency excess (see below) |
|
GMT latency max-min |
0.6us |
ok |
max range of jitter, required 1us |
within specs, TLU would improve value |
better then 200 ns (TLU timestamping) |
be aware of lm32 latency excess (see below) |
|
lm32 latency excess |
1E-4 |
not ok |
Wishbone bus blocked due to CPU access which |
TLU for latching time of MIL event |
N/A |
causes a latency around 50us. Rate depends |
|
on CPU program. Results in partial or total |
|
beam loss during transfer. |
|
MIL Devicebus error |
3E-5 |
not ok |
error in communication with modulbus I/O |
try MIL expander for device bus |
MIL expander implemented |
Results in failure of transfer (failed beam |
|
request, loss of beam, dry cycle...) |
|
EB read error |
5E-5 |
not ok |
timeout error when reading from Data Master |
Forward Error Correction |
'2nd chance' implemented |
via timing network. Results in beam loss. |
|
Workaround: try 2nd read in case of timeout |
|
avoids deadlock (but still beam loss) |
|
EB write error |
2E-5 |
fatal |
error when writing to Data Master |
Forward Error Correction |
timeout handling in DM implemented |
Presently, this results in an unrecoverable |
Workaround: timeout handling at DM |
|
"deadlock" halting the Data Masters thread. |
|
total error |
2E-4 |
bad |
Under the assumption "EB write erros" can be |
see above |
expected to have improved |
handled by timeout treatment, the present |
(needs to be remeasured) |
failure rate is about 2E-4: With injections |
|
at 1 Hz, one injection per hour fails. |
|
Table: Table with performance data and issues based on 492000 "dry" injections from UNILAC to SIS18 (more details see text).
The table above presents current performance performance and issues with respect to the transfer of beam from the UNILAC to the new White Rabbit based Data Master. For the data presented in the table, the DM-UNILAC gateway was operated for an extended period of time over Christmas and New Year 2017/2018. The latency numbers are identical to the ones measured with the integration setup in the Programmentwicklungsraum, as expected. In the following, relevant issues are discussed.
lm32 latency excess
The firmware polls the Wishbone
GSI_MIL_SCU
for the incoming MIL event EVT_READY_TO_SIS. While the jitter values for unperturbed operation are ok, the upper latency is sometimes drastically increased to values of 50us and more. This happens, if the Wishbone connection from lm32 to
GSI_MIL_SCU
is blocked by a third party. As an example, this happens if the status of gateway lm32 is read from the host system while a transfer is in progress. A possible solution is timestamping of the MIL event using the TLU of a timing receiver. This would also further reduce the jitter and the spread of min-max values.
EB read/write errors
Those errors are not observed in the integration system in the Programmentwicklungsraum, but only in the production system. The most likely cause are WR switches that occasionally drop the low priority packets between the gateway and the Data Master in favour of high priority traffic of the timing system itself. To some extent, the error rate might be enhanced due to the present (December 2017) operation mode of the timing system.
MIL Devicebus errors
Those errors are not observed in the integration system in the Programmentwicklungsraum, but only in the production system. This needs to be investigated. The simplest cause might be too long MIL cables between BG2 and LSB6. In this case, use of MIL expanders might cure the problem.
Summary
From the point of view of the gateway, a transfer of beam from UNILAC to SIS18 can be achieved. The communication between the UNILAC Pulszentrale, the gateway and the Data Master can be achieved with a time of exactly 1000.0(6) us as specified (specs: 1ms fixed latency and 1us uncertainty). The scenario has been tested in the production system and the real UNILAC Pulszentrale. To date (January 2018), the total error rate is about 2E-4 and there is good hope that this can be reduced further. Most critical are EB write errors, that can presently not be catched and result in a halt of the facility.
--
DietrichBeck - 25 January 2022