(Some) Basics of Networking and White Rabbit
The prime source for information for
White Rabbit project is the
Open Hardware Repository. The aim of this page is to provide an overview on the requirements on the timing system at GSI and to present the main ideas for its design.
Requirement to the Timing System at FAIR
The key requirements of the Timing System are the following.
- Distribution of Timing Messages (when to do something) to the equipment. This implies prior configuring the equipment (what to do) with FESA via a different "slow" network. Timing Messages are transmitted via the dedicated timing network, where they have the highest priority.
- Timing Accuracy of at least 1ns.
- Broadcast timing messages a few hundred us ahead of execution time.
- Propagation time of a few hundred us through the network.
- Transport medium for different kinds of traffic.
- timing messages
- PTP (Realtime Ethernet, TAI time stamps)focuses on An accuracy of about 100 picoseconds per kilometer and a jitter in the low femtosecond range is achieved.
- network management
- for special cases: low priority traffic based on Ethernet (example: machine synchronization)
- Robustness: message loss < 10e-12 (less than one lost timing message per year).
- Scalability: about 2000 timing receivers.
- Link length: up to a couple of kilometers.
Ingredients of White Rabbit
It has been decided, that the timing system at FAIR will be based on the
White Rabbit (
WR) timing project, which is a shared development by CERN, GSI and others. WR employs the following techniques.
Precision Time Protocol
Today, the
Precision Timing Protocol (PTP) is defined in IEEE 1588-2008. PTP is a high-precision time protocol for synchronization residing on a local area network. Clock synchronization is achieved via a master-slave architecture. Delays in the link medium are measured and compensated as well as an offset between slave and master clock. Typical implementations exchange PTP messages via
UDP, but WR uses a different approach. WR also uses a
link delay model to compensate an asymmetry of the link delays (delays for up-link and down-link may differ). According to
NIST, sub-microsecond precision can be achieved with low-cost implementations.
Clock and Phase
For reaching nanosecond precision, WR combines PTP with
Synchronous Ethernet (SyncE) and phase measurement and adjustment. The main idea behind SyncE is to use the carrier frequency of OSI layer 1 for clock synchronization. WR uses
Gigabit Ethernet (GigE) with a carrier frequency of 125 MHz to adjust all nodes to the frequency of the same physical clock. In addition to the frequency, also the phase of the Ethernet carrier signals is precisely measured and adjusted. This allows for precision and jitter in the low two digit picoseconds range and requires dedicated WR switches.
Managing Priorities
The Timing System should be capable of transmitting timing messages in a deterministic way and provide infrastructure to common services like the beam interlock system at GSI and FAIR. To guarantee the highest priority for timing messages, the following measures are applied.
- Priority encoding on the level of Ethernet frames. This is done by IEEE 802.1Q, or VLAN tagging. Here, a 32-bit field (VLAN tag) is inserted into Ethernet II frames. The VLAN tag contains a 3 bit Priority Code Point (PCP), which allows priority values from 0 (best effort) to 7 (highest).
- Broadcasting. Timing Messages are distributed via broadcast, either as payload of UDP packets or as raw Ethernet frames. Broadcasting by the Data Master does not require acknowledgement of messages by the timing receivers. This improves real-time capabilities of the network, but additional measures for robustness need to be applied (see forward error correction below).
- Cut-through Switching. The Data Master sends all timing messages by broadcasting them with highest priority. The special switches of a WR network will employ cut-through switching to minimize the propagation time. Only the Data Master is allowed to broadcast Ethernet frames with highest priority. WR switches use cut-through to achieve low-latency for all Ethernet frames. No CRC check is performed on the switches and damaged packages are not dropped.
- Prioritizing timing messages is handled in the queues at the outgoing port. Timing messages are transmitted with shortest propagation time and highest priority. Network traffic with lower priority is also possible. Per default, there is no preemption at the outgoing ports and high priority messages will suffer delays once sending out a preceding message has already started.
Forward Error Correction
Timing message are broadcast without explicit acknowledge of their receipt by the timing receivers. To recover from bit errors or frame losses in the network, a
Forward Error Correction (FEC) technique is applied. FEC encoding is done by the Data Master, whereas decoding is done by the timing receivers. As a requirement for the reliability of the transmission from the Data Master to all timing receivers, at most one timing message may be lost per year.
Network Topology
As an example, 2000 nodes could be connected to the data master using five layers of WR switches (including the grandmaster clock). For reliability and to reduce possible downtime of the timing network, it would be desirable to implement the network using redundant links between the WR switches. As a conservative estimate, an upper bound latency of 100us per switch layer is assumed (500us upper bound latency with five layers of switches).
Background: Networking in the OSI Model
Many terms of networking are described within the
Open Systems Interconnection model (OSI model), which can be used to describe networking in the form of seven layers.
Layer |
Description |
Example |
7. Application |
Network process to application |
HTTP, SNMP, DHCP, DNS, ... |
6. Presentation |
Data representation, encryption and decryption, |
TLS/SSL, ... |
convert machine dependent data to machine independent data |
Mime, ... |
5. Session |
Interhost communication |
NetBIOS, ... |
4. Transport |
End-to-end connections and reliability, flow control |
TCP, UDP, ... |
3. Network |
Path determination and logical addressing |
IPv4, IPv6, ... |
2. Data Link |
Physical addressing |
MAC, LLC, (parts of) Ethernet -frame, ... |
1. Physical |
Media, signal and binary transmission |
pins, voltages, cables, (parts of) Ethernet -frame ... |
Only a limited set of examples is given in the table above. Another widely used model is the so called
TCP/IP model, which does not strictly fit into the OSI model. Moreover, the TCP/IP model does not use a link layer and is therefore no useful for the description of a Timing System which is linked to specific hardware.
--
DietrichBeck - 10 February 2020