How-To: Check a FEC
This How-To describes how to check a FEC works properly with respect to features provided by the GMT.
This is just a start ...
- White Rabbit:
eb-mon -v dev/wbm0
saft-ctl bla -fijs
saft-ctl bla -fvx snoop 0x0 0x0 0 (if nothing happens, inject an message locally)
ssh to the FEC. Use
to check that the stack of the GMT (FPGA, driver, wishbone driver, Etherbone) is working properly up to Etherbone.
[ruth@scuxl0815 ~]# eb-ls dev/wbm0
throws an error at you, try to figure out what went wrong by checking boot messages.
- check for drivers
- remotely with Diagnostic Logging using 'graylog' (try your FEC name as logsource).
ssh to the FEC. Use some
to check the full stack of the GMT (FPGA, drivers, Etherbone, dbus, saftlib) is working properly.
[ruth@scuxl0815 ~]# saft-ctl bla -fijs
throws errors at you
- check that the stack works up to Etherbone (see above).
- check log file -> /tmp/saftd.log
- If things up to Etherbone work, try to figure out what the problem might be
- on the FEC, use
ps to check that "dbus-daemon" and "saftd" are running
- if not, check boot messages locally with "dmesg" or remotely with Diagnostic Logging
The versions of gateware, etherbone (and drivers) and saftlib must match the ones of the relevant release
eb-info dev/wbm0 or
saft-ctl bla -fk : gateware
eb-mon -e dev/wbm0 : Etherbone (typically, drivers are rolled out together with Etherbone)
saft-ctl bla -fi : saftlib
Saftlib and ECA
to test the functionality of ECA and saftlib.
- open a 1st ssh session to the FEC and start snooping to all events
saft-ctl bla -fvx snoop 0x0 0x0 0 . In case no timing messages are received from a a remote data master, do the following
- open a 2nd ssh session to the FEC and inject a timing message to the input of the ECA
saft-ctl bla -fp inject 0xffff000000000000 0x0 0
- verify an action has been triggered in the 1st session
ssh to the FEC. Use
to check a few things of White Rabbit.
[ruth@scuxl0815 ~]# eb-mon -v dev/wbm0
EB version / EB source: etherbone 2.1.0 (v2.1.0-4-g809617b): Apr 29 2016 02:26:04 / built by dbeck on Dec 16 2016 08:43:19 with asl742.acc.gsi.de running CentOS Linux release 7.2.1511 (Core)
WR_time - host_time [ms]: -1480581714573 // difference between timestamp of White Rabbit and host system. Should match the number of leap seconds * 1000.
Current TAI:1970-01-15 23:35:52 GMT // White Rabbit time stamp. Should match the current time with number of leap seconds subtracted.
Sync Status: TRACKING // White Rabbit synchronization status, here "track phase"
MAC: 00267b0003b2 // MAC of White Rabbit interface
Link Status: LINK_UP // link status of White Rabbit interface
IP: 192.168.16.175 // IP of White Rabbit interface
Why does saft-ctl only support format ID 0x0 (deprecated hack!)
Support for format ID 0x1 is done and will be rolled out with the next release (planned prior to Dry Run #2). As a hackish solution for DR #1, a tool
is now available on all SCUs (if not: reboot).
My SCU does not get timing events after recovering from a power cut.
Since a few weeks, it is observed that SCUs recovering from a power cut fail to establish a White Rabbit link. Check on the SCU using
(see above). In case "Link Status" is not "LINK_UP", the FPGA must be restarted. Restart of the FPGA can be done either via power cycle or via ssh (see below)
Can I reset the FPGA of a SCU remotely?
How to reset a SCU via ssh
[ruth@scuxl0815 ~]# eb-reset dev/wbm0
- 13 Feb 2018