How-To: Check a FEC
Introduction
This How-To describes how to check a FEC works properly with respect to features provided by the GMT.
Test Plan
This is just a start ...
- Etherbone:
eb-ls dev/wbm0
- White Rabbit:
eb-mon -v dev/wbm0
- saftlib:
saft-ctl bla -fijs
- ECA:
saft-ctl bla -fvx snoop 0x0 0x0 0
(if nothing happens, inject an message locally)
- socat:
eb-ls tcp/...
Check Installation
ssh to the FEC. Use
eb-ls
to check that the stack of the GMT (FPGA, driver, wishbone driver, Etherbone) is working properly up to Etherbone.
[ruth@scuxl0815 ~]# eb-ls dev/wbm0
In case
eb-ls
throws an error at you, try to figure out what went wrong by checking boot messages.
- locally:
- use
dmesg
- check for drivers
wishbone
and pcie_wb
- remotely with Diagnostic Logging using 'graylog' (try your FEC name as logsource).
Saftlib
ssh to the FEC. Use some
saft-ctl
to check the full stack of the GMT (FPGA, drivers, Etherbone, dbus, saftlib) is working properly.
[ruth@scuxl0815 ~]# saft-ctl bla -fijs
In case
saft-ctl
throws errors at you
- check that the stack works up to Etherbone (see above).
- check log file -> /tmp/saftd.log
- If things up to Etherbone work, try to figure out what the problem might be
- on the FEC, use
ps
to check that "dbus-daemon" and "saftd" are running
- if not, check boot messages locally with "dmesg" or remotely with Diagnostic Logging
Versioning
The versions of gateware, etherbone (and drivers) and saftlib must match the ones of the
relevant release.
-
eb-info dev/wbm0
or saft-ctl bla -fk
: gateware
-
eb-mon -e dev/wbm0
: Etherbone (typically, drivers are rolled out together with Etherbone)
-
saft-ctl bla -fi
: saftlib
Check Functionality
Saftlib and ECA
Use
saft-ctl
to test the functionality of ECA and saftlib.
- open a 1st ssh session to the FEC and start snooping to all events
saft-ctl bla -fvx snoop 0x0 0x0 0
. In case no timing messages are received from a a remote data master, do the following
- open a 2nd ssh session to the FEC and inject a timing message to the input of the ECA
saft-ctl bla -fp inject 0xffff000000000000 0x0 0
- verify an action has been triggered in the 1st session
White Rabbit
ssh to the FEC. Use
eb-mon
to check a few things of White Rabbit.
[ruth@scuxl0815 ~]# eb-mon -v dev/wbm0
EB version / EB source: etherbone 2.1.0 (v2.1.0-4-g809617b): Apr 29 2016 02:26:04 / built by dbeck on Dec 16 2016 08:43:19 with asl742.acc.gsi.de running CentOS Linux release 7.2.1511 (Core)
WR_time - host_time [ms]: -1480581714573 // difference between timestamp of White Rabbit and host system. Should match the number of leap seconds * 1000.
Current TAI:1970-01-15 23:35:52 GMT // White Rabbit time stamp. Should match the current time with number of leap seconds subtracted.
Sync Status: TRACKING // White Rabbit synchronization status, here "track phase"
MAC: 00267b0003b2 // MAC of White Rabbit interface
Link Status: LINK_UP // link status of White Rabbit interface
IP: 192.168.16.175 // IP of White Rabbit interface
FAQ
Support for format ID 0x1 is done and will be rolled out with the next release (planned prior to Dry Run #2). As a hackish solution for DR #1, a tool
saft-ctl-dryRun1
is now available on all SCUs (if not: reboot).
My SCU does not get timing events after recovering from a power cut.
Since a few weeks, it is observed that SCUs recovering from a power cut fail to establish a White Rabbit link. Check on the SCU using
eb-mon
(see above). In case "Link Status" is not "LINK_UP", the FPGA must be restarted. Restart of the FPGA can be done either via power cycle or via ssh (see below)
Can I reset the FPGA of a SCU remotely?
How to reset a SCU via ssh
[ruth@scuxl0815 ~]# eb-reset dev/wbm0
--
DietrichBeck - 13 Feb 2018