Richard Cochran
2014-11-25 11:32:08 UTC
//==================================
Once running up the iperf, stack may break after some seconds.
The stack is not broken. A transmit time stamp has gone missing. TheOnce running up the iperf, stack may break after some seconds.
real cause of the fault is is a driver issue (or hardware limitation).
ptp4l[247.934]: timed out while polling for tx timestamp
ptp4l[247.934]: increasing tx_timestamp_timeout may correct this issue, but it is likely caused by a driver bug
ptp4l[247.935]: port 1: send sync failed
...ptp4l[247.934]: increasing tx_timestamp_timeout may correct this issue, but it is likely caused by a driver bug
ptp4l[247.935]: port 1: send sync failed
LinuxPTP stack V1.4 break message show tx_timestamp timeout. The default "tx_timestamp timeout" value is 1ms, after increase tx_timestamp timeout
Value to 10ms, stack don't break, but convergence value has one jitter like more than 1000ns.
What do you mean by jitter?Value to 10ms, stack don't break, but convergence value has one jitter like more than 1000ns.
Using hardware time stamps, there should be no jitter, even with lost
time stamps. Sounds like a FEC driver bug is mixing up the time stamps.
Why V1.0 stack work fine, whether V1.4 stack has potential bug ?
The difference is in the way transmit time stamps are read. The PTPstack has to read the Tx time stamp from the socket's error queue, but
because time stamps can go missing at the hardware level, it also has
to give up after a certain timeout.
In v1.0, we specified "number of tries" in calling recvmsg(). But it
was hard to explain to people how to correctly choose the number of
tries. So in v1.2 we changed to a time based parameter. This has the
advantage of being easy to understand, but it also has the disadvantage
that the stack must always wait the dialed time, even if the time stamp
becomes available right away. This disadvantage comes from a limitation
in Linux kernels before 3.10.
So to fix this, you have three options.
1. Hack the v1.4 code in sk.c back to using "number of tries". This
will at least give you the old behavior.
2. Find and fix the bug in the FEC that causes Tx time stamps to be
delayed or lost when under load.
3. Use the latest git version of ptp4l together with kernel v3.10 or
newer. This lets you specify the timeout in milliseconds, but it
avoids any unnecessary delays.
HTH,
Richard