Discussion:
[Linuxptp-devel] Limited success with hardware PTP
Gary E. Miller
2015-02-26 00:07:05 UTC
Permalink
Yo All!

Different day, new results.

I just got two of these:

Intel Corporation 82574L Gigabit Network Connection

They use the e1000e driver, but report fewer capabilities to ethtool
than my i217-LM:

kong ~ # ethtool -T eth2
Time stamping parameters for eth2:
Capabilities:
hardware-transmit (SOF_TIMESTAMPING_TX_HARDWARE)
software-transmit (SOF_TIMESTAMPING_TX_SOFTWARE)
hardware-receive (SOF_TIMESTAMPING_RX_HARDWARE)
software-receive (SOF_TIMESTAMPING_RX_SOFTWARE)
software-system-clock (SOF_TIMESTAMPING_SOFTWARE)
hardware-raw-clock (SOF_TIMESTAMPING_RAW_HARDWARE)
PTP Hardware Clock: 2
Hardware Transmit Timestamp Modes:
off (HWTSTAMP_TX_OFF)
on (HWTSTAMP_TX_ON)
Hardware Receive Filter Modes:
none (HWTSTAMP_FILTER_NONE)
all (HWTSTAMP_FILTER_ALL)

Hardware mode seems to work with ntpd:

# sh -v Do-ptp-test

# killall ptp4l phc2sys
ptp4l: no process found
phc2sys: no process found
# killall ptp4l phc2sys
ptp4l: no process found
phc2sys: no process found
# cat ptp.conf
[global]
clock_servo linreg
uds_address /var/run/ptp4l

# ptp4l -i eth2 -l 6 -m -f ptp.conf &
# sleep 3
ptp4l[1075.704]: selected /dev/ptp2 as PTP clock
ptp4l[1075.704]: driver changed our HWTSTAMP options
ptp4l[1075.705]: tx_type 1 not 1
ptp4l[1075.705]: rx_filter 1 not 12
ptp4l[1075.705]: port 1: INITIALIZING to LISTENING on INITIALIZE
ptp4l[1075.705]: port 0: INITIALIZING to LISTENING on INITIALIZE
ptp4l[1076.211]: port 1: new foreign master 003048.fffe.345fe2-1
# phc2sys -a -r -E ntpshm -m -M 2
phc2sys[1079.705]: reconfiguring after port state change
phc2sys[1079.705]: selecting eth2 for synchronization
phc2sys[1079.705]: nothing to synchronize
ptp4l[1080.211]: selected best master clock 003048.fffe.345fe2
ptp4l[1080.211]: foreign master not using PTP timescale
ptp4l[1080.211]: port 1: LISTENING to UNCALIBRATED on RS_SLAVE
phc2sys[1080.705]: port 002590.fffe.f355db-1 changed state
phc2sys[1080.705]: reconfiguring after port state change
phc2sys[1080.705]: master clock not ready, waiting...
ptp4l[1082.226]: master offset -18727 s0 freq +12157 path delay 37953
ptp4l[1083.226]: master offset -21606 s0 freq +12157 path delay 37953
ptp4l[1084.226]: master offset -30465 s0 freq +12157 path delay 43424
ptp4l[1085.226]: master offset -33206 s1 freq +6288 path delay 43424
ptp4l[1086.226]: master offset -4179 s2 freq +2178 path delay 43424
ptp4l[1086.226]: port 1: UNCALIBRATED to SLAVE on MASTER_CLOCK_SELECTED
phc2sys[1086.705]: port 002590.fffe.f355db-1 changed state
phc2sys[1086.705]: reconfiguring after port state change
phc2sys[1086.705]: selecting CLOCK_REALTIME for synchronization
phc2sys[1086.705]: selecting eth2 as the master clock
phc2sys[1086.706]: phc offset 2017 s0 freq +0 delay 6582
ptp4l[1087.226]: master offset 2814 s2 freq +8577 path delay 43424
phc2sys[1087.706]: phc offset -1975 s0 freq +0 delay 6513
ptp4l[1088.226]: master offset 8420 s2 freq +14060 path delay 38989
phc2sys[1088.706]: phc offset 48 s0 freq +0 delay 6556
ptp4l[1089.226]: master offset -6617 s2 freq +5060 path delay 48896
phc2sys[1089.706]: phc offset 558 s0 freq +0 delay 6527
ptp4l[1090.226]: master offset 1944 s2 freq +8547 path delay 44804
phc2sys[1090.706]: phc offset -2116 s0 freq +0 delay 6113
ptp4l[1091.226]: master offset 6543 s2 freq +12865 path delay 40713
phc2sys[1091.706]: phc offset -420 s0 freq +0 delay 6656
ptp4l[1092.226]: master offset 1050 s2 freq +14433 path delay 42385
phc2sys[1092.706]: phc offset 3724 s0 freq +0 delay 6120
phc2sys[1092.791]: phc offset -1882 s0 freq +0 delay 19735

So the times look good.

ntpmon shows ntpshm getting good times:

kong gpsd # ntpmon
sample NTP2 1424906561.648842149 1424906561.544090246 1424906561.544098174 0 -30
sample NTP2 1424906562.545280731 1424906562.544236791 1424906562.544243340 0 -30
sample NTP2 1424906563.545125632 1424906563.544377748 1424906563.544381591 0 -30
sample NTP2 1424906564.545161448 1424906564.544545477 1424906564.544546037 0 -30
sample NTP2 1424906564.965674501 1424906564.965073008 1424906564.965078572 0 -30
sample NTP2 1424906577.747901488 1424906577.746959372 1424906577.746957355 0 -30
sample NTP2 1424906578.747597601 1424906578.747086914 1424906578.747088889 0 -30
sample NTP2 1424906579.748152844 1424906579.747224639 1424906579.747224591 0 -30
sample NTP2 1424906580.747688039 1424906580.747367552 1424906580.747366994 0 -30
sample NTP2 1424906581.748464762 1424906581.747523310 1424906581.747525426 0 -30
sample NTP2 1424906582.748521691 1424906582.747689166 1424906582.747689586 0 -30
sample NTP2 1424906583.748467526 1424906583.747841769 1424906583.747838045 0 -30
sample NTP2 1424906583.833281041 1424906583.832562220 1424906583.832564102 0 -30

But nothing showing up in chronyc???

Hmm:

# ipcs -m

------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
0x4e545030 0 root 600 96 2
0x4e545031 32769 root 600 96 2
0x4e545032 65538 root 600 96 2
^^^

Ah. I started phc2sys before chronyd and phc2sys created the ntpshm with
the wrong permissions! NTP2 should be perms 666, not 600. That should
be an easy bug to fix.

So, stop everything, and change to a new ntpd that can read from a 600

And now ntpd is reading, and syncing, to my SHM.

My host did not change, no software updates, no change to my test scripts,
and the same e1000e driver. The only change is the replacement of the
i217 -LM with the 82574L NIC.

So, that's two bugs to fix: broken i217-LM, and broken SHM permissions.

Until the SHM is fixed working with chronyd is problematic for SHMs 2
and up.

And until the i217-LM is fixed it should be documented in the README
as broken.

To soon to tell, but the jitter seems pretty aweful. Maybe 2 mSec?
Time will tell, maybe it will settle down. I had hoped hardware mode
would actually be better than software mode. Maybe it will converge
over night? In general ntpd converges much worse than chronyd so that
could be part of the issue.

RGDS
GARY
---------------------------------------------------------------------------
Gary E. Miller Rellim 109 NW Wilmington Ave., Suite E, Bend, OR 97701
***@rellim.com Tel:+1(541)382-8588
Keller, Jacob E
2015-02-26 00:32:27 UTC
Permalink
Hi,
Post by Gary E. Miller
Yo All!
Different day, new results.
Looks like much better results. That is at least good for solving this
issue.
Post by Gary E. Miller
Intel Corporation 82574L Gigabit Network Connection
They use the e1000e driver, but report fewer capabilities to ethtool
kong ~ # ethtool -T eth2
hardware-transmit (SOF_TIMESTAMPING_TX_HARDWARE)
software-transmit (SOF_TIMESTAMPING_TX_SOFTWARE)
hardware-receive (SOF_TIMESTAMPING_RX_HARDWARE)
software-receive (SOF_TIMESTAMPING_RX_SOFTWARE)
software-system-clock (SOF_TIMESTAMPING_SOFTWARE)
hardware-raw-clock (SOF_TIMESTAMPING_RAW_HARDWARE)
PTP Hardware Clock: 2
off (HWTSTAMP_TX_OFF)
on (HWTSTAMP_TX_ON)
none (HWTSTAMP_FILTER_NONE)
all (HWTSTAMP_FILTER_ALL)
Yea, in general all you really want is HWTSTAMP_FILTER_ALL, it's a much
better implementation.

At any rate, I would love to see the full output of just ptp4l running
on the i217-LM without phc2sys and so forth interfering. Mostly I would
like to get enough information to see if we can rootcause why that
particular hardware was failing.

You might try the out of tree sourceforge module for e1000e on the Intel
e1000e sourceforge page, because it may (or may not) have bug fixes
compared to upstream. I honestly don't know what state upstream e1000e
is vs the source forge driver.

Regards,
Jake
Gary E. Miller
2015-02-26 00:48:52 UTC
Permalink
Yo Jacob E!

On Thu, 26 Feb 2015 00:32:27 +0000
Post by Keller, Jacob E
Yea, in general all you really want is HWTSTAMP_FILTER_ALL, it's a
much better implementation.
So what is the minimmum for hardware mode timestamping? Like this?

HWTSTAMP_TX_OFF
HWTSTAMP_TX_ON
HWTSTAMP_FILTER_ALL
Post by Keller, Jacob E
At any rate, I would love to see the full output of just ptp4l running
on the i217-LM without phc2sys and so forth interfering. Mostly I
would like to get enough information to see if we can rootcause why
that particular hardware was failing.
That was in my email to linuxptp and you yesterday at 14:54.

Want me to forward a copy?
Post by Keller, Jacob E
You might try the out of tree sourceforge module for e1000e on the
Intel e1000e sourceforge page, because it may (or may not) have bug
fixes compared to upstream. I honestly don't know what state upstream
e1000e is vs the source forge driver.
No commits to e1000e n sourceforge since June 2014. The 3.19 kernel
files are Dec 2014.

Many NIC choices in the world, I'm not gonna waste my time on any one of
them.

RGDS
GARY
---------------------------------------------------------------------------
Gary E. Miller Rellim 109 NW Wilmington Ave., Suite E, Bend, OR 97701
***@rellim.com Tel:+1(541)382-8588
Keller, Jacob E
2015-02-26 01:01:36 UTC
Permalink
Hi,
Post by Gary E. Miller
Yo Jacob E!
On Thu, 26 Feb 2015 00:32:27 +0000
Post by Keller, Jacob E
Yea, in general all you really want is HWTSTAMP_FILTER_ALL, it's a
much better implementation.
So what is the minimmum for hardware mode timestamping? Like this?
HWTSTAMP_TX_OFF
HWTSTAMP_TX_ON
HWTSTAMP_FILTER_ALL
HWTSTAMP_TX_OFF is always supported. HWTSTAMP_TX_ON is required for
Hardware Tx timestamps.

HWTSTAMP_FILTER_ALL is best, otherwise you need the requisite modes for
your configuration.

Mostly for V2 layer4, with End to end, you need
HWTSTAMP_FILTER_V2_SYNC and
HWTSTAMP_FILTER_V2_DELAY_REQ

For P2P delay protocol you need to be able to timestamp PDELAY_REQ and
PDELAY_RESPONSE messages,

and for L2 mode you have to be able to timestamp L2 equivalents of
these. The most general non-timestamp-all mode is

HWTSTAMP_FILTER_V2_EVENT

ptp4l will try HWTSTAMP_FILTER_ALL if its available and degrade to more
general filters until it finds either a working combination or exits
saying required mode isn't supported.
Post by Gary E. Miller
Post by Keller, Jacob E
At any rate, I would love to see the full output of just ptp4l running
on the i217-LM without phc2sys and so forth interfering. Mostly I
would like to get enough information to see if we can rootcause why
that particular hardware was failing.
That was in my email to linuxptp and you yesterday at 14:54.
Want me to forward a copy?
The output you had there didn't showcase the actual failure with the
clockcheck showing a massive change in the clock. Either it didn't run
long enough or the failure case was triggered by phc2sys or some other
setup.
Post by Gary E. Miller
Post by Keller, Jacob E
You might try the out of tree sourceforge module for e1000e on the
Intel e1000e sourceforge page, because it may (or may not) have bug
fixes compared to upstream. I honestly don't know what state upstream
e1000e is vs the source forge driver.
No commits to e1000e n sourceforge since June 2014. The 3.19 kernel
files are Dec 2014.
Many NIC choices in the world, I'm not gonna waste my time on any one of
them.
Yep. Well, I'll try to forward what we do have to validation for that
part here. Thanks for the effort so far at least :) I understand that it
isn't worth too much effort on your end.

I am glad that you were able to get to at least a somewhat sane setup
finally.

Regards,
Jake
Post by Gary E. Miller
RGDS
GARY
---------------------------------------------------------------------------
Gary E. Miller Rellim 109 NW Wilmington Ave., Suite E, Bend, OR 97701
Gary E. Miller
2015-02-26 01:47:12 UTC
Permalink
Yo Jacob E!

On Thu, 26 Feb 2015 01:01:36 +0000
Post by Keller, Jacob E
Post by Gary E. Miller
So what is the minimmum for hardware mode timestamping? Like this?
HWTSTAMP_TX_OFF
HWTSTAMP_TX_ON
HWTSTAMP_FILTER_ALL
HWTSTAMP_TX_OFF is always supported. HWTSTAMP_TX_ON is required for
Hardware Tx timestamps.
HWTSTAMP_FILTER_ALL is best, otherwise you need the requisite modes
for your configuration.
Mostly for V2 layer4, with End to end, you need
HWTSTAMP_FILTER_V2_SYNC and
HWTSTAMP_FILTER_V2_DELAY_REQ
For P2P delay protocol you need to be able to timestamp PDELAY_REQ and
PDELAY_RESPONSE messages,
and for L2 mode you have to be able to timestamp L2 equivalents of
these. The most general non-timestamp-all mode is
HWTSTAMP_FILTER_V2_EVENT
ptp4l will try HWTSTAMP_FILTER_ALL if its available and degrade to
more general filters until it finds either a working combination or
exits saying required mode isn't supported.
I'm trying to make this real simple. :-)

So, if HWTSTAMP_TX_ON is present, can I know the NIC should be supported
for hardware time?
Post by Keller, Jacob E
The output you had there didn't showcase the actual failure with the
clockcheck showing a massive change in the clock. Either it didn't run
long enough or the failure case was triggered by phc2sys or some other
setup.
Agreed. But the php2sys failure always happens in under 60 seconds and
I could never get the failure to happen with just ptp4l. Since I could
never duplicate the failure in ptp4l mode nothing to show.
Post by Keller, Jacob E
Post by Gary E. Miller
Many NIC choices in the world, I'm not gonna waste my time on any
one of them.
Yep. Well, I'll try to forward what we do have to validation for that
part here. Thanks for the effort so far at least :) I understand that
it isn't worth too much effort on your end.
Where can I send my consulting bill? :-)

If some engineer, that would really fix something, would look at it
I would revisit the part.
Post by Keller, Jacob E
I am glad that you were able to get to at least a somewhat sane setup
finally.
I have two now. More on that in another email.

BTW, are you in Hillsboro?

RGDS
GARY
---------------------------------------------------------------------------
Gary E. Miller Rellim 109 NW Wilmington Ave., Suite E, Bend, OR 97701
***@rellim.com Tel:+1(541)382-8588
Richard Cochran
2015-02-26 07:29:08 UTC
Permalink
Post by Gary E. Miller
I'm trying to make this real simple. :-)
So, if HWTSTAMP_TX_ON is present, can I know the NIC should be supported
for hardware time?
Simple answer: Pick a card that offers HWTSTAMP_TX_ON and
HWTSTAMP_FILTER_V2_EVENT.
Post by Gary E. Miller
Agreed. But the php2sys failure always happens in under 60 seconds and
I could never get the failure to happen with just ptp4l. Since I could
never duplicate the failure in ptp4l mode nothing to show.
@Jason: Got private email this week from another person using the
I217-LM. Here is what they wrote:

The offset all of a sudden jumps be 40000+ seconds. I would think
that if it was an issue with just reading the timer that the servo
would help throw out spurious values like that, so I suspect we are
somehow actually corrupting the timer. We wrote our own little test
program that only calls get time on the PTP timer, and when we call
it more frequently than 1us is when we really start corrupting the
timer with a vengeance.

So that gives a clear cut test case that triggers the bug.

Thanks,
Richard
Gary E. Miller
2015-02-26 08:18:48 UTC
Permalink
Yo Richard!

On Thu, 26 Feb 2015 08:29:08 +0100
Post by Richard Cochran
Post by Gary E. Miller
I'm trying to make this real simple. :-)
So, if HWTSTAMP_TX_ON is present, can I know the NIC should be
supported for hardware time?
Simple answer: Pick a card that offers HWTSTAMP_TX_ON and
HWTSTAMP_FILTER_V2_EVENT.
So the three I have on the recommmended list are not good? Yeah, I
gotta agree.

Any idea what a suitable card may be?

RGDS
GARY
---------------------------------------------------------------------------
Gary E. Miller Rellim 109 NW Wilmington Ave., Suite E, Bend, OR 97701
***@rellim.com Tel:+1(541)382-8588
Richard Cochran
2015-02-26 09:08:05 UTC
Permalink
Post by Gary E. Miller
So the three I have on the recommmended list are not good? Yeah, I
gotta agree.
You have these three, I217-LM, 82574, and I210?

- I217-LM HW bug
- 82574 not a great ptp design
- I210 best PCIe card I have tried
Post by Gary E. Miller
Any idea what a suitable card may be?
In my experience, the two best Intel cards are the 82580 and the
i210. The i210 has HW inputs and outputs on a 6 pin header. That
opens up some interesting applications.

Cheers,
Richard
Gary E. Miller
2015-02-26 20:08:51 UTC
Permalink
Yo Richard!

On Thu, 26 Feb 2015 10:08:05 +0100
Post by Richard Cochran
Post by Gary E. Miller
So the three I have on the recommmended list are not good? Yeah, I
gotta agree.
You have these three, I217-LM, 82574, and I210?
- I217-LM HW bug
- 82574 not a great ptp design
- I210 best PCIe card I have tried
Yup. And my hardware timestamping experience:

- I217-LM HW bug
- 82574 6 mSec or worse jitter
- I210 300 to 900 mSec persistent offset
Post by Richard Cochran
Post by Gary E. Miller
Any idea what a suitable card may be?
In my experience, the two best Intel cards are the 82580 and the
i210.
Well, i210 not working so well or me.
Post by Richard Cochran
The i210 has HW inputs and outputs on a 6 pin header. That
opens up some interesting applications.
Hmm, I just checked the docs on my Supermicro X10SAE. Not such header.

I guess I'll order an 82580...

RGDS
GARY
---------------------------------------------------------------------------
Gary E. Miller Rellim 109 NW Wilmington Ave., Suite E, Bend, OR 97701
***@rellim.com Tel:+1(541)382-8588
Vick, Matthew
2015-02-26 21:16:21 UTC
Permalink
Post by Gary E. Miller
Yo Richard!
On Thu, 26 Feb 2015 10:08:05 +0100
Post by Richard Cochran
Post by Gary E. Miller
So the three I have on the recommmended list are not good? Yeah, I
gotta agree.
You have these three, I217-LM, 82574, and I210?
- I217-LM HW bug
- 82574 not a great ptp design
- I210 best PCIe card I have tried
- I217-LM HW bug
- 82574 6 mSec or worse jitter
- I210 300 to 900 mSec persistent offset
Woah. I wouldn't have expected that (at least, I've never seen that with
I210). That's approaching "broken BIOS/platform" levels of bad. If you
could provide some more information (test environment, what you're
connecting to, driver version, kernel version, platform, yadda yadda) for
I210/igb we will forward it along internally.

One other tidbit is that I210 supports EEE, which can affect jitter,
although I wouldn't expect it on that level. You can try turning this off
via ethtool (ethtool --set-eee ethX eee off) to see if that helps.

Cheers,
Matthew
Gary E. Miller
2015-02-26 21:31:33 UTC
Permalink
Yo Matthew!

On Thu, 26 Feb 2015 21:16:21 +0000
Post by Vick, Matthew
Post by Richard Cochran
- I217-LM HW bug
- 82574 6 mSec or worse jitter
- I210 300 to 900 mSec persistent offset
Woah. I wouldn't have expected that (at least, I've never seen that
with I210). That's approaching "broken BIOS/platform" levels of bad.
If you could provide some more information (test environment, what
you're connecting to, driver version, kernel version, platform, yadda
yadda) for I210/igb we will forward it along internally.
I'll be out for the weekend, but briefly, here is my config:

# killall ptp4l phc2sys
# killall ptp4l phc2sys
# cat ptp.conf
[global]
clock_servo linreg
uds_address /var/run/ptp4l

# ptp4l -i eth1 -l 6 -m -f ptp.conf &
# sleep 3
# phc2sys -a -r -E ntpshm -m -M 2

Here is what chrony sees after running for 18 hours.

This is the I210:

#x SHM2 0 4 377 8 -212ms[ -212ms] +/- 1000ns

This is a local reference clock over NTP

^* spidey.rellim.com 1 8 377 135 +10us[ +14us] +/- 182us

I do not have time now, but I have directly connnected a PPS source
previously and the spidey time is correct. Spidey is PPS synced (jitter
~200 nSec) and my local grand master. Other timestamp software slaves
of spidey see about 6 uSec or less of jitter and small offset. So spidey
is rock solid in the uSec range.
Post by Vick, Matthew
One other tidbit is that I210 supports EEE, which can affect jitter,
although I wouldn't expect it on that level. You can try turning this
off via ethtool (ethtool --set-eee ethX eee off) to see if that helps.
I'll try that next week. But my problem is offset. Plus chronyd doing
somthing wrong as no way there is 1000 nSec jitter, I can see the offset
jumping around!

RGDS
GARY
---------------------------------------------------------------------------
Gary E. Miller Rellim 109 NW Wilmington Ave., Suite E, Bend, OR 97701
***@rellim.com Tel:+1(541)382-8588
Miroslav Lichvar
2015-02-27 06:48:09 UTC
Permalink
Post by Gary E. Miller
#x SHM2 0 4 377 8 -212ms[ -212ms] +/- 1000ns
This is a local reference clock over NTP
^* spidey.rellim.com 1 8 377 135 +10us[ +14us] +/- 182us
I do not have time now, but I have directly connnected a PPS source
previously and the spidey time is correct. Spidey is PPS synced (jitter
~200 nSec) and my local grand master. Other timestamp software slaves
of spidey see about 6 uSec or less of jitter and small offset. So spidey
is rock solid in the uSec range.
I don't see any problems like that on my testing system with i210
(kernel 3.17.8). The PTP time with HW timestamping agrees with an NTP
source (synchronized by GPS PPS) to few microseconds.
Post by Gary E. Miller
Post by Vick, Matthew
One other tidbit is that I210 supports EEE, which can affect jitter,
although I wouldn't expect it on that level. You can try turning this
off via ethtool (ethtool --set-eee ethX eee off) to see if that helps.
I'll try that next week. But my problem is offset. Plus chronyd doing
somthing wrong as no way there is 1000 nSec jitter, I can see the offset
jumping around!
The default precision of the SHM refclock in chrony is 1 microsecond,
it won't report jitter smaller than that. Add "precision 1e-9" to the
SHM line in your chrony.conf to fix that. The configuration files
generated by timemaster include that. Ideally chronyd would be using
the value from the SHM samples, but some clients didn't set this
correctly, so it is currently ignored.
--
Miroslav Lichvar
Gary E. Miller
2015-03-03 00:29:37 UTC
Permalink
Yo Miroslav!

On Fri, 27 Feb 2015 07:48:09 +0100
Post by Miroslav Lichvar
Post by Gary E. Miller
Post by Vick, Matthew
One other tidbit is that I210 supports EEE, which can affect
jitter, although I wouldn't expect it on that level. You can try
turning this off via ethtool (ethtool --set-eee ethX eee off) to
see if that helps.
I'll try that next week. But my problem is offset. Plus chronyd
doing somthing wrong as no way there is 1000 nSec jitter, I can see
the offset jumping around!
The default precision of the SHM refclock in chrony is 1 microsecond,
it won't report jitter smaller than that. Add "precision 1e-9" to the
SHM line in your chrony.conf to fix that.
Ah, that explains a lot. Will that fix the jitter computation?
Post by Miroslav Lichvar
Ideally chronyd would be using
the value from the SHM samples, but some clients didn't set this
correctly, so it is currently ignored.
Easy to handle. ntpd sets the precision to -20 when it opens the
shm. If the client never updates the precision that is now OK. If
the client does update the precision even better.

RGDS
GARY
---------------------------------------------------------------------------
Gary E. Miller Rellim 109 NW Wilmington Ave., Suite E, Bend, OR 97701
***@rellim.com Tel:+1(541)382-8588
Miroslav Lichvar
2015-03-03 08:31:58 UTC
Permalink
Post by Gary E. Miller
Post by Miroslav Lichvar
The default precision of the SHM refclock in chrony is 1 microsecond,
it won't report jitter smaller than that. Add "precision 1e-9" to the
SHM line in your chrony.conf to fix that.
Ah, that explains a lot. Will that fix the jitter computation?
Yes, the +/- value in the chronyc sources output should be smaller
than 1 us now. It's mostly a cosmetic issue, it likely won't have any
noticeable effect on the synchronization. The precision is the minimum
allowed value of dispersion to avoid zero dispersion with low
resolution refclocks, e.g. with microsecond SHM it is possible to get
several samples with 0 offset, the dispersion would be zero and it
would then break all kinds of things.
--
Miroslav Lichvar
Gary E. Miller
2015-03-03 08:38:41 UTC
Permalink
Yo Miroslav!

On Tue, 3 Mar 2015 09:31:58 +0100
Post by Miroslav Lichvar
Post by Gary E. Miller
Post by Miroslav Lichvar
The default precision of the SHM refclock in chrony is 1
microsecond, it won't report jitter smaller than that. Add
"precision 1e-9" to the SHM line in your chrony.conf to fix that.
Ah, that explains a lot. Will that fix the jitter computation?
Yes, the +/- value in the chronyc sources output should be smaller
than 1 us now. It's mostly a cosmetic issue, it likely won't have any
noticeable effect on the synchronization. The precision is the minimum
allowed value of dispersion to avoid zero dispersion with low
resolution refclocks, e.g. with microsecond SHM it is possible to get
several samples with 0 offset, the dispersion would be zero and it
would then break all kinds of things.
Oh, great.

First, why does chronyd not support uSec SHM? (I usually use the SOCK)

Two, so if I make it perfect (zero offset) things break?

RGDS
GARY
---------------------------------------------------------------------------
Gary E. Miller Rellim 109 NW Wilmington Ave., Suite E, Bend, OR 97701
***@rellim.com Tel:+1(541)382-8588
Miroslav Lichvar
2015-03-03 09:28:37 UTC
Permalink
Post by Gary E. Miller
Post by Miroslav Lichvar
Post by Gary E. Miller
Ah, that explains a lot. Will that fix the jitter computation?
Yes, the +/- value in the chronyc sources output should be smaller
than 1 us now. It's mostly a cosmetic issue, it likely won't have any
noticeable effect on the synchronization. The precision is the minimum
allowed value of dispersion to avoid zero dispersion with low
resolution refclocks, e.g. with microsecond SHM it is possible to get
several samples with 0 offset, the dispersion would be zero and it
would then break all kinds of things.
Oh, great.
First, why does chronyd not support uSec SHM? (I usually use the SOCK)
It does support both microsecond and nanosecond resolution in SHM. The
default value of precision is set for microseconds, because using
nanosecond resolution with microsecond precision is much better than
using microsecond resolution with nanosecond precision as some samples
could have three orders of magnitude smaller dispersion that others.
Post by Gary E. Miller
Two, so if I make it perfect (zero offset) things break?
No, I was just trying to explain that the refclock precision prevents
that problem. If several consecutive refclock samples have identical
offset, the calculated dispersion might be zero, but it will be set
to the precision before accumulating the sample to avoid division by
zero etc.
--
Miroslav Lichvar
Gary E. Miller
2015-03-03 19:44:47 UTC
Permalink
Yo Miroslav!

On Tue, 3 Mar 2015 10:28:37 +0100
Post by Miroslav Lichvar
Post by Gary E. Miller
Post by Miroslav Lichvar
Post by Gary E. Miller
Ah, that explains a lot. Will that fix the jitter computation?
Yes, the +/- value in the chronyc sources output should be smaller
than 1 us now. It's mostly a cosmetic issue, it likely won't have
any noticeable effect on the synchronization. The precision is
the minimum allowed value of dispersion to avoid zero dispersion
with low resolution refclocks, e.g. with microsecond SHM it is
possible to get several samples with 0 offset, the dispersion
would be zero and it would then break all kinds of things.
Oh, great.
First, why does chronyd not support uSec SHM? (I usually use the SOCK)
It does support both microsecond and nanosecond resolution in SHM.
Very odd. That is not what I have been seeing, so instead of unimplemented
it must be broken.

The
Post by Miroslav Lichvar
default value of precision is set for microseconds, because using
nanosecond resolution with microsecond precision is much better than
using microsecond resolution with nanosecond precision as some samples
could have three orders of magnitude smaller dispersion that others.
Yeah, I already went back and fixed the howto and my configs. Still
no nSec on chronyd SHMs, but it works on ntpd.
Post by Miroslav Lichvar
Post by Gary E. Miller
Two, so if I make it perfect (zero offset) things break?
No, I was just trying to explain that the refclock precision prevents
that problem. If several consecutive refclock samples have identical
offset, the calculated dispersion might be zero, but it will be set
to the precision before accumulating the sample to avoid division by
zero etc.
So if the offset is zero it is set to the dispersion? That is not good.

Most people just set the precision approximately and get much better
jitter/offset than that. When they hit the nail on the head this
will add back a lot of 'jitter'. Hitting your thummb.

RGDS
GARY
---------------------------------------------------------------------------
Gary E. Miller Rellim 109 NW Wilmington Ave., Suite E, Bend, OR 97701
***@rellim.com Tel:+1(541)382-8588
Miroslav Lichvar
2015-03-04 08:24:33 UTC
Permalink
Post by Gary E. Miller
Post by Miroslav Lichvar
Post by Gary E. Miller
First, why does chronyd not support uSec SHM? (I usually use the SOCK)
It does support both microsecond and nanosecond resolution in SHM.
Very odd. That is not what I have been seeing, so instead of unimplemented
it must be broken.
What exactly do you see?

When I run

# chronyd 'refclock SHM 10 poll 2' 'logdir logs' 'log refclocks'
# phc2sys -E ntpshm -M 10 -m -ar

I see in refclocks.log

2015-03-04 08:12:39.659890 SHM1 0 N 0 -9.000000e-09 1.296176e-08 1.000e-06
2015-03-04 08:12:40.660008 SHM1 1 N 0 1.200000e-08 1.197851e-08 1.000e-06
2015-03-04 08:12:41.660120 SHM1 2 N 0 -1.400000e-08 -1.402629e-08 1.000e-06
2015-03-04 08:12:42.660227 SHM1 3 N 0 2.500000e-08 2.496892e-08 1.000e-06

The raw offset values are nicely rounded to nanoseconds.

If I modify ntpshm.c in linuxptp to disable nanosecond resolution (set
the nanosecond fields to 0), I see in refclocks.log

2015-03-04 08:17:11.702451 SHM1 0 N 0 0.000000e+00 -4.236286e-08 1.000e-06
2015-03-04 08:17:12.702575 SHM1 1 N 0 1.000000e-06 9.634952e-07 1.000e-06
2015-03-04 08:17:13.702691 SHM1 2 N 0 0.000000e+00 -3.064677e-08 1.000e-06
2015-03-04 08:17:14.702817 SHM1 3 N 0 0.000000e+00 -2.478869e-08 1.000e-06
2015-03-04 08:17:14.202754 SHM1 - N - - -2.771773e-08 1.000e-06

Three raw offsets are 0 and one is exactly 1 microsecond, so it looks
to be working as expected.
Post by Gary E. Miller
Post by Miroslav Lichvar
No, I was just trying to explain that the refclock precision prevents
that problem. If several consecutive refclock samples have identical
offset, the calculated dispersion might be zero, but it will be set
to the precision before accumulating the sample to avoid division by
zero etc.
So if the offset is zero it is set to the dispersion? That is not good.
No, the offset will be still zero, but dispersion will be set to precision.
Here is an example.

2015-03-04 08:20:39.727811 SHM1 0 N 0 0.000000e+00 5.347800e-09 1.000e-06
2015-03-04 08:20:40.727928 SHM1 1 N 0 0.000000e+00 5.267678e-09 1.000e-06
2015-03-04 08:20:41.728044 SHM1 2 N 0 0.000000e+00 5.187555e-09 1.000e-06
2015-03-04 08:20:42.728155 SHM1 3 N 0 0.000000e+00 5.107433e-09 1.000e-06
2015-03-04 08:20:41.227986 SHM1 - N - - 5.227617e-09 1.000e-06

The filtered sample has 1 microsecond dispersion, even when all four
SHM samples were 0.
--
Miroslav Lichvar
Gary E. Miller
2015-03-04 20:16:48 UTC
Permalink
Yo Miroslav!

On Wed, 4 Mar 2015 09:24:33 +0100
Post by Miroslav Lichvar
Post by Gary E. Miller
Post by Miroslav Lichvar
Post by Gary E. Miller
First, why does chronyd not support uSec SHM? (I usually use the SOCK)
It does support both microsecond and nanosecond resolution in SHM.
Very odd. That is not what I have been seeing, so instead of
unimplemented it must be broken.
What exactly do you see?
I feed my PPS by way of gpsd to chronyd using SHM and SOCK at the same
time. The SOCK was running jitter around 200 nSec and the SHM around
1500 nSec.

When I set chrony.conf to use precision 1e-9 for SHM 1 the two clocks
started to basically agree. No long term testing, but short term it
looks like the precision was holding a floor under the jitter. Not
something I see on ntpd.

Maybe a good thing, maybe not, but could be documented better. It is
not how ntpd does it.
Post by Miroslav Lichvar
Post by Gary E. Miller
So if the offset is zero it is set to the dispersion? That is not good.
No, the offset will be still zero, but dispersion will be set to
precision. Here is an example.
Ah, makes sense now. Thank you.

Which still leaves me with slave PHC clocks that either go crazy or have
huge offsets...

I have two live test beds now and I'mm getting a better feel for the failure
mechanisms.

It almost feels like the sign of the offset is wrong on the I210. As
soon as ntpd/chronyd steers to it the time blows up. Positive, not
negative feedback.

RGDS
GARY
---------------------------------------------------------------------------
Gary E. Miller Rellim 109 NW Wilmington Ave., Suite E, Bend, OR 97701
***@rellim.com Tel:+1(541)382-8588
Miroslav Lichvar
2015-03-05 06:42:09 UTC
Permalink
Post by Gary E. Miller
Which still leaves me with slave PHC clocks that either go crazy or have
huge offsets...
I have two live test beds now and I'mm getting a better feel for the failure
mechanisms.
It almost feels like the sign of the offset is wrong on the I210. As
soon as ntpd/chronyd steers to it the time blows up. Positive, not
negative feedback.
Well, it works for me with i210 nicely. If you post your configuration
files, command lines and program outputs of ptp4l, phc2sys,
chronyd/ntpd for both master and slave systems, maybe we could better
see what's happening.

There is one test that might be worth trying. Add noselect option to
all servers and refclocks in your chronyd/ntpd configuration to just
monitor the offsets without actually synchronizing the system clock.
If the SHM refclock is marked as falseticker, the problem probably is
somewhere upstream from the slave chronyd/ntpd. If the problem shows
up only when the noselect option is removed, well, I'm not sure what
that could be :).
--
Miroslav Lichvar
Miroslav Lichvar
2015-03-06 11:45:50 UTC
Permalink
Post by Gary E. Miller
Which still leaves me with slave PHC clocks that either go crazy or have
huge offsets...
Another thought, is it possible that the PTP master is using HW
timestamping with a PTP clock that is not synchronized to anything? If
it was set on driver initialization from the system clock which is set
on boot from the RTC with one second resolution, it would explain the
offsets up to 1 second nicely.
--
Miroslav Lichvar
Gary E. Miller
2015-03-06 18:22:33 UTC
Permalink
Yo Miroslav!

On Fri, 6 Mar 2015 12:45:50 +0100
Post by Miroslav Lichvar
Post by Gary E. Miller
Which still leaves me with slave PHC clocks that either go crazy or
have huge offsets...
Another thought, is it possible that the PTP master is using HW
timestamping with a PTP clock that is not synchronized to anything?
Nope. My master is PPS controlled, and only running in SW timestamping
mode.

I have other PPS NTP servers on the same net dboule checking each
other and I get Icinga alerts if any lose lock.
Post by Miroslav Lichvar
If
it was set on driver initialization from the system clock which is set
on boot from the RTC with one second resolution, it would explain the
offsets up to 1 second nicely.
I reboot my master about every 6 months.

My offset errors occasionally go way past 1 second.

On one test slave I have the PTP HW set to noselect, but it will still
get selected occasionally, then boom, time races to 30, 60, 90 seconds
off before chrony deslects the device.

On another PTP HW slave, different NIC type, the offset from PHC time
to PPS time wanders +/- 800 mSec.

RGDS
GARY
---------------------------------------------------------------------------
Gary E. Miller Rellim 109 NW Wilmington Ave., Suite E, Bend, OR 97701
***@rellim.com Tel:+1(541)382-8588

Gary E. Miller
2015-03-02 23:39:23 UTC
Permalink
Yo Matthew!

On Thu, 26 Feb 2015 21:16:21 +0000
Post by Vick, Matthew
One other tidbit is that I210 supports EEE, which can affect jitter,
although I wouldn't expect it on that level. You can try turning this
off via ethtool (ethtool --set-eee ethX eee off) to see if that helps.
Ding-ding-ding, I think we might have a winner. That instantly dropped
my offset from -400 mSec to around 1 mSec. That 1 mSec is well within
the confusion that the ntpd PLLs just took. Jitter went to 1 uSec.
Time to hook the PPS back up and get serious.

RGDS
GARY
---------------------------------------------------------------------------
Gary E. Miller Rellim 109 NW Wilmington Ave., Suite E, Bend, OR 97701
***@rellim.com Tel:+1(541)382-8588
Keller, Jacob E
2015-03-03 00:28:01 UTC
Permalink
Post by Gary E. Miller
Yo Matthew!
On Thu, 26 Feb 2015 21:16:21 +0000
Post by Vick, Matthew
One other tidbit is that I210 supports EEE, which can affect jitter,
although I wouldn't expect it on that level. You can try turning this
off via ethtool (ethtool --set-eee ethX eee off) to see if that helps.
Ding-ding-ding, I think we might have a winner. That instantly dropped
my offset from -400 mSec to around 1 mSec. That 1 mSec is well within
the confusion that the ntpd PLLs just took. Jitter went to 1 uSec.
Time to hook the PPS back up and get serious.
RGDS
GARY
Excellent news! :) I had forgotten about EEE

Thanks,
Jake
Gary E. Miller
2015-03-03 00:52:35 UTC
Permalink
Yo Jacob E!

On Tue, 3 Mar 2015 00:28:01 +0000
Post by Keller, Jacob E
Excellent news! :) I had forgotten about EEE
Which is why I am writing a HOWTO.

Now to figure out the problems are with my other two NICs...

RGDS
GARY
---------------------------------------------------------------------------
Gary E. Miller Rellim 109 NW Wilmington Ave., Suite E, Bend, OR 97701
***@rellim.com Tel:+1(541)382-8588
Gary E. Miller
2015-03-03 02:54:50 UTC
Permalink
Yo Gary!

On Mon, 2 Mar 2015 15:39:23 -0800
Post by Gary E. Miller
On Thu, 26 Feb 2015 21:16:21 +0000
Post by Vick, Matthew
One other tidbit is that I210 supports EEE, which can affect jitter,
although I wouldn't expect it on that level. You can try turning
this off via ethtool (ethtool --set-eee ethX eee off) to see if
that helps.
Ding-ding-ding, I think we might have a winner. That instantly
dropped my offset from -400 mSec to around 1 mSec. That 1 mSec is
well within the confusion that the ntpd PLLs just took. Jitter went
to 1 uSec. Time to hook the PPS back up and get serious.
I hooked my PPS back up, let things settle. Checked 'chronyc sources'
some more. My hardware timestamp jitter is now down to around a
uSec. But I still have a (smaller) persistent offset. Now seems to
be running around 80 mSec.

My master and slave have similar PPS clocks, so the offset from PPS
to timestamp hardware is very real.

But, real progress, so I'll try an 82574L on another host. Sadly the
82574L does not support EEE, so that will not help.

RGDS
GARY
---------------------------------------------------------------------------
Gary E. Miller Rellim 109 NW Wilmington Ave., Suite E, Bend, OR 97701
***@rellim.com Tel:+1(541)382-8588
Dale Smith
2015-03-04 21:43:24 UTC
Permalink
Greetings,

Just a pure shot in the dark, but how wide is your PPS pulse? It wouldn't
be 80mS would it? Like you are syncing to the trailing edge instead of the
leading edge?

-Dale
Post by Gary E. Miller
Yo Gary!
On Mon, 2 Mar 2015 15:39:23 -0800
Post by Gary E. Miller
On Thu, 26 Feb 2015 21:16:21 +0000
Post by Vick, Matthew
One other tidbit is that I210 supports EEE, which can affect jitter,
although I wouldn't expect it on that level. You can try turning
this off via ethtool (ethtool --set-eee ethX eee off) to see if
that helps.
Ding-ding-ding, I think we might have a winner. That instantly
dropped my offset from -400 mSec to around 1 mSec. That 1 mSec is
well within the confusion that the ntpd PLLs just took. Jitter went
to 1 uSec. Time to hook the PPS back up and get serious.
I hooked my PPS back up, let things settle. Checked 'chronyc sources'
some more. My hardware timestamp jitter is now down to around a
uSec. But I still have a (smaller) persistent offset. Now seems to
be running around 80 mSec.
My master and slave have similar PPS clocks, so the offset from PPS
to timestamp hardware is very real.
But, real progress, so I'll try an 82574L on another host. Sadly the
82574L does not support EEE, so that will not help.
RGDS
GARY
---------------------------------------------------------------------------
Gary E. Miller Rellim 109 NW Wilmington Ave., Suite E, Bend, OR 97701
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website,
sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for
all
things parallel software development, from weekly thought leadership blogs
to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Linuxptp-devel mailing list
https://lists.sourceforge.net/lists/listinfo/linuxptp-devel
Gary E. Miller
2015-03-04 22:57:07 UTC
Permalink
Yo Dale!

On Wed, 4 Mar 2015 16:43:24 -0500
Post by Dale Smith
Just a pure shot in the dark, but how wide is your PPS pulse? It
wouldn't be 80mS would it? Like you are syncing to the trailing edge
instead of the leading edge?
I got one of each. My test bench is crowded and cross-calibrated.

In today's setup, the pulse is 1 mSec, 1 mSec NTP peered to the PtP
master, and the error I see varies from -400 mSec to 800 mSec.

So even if I am on the wrong edge of the master PPS, the two paths
should still be equal offset to UTC.

RGDS
GARY
---------------------------------------------------------------------------
Gary E. Miller Rellim 109 NW Wilmington Ave., Suite E, Bend, OR 97701
Post by Dale Smith
-Dale
Post by Gary E. Miller
Yo Gary!
On Mon, 2 Mar 2015 15:39:23 -0800
Post by Gary E. Miller
On Thu, 26 Feb 2015 21:16:21 +0000
Post by Vick, Matthew
One other tidbit is that I210 supports EEE, which can affect
jitter, although I wouldn't expect it on that level. You can
try turning this off via ethtool (ethtool --set-eee ethX eee
off) to see if that helps.
Ding-ding-ding, I think we might have a winner. That instantly
dropped my offset from -400 mSec to around 1 mSec. That 1 mSec is
well within the confusion that the ntpd PLLs just took. Jitter
went to 1 uSec. Time to hook the PPS back up and get serious.
I hooked my PPS back up, let things settle. Checked 'chronyc
sources' some more. My hardware timestamp jitter is now down to
around a uSec. But I still have a (smaller) persistent offset.
Now seems to be running around 80 mSec.
My master and slave have similar PPS clocks, so the offset from PPS
to timestamp hardware is very real.
But, real progress, so I'll try an 82574L on another host. Sadly
the 82574L does not support EEE, so that will not help.
RGDS
GARY
---------------------------------------------------------------------------
Gary E. Miller Rellim 109 NW Wilmington Ave., Suite E, Bend, OR
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website,
sponsored
by Intel and developed in partnership with Slashdot Media, is your
hub for all
things parallel software development, from weekly thought
leadership blogs to
news, videos, case studies, tutorials and more. Take a look and
join the conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Linuxptp-devel mailing list
https://lists.sourceforge.net/lists/listinfo/linuxptp-devel
Gary E. Miller
2015-03-04 23:01:20 UTC
Permalink
Yo Dale!
Post by Gary E. Miller
On Wed, 4 Mar 2015 16:43:24 -0500
In today's setup, the pulse is 1 mSec, 1 mSec NTP peered to the PtP
master, and the error I see varies from -400 mSec to 800 mSec.
I could have worde that better. One PPS on the slave and one PPS on the
PTP master. The master and slave PTP and NTP connected. Plus others in
the local net for sanity checking.

RGDS
GARY
---------------------------------------------------------------------------
Gary E. Miller Rellim 109 NW Wilmington Ave., Suite E, Bend, OR 97701
***@rellim.com Tel:+1(541)382-8588
Richard Cochran
2015-02-26 08:52:07 UTC
Permalink
Post by Richard Cochran
@Jason: Got private email this week from another person using the
s/Jason/Jacob/

Still dizzy.

Sorry,
Richard
Keller, Jacob E
2015-02-26 17:46:40 UTC
Permalink
Post by Richard Cochran
@Jason: Got private email this week from another person using the
The offset all of a sudden jumps be 40000+ seconds. I would think
that if it was an issue with just reading the timer that the servo
would help throw out spurious values like that, so I suspect we are
somehow actually corrupting the timer. We wrote our own little test
program that only calls get time on the PTP timer, and when we call
it more frequently than 1us is when we really start corrupting the
timer with a vengeance.
So that gives a clear cut test case that triggers the bug.
Thanks,
Richard
Thanks Richard! I will pass this on.

Regards,
Jake
Keller, Jacob E
2015-02-26 17:43:36 UTC
Permalink
Hi,
Post by Gary E. Miller
Yo Jacob E!
On Thu, 26 Feb 2015 01:01:36 +0000
Post by Keller, Jacob E
Post by Gary E. Miller
So what is the minimmum for hardware mode timestamping? Like this?
HWTSTAMP_TX_OFF
HWTSTAMP_TX_ON
HWTSTAMP_FILTER_ALL
HWTSTAMP_TX_OFF is always supported. HWTSTAMP_TX_ON is required for
Hardware Tx timestamps.
HWTSTAMP_FILTER_ALL is best, otherwise you need the requisite modes
for your configuration.
Mostly for V2 layer4, with End to end, you need
HWTSTAMP_FILTER_V2_SYNC and
HWTSTAMP_FILTER_V2_DELAY_REQ
For P2P delay protocol you need to be able to timestamp PDELAY_REQ and
PDELAY_RESPONSE messages,
and for L2 mode you have to be able to timestamp L2 equivalents of
these. The most general non-timestamp-all mode is
HWTSTAMP_FILTER_V2_EVENT
ptp4l will try HWTSTAMP_FILTER_ALL if its available and degrade to
more general filters until it finds either a working combination or
exits saying required mode isn't supported.
I'm trying to make this real simple. :-)
So, if HWTSTAMP_TX_ON is present, can I know the NIC should be supported
for hardware time?
If HWTSTAMP_TX_ON is present, you support Tx timestamps.

If any HWTSTAMP_FILTER_(type) is present you support some category of
filtered Rx timestamps. The most general is HWTSTAMP_FILTER_ALL, and
they get more specific from there.
Post by Gary E. Miller
Post by Keller, Jacob E
The output you had there didn't showcase the actual failure with the
clockcheck showing a massive change in the clock. Either it didn't run
long enough or the failure case was triggered by phc2sys or some other
setup.
Agreed. But the php2sys failure always happens in under 60 seconds and
I could never get the failure to happen with just ptp4l. Since I could
never duplicate the failure in ptp4l mode nothing to show.
Yes, so it is possible there is some related thing going on by accessing
the clock in phc2sys, that causes the failed hardware/driver state.
Post by Gary E. Miller
Post by Keller, Jacob E
Post by Gary E. Miller
Many NIC choices in the world, I'm not gonna waste my time on any
one of them.
Yep. Well, I'll try to forward what we do have to validation for that
part here. Thanks for the effort so far at least :) I understand that
it isn't worth too much effort on your end.
Where can I send my consulting bill? :-)
If some engineer, that would really fix something, would look at it
I would revisit the part.
Post by Keller, Jacob E
I am glad that you were able to get to at least a somewhat sane setup
finally.
I have two now. More on that in another email.
BTW, are you in Hillsboro?
Yep :)

Regards,
Jake
Post by Gary E. Miller
RGDS
GARY
---------------------------------------------------------------------------
Gary E. Miller Rellim 109 NW Wilmington Ave., Suite E, Bend, OR 97701
Gary E. Miller
2015-02-26 18:03:23 UTC
Permalink
Yo Jacob E!

On Thu, 26 Feb 2015 17:43:36 +0000
Post by Keller, Jacob E
Post by Gary E. Miller
Post by Keller, Jacob E
ptp4l will try HWTSTAMP_FILTER_ALL if its available and degrade to
more general filters until it finds either a working combination
or exits saying required mode isn't supported.
I'm trying to make this real simple. :-)
So, if HWTSTAMP_TX_ON is present, can I know the NIC should be
supported for hardware time?
If HWTSTAMP_TX_ON is present, you support Tx timestamps.
Is supporting Tx timestamps necessary and sufficient for supporting
linuxptp timestamp hardware mode?

I'm just trying to get a minimum baseline here.
Post by Keller, Jacob E
If any HWTSTAMP_FILTER_(type) is present you support some category of
filtered Rx timestamps. The most general is HWTSTAMP_FILTER_ALL, and
they get more specific from there.
So the absolute minimum requirement would be HWTSMAP_TX_ON and at
least one of HWTSTAMP_FILTER_* ?
Post by Keller, Jacob E
Post by Gary E. Miller
Agreed. But the php2sys failure always happens in under 60 seconds
and I could never get the failure to happen with just ptp4l. Since
I could never duplicate the failure in ptp4l mode nothing to show.
Yes, so it is possible there is some related thing going on by
accessing the clock in phc2sys, that causes the failed
hardware/driver state.
Seemingly confirmed by other recent email on this list.
Post by Keller, Jacob E
Post by Gary E. Miller
BTW, are you in Hillsboro?
Yep :)
I'll buy you a beer or two if you get to Bend.

RGDS
GARY
---------------------------------------------------------------------------
Gary E. Miller Rellim 109 NW Wilmington Ave., Suite E, Bend, OR 97701
***@rellim.com Tel:+1(541)382-8588
Keller, Jacob E
2015-02-26 18:11:06 UTC
Permalink
Post by Gary E. Miller
Yo Jacob E!
On Thu, 26 Feb 2015 17:43:36 +0000
Post by Keller, Jacob E
Post by Gary E. Miller
Post by Keller, Jacob E
ptp4l will try HWTSTAMP_FILTER_ALL if its available and degrade to
more general filters until it finds either a working combination
or exits saying required mode isn't supported.
I'm trying to make this real simple. :-)
So, if HWTSTAMP_TX_ON is present, can I know the NIC should be
supported for hardware time?
If HWTSTAMP_TX_ON is present, you support Tx timestamps.
Is supporting Tx timestamps necessary and sufficient for supporting
linuxptp timestamp hardware mode?
I'm just trying to get a minimum baseline here.
Post by Keller, Jacob E
If any HWTSTAMP_FILTER_(type) is present you support some category of
filtered Rx timestamps. The most general is HWTSTAMP_FILTER_ALL, and
they get more specific from there.
So the absolute minimum requirement would be HWTSMAP_TX_ON and at
least one of HWTSTAMP_FILTER_* ?
The absolute minimum

HWTSTAMP_TX_ON for Transmit timestamps

and

HWTSTAMP_FILTER_V2_L4_SYNC if you are a slave in L4 (ipv4 or ipv6) mode.
You would never be able to support master.

HWTSTAMP_FILTER_V2_L4_DELAY_REQ if you are a master, in L4, never able
to be a slave.

(swap those for L2 if you are in L2 mode)

This wouldn't work with P2P delay model, because those are yet another
type of packet you'd have to Rx timestamp.

The "best" minimal answer is "HWTSTAMP_FILTER_V2_EVENT"

Almost all hardware supports some form of V2_EVENT, and honestly I
wouldn't suggest hardware that does not support V2_EVENT, as it requires
ptp4l to reconfigure timestamp mode upon state change from master to
slave if you can't support the generic V2_EVENT.
Post by Gary E. Miller
Post by Keller, Jacob E
Post by Gary E. Miller
Agreed. But the php2sys failure always happens in under 60 seconds
and I could never get the failure to happen with just ptp4l. Since
I could never duplicate the failure in ptp4l mode nothing to show.
Yes, so it is possible there is some related thing going on by
accessing the clock in phc2sys, that causes the failed
hardware/driver state.
Seemingly confirmed by other recent email on this list.
Yep.

Regards,
Jake
Post by Gary E. Miller
Post by Keller, Jacob E
Post by Gary E. Miller
BTW, are you in Hillsboro?
Yep :)
I'll buy you a beer or two if you get to Bend.
RGDS
GARY
---------------------------------------------------------------------------
Gary E. Miller Rellim 109 NW Wilmington Ave., Suite E, Bend, OR 97701
Gary E. Miller
2015-02-26 18:36:46 UTC
Permalink
Yo Jacob E!

On Thu, 26 Feb 2015 18:11:06 +0000
Post by Keller, Jacob E
The absolute minimum
HWTSTAMP_TX_ON for Transmit timestamps
and
HWTSTAMP_FILTER_V2_L4_SYNC if you are a slave in L4 (ipv4 or ipv6)
mode. You would never be able to support master.
HWTSTAMP_FILTER_V2_L4_DELAY_REQ if you are a master, in L4, never able
to be a slave.
(swap those for L2 if you are in L2 mode)
This wouldn't work with P2P delay model, because those are yet another
type of packet you'd have to Rx timestamp.
The "best" minimal answer is "HWTSTAMP_FILTER_V2_EVENT"
Almost all hardware supports some form of V2_EVENT, and honestly I
wouldn't suggest hardware that does not support V2_EVENT, as it
requires ptp4l to reconfigure timestamp mode upon state change from
master to slave if you can't support the generic V2_EVENT.
Hmm, clearly a swamp. Somebody should be collecting a list of
known working cards...

And I am confused again. See below for my I210.

kong linuxptp # ethtool -T eth1
Time stamping parameters for eth1:
Capabilities:
hardware-transmit (SOF_TIMESTAMPING_TX_HARDWARE)
software-transmit (SOF_TIMESTAMPING_TX_SOFTWARE)
hardware-receive (SOF_TIMESTAMPING_RX_HARDWARE)
software-receive (SOF_TIMESTAMPING_RX_SOFTWARE)
software-system-clock (SOF_TIMESTAMPING_SOFTWARE)
hardware-raw-clock (SOF_TIMESTAMPING_RAW_HARDWARE)
PTP Hardware Clock: 1
Hardware Transmit Timestamp Modes:
off (HWTSTAMP_TX_OFF)
on (HWTSTAMP_TX_ON)
Hardware Receive Filter Modes:
none (HWTSTAMP_FILTER_NONE)
all (HWTSTAMP_FILTER_ALL)

Does HWTSTAMP_FILTER_ALL implicitly include HWTSTAMP_V2_EVENT,
HWTSTAMP_FILTER_V2_L4_SYNC, HWTSTAMP_FILTER_V2_L4_DELAY_REQ?

RGDS
GARY
---------------------------------------------------------------------------
Gary E. Miller Rellim 109 NW Wilmington Ave., Suite E, Bend, OR 97701
***@rellim.com Tel:+1(541)382-8588
Keller, Jacob E
2015-02-26 19:21:06 UTC
Permalink
Post by Gary E. Miller
Yo Jacob E!
On Thu, 26 Feb 2015 18:11:06 +0000
Post by Keller, Jacob E
The absolute minimum
HWTSTAMP_TX_ON for Transmit timestamps
and
HWTSTAMP_FILTER_V2_L4_SYNC if you are a slave in L4 (ipv4 or ipv6)
mode. You would never be able to support master.
HWTSTAMP_FILTER_V2_L4_DELAY_REQ if you are a master, in L4, never able
to be a slave.
(swap those for L2 if you are in L2 mode)
This wouldn't work with P2P delay model, because those are yet another
type of packet you'd have to Rx timestamp.
The "best" minimal answer is "HWTSTAMP_FILTER_V2_EVENT"
Almost all hardware supports some form of V2_EVENT, and honestly I
wouldn't suggest hardware that does not support V2_EVENT, as it
requires ptp4l to reconfigure timestamp mode upon state change from
master to slave if you can't support the generic V2_EVENT.
Hmm, clearly a swamp. Somebody should be collecting a list of
known working cards...
And I am confused again. See below for my I210.
kong linuxptp # ethtool -T eth1
hardware-transmit (SOF_TIMESTAMPING_TX_HARDWARE)
software-transmit (SOF_TIMESTAMPING_TX_SOFTWARE)
hardware-receive (SOF_TIMESTAMPING_RX_HARDWARE)
software-receive (SOF_TIMESTAMPING_RX_SOFTWARE)
software-system-clock (SOF_TIMESTAMPING_SOFTWARE)
hardware-raw-clock (SOF_TIMESTAMPING_RAW_HARDWARE)
PTP Hardware Clock: 1
off (HWTSTAMP_TX_OFF)
on (HWTSTAMP_TX_ON)
none (HWTSTAMP_FILTER_NONE)
all (HWTSTAMP_FILTER_ALL)
Does HWTSTAMP_FILTER_ALL implicitly include HWTSTAMP_V2_EVENT,
HWTSTAMP_FILTER_V2_L4_SYNC, HWTSTAMP_FILTER_V2_L4_DELAY_REQ?
Yes.. ish. HWTSTAMP_FILTER_ALL means "all packets will be timestamped"

FILTER_V2_EVENT means: "all V2 Event packets will be timestamped" This
is the most generic filter outside of just providing timestamps for all
packets.

Filters are what modes of "filtering" the hardware supports to enable
timestamps. When you enable a filter, it means the hardware will
timestamp *at least* the requisite filter.

ie:

Hardware that enables FILTER_ALL is guaranteed to timestamp all packets.

Hardware that enables FILTER_V2_EVENT timestamps all V2 event packets.
Hardware is allowed to timestamp more if it needs to (ie: be more
generic) but must timestamp at least what the filter represents.

Old hardware only supported doing one type at a time. Ie: V2_L4_SYNC

or "V2_L4_DELAY_REQ"

full PTP protocol requires timestamps on the receipt of Sync messages,
ans on the receipt of Delay Request messages, and (for P2P mode) on the
receipt of Peer Delay request and peer delay response messages.

It's complicated because of variety of available hardware and attempting
to be able to support the various combinations.

Most "good" hardware supports at least V2_EVENT (or ALL)

Regards,
Jake
Post by Gary E. Miller
RGDS
GARY
---------------------------------------------------------------------------
Gary E. Miller Rellim 109 NW Wilmington Ave., Suite E, Bend, OR 97701
Miroslav Lichvar
2015-02-26 06:24:26 UTC
Permalink
Post by Gary E. Miller
# ipcs -m
------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
0x4e545030 0 root 600 96 2
0x4e545031 32769 root 600 96 2
0x4e545032 65538 root 600 96 2
^^^
Ah. I started phc2sys before chronyd and phc2sys created the ntpshm with
the wrong permissions! NTP2 should be perms 666, not 600. That should
be an easy bug to fix.
phc2sys and chronyd both create all segments with 600 permissions. In
chronyd this is configurable with the mode refclock option, but the
default is always 600. As everything is started as root I'm not sure
why would 666 be needed.
Post by Gary E. Miller
So, stop everything, and change to a new ntpd that can read from a 600
Hm, why would ntpd or chronyd not be able to read from a 600 segment?
They are both started as root and open the segment before they drop
the root privileges.
--
Miroslav Lichvar
Gary E. Miller
2015-02-26 08:26:50 UTC
Permalink
This post might be inappropriate. Click to display it.
Loading...