[Linuxptp-devel] Peer delay regression in v1.6

Discussion:

Richard Cochran

2015-10-22 08:45:05 UTC

We figure the neighbor rate ratio in port_nrate_calculate():

delta t3 / delta t4 = peer_interval / local_interval

Originally the peer delay was figured like this:

pd = tmv_sub(tmv_sub(t4, t1), tmv_sub(t3, t2));

Since t4-t1 are local, and t3-t2 are from the peer, there will be an
error if the local and peer are not syntonized.

In commit bd001fde (in v1.5) Delio changed the peer delay calculation:

adj_t41 = p->nrate.ratio * tmv_dbl(tmv_sub(t4, t1));
pd = tmv_sub(dbl_tmv(adj_t41), tmv_sub(t3, t2));

This converts the local interval t4-t1 into the time base of the peer,
and so the peer delay is then expressed with the peer's frequency.
This follows clause 11.2.15.2.4 from IEEE 802.1-AS.

Now, with the tsproc in place, the peer delay figured like this:

/* delay = ((t2 - t3) * rr + (t4 - t1)) / 2 */

t23 = tmv_sub(tsp->t2, tsp->t3);
if (tsp->clock_rate_ratio != 1.0)
t23 = dbl_tmv(tmv_dbl(t23) * tsp->clock_rate_ratio);
t41 = tmv_sub(tsp->t4, tsp->t1);
delay = tmv_div(tmv_add(t23, t41), 2);

The ratio peer/local is applied to t3-t2, but these are peer time
stamps to begin with! That scaling is wrong, and even if the code did
(t2-t3)*(1/rr), it would still change the meaning of the peer delay,
since it would be then expressed in the local time base.

[ The tsproc code is correct for figuring E2E delay, assuming that
tsp->clock_rate_ratio is GM/local, since t3-t2 is a local time
interval. ]

We need to fix this and make a new release soon, since the error in
the peer delay will impact some configurations, especially gPTP with
free running local clocks. In order to fix this, I am thinking of
adding a e2e/p2p flag to tsproc_create. When set to p2p, the formula

delay = ((t2 - t3) + rr * (t4 - t1)) / 2

would be used. That would restore the behavior of v1.5.

Thoughts?

Thanks,
Richard

------------------------------------------------------------------------------

Delio Brignoli

2015-10-22 13:43:33 UTC

Permalink

Hello Richard,

Thank you for spotting this issue. We haven’t been hit by it (yet!) because we use an old version of linuxptp. I hope to be able to move to a more recent release at some point. The fix you propose seems good to me.

Thanks!
--
Delio Brignoli
AudioScience Inc

Post by Richard Cochran
delta t3 / delta t4 = peer_interval / local_interval
pd = tmv_sub(tmv_sub(t4, t1), tmv_sub(t3, t2));
Since t4-t1 are local, and t3-t2 are from the peer, there will be an
error if the local and peer are not syntonized.
adj_t41 = p->nrate.ratio * tmv_dbl(tmv_sub(t4, t1));
pd = tmv_sub(dbl_tmv(adj_t41), tmv_sub(t3, t2));
This converts the local interval t4-t1 into the time base of the peer,
and so the peer delay is then expressed with the peer's frequency.
This follows clause 11.2.15.2.4 from IEEE 802.1-AS.
/* delay = ((t2 - t3) * rr + (t4 - t1)) / 2 */
t23 = tmv_sub(tsp->t2, tsp->t3);
if (tsp->clock_rate_ratio != 1.0)
t23 = dbl_tmv(tmv_dbl(t23) * tsp->clock_rate_ratio);
t41 = tmv_sub(tsp->t4, tsp->t1);
delay = tmv_div(tmv_add(t23, t41), 2);
The ratio peer/local is applied to t3-t2, but these are peer time
stamps to begin with! That scaling is wrong, and even if the code did
(t2-t3)*(1/rr), it would still change the meaning of the peer delay,
since it would be then expressed in the local time base.
[ The tsproc code is correct for figuring E2E delay, assuming that
tsp->clock_rate_ratio is GM/local, since t3-t2 is a local time
interval. ]
We need to fix this and make a new release soon, since the error in
the peer delay will impact some configurations, especially gPTP with
free running local clocks. In order to fix this, I am thinking of
adding a e2e/p2p flag to tsproc_create. When set to p2p, the formula
delay = ((t2 - t3) + rr * (t4 - t1)) / 2
would be used. That would restore the behavior of v1.5.
Thoughts?
Thanks,
Richard
------------------------------------------------------------------------------
_______________________________________________
Linuxptp-devel mailing list
https://lists.sourceforge.net/lists/listinfo/linuxptp-devel

------------------------------------------------------------------------------

Miroslav Lichvar

2015-10-23 13:49:17 UTC

Permalink

Post by Richard Cochran
t23 = tmv_sub(tsp->t2, tsp->t3);
if (tsp->clock_rate_ratio != 1.0)
t23 = dbl_tmv(tmv_dbl(t23) * tsp->clock_rate_ratio);
t41 = tmv_sub(tsp->t4, tsp->t1);
delay = tmv_div(tmv_add(t23, t41), 2);
The ratio peer/local is applied to t3-t2, but these are peer time
stamps to begin with! That scaling is wrong, and even if the code did
(t2-t3)*(1/rr), it would still change the meaning of the peer delay,
since it would be then expressed in the local time base.

The math in the tsproc code was supposed to be identical to the code
it replaced. t2 and t3 in tsproc are local times. The mapping between
the port and tsproc timestamps is (t1, t2, t3, t4) -> (t3, t4, t1, t2).
Maybe it wasn't the best idea to use the same variable names in all
three (clock, port, tsproc) contexts?

Anyway, does this actually break something?

--
Miroslav Lichvar

------------------------------------------------------------------------------

Richard Cochran

2015-10-23 16:03:45 UTC

Permalink

Post by Miroslav Lichvar
The math in the tsproc code was supposed to be identical to the code
it replaced. t2 and t3 in tsproc are local times. The mapping between
the port and tsproc timestamps is (t1, t2, t3, t4) -> (t3, t4, t1, t2).

Oh, tricky! Now I vaguely recall that from the original review.

Post by Miroslav Lichvar
Maybe it wasn't the best idea to use the same variable names in all
three (clock, port, tsproc) contexts?

The tsproc unifies the e2e and p2p methods in a good way. But the
different meaning of t1-t4 could use a comment, in order keep people
like me from getting confused.

Post by Miroslav Lichvar
Anyway, does this actually break something?

I guess not then. I did try v1.5 an v1.6 on a test system, and I was
puzzled why I couldn't see the effect of this bug.

Thanks,
Richard

------------------------------------------------------------------------------