Discussion:
[Linuxptp-devel] [PATCH] Extend drift estimation interval in PI servo.
Miroslav Lichvar
2013-04-25 13:47:53 UTC
Permalink
Make the interval between the two samples that are used to estimate the
drift inversely proportional to the integral constant to improve the
accuracy of the estimated drift and to make the initial convergence
faster.

The interval is now longer than 1 with integral constants below 0.01.

Signed-off-by: Miroslav Lichvar <***@redhat.com>
---
pi.c | 30 ++++++++++++++++++++++++------
1 file changed, 24 insertions(+), 6 deletions(-)

diff --git a/pi.c b/pi.c
index 427cb3e..dd54100 100644
--- a/pi.c
+++ b/pi.c
@@ -31,6 +31,9 @@

#define NSEC_PER_SEC 1000000000

+#define DRIFT_EST_KI 0.01
+#define DRIFT_EST_MAX 1000.0
+
/* These take their values from the configuration file. (see ptp4l.c) */
double configured_pi_kp = 0.0;
double configured_pi_ki = 0.0;
@@ -46,7 +49,8 @@ struct pi_servo {
double kp;
double ki;
double max_offset;
- int count;
+ int state;
+ unsigned int count;
};

static void pi_destroy(struct servo *servo)
@@ -60,23 +64,37 @@ static double pi_sample(struct servo *servo,
uint64_t local_ts,
enum servo_state *state)
{
- double ki_term, ppb = 0.0;
+ double drift_est_interval, ki_term, ppb = 0.0;
struct pi_servo *s = container_of(servo, struct pi_servo, servo);

- switch (s->count) {
+ switch (s->state) {
case 0:
s->offset[0] = offset;
s->local[0] = local_ts;
*state = SERVO_UNLOCKED;
+ s->state = 1;
s->count = 1;
break;
case 1:
s->offset[1] = offset;
s->local[1] = local_ts;
+ s->count++;

/* Make sure the first sample is older than the second. */
if (s->local[0] >= s->local[1]) {
- s->count = 0;
+ *state = SERVO_UNLOCKED;
+ s->state = 0;
+ break;
+ }
+
+ /* The interval between the two samples used to estimate the
+ drift is inversely proportional to the integral constant to
+ make the initial convergence faster. */
+ drift_est_interval = DRIFT_EST_KI / s->ki;
+ if (drift_est_interval > DRIFT_EST_MAX)
+ drift_est_interval = DRIFT_EST_MAX;
+ if (s->count - 1 < drift_est_interval) {
+ *state = SERVO_UNLOCKED;
break;
}

@@ -89,7 +107,7 @@ static double pi_sample(struct servo *servo,

*state = SERVO_JUMP;
ppb = s->drift;
- s->count = 2;
+ s->state = 2;
break;
case 2:
/*
@@ -101,7 +119,7 @@ static double pi_sample(struct servo *servo,
*/
if (s->max_offset && (s->max_offset < fabs(offset))) {
*state = SERVO_UNLOCKED;
- s->count = 0;
+ s->state = 0;
break;
}
--
1.8.1.4
Richard Cochran
2013-05-10 07:58:21 UTC
Permalink
Post by Miroslav Lichvar
Make the interval between the two samples that are used to estimate the
drift inversely proportional to the integral constant to improve the
accuracy of the estimated drift and to make the initial convergence
faster.
The interval is now longer than 1 with integral constants below 0.01.
While I agree with the intent of this patch, I think the issue should
be solved differently, letting the user choose a time based interval
directly, instead of trying to guess based on the PI constants.

We already have a 'freq_est_interval' for the peer clock frequency
ratio, and we could also apply this to the servo (or add another
variable if you think it really need to be separate).

Thanks,
Richard
Miroslav Lichvar
2013-05-13 08:40:04 UTC
Permalink
Post by Richard Cochran
Post by Miroslav Lichvar
Make the interval between the two samples that are used to estimate the
drift inversely proportional to the integral constant to improve the
accuracy of the estimated drift and to make the initial convergence
faster.
The interval is now longer than 1 with integral constants below 0.01.
While I agree with the intent of this patch, I think the issue should
be solved differently, letting the user choose a time based interval
directly, instead of trying to guess based on the PI constants.
We already have a 'freq_est_interval' for the peer clock frequency
ratio, and we could also apply this to the servo (or add another
variable if you think it really need to be separate).
I considered reusing the freq_est_interval or adding a new option, but
then I thought how would I find a good setting for it. The convergence
time increases as the value of I decreases, so setting the estimation
interval as a fraction of that looked like an elegant solution to me.
I did some tests and it seemed to work so well that I thought it's not
necessary to have it configurable.

Thanks,
--
Miroslav Lichvar
Richard Cochran
2013-05-13 15:56:06 UTC
Permalink
Post by Miroslav Lichvar
I considered reusing the freq_est_interval or adding a new option, but
then I thought how would I find a good setting for it. The convergence
time increases as the value of I decreases, so setting the estimation
interval as a fraction of that looked like an elegant solution to me.
I did some tests and it seemed to work so well that I thought it's not
necessary to have it configurable.
We are talking about the initial frequency estimation, right?

In my experience, the time interval needed to produce this estimate is
about one or two seconds. I did some experimentation while developing
the gPTP code, and I found that less one second produces noticeably
poor results, but more than two seconds did not improve things.

I think we could even hard code the interval as MAX(2s, sync period).

Thanks,
Richard
Miroslav Lichvar
2013-05-13 16:30:10 UTC
Permalink
Post by Richard Cochran
We are talking about the initial frequency estimation, right?
Yes.
Post by Richard Cochran
In my experience, the time interval needed to produce this estimate is
about one or two seconds. I did some experimentation while developing
the gPTP code, and I found that less one second produces noticeably
poor results, but more than two seconds did not improve things.
Have you tried it with larger jitter/software time stamping?

The error in the estimated frequency is proportional to the jitter and
inversely proportional to the estimation interval. Larger jitter also
shortens the convergence time as the servo will reach the noise level
sooner. With larger I constant the servo will correct the error
faster, and the estimate doesn't need to be so accurate. The goal is
to reduce the sum of the estimation interval and the convergence time.
Post by Richard Cochran
I think we could even hard code the interval as MAX(2s, sync period).
In some of my tests that would be too short.
--
Miroslav Lichvar
Richard Cochran
2013-05-14 05:38:59 UTC
Permalink
Post by Miroslav Lichvar
Have you tried it with larger jitter/software time stamping?
With SW time stamping, the servo's convergence is very slow. I would
expect that the frequency estimate makes no measurable difference.
(I mean, in the real world, not in simulation).
Post by Miroslav Lichvar
The error in the estimated frequency is proportional to the jitter and
inversely proportional to the estimation interval. Larger jitter also
shortens the convergence time as the servo will reach the noise level
sooner. With larger I constant the servo will correct the error
faster, and the estimate doesn't need to be so accurate. The goal is
to reduce the sum of the estimation interval and the convergence time.
The only issue here is the time stamp jitter, not the actual clock
jitter, I think. For example, some MAC based time stamps jitter around
100 nanoseconds, times four, for both sender and receiver, and a so a
one second estimate is accurate to about one PPM.
Post by Miroslav Lichvar
Post by Richard Cochran
I think we could even hard code the interval as MAX(2s, sync period).
In some of my tests that would be too short.
You should try testing with real hardware. I honestly could not see a
big difference with and without the frequency esitmate, with HW time
stamping, when looking at the initial convergence.

Thanks,
Richard
Miroslav Lichvar
2013-05-14 09:06:26 UTC
Permalink
Post by Richard Cochran
With SW time stamping, the servo's convergence is very slow. I would
expect that the frequency estimate makes no measurable difference.
It does make a measurable difference. The convergence time is a log
function of the frequency error. It also reduces the maximum offset
reached during the initial convergence (when the servo is already in
the locked state), that can be easily seen.

If the servo adjusts the frequency slowly (small I constant), it's
better to use a longer estimation interval, so the initial frequency
will be more accurate and the servo will then converge faster. The
extra time used in the estimation, however, shouldn't be longer than
the time saved in the convergence.
Post by Richard Cochran
(I mean, in the real world, not in simulation).
Are you getting results from simulations which don't agree with real
world results? Is the number of tests large enough to get
statistically significant results?
Post by Richard Cochran
You should try testing with real hardware. I honestly could not see a
big difference with and without the frequency esitmate, with HW time
stamping, when looking at the initial convergence.
Yes, with HW time stamping (large I constant), the servo converges so
quickly that the estimation doesn't matter much.

But with SW time stamping it does matter. That's when I noticed the
problem and I implemented the patch.

Thanks,
--
Miroslav Lichvar
Richard Cochran
2013-05-14 18:37:58 UTC
Permalink
This post might be inappropriate. Click to display it.
Keller, Jacob E
2013-05-14 21:03:34 UTC
Permalink
-----Original Message-----
Sent: Tuesday, May 14, 2013 11:38 AM
To: Miroslav Lichvar
Subject: Re: [Linuxptp-devel] [PATCH] Extend drift estimation interval
in PI servo.
Post by Miroslav Lichvar
But with SW time stamping it does matter. That's when I noticed the
problem and I implemented the patch.
I tried a quick test with my laptop, and the standard 1x sync
estimation does not help very much, but hacking this to 16 seconds
made a huge difference.
Still, I think it would be much more straightforward to either
- let the user configure the estimation interval directly, or
- choose a reasonable interval automatically, like 1 second for HW and
16 for SW time stamping.
Thanks,
Richard
Why not both? Let it choose automatically unless set by the user?

The main reason I would prefer automatic setting is because it means we pick reasonable defaults if the user doesn't know what they need.

- Jake
Richard Cochran
2013-05-15 03:51:32 UTC
Permalink
Post by Keller, Jacob E
The main reason I would prefer automatic setting is because it means we pick reasonable defaults if the user doesn't know what they need.
Yes, that surely describes the majority use case. One of the selling
points of PTP is, "it just works," with auto-configuration via the BMC
and so on.

Thanks,
Richard
Miroslav Lichvar
2013-05-15 07:22:16 UTC
Permalink
Post by Richard Cochran
I tried a quick test with my laptop, and the standard 1x sync
estimation does not help very much, but hacking this to 16 seconds
made a huge difference.
Still, I think it would be much more straightforward to either
- let the user configure the estimation interval directly, or
- choose a reasonable interval automatically, like 1 second for HW and
16 for SW time stamping.
The patch I proposed does exactly that, or not? With HW time stamping
(ki=0.7) the estimation interval is 1 update, with SW time
stamping (ki=0.001) it's 10 updates. If you'd prefer it to be
16 updates, I have no problem with changing the constant slightly.

Thanks,
--
Miroslav Lichvar
Miroslav Lichvar
2013-05-16 08:11:55 UTC
Permalink
Post by Miroslav Lichvar
Post by Richard Cochran
- choose a reasonable interval automatically, like 1 second for HW and
16 for SW time stamping.
The patch I proposed does exactly that, or not? With HW time stamping
(ki=0.7) the estimation interval is 1 update, with SW time
stamping (ki=0.001) it's 10 updates. If you'd prefer it to be
16 updates, I have no problem with changing the constant slightly.
I've noticed that the patch doesn't work correctly with non-default
update intervals. With longer update intervals the estimation interval
is too short, and with short interval it's too long. I think the
number of updates used for the estimation should be calculated as
0.01 / ki / u instead of 0.01 / ki.

What do you think?

Thanks,
--
Miroslav Lichvar
Richard Cochran
2013-05-16 13:50:02 UTC
Permalink
Post by Miroslav Lichvar
What do you think?
How about this instead:

Use a consistent frequency estimation interval in the PI controller.

As it now stands, the frequency offset from the master clock is guessed
by using one sync interval. When the interval is shorter than one second,
this does not work any more when using hardware time stamping, and
software time stamping requires an even longer interval. This patch
fixes the issue by always using an interval at least one second long
(and 16 seconds in the case of SW time stamping).

Signed-off-by: Richard Cochran <***@gmail.com>

diff --git a/pi.c b/pi.c
index 427cb3e..1f076a0 100644
--- a/pi.c
+++ b/pi.c
@@ -47,6 +47,7 @@ struct pi_servo {
double ki;
double max_offset;
int count;
+ int freq_est;
};

static void pi_destroy(struct servo *servo)
@@ -79,6 +80,10 @@ static double pi_sample(struct servo *servo,
s->count = 0;
break;
}
+ /* Wait long enough before estimating the frequency offset. */
+ if ((s->local[1] - s->local[0]) / 1e9 - s->freq_est > 0.001) {
+ break;
+ }

s->drift += (s->offset[1] - s->offset[0]) * 1e9 /
(s->local[1] - s->local[0]);
@@ -133,6 +138,7 @@ struct servo *pi_servo_create(int fadj, int max_ppb, int sw_ts)
s->servo.sample = pi_sample;
s->drift = fadj;
s->maxppb = max_ppb;
+ s->freq_est = 1;

if (configured_pi_kp && configured_pi_ki) {
s->kp = configured_pi_kp;
@@ -140,6 +146,7 @@ struct servo *pi_servo_create(int fadj, int max_ppb, int sw_ts)
} else if (sw_ts) {
s->kp = SWTS_KP;
s->ki = SWTS_KI;
+ s->freq_est = 16;
} else {
s->kp = HWTS_KP;
s->ki = HWTS_KI;
Miroslav Lichvar
2013-05-17 09:54:15 UTC
Permalink
Post by Richard Cochran
Use a consistent frequency estimation interval in the PI controller.
As it now stands, the frequency offset from the master clock is guessed
by using one sync interval. When the interval is shorter than one second,
this does not work any more when using hardware time stamping, and
software time stamping requires an even longer interval. This patch
fixes the issue by always using an interval at least one second long
(and 16 seconds in the case of SW time stamping).
I really think the estimation interval should be based directly on the
I constant and the update interval instead of hardcoding it for the
default constants with SW and HW time stamping.

See these results from the simulator with different hardcoded
estimation intervals set in number of updates and with 2^-4, 2^0 and
2^4 update intervals. The jitter is modelled with 10us pulse signal to
get the maximum error in the estimated frequency and to get directly
comparable results. In the first example you can see 16 seconds would
be too short and in the third example 16 would be too long.

phc2sys -R 0.0625 -P 0.00625 -I 0.0000625 -N 1
Loading Image...

phc2sys -R 1.0 -P 0.1 -I 0.001 -N 1
Loading Image...

phc2sys -R 16.0 -P 1.6 -I 0.016 -N 1
Loading Image...

Please reconsider using the "0.01 / ki / u" formula.

Thanks,
--
Miroslav Lichvar
Richard Cochran
2013-05-18 15:20:46 UTC
Permalink
Post by Miroslav Lichvar
phc2sys -R 16.0 -P 1.6 -I 0.016 -N 1
http://mlichvar.fedorapeople.org/tmp/ptp/pi_est.3.png
Looks like the red line, "1 update", has used a whole second (and not
1/16) to make the estimation? The red curve begins at time 1 second.

Or does the simulation begin at time 1 second?

Thanks,
Richard
Miroslav Lichvar
2013-05-20 10:50:22 UTC
Permalink
Post by Richard Cochran
Post by Miroslav Lichvar
phc2sys -R 16.0 -P 1.6 -I 0.016 -N 1
http://mlichvar.fedorapeople.org/tmp/ptp/pi_est.3.png
Looks like the red line, "1 update", has used a whole second (and not
1/16) to make the estimation? The red curve begins at time 1 second.
Or does the simulation begin at time 1 second?
Yes, the start of the clients was delayed by 1.1 seconds, sorry for
the confusion. I forgot to remove the setting after I figured out why
some results are "upside down", the reason for that is sharing of the
refclock generator between the SHM refclock, which is updated once per
second, and the PHC.

Here is it without any delays in the start and also in a better
resolution (clknetsim now has an option to set the freq/log/stats
update interval):

Loading Image...

I've uploaded also the scripts and the phc2sys patch in case you want
to play with it:
http://mlichvar.fedorapeople.org/clknetsim/ptp/pi_freq_est/

BTW, with your patch, which measures the interval instead of counting
the updates, the approach I propose would be even simpler:

- fest_interval = s->hwts ? 1.0 : 16.0;
+ fest_interval = 0.016 / s->ki;
+ if (fest_interval > 1000.0)
+ fest_interval = 1000.0;

Thanks,
--
Miroslav Lichvar
Richard Cochran
2013-05-22 18:55:41 UTC
Permalink
Post by Miroslav Lichvar
Here is it without any delays in the start and also in a better
resolution (clknetsim now has an option to set the freq/log/stats
http://mlichvar.fedorapeople.org/clknetsim/ptp/pi_freq_est/pi_est.3.png
The graphs help a lot to make clear what you have been trying to
say: A slow servo (with smaller constants and lower sync rate) needs a
longer, better frequency estimation, but a higher sync rate hardly
needs any estimation at all, since it learns the frequency right away
in any case.
Post by Miroslav Lichvar
BTW, with your patch, which measures the interval instead of counting
- fest_interval = s->hwts ? 1.0 : 16.0;
+ fest_interval = 0.016 / s->ki;
+ if (fest_interval > 1000.0)
+ fest_interval = 1000.0;
I am leaning towards having an time interval as input to the PI code
(at the interface level), but using your formula in the calling
code. In this way, it "just works" for most people, but still we can
make it customizable in the future.

Thanks,
Richard
Miroslav Lichvar
2013-05-23 14:19:03 UTC
Permalink
Post by Richard Cochran
Post by Miroslav Lichvar
- fest_interval = s->hwts ? 1.0 : 16.0;
+ fest_interval = 0.016 / s->ki;
+ if (fest_interval > 1000.0)
+ fest_interval = 1000.0;
I am leaning towards having an time interval as input to the PI code
(at the interface level), but using your formula in the calling
code. In this way, it "just works" for most people, but still we can
make it customizable in the future.
Sounds great. Thanks.
--
Miroslav Lichvar
Miroslav Lichvar
2013-06-24 14:13:43 UTC
Permalink
Post by Richard Cochran
Post by Miroslav Lichvar
BTW, with your patch, which measures the interval instead of counting
- fest_interval = s->hwts ? 1.0 : 16.0;
+ fest_interval = 0.016 / s->ki;
+ if (fest_interval > 1000.0)
+ fest_interval = 1000.0;
I am leaning towards having an time interval as input to the PI code
(at the interface level), but using your formula in the calling
code. In this way, it "just works" for most people, but still we can
make it customizable in the future.
Any news on this? Should I try to prepare a patch?

Thanks,
--
Miroslav Lichvar
Richard Cochran
2013-06-24 15:32:56 UTC
Permalink
Post by Miroslav Lichvar
Any news on this? Should I try to prepare a patch?
I have something in the works. Will post soon...

Thanks,
Richard

Loading...