Discussion:
[Linuxptp-devel] Clock sanity checking?
Miroslav Lichvar
2013-10-09 13:30:46 UTC
Permalink
I was thinking about adding some code to the servo that would check if
the synchronized clock is advancing as expected by comparing it to the
system monotonic clock. It would allow us to see when something else
is touching the clock or the PHC/driver is broken.

We would probably need to move the clock_settime()/clock_adjtime()
calls to the servo first to allow the servo to make these assumptions,
but I was thinking something like this. What do you think?

diff --git a/servo.c b/servo.c
index 45c5832..f5daa9a 100644
--- a/servo.c
+++ b/servo.c
@@ -17,16 +17,26 @@
* 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
*/
#include <string.h>
+#include <time.h>

#include "pi.h"
+#include "print.h"
#include "servo_private.h"

struct servo *servo_create(enum servo_type type, int fadj, int max_ppb, int sw_ts)
{
+ struct servo *servo;
+
if (type == CLOCK_SERVO_PI) {
- return pi_servo_create(fadj, max_ppb, sw_ts);
- }
- return NULL;
+ servo = pi_servo_create(fadj, max_ppb, sw_ts);
+ } else
+ return NULL;
+
+ servo->last_mono_ts = 0;
+ servo->last_local_ts = 0;
+ servo->last_fadj = fadj;
+
+ return servo;
}

void servo_destroy(struct servo *servo)
@@ -39,7 +49,42 @@ double servo_sample(struct servo *servo,
uint64_t local_ts,
enum servo_state *state)
{
- return servo->sample(servo, offset, local_ts, state);
+ double fadj;
+ struct timespec now;
+ uint64_t mono_ts;
+
+ clock_gettime(CLOCK_MONOTONIC, &now);
+ mono_ts = now.tv_sec * 1000000000 + now.tv_nsec;
+
+ if (servo->last_local_ts) {
+ if (servo->last_local_ts > local_ts) {
+ pr_warning("unexpected backward clock step!");
+ } else if ((mono_ts - servo->last_mono_ts) *
+ (1.1 - servo->last_fadj / 1e9) <
+ (local_ts - servo->last_local_ts)) {
+ pr_warning("clock running faster than expected!");
+ } else if ((mono_ts - servo->last_mono_ts) *
+ (0.9 + servo->last_fadj / 1e9) >
+ (local_ts - servo->last_local_ts)) {
+ pr_warning("clock running slower than expected!");
+ }
+ }
+
+ fadj = servo->sample(servo, offset, local_ts, state);
+
+ servo->last_mono_ts = mono_ts;
+ servo->last_local_ts = local_ts;
+ servo->last_fadj = fadj;
+
+ switch (*state) {
+ case SERVO_JUMP:
+ servo->last_local_ts -= offset;
+ break;
+ default:
+ break;
+ }
+
+ return fadj;
}

void servo_sync_interval(struct servo *servo, double interval)
diff --git a/servo_private.h b/servo_private.h
index 98c7db2..e6d3ae1 100644
--- a/servo_private.h
+++ b/servo_private.h
@@ -22,6 +22,9 @@
#include "contain.h"

struct servo {
+ uint64_t last_mono_ts;
+ uint64_t last_local_ts;
+ double last_fadj;

void (*destroy)(struct servo *servo);
--
Miroslav Lichvar
Miroslav Lichvar
2013-10-10 11:29:42 UTC
Permalink
(CCing linuxptp-devel back)
I really like this idea. Some nits below, but overall seems good. I do
agree that to properly make the adjustments, the servo would need some
form of callback from the clock.. might not have to totally pull those
tests in..
What if we just had a servo function like "servo_reset_sanity()" which
we called whenever the clock was set or adjusted, and would simply reset
the last_local_ts to 0? It seems to me that would be better overall vs
the alternative of having tighter coupling between the classes?
When not in the unlocked state, every servo call results in a clock
adjustment, so I was considering to give the servo the clock id in its
constructor and let it adjust the clock itself. Do you think we should
avoid that?
Post by Miroslav Lichvar
+ servo = pi_servo_create(fadj, max_ppb, sw_ts);
+ } else
+ return NULL;
a nit here, but I do prefer the kernel style where you add { } around
all blocks if any block requires them.
Ok.
Post by Miroslav Lichvar
+ if (servo->last_local_ts) {
+ if (servo->last_local_ts > local_ts) {
+ pr_warning("unexpected backward clock step!");
+ } else if ((mono_ts - servo->last_mono_ts) *
+ (1.1 - servo->last_fadj / 1e9) <
+ (local_ts - servo->last_local_ts)) {
+ pr_warning("clock running faster than expected!");
+ } else if ((mono_ts - servo->last_mono_ts) *
+ (0.9 + servo->last_fadj / 1e9) >
+ (local_ts - servo->last_local_ts)) {
+ pr_warning("clock running slower than expected!");
+ }
I think here we could make the messages a bit more clearly indicating
they came from the servo? That would be nice.
Ok, prefix the messages with "servo: "?
--
Miroslav Lichvar
Keller, Jacob E
2013-10-10 20:56:24 UTC
Permalink
-----Original Message-----
Sent: Thursday, October 10, 2013 4:30 AM
To: Keller, Jacob E
Subject: Re: [Linuxptp-devel] Clock sanity checking?
(CCing linuxptp-devel back)
Oops.. didn't mean to only email you..
I really like this idea. Some nits below, but overall seems good. I do
agree that to properly make the adjustments, the servo would need
some
form of callback from the clock.. might not have to totally pull those
tests in..
What if we just had a servo function like "servo_reset_sanity()" which
we called whenever the clock was set or adjusted, and would simply
reset
the last_local_ts to 0? It seems to me that would be better overall vs
the alternative of having tighter coupling between the classes?
When not in the unlocked state, every servo call results in a clock
adjustment, so I was considering to give the servo the clock id in its
constructor and let it adjust the clock itself. Do you think we should
avoid that?
I just don't like the idea of the servo doing what the clock is supposed
to do. That leads to tighter coupling which makes it harder to re-use code.

It's not really a bigger issue, I think Richard's idea of just doing this
inside the clock isn't a bad thing, either.
Post by Miroslav Lichvar
+ servo = pi_servo_create(fadj, max_ppb, sw_ts);
+ } else
+ return NULL;
a nit here, but I do prefer the kernel style where you add { } around
all blocks if any block requires them.
Ok.
Post by Miroslav Lichvar
+ if (servo->last_local_ts) {
+ if (servo->last_local_ts > local_ts) {
+ pr_warning("unexpected backward clock step!");
+ } else if ((mono_ts - servo->last_mono_ts) *
+ (1.1 - servo->last_fadj / 1e9) <
+ (local_ts - servo->last_local_ts)) {
+ pr_warning("clock running faster than
expected!");
Post by Miroslav Lichvar
+ } else if ((mono_ts - servo->last_mono_ts) *
+ (0.9 + servo->last_fadj / 1e9) >
+ (local_ts - servo->last_local_ts)) {
+ pr_warning("clock running slower than
expected!");
Post by Miroslav Lichvar
+ }
I think here we could make the messages a bit more clearly indicating
they came from the servo? That would be nice.
Ok, prefix the messages with "servo: "?
Yes, something like that, or "clock:" if this was moved into the clock.c file?

Regards,
Jake
--
Miroslav Lichvar
Richard Cochran
2013-10-10 18:25:12 UTC
Permalink
Post by Miroslav Lichvar
I was thinking about adding some code to the servo that would check if
the synchronized clock is advancing as expected by comparing it to the
system monotonic clock. It would allow us to see when something else
is touching the clock
Okay.
Post by Miroslav Lichvar
or the PHC/driver is broken.
IMHO, a better way to check for this would be to add a "no adjust"
option to phc2sys and just track the two clocks.
Post by Miroslav Lichvar
We would probably need to move the clock_settime()/clock_adjtime()
calls to the servo first to allow the servo to make these assumptions,
but I was thinking something like this. What do you think?
I don't see why this code cannot live in clock.c or in another module
call by clock_synchronize().
Post by Miroslav Lichvar
+ if (servo->last_local_ts) {
+ if (servo->last_local_ts > local_ts) {
+ pr_warning("unexpected backward clock step!");
Okay, maybe pr_err instead?
Post by Miroslav Lichvar
+ } else if ((mono_ts - servo->last_mono_ts) *
+ (1.1 - servo->last_fadj / 1e9) <
+ (local_ts - servo->last_local_ts)) {
Maybe let the test conditions be configurable?
Post by Miroslav Lichvar
+ pr_warning("clock running faster than expected!");
This will message repeat endlessly whenever it appears, right?

Maybe there is better response to this, like resetting the servo,
switching to no adjust mode, or just ending the program.
Post by Miroslav Lichvar
+ } else if ((mono_ts - servo->last_mono_ts) *
+ (0.9 + servo->last_fadj / 1e9) >
+ (local_ts - servo->last_local_ts)) {
+ pr_warning("clock running slower than expected!");
+ }
+ }
Thanks,
Richard
Miroslav Lichvar
2013-10-11 09:55:48 UTC
Permalink
Post by Richard Cochran
Post by Miroslav Lichvar
I was thinking about adding some code to the servo that would check if
the synchronized clock is advancing as expected by comparing it to the
system monotonic clock. It would allow us to see when something else
is touching the clock
Okay.
Post by Miroslav Lichvar
or the PHC/driver is broken.
IMHO, a better way to check for this would be to add a "no adjust"
option to phc2sys and just track the two clocks.
What if the driver bug is triggered by adjusting the clock frequency?

The check I'm proposing is always running. When someone reports a bug,
we can tell immediately from the log if there was something weird
going on with the clock. It can be a problem that happens rarely and
suspending the operation and running phc2sys in the tracking mode
might not be an option.
Post by Richard Cochran
Post by Miroslav Lichvar
We would probably need to move the clock_settime()/clock_adjtime()
calls to the servo first to allow the servo to make these assumptions,
but I was thinking something like this. What do you think?
I don't see why this code cannot live in clock.c or in another module
call by clock_synchronize().
What module do you suggest to move the clock_settime/adjtime calls to?

I'd like to avoid duplicating code in ptp4l and phc2sys. To me, it
seems the servo should be in control of the clock, let it do its job
and just report the changes it's doing. The new servo I'd like to
implement sometime will probably have its own timeout and will need to
be able to adjust the frequency adjustment at any time, not just when
the sync message is received.
Post by Richard Cochran
Post by Miroslav Lichvar
+ if (servo->last_local_ts) {
+ if (servo->last_local_ts > local_ts) {
+ pr_warning("unexpected backward clock step!");
Okay, maybe pr_err instead?
Hm, I'm not sure about this.
Post by Richard Cochran
Post by Miroslav Lichvar
+ } else if ((mono_ts - servo->last_mono_ts) *
+ (1.1 - servo->last_fadj / 1e9) <
+ (local_ts - servo->last_local_ts)) {
Maybe let the test conditions be configurable?
Agreed, the threshold for the frequency error should be configurable,
perhaps 10% by default.
Post by Richard Cochran
Post by Miroslav Lichvar
+ pr_warning("clock running faster than expected!");
This will message repeat endlessly whenever it appears, right?
Only if the clock is constantly running faster. If it is, I think the
user should see that in the log. If something stepped the clock
forward once, there will be just one message.
Post by Richard Cochran
Maybe there is better response to this, like resetting the servo,
switching to no adjust mode, or just ending the program.
Resetting the servo sounds good.
--
Miroslav Lichvar
Keller, Jacob E
2013-10-11 16:25:23 UTC
Permalink
Post by Miroslav Lichvar
Post by Richard Cochran
Post by Miroslav Lichvar
I was thinking about adding some code to the servo that would check if
the synchronized clock is advancing as expected by comparing it to the
system monotonic clock. It would allow us to see when something else
is touching the clock
Okay.
Post by Miroslav Lichvar
or the PHC/driver is broken.
IMHO, a better way to check for this would be to add a "no adjust"
option to phc2sys and just track the two clocks.
What if the driver bug is triggered by adjusting the clock frequency?
The check I'm proposing is always running. When someone reports a bug,
we can tell immediately from the log if there was something weird
going on with the clock. It can be a problem that happens rarely and
suspending the operation and running phc2sys in the tracking mode
might not be an option.
I agree, I think this makes sense to run directly in the ptp4l instance.
Post by Miroslav Lichvar
Post by Richard Cochran
Post by Miroslav Lichvar
We would probably need to move the clock_settime()/clock_adjtime()
calls to the servo first to allow the servo to make these assumptions,
but I was thinking something like this. What do you think?
I don't see why this code cannot live in clock.c or in another module
call by clock_synchronize().
What module do you suggest to move the clock_settime/adjtime calls to?
I'd like to avoid duplicating code in ptp4l and phc2sys. To me, it
seems the servo should be in control of the clock, let it do its job
and just report the changes it's doing. The new servo I'd like to
implement sometime will probably have its own timeout and will need to
be able to adjust the frequency adjustment at any time, not just when
the sync message is received.
In that case I think servo is the only option we have.. I don't really
mind pulling the clock stuff into servo, but I am just questioning,
then, whether we need clock.c at that point?
Post by Miroslav Lichvar
Post by Richard Cochran
Post by Miroslav Lichvar
+ if (servo->last_local_ts) {
+ if (servo->last_local_ts > local_ts) {
+ pr_warning("unexpected backward clock step!");
Okay, maybe pr_err instead?
Hm, I'm not sure about this.
Post by Richard Cochran
Post by Miroslav Lichvar
+ } else if ((mono_ts - servo->last_mono_ts) *
+ (1.1 - servo->last_fadj / 1e9) <
+ (local_ts - servo->last_local_ts)) {
Maybe let the test conditions be configurable?
Agreed, the threshold for the frequency error should be configurable,
perhaps 10% by default.
Most definitely.
Post by Miroslav Lichvar
Post by Richard Cochran
Post by Miroslav Lichvar
+ pr_warning("clock running faster than expected!");
This will message repeat endlessly whenever it appears, right?
Only if the clock is constantly running faster. If it is, I think the
user should see that in the log. If something stepped the clock
forward once, there will be just one message.
I agree. This isn't tracking frequency, just the total time in a single
pass. I think this won't display often, and when it does it *should*, as
there is a serious problem then.
Post by Miroslav Lichvar
Post by Richard Cochran
Maybe there is better response to this, like resetting the servo,
switching to no adjust mode, or just ending the program.
Resetting the servo sounds good.
I like resetting the servo, seems like the proper response here.

Regards,
Jake
Richard Cochran
2013-10-11 17:38:28 UTC
Permalink
Post by Miroslav Lichvar
Post by Richard Cochran
IMHO, a better way to check for this would be to add a "no adjust"
option to phc2sys and just track the two clocks.
What if the driver bug is triggered by adjusting the clock frequency?
The check I'm proposing is always running. When someone reports a bug,
we can tell immediately from the log if there was something weird
going on with the clock. It can be a problem that happens rarely and
suspending the operation and running phc2sys in the tracking mode
might not be an option.
Well, we shouldn't worry too much about debugging kernel
drivers. After all, network programs don't try to second guess
Ethernet drivers either.

However, I think detecting a PHC clock reset by an outside program is
reasonable for ptp4l, as is rejecting a really out of spec master.
Post by Miroslav Lichvar
Post by Richard Cochran
Post by Miroslav Lichvar
We would probably need to move the clock_settime()/clock_adjtime()
calls to the servo first to allow the servo to make these assumptions,
but I was thinking something like this. What do you think?
I don't see why this code cannot live in clock.c or in another module
call by clock_synchronize().
What module do you suggest to move the clock_settime/adjtime calls to?
I don't see why you need to move these calls at all. They are fine
where they are.
Post by Miroslav Lichvar
I'd like to avoid duplicating code in ptp4l and phc2sys.
Yes, and also among different servo implementations. This sanity
checking should be written just once and shared by multiple servos in
ptp4l and also between ptp4l and phc2sys.
Post by Miroslav Lichvar
To me, it
seems the servo should be in control of the clock, let it do its job
and just report the changes it's doing. The new servo I'd like to
implement sometime will probably have its own timeout and will need to
be able to adjust the frequency adjustment at any time, not just when
the sync message is received.
Okay, but let's see that new servo first. For now, the patch you wrote
doesn't need this, AFAICT.

Thanks,
Richard
Miroslav Lichvar
2013-10-15 10:53:05 UTC
Permalink
Post by Richard Cochran
Post by Miroslav Lichvar
Post by Richard Cochran
I don't see why this code cannot live in clock.c or in another module
call by clock_synchronize().
What module do you suggest to move the clock_settime/adjtime calls to?
I don't see why you need to move these calls at all. They are fine
where they are.
My reasoning is that in order to make the sanity checking work
correctly the servo needs to know when and how was the clock adjusted.
If the servo callers apply the result always and immediately (as the
current code does), then why not apply them right in the servo? Less
code in the callers and no broken assumptions.

I'll prepare a new patch addressing the issues from your and Jake's
review and we can make the move later if you change your mind.
--
Miroslav Lichvar
Keller, Jacob E
2013-10-15 17:49:54 UTC
Permalink
Post by Miroslav Lichvar
Post by Richard Cochran
Post by Miroslav Lichvar
Post by Richard Cochran
I don't see why this code cannot live in clock.c or in another module
call by clock_synchronize().
What module do you suggest to move the clock_settime/adjtime calls to?
I don't see why you need to move these calls at all. They are fine
where they are.
My reasoning is that in order to make the sanity checking work
correctly the servo needs to know when and how was the clock adjusted.
If the servo callers apply the result always and immediately (as the
current code does), then why not apply them right in the servo? Less
code in the callers and no broken assumptions.
I'll prepare a new patch addressing the issues from your and Jake's
review and we can make the move later if you change your mind.
That makes sense.

Rega
Richard Cochran
2013-10-16 11:42:19 UTC
Permalink
Post by Miroslav Lichvar
My reasoning is that in order to make the sanity checking work
correctly the servo needs to know when and how was the clock adjusted.
But why? The checking code in the patch works fine without this
information, doesn't it?
Post by Miroslav Lichvar
If the servo callers apply the result always and immediately (as the
current code does), then why not apply them right in the servo? Less
code in the callers and no broken assumptions.
A classical servo is a black box with one input, one output, and no
other dependencies. So it is natural to place it into its own
module. As far as the clock control goes, one of the best performing
solutions is to have a VCO (controlled by SPI or I2C or the like). The
present form of the code lends itself well to hacking this in (or
extending the clock control). In addition, it lets you play other
games (unit testing the servos, comparing multiple servos, voting over
multiple servos, etc).

So unless there is a really good reason to combine servo and clock
control, I prefer to keep them separate.

Thanks,
Richard
Miroslav Lichvar
2013-10-16 15:47:21 UTC
Permalink
Post by Richard Cochran
Post by Miroslav Lichvar
My reasoning is that in order to make the sanity checking work
correctly the servo needs to know when and how was the clock adjusted.
But why? The checking code in the patch works fine without this
information, doesn't it?
If the caller decided to ignore the result of the servo and not set
the clock frequency or step the clock, the sanity check would have
incorrect information about current clock frequency or the old local
time stamp, and the check would produce false positive.
Post by Richard Cochran
So unless there is a really good reason to combine servo and clock
control, I prefer to keep them separate.
Fair enough. Perhaps a cleaner aproach would be adding a new function
to the servo which would inform it when and how was the clock adjusted.
--
Miroslav Lichvar
Richard Cochran
2013-10-17 06:25:28 UTC
Permalink
Post by Miroslav Lichvar
Post by Richard Cochran
Post by Miroslav Lichvar
My reasoning is that in order to make the sanity checking work
correctly the servo needs to know when and how was the clock adjusted.
But why? The checking code in the patch works fine without this
information, doesn't it?
If the caller decided to ignore the result of the servo and not set
the clock frequency or step the clock, the sanity check would have
incorrect information about current clock frequency or the old local
time stamp, and the check would produce false positive.
But this information is readily available in the caller.

Thanks,
Richard

Loading...