[Linuxptp-devel] port_renew_transport() on FD_ANNOUNCE

Discussion:

[Linuxptp-devel] port_renew_transport() on FD_ANNOUNCE_TIMER

Delio Brignoli

2013-08-20 16:14:28 UTC

Hello Richard,

Please, can you remind me why port_renew_transport() needs to be called in port_event() when handling FD_ANNOUNCE_TIMER? I would like to suppress calls to port_renew_transport() depending on a configuration setting. Right now a slave-only port with no master sending announce messages keeps restarting the transport.

Thanks
--
Delio

Richard Cochran

2013-08-20 17:57:58 UTC

Permalink

Post by Delio Brignoli
Hello Richard,
Please, can you remind me why port_renew_transport() needs to be
called in port_event() when handling FD_ANNOUNCE_TIMER? I would like
to suppress calls to port_renew_transport() depending on a
configuration setting. Right now a slave-only port with no master
sending announce messages keeps restarting the transport.

(Cool tip: use git blame. In emacs, you even get pretty colors.)

The commit message says it all:

commit 646bf8bc26ca32206852b312b2053891c5f45792
Author: Richard Cochran <***@gmail.com>
Date: Fri Jul 6 21:17:45 2012 +0200

Recover from lost link when running in slave only mode.

Under Linux, when the link goes down our multicast socket becomes stale.
We always poll(2) for events, but the link down does not trigger any event
to let us know that something is wrong. Once the port enters master mode
and starts announcing itself, the socket throws an error. This in turn
causes a fault, and we reopen the socket when clearing the fault.

However, in the case of slave only mode, if the port is listening then
it will never send, discover the link error, or repair the socket. This
patch fixes the issue by simply reopening the socket after an announce
timeout.

[ Another way would be to use a netlink socket, but that would add too
much complexity as it poorly matches our port/interface model. ]

Signed-off-by: Richard Cochran <***@gmail.com>

Delio Brignoli

2013-08-21 09:47:42 UTC

Permalink

Post by Richard Cochran

(Cool tip: use git blame. In emacs, you even get pretty colors.)

Touché, I should have tried that before emailing ;-)
[...]

Post by Richard Cochran
Under Linux, when the link goes down our multicast socket becomes stale.

Does this apply only to the raw transport or to udp4 and udp6 as well?

[...]

Post by Richard Cochran
However, in the case of slave only mode, if the port is listening then
it will never send, discover the link error, or repair the socket. This
patch fixes the issue by simply reopening the socket after an announce
timeout.
[ Another way would be to use a netlink socket, but that would add too
much complexity as it poorly matches our port/interface model. ]

Did you try using getsockopt() with level set to SOL_SOCKET and optname set to SO_ERROR to detect a stale socket on announce timeout? Or Can you think of a reason why it wouldn't work?

Thanks
--
Delio

Delio Brignoli

2013-08-21 11:10:12 UTC

Permalink

On Aug 21, 2013, at 11:47 AM, Delio Brignoli wrote:

[...]

Post by Delio Brignoli
Did you try using getsockopt() with level set to SOL_SOCKET and optname set to SO_ERROR to detect a stale socket on announce timeout? Or Can you think of a reason why it wouldn't work?

I went ahead and implemented a transport_check() function that uses getsockopt(). The only backend that implements the check() function is raw.c at the moment (I cannot test udp and udp6 right now). It seems to work as intended.

I also found out that when linuxptp runs slaveonly with p2p delay mechanism a fault is triggered when the link is down by the pdelay tx failure. In other words when slaveonly && p2p it is not strictly necessary to restart the transport on FD_ANNOUNCE_TIMER.

--
Delio

Richard Cochran

2013-08-21 14:07:39 UTC

Permalink

Post by Delio Brignoli
[...]

Okay, I'll try this, as it would be ideal.
(Never heard of that before.)

Post by Delio Brignoli
I also found out that when linuxptp runs slaveonly with p2p delay mechanism a fault is triggered when the link is down by the pdelay tx failure. In other words when slaveonly && p2p it is not strictly necessary to restart the transport on FD_ANNOUNCE_TIMER.

Right, the check could also be limited to E2E mode.

Thanks,
Richard

Richard Cochran

2013-08-26 09:06:52 UTC

Permalink

Post by Delio Brignoli
Hello Richard,
Please, can you remind me why port_renew_transport() needs to be called in port_event() when handling FD_ANNOUNCE_TIMER? I would like to suppress calls to port_renew_transport() depending on a configuration setting. Right now a slave-only port with no master sending announce messages keeps restarting the transport.

Delio,

Can you please explain why restarting the transport is problem for
you? Is it the driver calls to the SIOCSHWTSTAMP ioctl?

Thanks,
Richard

Delio Brignoli

2013-08-26 11:41:33 UTC

Permalink

Post by Richard Cochran

Post by Delio Brignoli
Hello Richard,
Please, can you remind me why port_renew_transport() needs to be called in port_event() when handling FD_ANNOUNCE_TIMER? I would like to suppress calls to port_renew_transport() depending on a configuration setting. Right now a slave-only port with no master sending announce messages keeps restarting the transport.

Delio,
Can you please explain why restarting the transport is problem for
you? Is it the driver calls to the SIOCSHWTSTAMP ioctl?

Yes, it's SIOCSHWTSTAMP. I'd rather avoid flicking OFF and ON hardware timestamping unnecessarily. However, it is not causing major issues as it is, just a burst of unnecessary log messages. I saw your test results and understand your preference towards playing it safe.

Thanks
--
Delio