Discussion:
[Linuxptp-devel] Wrong Interface
Amritpal Bains
2017-06-22 20:59:04 UTC
Permalink
Hi all,

We had issue where multiple devices running ptp4l were not synchronizing
their clocks (in fact they were not talking to each other at all).

The problem did NOT occur when the IP addresses were static or we were
using a dhcp server. It only occurred when we were connected using ZeroConf.

The exact command that we were running on each device was "ptp4l -i eth0 -f
/etc/ptp.conf"

We had a look at the ptp packets through wireshark and noticed that the
source ip address was for the wrong interface (eth1).

The code change in the attached patch file resolved the problem. We simply
declare the ip-address in the imr_address structure since the ip(7) manpage
says leaving it as 0.0.0.0leaves the system to determine the appropriate
interface.

Can you tell us whether our solution is correct, and whether it is a
problem in the code?

Thanks,
Amrit
Richard Cochran
2017-06-23 05:29:45 UTC
Permalink
Post by Amritpal Bains
The problem did NOT occur when the IP addresses were static or we were
using a dhcp server. It only occurred when we were connected using ZeroConf.
Is multicast enabled on the interface, when using zeroconf?
Post by Amritpal Bains
The exact command that we were running on each device was "ptp4l -i eth0 -f
/etc/ptp.conf"
We had a look at the ptp packets through wireshark and noticed that the
source ip address was for the wrong interface (eth1).
Strange.
Post by Amritpal Bains
The code change in the attached patch file resolved the problem. We simply
declare the ip-address in the imr_address structure since the ip(7) manpage
says leaving it as 0.0.0.0leaves the system to determine the appropriate
interface.
To quote the man page:

IP_ADD_MEMBERSHIP (since Linux 1.2)
Join a multicast group. Argument is an ip_mreqn structure.

...

imr_multiaddr contains the address of the multicast group the
application wants to join or leave. It must be a valid multi‐
cast address (or setsockopt(2) fails with the error EINVAL).
imr_address is the address of the local interface with which the
system should join the multicast group; if it is equal to
INADDR_ANY, an appropriate interface is chosen by the system.
imr_ifindex is the interface index of the interface that should
join/leave the imr_multiaddr group, or 0 to indicate any inter‐
face.

We set 'imr_ifindex' to a non-zero value. That gives "the interface
index of the interface that should join/leave the imr_multiaddr group."

My expectation is that the kernel will choose the interface only when
'imr_address' and 'imr_ifindex' are both zero. There might be a
kernel bug here.
Post by Amritpal Bains
Can you tell us whether our solution is correct, and whether it is a
problem in the code?
I am not sure what is going on here, but I will take a look...

Thanks,
Richard
Amritpal Bains
2017-06-23 15:04:38 UTC
Permalink
Post by Richard Cochran
Is multicast enabled on the interface, when using zeroconf?
If I execute an if-config:
*eth0 reports:* UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
*eth1 reports:* UP BROADCAST MULTICAST MTU:1500 Metric:1
Post by Richard Cochran
My expectation is that the kernel will choose the interface only when
'imr_address' and 'imr_ifindex' are both zero.
That's also how I first interpreted the documentation.

Please let me know if you need anymore info. Thanks for taking a look
Richard!

- Amrit
Post by Richard Cochran
Post by Amritpal Bains
The problem did NOT occur when the IP addresses were static or we were
using a dhcp server. It only occurred when we were connected using
ZeroConf.
Is multicast enabled on the interface, when using zeroconf?
Post by Amritpal Bains
The exact command that we were running on each device was "ptp4l -i eth0
-f
Post by Amritpal Bains
/etc/ptp.conf"
We had a look at the ptp packets through wireshark and noticed that the
source ip address was for the wrong interface (eth1).
Strange.
Post by Amritpal Bains
The code change in the attached patch file resolved the problem. We
simply
Post by Amritpal Bains
declare the ip-address in the imr_address structure since the ip(7)
manpage
Post by Amritpal Bains
says leaving it as 0.0.0.0leaves the system to determine the appropriate
interface.
IP_ADD_MEMBERSHIP (since Linux 1.2)
Join a multicast group. Argument is an ip_mreqn structure.
...
imr_multiaddr contains the address of the multicast group
the
application wants to join or leave. It must be a valid
multi‐
cast address (or setsockopt(2) fails with the error
EINVAL).
imr_address is the address of the local interface with which the
system should join the multicast group; if it is
equal to
INADDR_ANY, an appropriate interface is chosen by the system.
imr_ifindex is the interface index of the interface that
should
join/leave the imr_multiaddr group, or 0 to indicate any
inter‐
face.
We set 'imr_ifindex' to a non-zero value. That gives "the interface
index of the interface that should join/leave the imr_multiaddr group."
My expectation is that the kernel will choose the interface only when
'imr_address' and 'imr_ifindex' are both zero. There might be a
kernel bug here.
Post by Amritpal Bains
Can you tell us whether our solution is correct, and whether it is a
problem in the code?
I am not sure what is going on here, but I will take a look...
Thanks,
Richard
Richard Cochran
2017-06-28 10:02:19 UTC
Permalink
Post by Amritpal Bains
Post by Richard Cochran
My expectation is that the kernel will choose the interface only when
'imr_address' and 'imr_ifindex' are both zero.
That's also how I first interpreted the documentation.
So I see what is happening, but the correct remedy still needs
thought. When the MC source address in the kernel is zero, the kernel
chooses a source address. In the case of IGMP, it correctly chooses
the zeroconf address, but that is handled as a special case.
Post by Amritpal Bains
Please let me know if you need anymore info.
Can you please confirm:

- The packets are sent on eth0 as expected (not on eth1), but with the
source address from eth1.

- The clients are not ptp4l, or they are ptp4l running hybrid_e2e
mode. (The source address is irrelevant, and we don't check it in
ptp4.)

Thanks,
Richard
Amritpal Bains
2017-07-11 17:24:08 UTC
Permalink
I apologize for the long delay in my response.
Post by Richard Cochran
- The packets are sent on eth0 as expected (not on eth1), but with the
source address from eth1.
Yes this is what is happening.
Post by Richard Cochran
The clients are not ptp4l, or they are ptp4l running hybrid_e2e
mode.
How do I check if ptp4l is running in hybrid_e2e mode?
Post by Richard Cochran
(The source address is irrelevant, and we don't check it in
ptp4.)
Yes, this is true but I think the source address may be relevant to the
problem because of our setup. We have multiple devices with the address
127.0.0.1 on eth1 (see attached diagram)
Post by Richard Cochran
Post by Amritpal Bains
Post by Richard Cochran
My expectation is that the kernel will choose the interface only when
'imr_address' and 'imr_ifindex' are both zero.
That's also how I first interpreted the documentation.
So I see what is happening, but the correct remedy still needs
thought. When the MC source address in the kernel is zero, the kernel
chooses a source address. In the case of IGMP, it correctly chooses
the zeroconf address, but that is handled as a special case.
Post by Amritpal Bains
Please let me know if you need anymore info.
- The packets are sent on eth0 as expected (not on eth1), but with the
source address from eth1.
- The clients are not ptp4l, or they are ptp4l running hybrid_e2e
mode. (The source address is irrelevant, and we don't check it in
ptp4.)
Thanks,
Richard
Richard Cochran
2017-07-11 20:16:58 UTC
Permalink
Post by Amritpal Bains
I apologize for the long delay in my response.
No problem.
Post by Amritpal Bains
How do I check if ptp4l is running in hybrid_e2e mode?
If your configuration file has

hybrid_e2e 1

then you are using that mode.
Post by Amritpal Bains
Post by Richard Cochran
(The source address is irrelevant, and we don't check it in
ptp4.)
Yes, this is true but I think the source address may be relevant to the
problem because of our setup. We have multiple devices with the address
127.0.0.1 on eth1 (see attached diagram)
The 127.0.0.1 is a special address meaning "localhost". It should not
be assigned to a real interface.


Thanks,
Richard
Amritpal Bains
2017-07-12 13:32:05 UTC
Permalink
Post by Richard Cochran
Post by Richard Cochran
If your configuration file has
hybrid_e2e 1
then you are using that mode.
It appears not to be running in this mode
Post by Richard Cochran
Post by Richard Cochran
The 127.0.0.1 is a special address meaning "localhost". It should not
be assigned to a real interface.
My mistake. What I actually meant was the ip address is set to 10.0.0.3. I
updated the diagram.
Post by Richard Cochran
Post by Richard Cochran
I apologize for the long delay in my response.
No problem.
Post by Richard Cochran
How do I check if ptp4l is running in hybrid_e2e mode?
If your configuration file has
hybrid_e2e 1
then you are using that mode.
Post by Richard Cochran
Post by Richard Cochran
(The source address is irrelevant, and we don't check it in
ptp4.)
Yes, this is true but I think the source address may be relevant to the
problem because of our setup. We have multiple devices with the address
127.0.0.1 on eth1 (see attached diagram)
The 127.0.0.1 is a special address meaning "localhost". It should not
be assigned to a real interface.
Thanks,
Richard
Loading...