Discussion:
[Linuxptp-devel] pmc Problem
Rohrer Hansjoerg
2014-01-06 12:48:04 UTC
Permalink
Hello

After the last set of patches I get errors when using pmc to read local information:
cmd:
pmc -4 -u -b 0 "GET PORT_DATA_SET"

answer:
sending: GET PORT_DATA_SET
pmc[976.628]: uds: sendto failed: Connection refused
pmc[976.632]: failed to send message

The problem seems to be the latest ptp4l binary:
ptp4l actual version: 1.3-00056-g6205c1d

With the former ptp4l version there was no problem
ptp4l former version: 1.3-00047-g847e49c

Output of pmc with the former ptp4l version:
pmc -4 -u -b 0 "GET PORT_DATA_SET"
sending: GET PORT_DATA_SET
000cc6.fffe.76f46f-1 seq 0 RESPONSE MANAGMENT PORT_DATA_SET
portIdentity 000cc6.fffe.76f46f-1
portState SLAVE
logMinDelayReqInterval 0
peerMeanPathDelay 738
logAnnounceInterval 1
announceReceiptTimeout 3
logSyncInterval 0
delayMechanism 2
logMinPdelayReqInterval 0
versionNumber 2

For pmc I can use the former or the actual version without difference.

Did I miss something? Any idea where to start with debugging?

Best regards
Hansjörg Rohrer


__________________________________________________


Hansjoerg Rohrer
Deputy CTO
Engineering & Design
***@mobatime.com
Phone direct: +41 34 432 4635

Moser-Baer AG
Spitalstrasse 7
3454 Sumiswald
Switzerland

Phone: +41 34 432 4646
Fax: +41 34 432 4699

www.mobatime.com <http://www.mobatime.com>
www.mobatec.ch <http://www.mobatec.ch>
[cid:moser-baer_75years727810]

__________________________________________________

Confidentiality:
The information contained in this e-mail and in any attached files is confidential and/or legally privileged. If you are not the intended recipient, please contact the sender and delete this e-mail. Any unauthorised copying or distribution of the information contained in this e-mail and/or in any attached file is prohibited. The sender and/or the sending company do not accept liability for the incorrect and/or incomplete transmission of the information, nor for any delay or interruption of the transmission, nor for the damages arising from the use of or reliance on the information unless mandatory law provides otherwise. E-mails may be interfered with, may contain computer viruses or other defects. The sender and/or the sending company give no warranties and do not accept liability in relation to these matters, unless mandatory law provides otherwise. Thank you for your cooperation.
Richard Cochran
2014-01-06 14:47:04 UTC
Permalink
Post by Rohrer Hansjoerg
ptp4l actual version: 1.3-00056-g6205c1d
With the former ptp4l version there was no problem
ptp4l former version: 1.3-00047-g847e49c
...
Post by Rohrer Hansjoerg
Did I miss something? Any idea where to start with debugging?
This is a classic case for git bisect. Please try it.

Thanks,
Richard
Miroslav Lichvar
2014-01-07 17:56:54 UTC
Permalink
Post by Richard Cochran
Post by Rohrer Hansjoerg
ptp4l actual version: 1.3-00056-g6205c1d
With the former ptp4l version there was no problem
ptp4l former version: 1.3-00047-g847e49c
This is a classic case for git bisect. Please try it.
I'm seeing some strange problems with pmc and uds too. Bisecting
didn't really help me, it pointed to e2586a, which was a merge commit
and both merged branches didn't show the problem.

After more debugging it seems the problem is that the name from struct
port for UDS is pointing to a local variable of clock_create(), which
doesn't call port_renew_transport(), when the socket is actually
created.

It looks like an old bug, which become only visible when so much code
was added to ptp4l that the port name on stack is overwritten before
the socket is created.

Should be the interface name and pod variables copied in port_open()?
--
Miroslav Lichvar
Richard Cochran
2014-01-08 07:17:51 UTC
Permalink
Post by Miroslav Lichvar
It looks like an old bug, which become only visible when so much code
was added to ptp4l that the port name on stack is overwritten before
the socket is created.
I agree that the name on the stack is a bug, but I don't think it is
the cause.

I see this issue appear when running ptp4l in slave only mode. I think
this is caused by the new timer code causing this part of port_event()
to trigger:

if (clock_slave_only(p->clock) && p->delayMechanism != DM_P2P &&
port_renew_transport(p)) {
return EV_FAULT_DETECTED;
}

Then ptp4l starts closing and reopening the UDS socket in a rapid
sequence.

Here is a bit of hacked debuggin code that shows the problem. Using
set_tmo_log hides the bug, and set_tmo_random exposes it.

static int port_set_announce_tmo(struct port *p)
{
pr_debug("port_set_announce_tmo %hu announceReceiptTimeout=%d",
portnum(p), p->announceReceiptTimeout);
#if 1
return set_tmo_random(p->fda.fd[FD_ANNOUNCE_TIMER],
p->announceReceiptTimeout, 1, p->logAnnounceInterval);
#else
return set_tmo_log(p->fda.fd[FD_ANNOUNCE_TIMER],
p->announceReceiptTimeout, p->logAnnounceInterval);
#endif
}

Still need more time to figure this out...

Thanks,
Richard
Richard Cochran
2014-01-08 08:52:52 UTC
Permalink
On Wed, Jan 08, 2014 at 08:17:51AM +0100, Richard Cochran wrote:

Okay, found it.
Post by Richard Cochran
Here is a bit of hacked debuggin code that shows the problem. Using
set_tmo_log hides the bug, and set_tmo_random exposes it.
static int port_set_announce_tmo(struct port *p)
{
pr_debug("port_set_announce_tmo %hu announceReceiptTimeout=%d",
portnum(p), p->announceReceiptTimeout);
The announceReceiptTimeout is zero for the UDS port.
Post by Richard Cochran
#if 1
return set_tmo_random(p->fda.fd[FD_ANNOUNCE_TIMER],
p->announceReceiptTimeout, 1, p->logAnnounceInterval);
The new code sets a timer going...
Post by Richard Cochran
#else
return set_tmo_log(p->fda.fd[FD_ANNOUNCE_TIMER],
p->announceReceiptTimeout, p->logAnnounceInterval);
But the old code disables the timer by setting it to zero. That was on
purpose, but admittedly it was a bit opaque.
Post by Richard Cochran
#endif
}
I am not sure yet how to fix this, but I'll try to come up with
something this evening.

Thanks,
Richard
Miroslav Lichvar
2014-01-08 10:03:52 UTC
Permalink
Post by Richard Cochran
Post by Richard Cochran
#if 1
return set_tmo_random(p->fda.fd[FD_ANNOUNCE_TIMER],
p->announceReceiptTimeout, 1, p->logAnnounceInterval);
The new code sets a timer going...
Post by Richard Cochran
#else
return set_tmo_log(p->fda.fd[FD_ANNOUNCE_TIMER],
p->announceReceiptTimeout, p->logAnnounceInterval);
But the old code disables the timer by setting it to zero. That was on
purpose, but admittedly it was a bit opaque.
I see. With set_tmo_random() the timer is disabled only when both min
and span are zero, so this special case will need to be handled.
Post by Richard Cochran
I am not sure yet how to fix this, but I'll try to come up with
something this evening.
Thanks.
--
Miroslav Lichvar
Richard Cochran
2014-01-06 15:40:24 UTC
Permalink
Post by Rohrer Hansjoerg
ptp4l actual version: 1.3-00056-g6205c1d
With the former ptp4l version there was no problem
ptp4l former version: 1.3-00047-g847e49c
I don't have either of these SHA1 IDs in my tree. You must be carrying
some of your own patches?

The lasted master is: v1.3-45-ge63a6ea

Please try this version first and see if it works.
(It works fine for me.)

Thanks,
Richard
Loading...