Discussion:
[Linuxptp-devel] 82576 and PTP: poll tx timestamp timeout
Gert-Jan Roskam
2013-08-13 19:11:38 UTC
Permalink
Since I think this is an issue with the 82576 chip, I am sending this to
both the ptp and e1000 lists.

A few months ago I had trouble with the INTEL 82576 and PTP. I got
errors like: recvmsg tx timestamp failed: Resource temporarily unavailable.
It was suggested to switch to a newer kernel, which I did.
I went to kernel 3.7.10 and ptp 1.1 (which were current at that time).
PTP worked great for port 0.

The 82576 has 2 ports. In our design the first port is connected to an
RJ45, the second to a SFP.
I never had ptp working with the SFP port. It constantly gives the
"recvmsg tx timestamp failed" errors.


At this moment I switched to ptp 1.3 and used the 4.3.0 igb driver from
sourceforge.
The SFP is still not working. The error message is:
poll tx timestamp timeout
port 1: send delay request failed

The RJ45 port is working, but every now and then the same error occurs.

On the errate sheet of the 82576 I found the following:

37. TimeSync: Missing Tx timestamps in SerDes mode
Problem: When transmitting a TimeSync packet in SerDes mode,
there is a probability that the
timestamp will not be sampled in the Tx Timestamp Value
registers and thus
TSYNCTXCTL.TXTT will not be set.
There is no issue when using 10/100/1000 BASE-T(Copper) mode.
Implication: Missing timestamps make it difficult for the
software to effectively implement the
TimeSync functionality.



My questions are:

- Could this error on the errato sheet be the cause that our SFP is not
working with PTP?

- What could be the cause of spurious errors on the RJ45 port with
linuxptp_1.3 ( linuxptp_1.1 works good)

- Are there people who use ptp with the 82576 chip without problems?

Many thanks,
Gert-Jan Roskam
Vick, Matthew
2013-08-13 21:36:25 UTC
Permalink
Post by Gert-Jan Roskam
Since I think this is an issue with the 82576 chip, I am sending this to
both the ptp and e1000 lists.
A few months ago I had trouble with the INTEL 82576 and PTP. I got
errors like: recvmsg tx timestamp failed: Resource temporarily
unavailable.
It was suggested to switch to a newer kernel, which I did.
I went to kernel 3.7.10 and ptp 1.1 (which were current at that time).
PTP worked great for port 0.
The 82576 has 2 ports. In our design the first port is connected to an
RJ45, the second to a SFP.
I never had ptp working with the SFP port. It constantly gives the
"recvmsg tx timestamp failed" errors.
At this moment I switched to ptp 1.3 and used the 4.3.0 igb driver from
sourceforge.
poll tx timestamp timeout
port 1: send delay request failed
The RJ45 port is working, but every now and then the same error occurs.
37. TimeSync: Missing Tx timestamps in SerDes mode
Problem: When transmitting a TimeSync packet in SerDes mode,
there is a probability that the
timestamp will not be sampled in the Tx Timestamp Value
registers and thus
TSYNCTXCTL.TXTT will not be set.
There is no issue when using 10/100/1000 BASE-T(Copper) mode.
Implication: Missing timestamps make it difficult for the
software to effectively implement the
TimeSync functionality.
- Could this error on the errato sheet be the cause that our SFP is not
working with PTP?
- What could be the cause of spurious errors on the RJ45 port with
linuxptp_1.3 ( linuxptp_1.1 works good)
- Are there people who use ptp with the 82576 chip without problems?
Many thanks,
Gert-Jan Roskam
Gert-Jan,

Yes, that errata would be the explanation for why the SFP port does not
work. For that port, the 82576 will be in SerDes mode, meaning you will
occasionally be dropping Tx timestamps.

For the occasional "hiccup" on the RJ45 side, if the 82576 tries to
timestamp two outbound packets too quickly, it will fail to return the
second one (since the hardware is busy timestamping the first one).

What options are you running with ptp4l? -P should help reduce the number
of failed timestamps on the RJ45 port.

Cheers,
Matthew

Matthew Vick
Linux Development
Networking Division
Intel Corporation
Richard Cochran
2013-08-14 07:31:22 UTC
Permalink
Post by Vick, Matthew
For the occasional "hiccup" on the RJ45 side, if the 82576 tries to
timestamp two outbound packets too quickly, it will fail to return the
second one (since the hardware is busy timestamping the first one).
What options are you running with ptp4l? -P should help reduce the number
of failed timestamps on the RJ45 port.
I don't think changing ptp4l's options makes any difference. That
program is single threaded, and it always waits for the time stamp
after sending an event message. This behavior is quite on purpose,
knowing that some of the hardware out there can only handle on at a
time.

On the linuxptp-users list, Alexander also recently reported frequent
time stamp failures on peer delay request messages. It is beginning to
look like the 82576 has a bug in this regard.

Thoughts?

Thanks,
Richard
Gert-Jan Roskam
2013-08-14 15:22:22 UTC
Permalink
Post by Richard Cochran
On the linuxptp-users list, Alexander also recently reported frequent
time stamp failures on peer delay request messages. It is beginning to
look like the 82576 has a bug in this regard.
Thoughts?
If is a bug on the 82576, it is triggered by the 'poll' method used in linuxptp 1.3, in the older version 1.1 it runs for days without errors.

Gert-Jan.
Richard Cochran
2013-08-14 17:00:16 UTC
Permalink
Post by Gert-Jan Roskam
If is a bug on the 82576, it is triggered by the 'poll' method used
in linuxptp 1.3, in the older version 1.1 it runs for days without
errors.
Okay, if that is true, then increasing tx_timestamp_timeout (to 100 or
1000 for example) should allow 1.3 to work just as well.

The only difference using poll is that the timeout is expressed in
time (milliseconds) instead of "retries" as before.

Thanks,
Richard
Gert-Jan Roskam
2013-08-14 19:35:09 UTC
Permalink
Post by Richard Cochran
Okay, if that is true, then increasing tx_timestamp_timeout (to 100 or
1000 for example) should allow 1.3 to work just as well.
The only difference using poll is that the timeout is expressed in
time (milliseconds) instead of "retries" as before.
I changed the tx_timestamp_timeout to 100, the default was 1.
This seems to work, I saw no problems for the last 2 hours.
So now I can use the latest ptp code with the RJ45 port.
Thanks for helping me out.

BTW. I have patches for the igb driver to let the 82576 chip generate a 1pps signal. Is there interest to submit this code?

Gert-Jan.
Keller, Jacob E
2013-08-14 22:11:47 UTC
Permalink
-----Original Message-----
Sent: Wednesday, August 14, 2013 12:35 PM
To: Richard Cochran; Vick, Matthew
Subject: Re: [Linuxptp-devel] [E1000-devel] 82576 and PTP: poll tx
timestamp timeout
Post by Richard Cochran
Okay, if that is true, then increasing tx_timestamp_timeout (to 100 or
1000 for example) should allow 1.3 to work just as well.
The only difference using poll is that the timeout is expressed in
time (milliseconds) instead of "retries" as before.
I changed the tx_timestamp_timeout to 100, the default was 1.
This seems to work, I saw no problems for the last 2 hours.
So now I can use the latest ptp code with the RJ45 port.
Thanks for helping me out.
BTW. I have patches for the igb driver to let the 82576 chip generate a
1pps signal. Is there interest to submit this code?
Gert-Jan.
I don't think anyone here at Intel would mind. We never did it because there wasn't much demand (since most of the physical cards don't have the SDP pins exposed to enable routing the PPS anywhere)

- Jake
Richard Cochran
2013-08-15 06:28:28 UTC
Permalink
Post by Gert-Jan Roskam
BTW. I have patches for the igb driver to let the 82576 chip
generate a 1pps signal. Is there interest to submit this code?
Yes please do submit it.

If your work is based on the Intel driver tgz, then post it to
e1000-devel.

If your work is based on the vanilla Linux driver, please rebase it
onto the net-next branch and post it to the netdev list.

Thanks,
Richard

Keller, Jacob E
2013-08-14 22:05:30 UTC
Permalink
-----Original Message-----
Sent: Wednesday, August 14, 2013 8:22 AM
Subject: Re: [Linuxptp-devel] [E1000-devel] 82576 and PTP: poll tx
timestamp timeout
Post by Richard Cochran
On the linuxptp-users list, Alexander also recently reported frequent
time stamp failures on peer delay request messages. It is beginning to
look like the 82576 has a bug in this regard.
Thoughts?
If is a bug on the 82576, it is triggered by the 'poll' method used in
linuxptp 1.3, in the older version 1.1 it runs for days without errors.
Gert-Jan.
Did you ever change the tx timestamp timeout value? It may be the case that you had increased that value before, (If you had not I would be very surprised...) You may try increasing the poll timeout to 5 or so and see if that helps the issue.)

It is likely that you are simply not waiting long enough. That is a known issue with some Intel cards regarding how the Tx timestamp is returned and there is nothing that the driver can do to decrease the time required to get the timestamp. :(

- Jake
Vick, Matthew
2013-08-14 15:23:06 UTC
Permalink
Post by Richard Cochran
Post by Vick, Matthew
For the occasional "hiccup" on the RJ45 side, if the 82576 tries to
timestamp two outbound packets too quickly, it will fail to return the
second one (since the hardware is busy timestamping the first one).
What options are you running with ptp4l? -P should help reduce the
number
of failed timestamps on the RJ45 port.
I don't think changing ptp4l's options makes any difference. That
program is single threaded, and it always waits for the time stamp
after sending an event message. This behavior is quite on purpose,
knowing that some of the hardware out there can only handle on at a
time.
On the linuxptp-users list, Alexander also recently reported frequent
time stamp failures on peer delay request messages. It is beginning to
look like the 82576 has a bug in this regard.
Thoughts?
Thanks,
Richard
I believe changing options does make a difference for the 82576. At least,
I had issues with E2E in the past--I would need to re-test to confirm if
there is still an issue or not. I remember getting Rx packets to be
timestamped too close together, which the 82576 cannot support, but I did
grab the tip of the ptp4l tree and not a stable release at the time.

Adding Jake to the thread, since I'm fairly certain that Alexander was
using a modified driver or stack to do something non-standard, so it
wasn't a bug with the 82576. Jake, do you remember what the root cause was
on that one?

Cheers,
Matthew
Richard Cochran
2013-08-14 17:19:35 UTC
Permalink
Post by Vick, Matthew
I believe changing options does make a difference for the 82576. At least,
I had issues with E2E in the past--I would need to re-test to confirm if
there is still an issue or not. I remember getting Rx packets to be
timestamped too close together, which the 82576 cannot support, but I did
grab the tip of the ptp4l tree and not a stable release at the time.
Wait a minute, "Rx packets"?

I thought the issues were with Tx packets (and the driver does try to
correctly work around this).

The issue reported here and in Alexander's thread are about missing
transmit time stamps. That much is clear from the logs. If incoming
packets can spoil transmit time stamps, then all bets are off, and
the card is just not usable.
Post by Vick, Matthew
Adding Jake to the thread, since I'm fairly certain that Alexander was
using a modified driver or stack to do something non-standard, so it
wasn't a bug with the 82576. Jake, do you remember what the root cause was
on that one?
This was never resolved, and Alex stopped responding to questions. But
he appears to me to have had the same symptoms.

Thanks,
Richard
Vick, Matthew
2013-08-14 17:46:35 UTC
Permalink
Post by Richard Cochran
Post by Vick, Matthew
I believe changing options does make a difference for the 82576. At
least,
I had issues with E2E in the past--I would need to re-test to confirm if
there is still an issue or not. I remember getting Rx packets to be
timestamped too close together, which the 82576 cannot support, but I
did
grab the tip of the ptp4l tree and not a stable release at the time.
Wait a minute, "Rx packets"?
I thought the issues were with Tx packets (and the driver does try to
correctly work around this).
The issue reported here and in Alexander's thread are about missing
transmit time stamps. That much is clear from the logs. If incoming
packets can spoil transmit time stamps, then all bets are off, and
the card is just not usable.
Agreed, but the hardware can definitely Tx and Rx timestamps at the same
time--I just mentioned using P2P for a general improvement on 82576. In
this case, it does sound like something about the polling is likely
terminating too early (as per the other thread going on).
Post by Richard Cochran
Post by Vick, Matthew
Adding Jake to the thread, since I'm fairly certain that Alexander was
using a modified driver or stack to do something non-standard, so it
wasn't a bug with the 82576. Jake, do you remember what the root cause
was
on that one?
This was never resolved, and Alex stopped responding to questions. But
he appears to me to have had the same symptoms.
Thanks,
Richard
I believe it was resolved off-list, but I look to Jake to confirm that one.

Cheers,
Matthew
Keller, Jacob E
2013-08-14 22:08:24 UTC
Permalink
-----Original Message-----
From: Vick, Matthew
Sent: Wednesday, August 14, 2013 10:47 AM
To: Richard Cochran
Subject: Re: [E1000-devel] 82576 and PTP: poll tx timestamp timeout
Post by Richard Cochran
On 8/14/13 12:31 AM, "Richard Cochran"
I believe changing options does make a difference for the 82576. At least,
I had issues with E2E in the past--I would need to re-test to confirm if
there is still an issue or not. I remember getting Rx packets to be
timestamped too close together, which the 82576 cannot support, but
I
Post by Richard Cochran
did
grab the tip of the ptp4l tree and not a stable release at the time.
Wait a minute, "Rx packets"?
I thought the issues were with Tx packets (and the driver does try to
correctly work around this).
The issue reported here and in Alexander's thread are about missing
transmit time stamps. That much is clear from the logs. If incoming
packets can spoil transmit time stamps, then all bets are off, and
the card is just not usable.
Agreed, but the hardware can definitely Tx and Rx timestamps at the same
time--I just mentioned using P2P for a general improvement on 82576. In
this case, it does sound like something about the polling is likely
terminating too early (as per the other thread going on).
Post by Richard Cochran
Adding Jake to the thread, since I'm fairly certain that Alexander was
using a modified driver or stack to do something non-standard, so it
wasn't a bug with the 82576. Jake, do you remember what the root
cause
Post by Richard Cochran
was
on that one?
This was never resolved, and Alex stopped responding to questions. But
he appears to me to have had the same symptoms.
Thanks,
Richard
I believe it was resolved off-list, but I look to Jake to confirm that one.
Cheers,
Matthew
I never got a resolution to that.. I thought Alex was having an issue that he wasn't, because he claimed to be using ptp4l 1.3 when he was using ptp4l 1.1 (which didn't use the poll method)

I believe the issue you are seeing regarding the RJ45 port is simply not polling long enough.. Did you ever change the value of the tx timestamp timeout configuration option from ptp4l 1.1? And if so could you try increasing the poll_tx_timeout option to 5 or so?

Thanks

- Jake
Loading...