[Linuxptp-devel] Workaround for 'timed out while polling for tx timestamp' on IGB

Discussion:

David Mirabito

2017-04-27 05:41:03 UTC

Hello All,

I think I've found the reason why ptp4l times out on TX timestamps, despite
the packet being sent.

Please note, I'm not (yet) looking for specific comments on the fix/patch -
I'll take that up on a the appropriate Intel/IGB list but for now am
curious for feedback on how the overall semantics of tx timestamps are
supposed to work and whether my conclusions as to the actual problem are
sane.

* Software: Linux PTP Version 1.8
* Hardware:
- Intel Corporation Ethernet Connection I354 (PCI ID: 8086:1f41)
- IGB version 5.3.0-k

I'll spare the details (unless anyone really wants, I am happy to share)
but in short, prompted by 'timed out while polling for tx timestamp'
messages:

* a test program that would reliably loose around 3-10 timestamps out of
1000 when doing {send(); poll(POLLPRI); recvmsg(MSG_ERRQUEUE);} in a
reasonably tight loop.
* in all cases the packet did hit the wire, just the timestamp never
came back.
* setting larger timeouts didn't help, I stopped bumping it at 5
seconds, by which time I could easily see the packet arrive via tcpdump on
the attached machine

* What may be a race between the IGB driver and calling application:
1. The ndo_start_xmit callback grabs the single TX-Timestamp (aka
tx-ts) resource with a test_and_set_bit_lock()
2. Stashes the skb and sets the "needs timestamp" descriptor flag. The
continues on normal tx path to submit to the HW ring.
3. When the TXTS interrupt comes in, extracts the timestamp from HW and
populates a skb_shared_hwtstamps structure.
4. Notify userspace via skb_tstamp_tx() with the stashed skb
5. Frees the stashed skb
6. Finally, release the HW resource with clear_bit_unlock()

What appears to happen, is that #step 4 wakes the application, which grabs
the timestamps and immediately sends another packet. If the second packet
hits ndo_start_xmit before the previous packet's step #6 completes, the
test_and_set can fail, despite actually being free.
As written, when the test-and-set fails, IGB ignores the timestamp request
and our poor application is now waiting for a timestamp that will never
arrive.

* This seems to affect any all chips supported by this driver
* There does exist a 'tx_hwtstamp_timeouts' counter, but this only
increments if the polling goes on too long. No IRQ --> no polling --> no
timeout count.

* "Fixing" (if this is indeed a bug) was reasonably straight forward - more
or less reordering steps 4,5,6 so that we wake the app only *after* we've
unlocked the bit.
On the surface, this appeared to work, and I could then spam upwards of 10M
packets as quickly as the poll returns, without a single timestamp being
"lost".

So far so good, and I'll let this setup soak on a few boxes for a while
whilst I get the patch done properly.

But I do have a few questions from a higher perspective than one driver's
implementation:

Q0: Did I make any immediately bad assumptions in my quest to try stop
these tx_timeouts?

Q1: Is there any way for drivers to pass up a 'non-timestamp' to indicate
to applications that it's never going to come?
I know there are cases where the packet may be dropped beyond driver's
control so the timestamp won't arrive, but does it make sense for drivers
to indicate to applications that it knows the timestamp will never come,
particularly if the packet was sent?

Q2: Could it be within ptp4l's capabilities to detect such a 'no-timestamp
possible' message on the errqueue and do something not quite so drastic as
a full reset, especially if it's a transient -EAGAIN type response?
This is not there today, but would it be sensible/allowable to try again a
few times, with different sequence numbers, etc? Even if not the PTP
protocol should survive a the occasional missing packets, without a full
reset, just maybe the delay value gets a little out of date or whatnot, no?

Q3: This all assumes "well behaved" apps that send one packet, receiving
one timestamp before attempting to send another timestamped packet. Is this
mandated, or could an app reasonably expect to send a few packets in a row?
Sending a second packet will compete against the driver retrieving the
timestamp of the first, with no feedback to the app whether it won or not
and whether a timestamp may be expected. Does the API allow for more fancy
HW with deeper tx-timestamp queues to be fully utilised?

Q4: Is there some conceptual difference between "Packet was dropped
therefore no timestamp" and "Packet [maybe?] sent; wasn't able to get a TX
timestamp for it"?
I have an inkling it might matter for things like whether it's safe (or at
least; ideal) to try again, but thats just a gut feeling at the moment.

Thanks,
Dave

PS: I've also eyeballed igb-5.3.5.4 from intel sourceforge and it looks to
behave the same as 5.3.0 in this regard.

Richard Cochran

2017-04-27 18:24:59 UTC

Permalink

Post by David Mirabito
* "Fixing" (if this is indeed a bug) was reasonably straight forward - more
or less reordering steps 4,5,6 so that we wake the app only *after* we've
unlocked the bit.

If your analysis was correct, then yes, indeed this is a driver bug.
Please submit a patch on netdev.

Post by David Mirabito
Q0: Did I make any immediately bad assumptions in my quest to try stop
these tx_timeouts?

Hard to tell, but your explanation seed reasonable to me.

Post by David Mirabito
Q1: Is there any way for drivers to pass up a 'non-timestamp' to indicate
to applications that it's never going to come?

No.

Post by David Mirabito
I know there are cases where the packet may be dropped beyond driver's
control so the timestamp won't arrive, but does it make sense for drivers
to indicate to applications that it knows the timestamp will never come,
particularly if the packet was sent?

I can't imagine that a driver would know this.

Post by David Mirabito
Q2: Could it be within ptp4l's capabilities to detect such a 'no-timestamp
possible' message on the errqueue and do something not quite so drastic as
a full reset, especially if it's a transient -EAGAIN type response?

If you miss a Tx time stamp, then something is wrong. Probably the
link is down, but it hard to reliably know the cause. I am skeptical
that this can really be improved in a practical way.

Really, we should fix the drivers, as you have done, or choose
non-broken HW.

[ BTW, if you don't like the long fault interval, just use ASAP. ]

Post by David Mirabito
This is not there today, but would it be sensible/allowable to try again a
few times, with different sequence numbers, etc? Even if not the PTP
protocol should survive a the occasional missing packets, without a full
reset, just maybe the delay value gets a little out of date or whatnot, no?

I could imagine an option allowing the program to ignore a certain
number of missed Tx time stamps before throwing the fault.

Post by David Mirabito
Q3: This all assumes "well behaved" apps that send one packet, receiving
one timestamp before attempting to send another timestamped packet. Is this
mandated, or could an app reasonably expect to send a few packets in a row?

Many current HW designs do not support this.

Post by David Mirabito
Sending a second packet will compete against the driver retrieving the
timestamp of the first, with no feedback to the app whether it won or not
and whether a timestamp may be expected. Does the API allow for more fancy
HW with deeper tx-timestamp queues to be fully utilised?

The API allows fully asynchronous Tx time stamping. In theory, you
could send a packet, remember that it deserves a time stamp, then go
on to other things. Polling on the error queue would allow you to
later match CMSGs with the remembered transmitted packets.

We don't do that way because 1) this complicates the code for dubious
benefit* and 2) that would limit the HW you could use.

* The only benefit I can see would be when sending messages at a very
high rate. So far, I have yet to hear that anyone has run into this
limitation.

Post by David Mirabito
Q4: Is there some conceptual difference between "Packet was dropped
therefore no timestamp" and "Packet [maybe?] sent; wasn't able to get a TX
timestamp for it"?

Well, there is a difference, but the poor application will never know
about it.

Thanks,
Richard

Keller, Jacob E

2017-04-27 22:58:27 UTC

Permalink

-----Original Message-----
Sent: Thursday, April 27, 2017 11:25 AM
Subject: Re: [Linuxptp-devel] Workaround for 'timed out while polling for tx
timestamp' on IGB

Post by David Mirabito
* "Fixing" (if this is indeed a bug) was reasonably straight forward - more
or less reordering steps 4,5,6 so that we wake the app only *after* we've
unlocked the bit.

If your analysis was correct, then yes, indeed this is a driver bug.
Please submit a patch on netdev.

Can I clarify here, David, you're suggesting that you instead clear the bitlock earlier before you call the timestamp function? I think that's probably a good thing and minimizing the time that we hold a lock is good. I suspect that most of the Intel drivers are at fault here and can make some patches for them. Or you can if you wish.

There *is* a fundamental limit that the hardware assumes that only one transmit timestamp request at a time, so we can't actually do any better. But we can unlock as soon as we read the timestamp registers, which should help this race.

Thanks,
Jake

Post by David Mirabito
Q0: Did I make any immediately bad assumptions in my quest to try stop
these tx_timeouts?

Hard to tell, but your explanation seed reasonable to me.

Post by David Mirabito
Q1: Is there any way for drivers to pass up a 'non-timestamp' to indicate
to applications that it's never going to come?

No.

I can't imagine that a driver would know this.

If you miss a Tx time stamp, then something is wrong. Probably the
link is down, but it hard to reliably know the cause. I am skeptical
that this can really be improved in a practical way.
Really, we should fix the drivers, as you have done, or choose
non-broken HW.
[ BTW, if you don't like the long fault interval, just use ASAP. ]

I could imagine an option allowing the program to ignore a certain
number of missed Tx time stamps before throwing the fault.

Many current HW designs do not support this.

The API allows fully asynchronous Tx time stamping. In theory, you
could send a packet, remember that it deserves a time stamp, then go
on to other things. Polling on the error queue would allow you to
later match CMSGs with the remembered transmitted packets.
We don't do that way because 1) this complicates the code for dubious
benefit* and 2) that would limit the HW you could use.
* The only benefit I can see would be when sending messages at a very
high rate. So far, I have yet to hear that anyone has run into this
limitation.

Post by David Mirabito
Q4: Is there some conceptual difference between "Packet was dropped
therefore no timestamp" and "Packet [maybe?] sent; wasn't able to get a TX
timestamp for it"?

Well, there is a difference, but the poor application will never know
about it.
Thanks,
Richard
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Linuxptp-devel mailing list
https://lists.sourceforge.net/lists/listinfo/linuxptp-devel

Keller, Jacob E

2017-04-27 23:58:53 UTC

Permalink

-----Original Message-----
Sent: Thursday, April 27, 2017 3:58 PM
Subject: Re: [Linuxptp-devel] Workaround for 'timed out while polling for tx
timestamp' on IGB

-----Original Message-----
Sent: Thursday, April 27, 2017 11:25 AM
Subject: Re: [Linuxptp-devel] Workaround for 'timed out while polling for tx
timestamp' on IGB

Post by David Mirabito
* "Fixing" (if this is indeed a bug) was reasonably straight forward - more
or less reordering steps 4,5,6 so that we wake the app only *after* we've
unlocked the bit.

If your analysis was correct, then yes, indeed this is a driver bug.
Please submit a patch on netdev.

Can I clarify here, David, you're suggesting that you instead clear the bitlock
earlier before you call the timestamp function? I think that's probably a good
thing and minimizing the time that we hold a lock is good. I suspect that most of
the Intel drivers are at fault here and can make some patches for them. Or you
can if you wish.
There *is* a fundamental limit that the hardware assumes that only one transmit
timestamp request at a time, so we can't actually do any better. But we can
unlock as soon as we read the timestamp registers, which should help this race.
Thanks,
Jake

I sent a patch to Intel Wired LAN, and I Cc'd both of you on it. Feel free to comment. I fixed up e1000e, igb, ixgbe, and i40e. Please comment indicating if you think this fix is correct. I believe it is correct from your original comment.

One change we could make as well, is to increment the timestamp dropped counter for Tx whenever we get a request that we can't fulfill, but I didn't change that in my current patch.

Thanks,
Jake

David Mirabito

2017-04-28 02:32:39 UTC

Permalink

Thanks Jacob, and Richard for your comments.

I've replied re: details of the patch on the Wired LAN. Thanks for
generating + submitting the patch across the related drivers too!
I agree on bumping the counter when we can't grab the lock, more comments
over on the Intel list.

Cheers,
- DavidM

This email is subject to copyright and the information in it is
confidential. This e-mail, its content and any files transmitted with it
are intended solely for the addressee/s and may be legally privileged
and/or confidential. Access by any other person other than the addressee/s
is unauthorized without the express written permission of the sender. If
you have received this e-mail in error notify the sender immediately by
email, do not use the email or any attachment or disclose them to any
person, delete the email from your system and destroy all copies you may
have printed. Metamako LP does not guarantee that any email or attachment
is secure or free from viruses or other defects

Post by David Mirabito

-----Original Message-----
Sent: Thursday, April 27, 2017 3:58 PM
Subject: Re: [Linuxptp-devel] Workaround for 'timed out while polling

for tx

timestamp' on IGB

-----Original Message-----
Sent: Thursday, April 27, 2017 11:25 AM
Subject: Re: [Linuxptp-devel] Workaround for 'timed out while polling

for tx

timestamp' on IGB

Post by David Mirabito
* "Fixing" (if this is indeed a bug) was reasonably straight forward

- more

Post by David Mirabito
or less reordering steps 4,5,6 so that we wake the app only *after*

we've

Post by David Mirabito
unlocked the bit.

If your analysis was correct, then yes, indeed this is a driver bug.
Please submit a patch on netdev.

Can I clarify here, David, you're suggesting that you instead clear the

bitlock

earlier before you call the timestamp function? I think that's probably

a good

thing and minimizing the time that we hold a lock is good. I suspect

that most of

the Intel drivers are at fault here and can make some patches for them.

Or you

can if you wish.
There *is* a fundamental limit that the hardware assumes that only one

transmit

timestamp request at a time, so we can't actually do any better. But we

can

unlock as soon as we read the timestamp registers, which should help

this race.

Thanks,
Jake

I sent a patch to Intel Wired LAN, and I Cc'd both of you on it. Feel free
to comment. I fixed up e1000e, igb, ixgbe, and i40e. Please comment
indicating if you think this fix is correct. I believe it is correct from
your original comment.
One change we could make as well, is to increment the timestamp dropped
counter for Tx whenever we get a request that we can't fulfill, but I
didn't change that in my current patch.
Thanks,
Jake