David Mirabito
2017-04-27 05:41:03 UTC
Hello All,
I think I've found the reason why ptp4l times out on TX timestamps, despite
the packet being sent.
Please note, I'm not (yet) looking for specific comments on the fix/patch -
I'll take that up on a the appropriate Intel/IGB list but for now am
curious for feedback on how the overall semantics of tx timestamps are
supposed to work and whether my conclusions as to the actual problem are
sane.
* Software: Linux PTP Version 1.8
* Hardware:
- Intel Corporation Ethernet Connection I354 (PCI ID: 8086:1f41)
- IGB version 5.3.0-k
I'll spare the details (unless anyone really wants, I am happy to share)
but in short, prompted by 'timed out while polling for tx timestamp'
messages:
* a test program that would reliably loose around 3-10 timestamps out of
1000 when doing {send(); poll(POLLPRI); recvmsg(MSG_ERRQUEUE);} in a
reasonably tight loop.
* in all cases the packet did hit the wire, just the timestamp never
came back.
* setting larger timeouts didn't help, I stopped bumping it at 5
seconds, by which time I could easily see the packet arrive via tcpdump on
the attached machine
* What may be a race between the IGB driver and calling application:
1. The ndo_start_xmit callback grabs the single TX-Timestamp (aka
tx-ts) resource with a test_and_set_bit_lock()
2. Stashes the skb and sets the "needs timestamp" descriptor flag. The
continues on normal tx path to submit to the HW ring.
3. When the TXTS interrupt comes in, extracts the timestamp from HW and
populates a skb_shared_hwtstamps structure.
4. Notify userspace via skb_tstamp_tx() with the stashed skb
5. Frees the stashed skb
6. Finally, release the HW resource with clear_bit_unlock()
What appears to happen, is that #step 4 wakes the application, which grabs
the timestamps and immediately sends another packet. If the second packet
hits ndo_start_xmit before the previous packet's step #6 completes, the
test_and_set can fail, despite actually being free.
As written, when the test-and-set fails, IGB ignores the timestamp request
and our poor application is now waiting for a timestamp that will never
arrive.
* This seems to affect any all chips supported by this driver
* There does exist a 'tx_hwtstamp_timeouts' counter, but this only
increments if the polling goes on too long. No IRQ --> no polling --> no
timeout count.
* "Fixing" (if this is indeed a bug) was reasonably straight forward - more
or less reordering steps 4,5,6 so that we wake the app only *after* we've
unlocked the bit.
On the surface, this appeared to work, and I could then spam upwards of 10M
packets as quickly as the poll returns, without a single timestamp being
"lost".
So far so good, and I'll let this setup soak on a few boxes for a while
whilst I get the patch done properly.
But I do have a few questions from a higher perspective than one driver's
implementation:
Q0: Did I make any immediately bad assumptions in my quest to try stop
these tx_timeouts?
Q1: Is there any way for drivers to pass up a 'non-timestamp' to indicate
to applications that it's never going to come?
I know there are cases where the packet may be dropped beyond driver's
control so the timestamp won't arrive, but does it make sense for drivers
to indicate to applications that it knows the timestamp will never come,
particularly if the packet was sent?
Q2: Could it be within ptp4l's capabilities to detect such a 'no-timestamp
possible' message on the errqueue and do something not quite so drastic as
a full reset, especially if it's a transient -EAGAIN type response?
This is not there today, but would it be sensible/allowable to try again a
few times, with different sequence numbers, etc? Even if not the PTP
protocol should survive a the occasional missing packets, without a full
reset, just maybe the delay value gets a little out of date or whatnot, no?
Q3: This all assumes "well behaved" apps that send one packet, receiving
one timestamp before attempting to send another timestamped packet. Is this
mandated, or could an app reasonably expect to send a few packets in a row?
Sending a second packet will compete against the driver retrieving the
timestamp of the first, with no feedback to the app whether it won or not
and whether a timestamp may be expected. Does the API allow for more fancy
HW with deeper tx-timestamp queues to be fully utilised?
Q4: Is there some conceptual difference between "Packet was dropped
therefore no timestamp" and "Packet [maybe?] sent; wasn't able to get a TX
timestamp for it"?
I have an inkling it might matter for things like whether it's safe (or at
least; ideal) to try again, but thats just a gut feeling at the moment.
Thanks,
Dave
PS: I've also eyeballed igb-5.3.5.4 from intel sourceforge and it looks to
behave the same as 5.3.0 in this regard.
I think I've found the reason why ptp4l times out on TX timestamps, despite
the packet being sent.
Please note, I'm not (yet) looking for specific comments on the fix/patch -
I'll take that up on a the appropriate Intel/IGB list but for now am
curious for feedback on how the overall semantics of tx timestamps are
supposed to work and whether my conclusions as to the actual problem are
sane.
* Software: Linux PTP Version 1.8
* Hardware:
- Intel Corporation Ethernet Connection I354 (PCI ID: 8086:1f41)
- IGB version 5.3.0-k
I'll spare the details (unless anyone really wants, I am happy to share)
but in short, prompted by 'timed out while polling for tx timestamp'
messages:
* a test program that would reliably loose around 3-10 timestamps out of
1000 when doing {send(); poll(POLLPRI); recvmsg(MSG_ERRQUEUE);} in a
reasonably tight loop.
* in all cases the packet did hit the wire, just the timestamp never
came back.
* setting larger timeouts didn't help, I stopped bumping it at 5
seconds, by which time I could easily see the packet arrive via tcpdump on
the attached machine
* What may be a race between the IGB driver and calling application:
1. The ndo_start_xmit callback grabs the single TX-Timestamp (aka
tx-ts) resource with a test_and_set_bit_lock()
2. Stashes the skb and sets the "needs timestamp" descriptor flag. The
continues on normal tx path to submit to the HW ring.
3. When the TXTS interrupt comes in, extracts the timestamp from HW and
populates a skb_shared_hwtstamps structure.
4. Notify userspace via skb_tstamp_tx() with the stashed skb
5. Frees the stashed skb
6. Finally, release the HW resource with clear_bit_unlock()
What appears to happen, is that #step 4 wakes the application, which grabs
the timestamps and immediately sends another packet. If the second packet
hits ndo_start_xmit before the previous packet's step #6 completes, the
test_and_set can fail, despite actually being free.
As written, when the test-and-set fails, IGB ignores the timestamp request
and our poor application is now waiting for a timestamp that will never
arrive.
* This seems to affect any all chips supported by this driver
* There does exist a 'tx_hwtstamp_timeouts' counter, but this only
increments if the polling goes on too long. No IRQ --> no polling --> no
timeout count.
* "Fixing" (if this is indeed a bug) was reasonably straight forward - more
or less reordering steps 4,5,6 so that we wake the app only *after* we've
unlocked the bit.
On the surface, this appeared to work, and I could then spam upwards of 10M
packets as quickly as the poll returns, without a single timestamp being
"lost".
So far so good, and I'll let this setup soak on a few boxes for a while
whilst I get the patch done properly.
But I do have a few questions from a higher perspective than one driver's
implementation:
Q0: Did I make any immediately bad assumptions in my quest to try stop
these tx_timeouts?
Q1: Is there any way for drivers to pass up a 'non-timestamp' to indicate
to applications that it's never going to come?
I know there are cases where the packet may be dropped beyond driver's
control so the timestamp won't arrive, but does it make sense for drivers
to indicate to applications that it knows the timestamp will never come,
particularly if the packet was sent?
Q2: Could it be within ptp4l's capabilities to detect such a 'no-timestamp
possible' message on the errqueue and do something not quite so drastic as
a full reset, especially if it's a transient -EAGAIN type response?
This is not there today, but would it be sensible/allowable to try again a
few times, with different sequence numbers, etc? Even if not the PTP
protocol should survive a the occasional missing packets, without a full
reset, just maybe the delay value gets a little out of date or whatnot, no?
Q3: This all assumes "well behaved" apps that send one packet, receiving
one timestamp before attempting to send another timestamped packet. Is this
mandated, or could an app reasonably expect to send a few packets in a row?
Sending a second packet will compete against the driver retrieving the
timestamp of the first, with no feedback to the app whether it won or not
and whether a timestamp may be expected. Does the API allow for more fancy
HW with deeper tx-timestamp queues to be fully utilised?
Q4: Is there some conceptual difference between "Packet was dropped
therefore no timestamp" and "Packet [maybe?] sent; wasn't able to get a TX
timestamp for it"?
I have an inkling it might matter for things like whether it's safe (or at
least; ideal) to try again, but thats just a gut feeling at the moment.
Thanks,
Dave
PS: I've also eyeballed igb-5.3.5.4 from intel sourceforge and it looks to
behave the same as 5.3.0 in this regard.