Re: TCP fast retransmit issues

Neal Cardwell Wed, 26 Jul 2017 10:07:36 -0700

On Wed, Jul 26, 2017 at 12:43 PM, Neal Cardwell <ncardw...@google.com> wrote:
> (1) Because the connection negotiated SACK, the Linux TCP sender does
> not get to its tcp_add_reno_sack() code to count dupacks and enter
> fast recovery on the 3rd dupack. The sender keeps waiting for specific
> packets to be SACKed that would signal that something has probably
> been lost. We could probably mitigate this by having the sender turn
> off SACK once it sees SACKed ranges that are completely invalid (way
> out of window). Then it should use the old non-SACK "Recovery on 3rd
> dupack" path.
>
> (2) It looks like there is a bug in the sender code where it seems to
> be repeatedly using a TLP timer firing 211ms after every ACK is
> received to transmit another TLP probe (a new packet in this case).
> Somehow these weird invalid SACKs seem to be triggering a code path
> that makes us think we can send another TLP, when we probably should
> be firing an RTO. That's my interpretation, anyway. I will try to
> reproduce this with packetdrill.


Hmm. It looks like this might be a general issue, where any time we
get an ACK that doesn't ACK/SACK anything new (whether because it's
incoming data in a bi-directional flow, or a middlebox breaking the
SACKs), then we schedule a TLP timer further out in time. Probably we
should only push the TLP timer out if something is cumulatively ACKed.

But that's not a trivial thing to do, because by the time we are
deciding whether to schedule another TLP, we have already canceled the
previous TLP and reinstalled an RTO. Hmm.

neal

Re: TCP fast retransmit issues

Reply via email to