This updates tcp_acceptable_seq(), one of the oldest functions since 2.4.0, by
preventing sending out a left-shifted sequence number from a Linux sender in
response to a peer's shrunk receive-window caused by losing least significant
bits in window-scaling.

RFC7323 page 10 (Chapter 2.4. Addressing Window Retraction) specifies sender
side responsibility to handle the sequence number out of window:
> On the sender side:
> 
>    3)  The initial transmission MUST be within the window announced by
>        the most recent <ACK>.

Some related discussion can be found at the IETF [tcpm] mailing list:
https://mailarchive.ietf.org/arch/msg/tcpm/pPO7cYxtky27Qto9b30eaHB_RQI

The issue has been reproduced and the patch has been verified by scp a 20GB file
from a Linux box using kernel version 4.4.48 to a FreeBSD 11.0 box.

[ I mainly want feedback to see if everyone is OK with the approach. ]

Cc: "David S. Miller" <da...@davemloft.net>
Cc: Alexey Kuznetsov <kuz...@ms2.inr.ac.ru>
Cc: James Morris <jmor...@namei.org>
Cc: Hideaki YOSHIFUJI <yoshf...@linux-ipv6.org>
Cc: Patrick McHardy <ka...@trash.net>
Signed-off-by: Cheng Cui <cheng....@netapp.com>
---
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 1d5331a..29d736d 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -85,7 +85,8 @@ static void tcp_event_new_data_sent(struct sock *sk, const 
struct sk_buff *skb)
                      tcp_skb_pcount(skb));
 }
 
-/* SND.NXT, if window was not shrunk.
+/* SND.NXT, if window was not shrunk or the amount of shrunk was less than one
+ * window scaling factor due to loss of precision.
  * If window has been shrunk, what should we make? It is not clear at all.
  * Using SND.UNA we will fail to open window, SND.NXT is out of window. :-(
  * Anything in between SND.UNA...SND.UNA+SND.WND also can be already
@@ -95,7 +96,8 @@ static inline __u32 tcp_acceptable_seq(const struct sock *sk)
 {
        const struct tcp_sock *tp = tcp_sk(sk);
 
-       if (!before(tcp_wnd_end(tp), tp->snd_nxt))
+       if ((!before(tcp_wnd_end(tp), tp->snd_nxt)) || (tp->rx_opt.wscale_ok &&
+           ((tp->snd_nxt - tcp_wnd_end(tp)) < (1 << tp->rx_opt.rcv_wscale))))
                return tp->snd_nxt;
        else
                return tcp_wnd_end(tp);
-- 
2.9.3 (Apple Git-75)



Reply via email to