Hello, On Mon, May 25, 2020 at 3:19 PM Florian Westphal <f...@strlen.de> wrote: > > From: Paolo Abeni <pab...@redhat.com> > > Place receive window tuning in the recvmsg path. > This makes sure the size is only increased when userspace consumes data. > > Previously we would grow the sk receive buffer towards tcp_rmem[2], now we > so only if userspace reads data. > > Simply adjust the msk rcvbuf size to the largest receive buffer of any of > the existing subflows. > > Signed-off-by: Paolo Abeni <pab...@redhat.com> > Signed-off-by: Florian Westphal <f...@strlen.de> > --- > This patch is new in v2. > > net/mptcp/protocol.c | 32 ++++++++++++++++++++++++-------- > 1 file changed, 24 insertions(+), 8 deletions(-) > > diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c > index dbb86cbb9e77..89a35c3fc499 100644 > --- a/net/mptcp/protocol.c > +++ b/net/mptcp/protocol.c > @@ -190,13 +190,6 @@ static bool __mptcp_move_skbs_from_subflow(struct > mptcp_sock *msk, > return false; > } > > - if (!(sk->sk_userlocks & SOCK_RCVBUF_LOCK)) { > - int rcvbuf = max(ssk->sk_rcvbuf, sk->sk_rcvbuf); > - > - if (rcvbuf > sk->sk_rcvbuf) > - sk->sk_rcvbuf = rcvbuf; > - } > - > tp = tcp_sk(ssk); > do { > u32 map_remaining, offset; > @@ -933,6 +926,25 @@ static bool __mptcp_move_skbs(struct mptcp_sock *msk) > return moved > 0; > } > > +static void mptcp_rcv_space_adjust(struct mptcp_sock *msk) > +{ > + const struct mptcp_subflow_context *subflow; > + struct sock *sk = (struct sock *)msk; > + const struct sock *ssk; > + int rcvbuf = 0; > + > + if (sk->sk_userlocks & SOCK_RCVBUF_LOCK) > + return; > + > + mptcp_for_each_subflow(msk, subflow) { > + ssk = mptcp_subflow_tcp_sock(subflow); > + rcvbuf = max(ssk->sk_rcvbuf, rcvbuf);
tcp_rcv_space_adjust is called even when the app is not yet reading, thus wouldn't this mean that we still end up with an ever-growing window? E.g., imagine an app that does not read at all at the beginning. The call to tcp_rcv_space_adjust in patch 1/2 will make the subflow's window grow. Now, the app comes and reads one byte. Then, the window at MPTCP-level will jump regardless of how much the app actually read. I think what is needed is to size the MPTCP-level window to 2 x the amount of data read by the application within an RTT (maximum RTT among all the active subflows). That way if an app reads 1 byte a second, the window will remain low. While for a bulk-transfer it will allow all subflows to receive at full speed [1]. Or do you think that kind of tuning can be done in a follow-up patch? Christoph [1]: https://www.usenix.org/system/files/conference/nsdi12/nsdi12-final125.pdf -> Section 4.2