On Mon, Aug 26, 2019 at 12:19 PM Eric Dumazet <eduma...@google.com> wrote: > > Vladimir Rutsky reported stuck TCP sessions after memory pressure > events. Edge Trigger epoll() user would never receive an EPOLLOUT > notification allowing them to retry a sendmsg(). > > Jason tested the case of sk_stream_alloc_skb() returning NULL, > but there are other paths that could lead both sendmsg() and sendpage() > to return -1 (EAGAIN), with an empty skb queued on the write queue. > > This patch makes sure we remove this empty skb so that > Jason code can detect that the queue is empty, and > call sk->sk_write_space(sk) accordingly. > > Fixes: ce5ec440994b ("tcp: ensure epoll edge trigger wakeup when write queue > is empty") > Signed-off-by: Eric Dumazet <eduma...@google.com> > Cc: Jason Baron <jba...@akamai.com> > Reported-by: Vladimir Rutsky <rut...@google.com> > Cc: Soheil Hassas Yeganeh <soh...@google.com> > Cc: Neal Cardwell <ncardw...@google.com>
Acked-by: Soheil Hassas Yeganeh <soh...@google.com> Nice find! > --- > net/ipv4/tcp.c | 30 ++++++++++++++++++++---------- > 1 file changed, 20 insertions(+), 10 deletions(-) > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > index > 77b485d60b9d0e00edc4e2f0d6c5bb3a9460b23b..61082065b26a068975c411b74eb46739ab0632ca > 100644 > --- a/net/ipv4/tcp.c > +++ b/net/ipv4/tcp.c > @@ -935,6 +935,22 @@ static int tcp_send_mss(struct sock *sk, int *size_goal, > int flags) > return mss_now; > } > > +/* In some cases, both sendpage() and sendmsg() could have added > + * an skb to the write queue, but failed adding payload on it. > + * We need to remove it to consume less memory, but more > + * importantly be able to generate EPOLLOUT for Edge Trigger epoll() > + * users. > + */ > +static void tcp_remove_empty_skb(struct sock *sk, struct sk_buff *skb) > +{ > + if (skb && !skb->len) { > + tcp_unlink_write_queue(skb, sk); > + if (tcp_write_queue_empty(sk)) > + tcp_chrono_stop(sk, TCP_CHRONO_BUSY); > + sk_wmem_free_skb(sk, skb); > + } > +} > + > ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset, > size_t size, int flags) > { > @@ -1064,6 +1080,7 @@ ssize_t do_tcp_sendpages(struct sock *sk, struct page > *page, int offset, > return copied; > > do_error: > + tcp_remove_empty_skb(sk, tcp_write_queue_tail(sk)); > if (copied) > goto out; > out_err: > @@ -1388,18 +1405,11 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr > *msg, size_t size) > sock_zerocopy_put(uarg); > return copied + copied_syn; > > +do_error: > + skb = tcp_write_queue_tail(sk); > do_fault: > - if (!skb->len) { > - tcp_unlink_write_queue(skb, sk); > - /* It is the one place in all of TCP, except connection > - * reset, where we can be unlinking the send_head. > - */ > - if (tcp_write_queue_empty(sk)) > - tcp_chrono_stop(sk, TCP_CHRONO_BUSY); > - sk_wmem_free_skb(sk, skb); > - } > + tcp_remove_empty_skb(sk, skb); > > -do_error: > if (copied + copied_syn) > goto out; > out_err: > -- > 2.23.0.187.g17f5b7556c-goog >