On Tue, Sep 10, 2019 at 5:49 PM Eric Dumazet <eduma...@google.com> wrote: > > When tcp sends a TSO packet, adding a PSH flag on it > reduces the sojourn time of GRO packet in GRO receivers. > > This is particularly the case under pressure, since RX queues > receive packets for many concurrent flows. > > A sender can give a hint to GRO engines when it is > appropriate to flush a super-packet, especially when pacing > is in the picture, since next packet is probably delayed by > one ms. > > Having less packets in GRO engine reduces chance > of LRU eviction or inflated RTT, and reduces GRO cost. > > We found recently that we must not set the PSH flag on > individual full-size MSS segments [1] : > > Under pressure (CWR state), we better let the packet sit > for a small delay (depending on NAPI logic) so that the > ACK packet is delayed, and thus next packet we send is > also delayed a bit. Eventually the bottleneck queue can > be drained. DCTCP flows with CWND=1 have demonstrated > the issue. > > This patch allows to slowdown the aggregate traffic without > involving high resolution timers on senders and/or > receivers. > > It has been used at Google for about four years, > and has been discussed at various networking conferences. > > [1] segments smaller than MSS already have PSH flag set > by tcp_sendmsg() / tcp_mark_push(), unless MSG_MORE > has been requested by the user. > > Signed-off-by: Eric Dumazet <eduma...@google.com> > Cc: Soheil Hassas Yeganeh <soh...@google.com> > Cc: Neal Cardwell <ncardw...@google.com> > Cc: Yuchung Cheng <ych...@google.com> > Cc: Daniel Borkmann <dan...@iogearbox.net> > Cc: Tariq Toukan <tar...@mellanox.com>
Acked-by: Soheil Hassas Yeganeh <soh...@google.com> Eric, thank you for the patch and the very nice commit message! > --- > net/ipv4/tcp_output.c | 15 +++++++++++++-- > 1 file changed, 13 insertions(+), 2 deletions(-) > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c > index > 42abc9bd687a5fea627cbc7cfa750d022f393d84..fec6d67bfd146dc78f0f25173fd71b8b8cc752fe > 100644 > --- a/net/ipv4/tcp_output.c > +++ b/net/ipv4/tcp_output.c > @@ -1050,11 +1050,22 @@ static int __tcp_transmit_skb(struct sock *sk, struct > sk_buff *skb, > tcb = TCP_SKB_CB(skb); > memset(&opts, 0, sizeof(opts)); > > - if (unlikely(tcb->tcp_flags & TCPHDR_SYN)) > + if (unlikely(tcb->tcp_flags & TCPHDR_SYN)) { > tcp_options_size = tcp_syn_options(sk, skb, &opts, &md5); > - else > + } else { > tcp_options_size = tcp_established_options(sk, skb, &opts, > &md5); > + /* Force a PSH flag on all (GSO) packets to expedite GRO flush > + * at receiver : This slightly improve GRO performance. > + * Note that we do not force the PSH flag for non GSO packets, > + * because they might be sent under high congestion events, > + * and in this case it is better to delay the delivery of > 1-MSS > + * packets and thus the corresponding ACK packet that would > + * release the following packet. > + */ > + if (tcp_skb_pcount(skb) > 1) > + tcb->tcp_flags |= TCPHDR_PSH; > + } > tcp_header_size = tcp_options_size + sizeof(struct tcphdr); > > /* if no packet is in qdisc/device queue, then allow XPS to select > -- > 2.23.0.162.g0b9fbb3734-goog >