On Wed, Apr 27, 2016 at 10:25:46PM -0700, Eric Dumazet wrote: > Most of TCP stack assumed it was running from BH handler. > > This is great for most things, as TCP behavior is very sensitive > to scheduling artifacts. > > However, the prequeue and backlog processing are problematic, > as they need to be flushed with BH being blocked. > > To cope with modern needs, TCP sockets have big sk_rcvbuf values, > in the order of 16 MB. > This means that backlog can hold thousands of packets, and things > like TCP coalescing or collapsing on this amount of packets can > lead to insane latency spikes, since BH are blocked for too long.
And due to that, it may potentially lead to packet drops on NIC ring buffers. Great, thanks Eric. > It is time to make UDP/TCP stacks preemptible. > > Note that fast path still runs from BH handler. > > Eric Dumazet (6): > tcp: do not assume TCP code is non preemptible > tcp: do not block bh during prequeue processing > dccp: do not assume DCCP code is non preemptible > udp: prepare for non BH masking at backlog processing > sctp: prepare for socket backlog behavior change > net: do not block BH while processing socket backlog > > net/core/sock.c | 22 +++------ > net/dccp/input.c | 2 +- > net/dccp/ipv4.c | 4 +- > net/dccp/ipv6.c | 4 +- > net/dccp/options.c | 2 +- > net/ipv4/tcp.c | 6 +-- > net/ipv4/tcp_cdg.c | 20 ++++---- > net/ipv4/tcp_cubic.c | 20 ++++---- > net/ipv4/tcp_fastopen.c | 12 ++--- > net/ipv4/tcp_input.c | 126 > +++++++++++++++++++---------------------------- > net/ipv4/tcp_ipv4.c | 14 ++++-- > net/ipv4/tcp_minisocks.c | 2 +- > net/ipv4/tcp_output.c | 7 ++- > net/ipv4/tcp_recovery.c | 4 +- > net/ipv4/tcp_timer.c | 10 ++-- > net/ipv4/udp.c | 4 +- > net/ipv6/tcp_ipv6.c | 12 ++--- > net/ipv6/udp.c | 4 +- > net/sctp/inqueue.c | 2 + > 19 files changed, 124 insertions(+), 153 deletions(-) > > -- > 2.8.0.rc3.226.g39d4020 >