Fwd: BUG: sk_backlog.len can overestimate

John Ousterhout Tue, 01 Oct 2019 08:47:18 -0700

(I accidentally dropped netdev on my earlier message... here is Eric's
response, which also didn't go to the group)


---------- Forwarded message ---------
From: Eric Dumazet <eric.duma...@gmail.com>
Date: Mon, Sep 30, 2019 at 6:53 PM
Subject: Re: BUG: sk_backlog.len can overestimate
To: John Ousterhout <ous...@cs.stanford.edu>

On 9/30/19 5:41 PM, John Ousterhout wrote:
> On Mon, Sep 30, 2019 at 5:14 PM Eric Dumazet <eric.duma...@gmail.com> wrote:
>>
>>
>>
>> On 9/30/19 4:58 PM, John Ousterhout wrote:
>>> As of 4.16.10, it appears to me that sk->sk_backlog_len does not
>>> provide an accurate estimate of backlog length; this reduces the
>>> usefulness of the "limit" argument to sk_add_backlog.
>>>
>>> The problem is that, under heavy load, sk->sk_backlog_len can grow
>>> arbitrarily large, even though the actual amount of data in the
>>> backlog is small. This happens because __release_sock doesn't reset
>>> the backlog length until it gets completely caught up. Under heavy
>>> load, new packets can be arriving continuously  into the backlog
>>> (which increases sk_backlog.len) while other packets are being
>>> serviced. This can go on forever, so sk_backlog.len never gets reset
>>> and it can become arbitrarily large.
>>
>> Certainly not.
>>
>> It can not grow arbitrarily large, unless a backport gone wrong maybe.
>
> Can you help me understand what would limit the growth of this value?
> Suppose that new packets are arriving as quickly as they are
> processed. Every time __release_sock calls sk_backlog_rcv, a new
> packet arrives during the call, which is added to the backlog,
> incrementing sk_backlog.len. However, sk_backlog_len doesn't get
> decreased when sk_backlog_rcv completes, since the backlog hasn't
> emptied (as you said, it's not "safe"). As a result, sk_backlog.len
> has increased, but the actual backlog length is unchanged (one packet
> was added, one was removed). Why can't this process repeat
> indefinitely, until eventually sk_backlog.len reaches whatever limit
> the transport specifies when it invokes sk_add_backlog? At this point
> packets will be dropped by the transport even though the backlog isn't
> actually very large.

The process is bounded by socket sk_rcvbuf + sk_sndbuf

bool tcp_add_backlog(struct sock *sk, struct sk_buff *skb)
{
        u32 limit = sk->sk_rcvbuf + sk->sk_sndbuf;

        ...
        if (unlikely(sk_add_backlog(sk, skb, limit))) {
            ...
            __NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPBACKLOGDROP);
        ...
}


Once the limit is reached, sk_backlog.len wont be touched, unless
__release_sock()
has processed the whole queue.


>
>>>
>>> Because of this, the "limit" argument to sk_add_backlog may not be
>>> useful, since it could result in packets being discarded even though
>>> the backlog is not very large.
>>>
>>
>>
>> You will have to study git log/history for the details, the limit _is_ 
>> useful,
>> and we reset the limit in __release_sock() only when _safe_.
>>
>> Assuming you talk about TCP, then I suggest you use a more recent kernel.
>>
>> linux-5.0 got coalescing in the backlog queue, which helped quite a bit.

Fwd: BUG: sk_backlog.len can overestimate

Reply via email to