On Fri, Jun 24, 2016 at 6:22 PM, Willem de Bruijn
<willemdebruijn.ker...@gmail.com> wrote:
> On Fri, Jun 24, 2016 at 4:41 PM, Eric W. Biederman
> <ebied...@xmission.com> wrote:
>> Willem de Bruijn <willemdebruijn.ker...@gmail.com> writes:
>>
>>> From: Willem de Bruijn <will...@google.com>
>>>
>>> Socket destruction is only broadcast for a socket sk if a diag
>>> listener is registered and sk is not a kernel socket.
>>>
>>> Invert the test to not even check for listeners for kernel sockets.
>>>
>>> The sock_diag_has_destroy_listeners invocation dereferences
>>> sock_net(sk), which for kernel sockets can be invalid as they do not
>>> take a reference on the network namespace.
>>
>> No.  That isn't so.  A kernel socket for a network namespace must be
>> destroyed in the network namespace teardown.

I spent some more time looking at this.

inet_ctl_sock_destroy does not destroy the socket if there are still
skbuff with a reference on it (or its sk_wmem_alloc). Skbs are
orphaned when they leave the namespace through dev_forward_skb, but
not when sent out a physical nic (correctly, that would break TSQ).

The bug happened with macvlan on top of bonding on top of a physical
nic. The macvlan lives in a temporary namespace. After the macvlan and
network namespace are destroyed, the physical device has a TCP RST skb
from net.ipv4->tcp_sk queued for tx completion.

I have not able to reproduce this exact scenario, likely because tx
completion handling is on the order of microseconds and not easily
slowed sufficiently for testing. Using a tap device with skb_orphan
commented out, I can cause the issue. Commenting out skb_orrphan is
clearly a gross hack. The point I wanted to verify is that underlying
device is not stopped --and its queues cleaned of skb-- when the
macvlan device is destroyed.

Network namespace teardown is complex. Am I missing a step that does
prevents the above, or does this indeed sound feasible in principle
(if very unlikely in practice)?

Reply via email to