On Fri, Jun 24, 2016 at 6:22 PM, Willem de Bruijn <willemdebruijn.ker...@gmail.com> wrote: > On Fri, Jun 24, 2016 at 4:41 PM, Eric W. Biederman > <ebied...@xmission.com> wrote: >> Willem de Bruijn <willemdebruijn.ker...@gmail.com> writes: >> >>> From: Willem de Bruijn <will...@google.com> >>> >>> Socket destruction is only broadcast for a socket sk if a diag >>> listener is registered and sk is not a kernel socket. >>> >>> Invert the test to not even check for listeners for kernel sockets. >>> >>> The sock_diag_has_destroy_listeners invocation dereferences >>> sock_net(sk), which for kernel sockets can be invalid as they do not >>> take a reference on the network namespace. >> >> No. That isn't so. A kernel socket for a network namespace must be >> destroyed in the network namespace teardown.
I spent some more time looking at this. inet_ctl_sock_destroy does not destroy the socket if there are still skbuff with a reference on it (or its sk_wmem_alloc). Skbs are orphaned when they leave the namespace through dev_forward_skb, but not when sent out a physical nic (correctly, that would break TSQ). The bug happened with macvlan on top of bonding on top of a physical nic. The macvlan lives in a temporary namespace. After the macvlan and network namespace are destroyed, the physical device has a TCP RST skb from net.ipv4->tcp_sk queued for tx completion. I have not able to reproduce this exact scenario, likely because tx completion handling is on the order of microseconds and not easily slowed sufficiently for testing. Using a tap device with skb_orphan commented out, I can cause the issue. Commenting out skb_orrphan is clearly a gross hack. The point I wanted to verify is that underlying device is not stopped --and its queues cleaned of skb-- when the macvlan device is destroyed. Network namespace teardown is complex. Am I missing a step that does prevents the above, or does this indeed sound feasible in principle (if very unlikely in practice)?