In some rare cases, inet_sk_rx_dst_set() may be called multiple times on the same dst, causing reference count leakage. Eventually, it prevents net_device to be destroyed. The bug then manifested as
unregister_netdevice: waiting for lo to become free. Usage count = 1 in the kernel log, preventing new network namespace creation. The patch works around the issue by checking whether the socket already has the same dst set. Signed-off-by: Kevin Xu <kaiwen...@hulu.com> --- net/ipv4/tcp_ipv4.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 575e19d..f425c14 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -1807,9 +1807,14 @@ void inet_sk_rx_dst_set(struct sock *sk, const struct sk_buff *skb) { struct dst_entry *dst = skb_dst(skb); - if (dst && dst_hold_safe(dst)) { - sk->sk_rx_dst = dst; - inet_sk(sk)->rx_dst_ifindex = skb->skb_iif; + if (dst) { + if (unlikely(dst == sk->sk_rx_dst)) + return; + + if (dst_hold_safe(dst)) { + sk->sk_rx_dst = dst; + inet_sk(sk)->rx_dst_ifindex = skb->skb_iif; + } } } EXPORT_SYMBOL(inet_sk_rx_dst_set); -- 1.9.1