Hi Wei and Eric,

Thanks for the replies.

This was reported to us on the 5.4.61 kernel during a customer regression suite, so we don't have an exact reproducer unfortunately. From the trace logs we've added it seems like this is happening during IPv6 transport mode XFRM data transfer and the device is unregistered in the middle of it, but we've been unable to reproduce it ourselves.. We're open to trying out and sharing debug patches if needed though.

rt6_uncached_list_flush_dev() actually tries to replace the inet6_dev
with loopback_dev, and release the reference to the previous inet6_dev
by calling in6_dev_put(), which is actually doing the same thing as
ip6_dst_ifdown(). I don't understand why you say " a reference to the
inet6_dev is simply dropped".

Fair. I was going off the semantics used by the dst_dev_put() function which calls dst_ops->ifdown() explicitly. At least in the case of xfrm6_dst_ifdown() this swap of the loopback device and putting the refcount seems like it could be missing a few things.

The additional refcount to the DST is also released by doing the following:
                        if (rt_dev == dev) {
                                rt->dst.dev = blackhole_netdev;
                                dev_hold(rt->dst.dev);
                                dev_put(rt_dev);
                        }
Am I missing something?

That dev_put() is on the actual netdevice struct, not the inet6_dev associated with it. We're seeing many calls to icmp6_dst_alloc() and xfrm6_fill_dst() here, both of which seem to associate a reference to the inet6_dev struct with the DST in addition to the standard dev_hold() on the netdevice during the dst_alloc()/dst_init().

Thanks,
Sean

Reply via email to