On 12/10/20 6:12 PM, stran...@codeaurora.org wrote: >>> BTW, have you tried your previous proposed patch and confirmed it >>> would fix the issue? >>> > > Yes, we shared this with the customer and the refcount mismatch still > occurred, so this doesn't seem sufficient either. > >>> Could we further distinguish between dst added to the uncached list by >>> icmp6_dst_alloc() and xfrm6_fill_dst(), and confirm which ones are the >>> ones leaking reference? >>> I suspect it would be the xfrm ones, but I think it is worth verifying. >>> > > After digging into the DST allocation/destroy a bit more, it seems that > there are some cases where the DST's refcount does not hit zero, causing > them to never be freed and release their references. > One case comes from here on the IPv6 packet output path (these DST > structs would hold references to both the inet6_dev and the netdevice) > ip6_pol_route_output+0x20/0x2c -> ip6_pol_route+0x1dc/0x34c -> > rt6_make_pcpu_route+0x18/0xf4 -> ip6_rt_pcpu_alloc+0xb4/0x19c
This is the normal data path, and this refers to a per-cpu dst cache. Delete the route and the cached entries get removed. > > We also see two DSTs where they are stored as the xdst->rt entry on the > XFRM path that do not get released. One is allocated by the same path as > above, and the other like this > xfrm6_esp_err+0x7c/0xd4 -> esp6_err+0xc8/0x100 -> > ip6_update_pmtu+0xc8/0x100 -> __ip6_rt_update_pmtu+0x248/0x434 -> > ip6_rt_cache_alloc+0xa0/0x1dc This entry goes into an exception cache. I have lost track of kernel versions and features. Try listing the route cache to see these: ip -6 ro ls cache