On Mon, Feb 6, 2017 at 6:32 PM, Kaiwen Xu <ke...@kevxu.net> wrote: > Hi Cong, > > I did some more testing, seems like your second assumption is correct. > There is indeed some things holding the references to a particular dst > which preventing it to be gc'ed.
Excellent! > > I added logging to each dst_hold (or dst_hold_safe, or > skb_dst_force_safe) and dst_release, which formatted as following: > > <dev name> (<protocol>) [<dst addr>]: dst_release / dst_hold ... <refcnt> > <caller function> > > And inside dst_gc_task(), I added logging when gc delay occurred, > formatted as: > > [dst_gc_task] <dev name> (<protocol>): delayed <refcnt> > > I have the log attached. The following line looks suspicious: Feb 6 16:27:24 <hostname> kernel: [63589.458067] [dst_gc_task] lodebug (2): delayed 19 Looks like you ended up having one dst whose refcnt is 19 in GC, and this lasted for a rather long time for some reason. It is hard to know if it is a refcnt leak even with your log, since there were 4K+ refcnt'ing happened on that dst... Meanwhile, can you share your setup of your container? What network device do you use in your container? How is it connected to outside? Thanks.