Sure, will raise a patch post testing.
On Thu, Dec 31, 2020 at 10:53:59AM -0800, Cong Wang wrote: > On Tue, Dec 29, 2020 at 8:06 AM Chinmay Agarwal <china...@codeaurora.org> > wrote: > > > > Hi All, > > > > We found a crash while performing some automated stress tests on a 5.4 > > kernel based device. > > > > We found out that it there is a freed neighbour address which was still > > part of the gc_list and was leading to crash. > > Upon adding some debugs and checking neigh_put/neigh_hold/neigh_destroy > > calls stacks, looks like there is a possibility of a Race condition > > happening in the code. > [...] > > The crash may have been due to out of order ARP replies. > > As neighbour is marked dead should we go ahead with updating our ARP Tables? > > I think you are probably right, we should just do unlock and return > in __neigh_update() when hitting if (neigh->dead) branch. Something > like below: > > diff --git a/net/core/neighbour.c b/net/core/neighbour.c > index 9500d28a43b0..0ce592f585c8 100644 > --- a/net/core/neighbour.c > +++ b/net/core/neighbour.c > @@ -1250,6 +1250,7 @@ static int __neigh_update(struct neighbour > *neigh, const u8 *lladdr, > goto out; > if (neigh->dead) { > NL_SET_ERR_MSG(extack, "Neighbor entry is now dead"); > + new = old; > goto out; > } > > But given the old state probably contains NUD_PERMANENT, I guess > you hit the following branch instead: > > if (!(flags & NEIGH_UPDATE_F_ADMIN) && > (old & (NUD_NOARP | NUD_PERMANENT))) > goto out; > > So we may have to check ->dead before this. Please double check. > > This bug is probably introduced by commit 9c29a2f55ec05cc8b525ee. > Can you make a patch and send it out formally after testing? > > Thanks!