Hello, On Wed, 31 May 2017, Sowmini Varadhan wrote:
> On (06/01/17 00:41), Julian Anastasov wrote: > > > > So, we do not hold reference to neigh while accessing > > its fields. I suspect we need to move the table lock from > > neigh_remove_one here, for example: > > good point, let me think over your suggestion carefully (it sounds > right, I want to make sure I dont miss any other race-windows) > and post patch v4 tomorrow.. Another problem is that neigh_update() changes the state but before we go and unlink the entry another CPU can reactivate the entry, i.e. NUD_INCOMPLETE entered in __neigh_event_send(). So, there will be always some small window where surprises can happen and the entry is not really deleted. Checking for NUD_FAILED may not be needed, the atomic_read(&n->refcnt) == 1 check under lock ensures that we are in some state without timer, i.e. not in NUD_INCOMPLETE, so it is ok to remove the entry. > > Another solution to cause faster removal would be > > to cancel the gc_work and to schedule it after 1 jiffie. > > It helps when many entries are deleted at once but the > > work prefers to just sleep when gc_thresh1 is not reached, > > so such solution is not good enough. > > Right the other drawback of relying on gc for cleanup is > that it doesn't give higher preference to cleaning up FAILED > entries first- so it can end up cleaning up other useful entries > (as I was pointing out to David Ahern) I see, ok. Regards -- Julian Anastasov <j...@ssi.bg>