On Fri, 15 Mar 2019 14:26:10 -0700 Eric Dumazet <eric.duma...@gmail.com> wrote:
> On 03/15/2019 02:08 PM, Stefano Brivio wrote: > > On Fri, 15 Mar 2019 11:56:01 -0700 > > Eric Dumazet <eric.duma...@gmail.com> wrote: > > > >> On 03/15/2019 11:02 AM, David Miller wrote: > >>> From: Eric Dumazet <eric.duma...@gmail.com> > >>> Date: Fri, 15 Mar 2019 09:06:25 -0700 > >>> > >>>> > >>>> > >>>> On 03/15/2019 08:28 AM, Stefano Brivio wrote: > >>>>> On Fri, 15 Mar 2019 23:18:52 +0800 > >>>>> Zhiqiang Liu <liuzhiqian...@huawei.com> wrote: > >>>>> > >>>>>> In vxlan_destroy_tunnels func, unregister_netdevice_queue is called > >>>>>> after > >>>>>> gro_cells_destroy func. However, in unregister_netdevice_queue func, > >>>>>> the > >>>>>> gro_cells_destroy func will also call the gro_cells_destroy func as the > >>>>>> following routine: > >>>>>> unregister_netdevice_many() -> rollback_registered_many() > >>>>>> -> ndo_uninit() -> gro_cells_destroy() > >>>>>> > >>>>>> Signed-off-by: Suanming.Mou <mousuanm...@huawei.com> > >>>>>> Reviewed-by: Zhiqiang Liu <liuzhiqian...@huawei.com> > >>>>>> Reviewed-by: Stefano Brivio <sbri...@redhat.com> > >>>>> > >>>>> NACK, please read my and Eric's comments to v1 -- giving me more than 23 > >>>>> minutes to answer would have been a nice touch as well :) > >>>>> > >>>> > >>>> Sorry for the confusion, I forgot to add the question marks to my > >>>> sentences. > >>>> > >>>> In fact, this is a bug fix, that we missed in the previous fix. > >>>> > >>>> Technically the bug is older. > >>> > >>> Please elaborate. > >>> > >> > >> Commit ad6c9986bcb62 > >> ("vxlan: Fix GRO cells race condition between receive and link delete") > >> > >> fixed a race condition for the typical case a vxlan device is dismantled > >> from the > >> current netns. > >> > >> But if a netns is dismantled, we call vxlan_destroy_tunnels() > >> to schedule a unregister_netdevice_queue() of all the vxlan tunnels > >> that are related to this netns. > > > > Won't that happen via ops_exit_list() only after synchronize_rcu() is > > called by cleanup_net(), though? Is there another path I missed? > > Just look at vxlan_destroy_tunnels(). > > The call to gro_cells_destroy(&vxlan->gro_cells); > is done _before_ > unregister_netdevice_queue(vxlan->dev, head); > > So packets can still fly, the RCU grace period has not yet started. Wait, what... :/ thanks for pointing that out, I guess it was too obvious for me to notice. Zhiqiang, could you maybe update the commit message with these two bits of information (the real issue explained by Eric, and the different Fixes: tag), and post v3? This would be an actual fix and not a clean-up, so it doesn't need to wait for net-next to re-open. -- Stefano