On 03/15/2019 11:02 AM, David Miller wrote:
> From: Eric Dumazet <eric.duma...@gmail.com>
> Date: Fri, 15 Mar 2019 09:06:25 -0700
>
>>
>>
>> On 03/15/2019 08:28 AM, Stefano Brivio wrote:
>>> On Fri, 15 Mar 2019 23:18:52 +0800
>>> Zhiqiang Liu <liuzhiqian...@huawei.com> wrote:
>>>
>>>> In vxlan_destroy_tunnels func, unregister_netdevice_queue is called after
>>>> gro_cells_destroy func. However, in unregister_netdevice_queue func, the
>>>> gro_cells_destroy func will also call the gro_cells_destroy func as the
>>>> following routine:
>>>> unregister_netdevice_many() -> rollback_registered_many()
>>>> -> ndo_uninit() -> gro_cells_destroy()
>>>>
>>>> Signed-off-by: Suanming.Mou <mousuanm...@huawei.com>
>>>> Reviewed-by: Zhiqiang Liu <liuzhiqian...@huawei.com>
>>>> Reviewed-by: Stefano Brivio <sbri...@redhat.com>
>>>
>>> NACK, please read my and Eric's comments to v1 -- giving me more than 23
>>> minutes to answer would have been a nice touch as well :)
>>>
>>
>> Sorry for the confusion, I forgot to add the question marks to my sentences.
>>
>> In fact, this is a bug fix, that we missed in the previous fix.
>>
>> Technically the bug is older.
>
> Please elaborate.
>
Commit ad6c9986bcb62
("vxlan: Fix GRO cells race condition between receive and link delete")
fixed a race condition for the typical case a vxlan device is dismantled from
the
current netns.
But if a netns is dismantled, we call vxlan_destroy_tunnels()
to schedule a unregister_netdevice_queue() of all the vxlan tunnels
that are related to this netns.
This means that the gro_cells_destroy() call is done too soon,
for the same reasons explained in above commit .
We need to fully respect the RCU rules, and thus must remove the
gro_cells_destroy() call or risk use after-free.
The bug is day-0 I think.
commit 58ce31cca1ffe057f4744c3f671e3e84606d3d4a
Author: Tom Herbert <t...@herbertland.com>
Date: Wed Aug 19 17:07:33 2015 -0700
vxlan: GRO support at tunnel layer