Control: forwarded -1 
https://patchwork.ozlabs.org/project/netdev/patch/20201127180651.80283-1-bl...@debian.org/
Control: tags -1 patch

On Sat, 18 Jan 2020 17:06:55 +0000 Etienne Dechamps <etie...@edechamps.fr> 
wrote:
> Package: iproute2
> Version: 4.20.0-2
> Severity: important
> Control: found -1 5.4.0-1
> 
> The ipnetns.c:netns_add() function sets up the /var/run/netns mount
> point in a way that is fragile to race conditions if the routine is
> entered to by multiple processes at the same time.
> 
> If the race condition is triggered, some kind of mount point recursion
> explosion seems to happen, messing up the entire system in
> "interesting" ways. For example, /proc/self/mountinfo ends up with
> tons of duplicate entries, and the mountinfo file itself becomes so
> large that the entire system tends to slow down. Also, subsequent
> netns add commands might fail with the following error message (note
> that this doesn't always happen):
> 
>   mount --bind /run/netns /run/netns failed: No space left on device
> 
> Since it is a race condition, the issue is hard to reproduce on its
> own, but it is possible to force it to happen by using strace to
> inject an artificial delay in the mount() system call. See below.
> 
> I observed this race condition happen multiple times on a real
> production system during the boot process. This is because this
> particular system sets up network namespaces using systemd units.
> Because systemd is designed to start units in parallel, and due to
> cold caches, multiple units running "netns add" end up synchronizing
> with each other, making it quite likely the race condition will be
> triggered.
> 
> STEPS TO REPRODUCE
> 
> Do NOT follow this procedure on a system you care about. This
> procedure WILL mess up your system and likely require you to reboot!
> 
> 1. Start from a fresh system that never ran "netns add" since boot (or
> just unmount /var/run/netns manually). I can reproduce it on Debian
> Buster (iproute2 4.20.0-2) as well as latest Sid (5.4.0-1).
> 
> 2. Run the following bash script:
> ---
> for i in {0..9}
> do
>         strace -e trace=mount -e inject=mount:delay_exit=1000000 ip
> netns add "testnetns$i" 2>&1 | tee "$i.log" &
> done
> wait
> ---
> 
> 3. Look at /proc/self/mountinfo. Hilarity ensues.
> 
> If you increase the count in the script you might even get to see some
> "mount failed: No space left on device" errors.
> 
> WORKAROUND
> 
> Make sure that the first "netns add" command that runs after boot
> cannot run in parallel with any other "netns add" command. flock(1)
> might be useful here. I guess setting up the /var/run/netns point

Can reproduce the issue - sent a patch to use flock():

https://patchwork.ozlabs.org/project/netdev/patch/20201127180651.80283-1-bl...@debian.org/

Please test it as well, cannot reproduce anymore once the fix is in.

I'll also look into the new context-based mount API that was added
recently, although this will be needed for backward compatibility
anyway.

-- 
Kind regards,
Luca Boccassi

Attachment: signature.asc
Description: This is a digitally signed message part

Reply via email to