Control: forwarded -1 https://patchwork.ozlabs.org/project/netdev/patch/20201127180651.80283-1-bl...@debian.org/ Control: tags -1 patch
On Sat, 18 Jan 2020 17:06:55 +0000 Etienne Dechamps <etie...@edechamps.fr> wrote: > Package: iproute2 > Version: 4.20.0-2 > Severity: important > Control: found -1 5.4.0-1 > > The ipnetns.c:netns_add() function sets up the /var/run/netns mount > point in a way that is fragile to race conditions if the routine is > entered to by multiple processes at the same time. > > If the race condition is triggered, some kind of mount point recursion > explosion seems to happen, messing up the entire system in > "interesting" ways. For example, /proc/self/mountinfo ends up with > tons of duplicate entries, and the mountinfo file itself becomes so > large that the entire system tends to slow down. Also, subsequent > netns add commands might fail with the following error message (note > that this doesn't always happen): > > mount --bind /run/netns /run/netns failed: No space left on device > > Since it is a race condition, the issue is hard to reproduce on its > own, but it is possible to force it to happen by using strace to > inject an artificial delay in the mount() system call. See below. > > I observed this race condition happen multiple times on a real > production system during the boot process. This is because this > particular system sets up network namespaces using systemd units. > Because systemd is designed to start units in parallel, and due to > cold caches, multiple units running "netns add" end up synchronizing > with each other, making it quite likely the race condition will be > triggered. > > STEPS TO REPRODUCE > > Do NOT follow this procedure on a system you care about. This > procedure WILL mess up your system and likely require you to reboot! > > 1. Start from a fresh system that never ran "netns add" since boot (or > just unmount /var/run/netns manually). I can reproduce it on Debian > Buster (iproute2 4.20.0-2) as well as latest Sid (5.4.0-1). > > 2. Run the following bash script: > --- > for i in {0..9} > do > strace -e trace=mount -e inject=mount:delay_exit=1000000 ip > netns add "testnetns$i" 2>&1 | tee "$i.log" & > done > wait > --- > > 3. Look at /proc/self/mountinfo. Hilarity ensues. > > If you increase the count in the script you might even get to see some > "mount failed: No space left on device" errors. > > WORKAROUND > > Make sure that the first "netns add" command that runs after boot > cannot run in parallel with any other "netns add" command. flock(1) > might be useful here. I guess setting up the /var/run/netns point Can reproduce the issue - sent a patch to use flock(): https://patchwork.ozlabs.org/project/netdev/patch/20201127180651.80283-1-bl...@debian.org/ Please test it as well, cannot reproduce anymore once the fix is in. I'll also look into the new context-based mount API that was added recently, although this will be needed for backward compatibility anyway. -- Kind regards, Luca Boccassi
signature.asc
Description: This is a digitally signed message part