Hi, I currently can crash my net/master kernel by execute the following script:
--- snip modprobe dummy #mkdir /var/run/netns #touch /var/run/netns/init_net #mount --bind /proc/1/ns/net /var/run/netns/init_net while true do mkdir /var/run/netns touch /var/run/netns/init_net mount --bind /proc/1/ns/net /var/run/netns/init_net ip netns add foo ip netns exec foo ip link add dummy0 type dummy ip netns delete foo done --- snap After max ~1 minute the kernel will crash. Doing my hack of saving init_net outside the loop it will run fine... So the mount bind is necessary. The last message which I see is: BUG: stack guard page was hit at 00000000f0751759 (stack is 0000000069363195..0000000073ddc474) kernel stack overflow (double-fault): 0000 [#1] SMP PTI Modules linked in: CPU: 0 PID: 13917 Comm: ip Not tainted 4.16.0-11878-gef9d066f6808 #32 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 RIP: 0010:validate_chain.isra.23+0x44/0xc40 RSP: 0018:ffffc900002cbff8 EFLAGS: 00010002 RAX: 0000000000040000 RBX: 0e58b88e1d4d15da RCX: 0e58b88e1d4d15da RDX: 0000000000000000 RSI: ffff8802b25ee2a0 RDI: ffff8802b25edb00 RBP: 0e58b88e1d4d15da R08: 0000000000000000 R09: 0000000000000004 R10: ffffc900002cc050 R11: ffff8802b1054be8 R12: 0000000000000001 R13: ffff8802b25ee268 R14: ffff8802b25edb00 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff8802bfc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffc900002cbfe8 CR3: 0000000002024000 CR4: 00000000000006f0 Call Trace: ? get_max_files+0x10/0x10 __lock_acquire+0x332/0x710 lock_acquire+0x67/0xb0 ? lockref_put_or_lock+0x9/0x30 ? dput.part.7+0x17/0x2d0 _raw_spin_lock+0x2b/0x60 ? lockref_put_or_lock+0x9/0x30 lockref_put_or_lock+0x9/0x30 dput.part.7+0x1ec/0x2d0 drop_mountpoint+0x10/0x40 pin_kill+0x9b/0x3a0 ? wait_woken+0x90/0x90 ? mnt_pin_kill+0x2d/0x100 mnt_pin_kill+0x2d/0x100 cleanup_mnt+0x66/0x70 pin_kill+0x9b/0x3a0 ? wait_woken+0x90/0x90 ? mnt_pin_kill+0x2d/0x100 mnt_pin_kill+0x2d/0x100 cleanup_mnt+0x66/0x70 ... I guess maybe it has something to do with recently switching to migrate per-net ops to async. - Alex