On Sat, Jul 18, 2020 at 12:14 AM Taehee Yoo <[email protected]> wrote: > > If register_netdevice() is failed, net_device should not be used > because variables are uninitialized or freed. > So, the routine should be stopped immediately. > But, bond_create() doesn't check return value of register_netdevice() > immediately. That will result in a panic because of using uninitialized > or freed memory. > > Test commands: > modprobe netdev-notifier-error-inject > echo -22 > /sys/kernel/debug/notifier-error-inject/netdev/\ > actions/NETDEV_REGISTER/error > modprobe bonding max_bonds=3 > > Splat looks like: > [ 375.028492][ T193] general protection fault, probably for non-canonical > address 0x6b6b6b6b6b6b6b6b: 0000 [#1] SMP DEBUG_PAGEALLOC PTI > [ 375.033207][ T193] CPU: 2 PID: 193 Comm: kworker/2:2 Not tainted > 5.8.0-rc4+ #645 > [ 375.036068][ T193] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), > BIOS 1.10.2-1ubuntu1 04/01/2014 > [ 375.039673][ T193] Workqueue: events linkwatch_event > [ 375.041557][ T193] RIP: 0010:dev_activate+0x4a/0x340 > [ 375.043381][ T193] Code: 40 a8 04 0f 85 db 00 00 00 8b 83 08 04 00 00 85 > c0 0f 84 0d 01 00 00 31 d2 89 d0 48 8d 04 40 48 c1 e0 07 48 03 83 00 04 00 00 > <48> 8b 48 10 f6 41 10 01 75 08 f0 80 a1 a0 01 00 00 fd 48 89 48 08 > [ 375.050267][ T193] RSP: 0018:ffff9f8facfcfdd8 EFLAGS: 00010202 > [ 375.052410][ T193] RAX: 6b6b6b6b6b6b6b6b RBX: ffff9f8fae6ea000 RCX: > 0000000000000006 > [ 375.055178][ T193] RDX: 0000000000000000 RSI: 0000000000000000 RDI: > ffff9f8fae6ea000 > [ 375.057762][ T193] RBP: ffff9f8fae6ea000 R08: 0000000000000000 R09: > 0000000000000000 > [ 375.059810][ T193] R10: 0000000000000001 R11: 0000000000000000 R12: > ffff9f8facfcfe08 > [ 375.061892][ T193] R13: ffffffff883587e0 R14: 0000000000000000 R15: > ffff9f8fae6ea580 > [ 375.063931][ T193] FS: 0000000000000000(0000) GS:ffff9f8fbae00000(0000) > knlGS:0000000000000000 > [ 375.066239][ T193] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 375.067841][ T193] CR2: 00007f2f542167a0 CR3: 000000012cee6002 CR4: > 00000000003606e0 > [ 375.069657][ T193] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 > [ 375.071471][ T193] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: > 0000000000000400 > [ 375.073269][ T193] Call Trace: > [ 375.074005][ T193] linkwatch_do_dev+0x4d/0x50 > [ 375.075052][ T193] __linkwatch_run_queue+0x10b/0x200 > [ 375.076244][ T193] linkwatch_event+0x21/0x30 > [ 375.077274][ T193] process_one_work+0x252/0x600 > [ 375.078379][ T193] ? process_one_work+0x600/0x600 > [ 375.079518][ T193] worker_thread+0x3c/0x380 > [ 375.080534][ T193] ? process_one_work+0x600/0x600 > [ 375.081668][ T193] kthread+0x139/0x150 > [ 375.082567][ T193] ? kthread_park+0x90/0x90 > [ 375.083567][ T193] ret_from_fork+0x22/0x30 > > Fixes: 9e2e61fbf8ad ("bonding: fix potential deadlock in bond_uninit()")
I doubt this is the first offending commit. At that time, the only thing after register_netdevice() was rtnl_unlock(). I think it is commit e826eafa65c6f1f7c8db5a237556cebac57ebcc5 which introduced the bug, as it is the first commit puts something between register_netdevice() and rtnl_unlock(). But this patch itself is obviously correct. Thanks.
