On 21/04/17 20:42, Linus Torvalds wrote: > On Fri, Apr 21, 2017 at 10:25 AM, Linus Torvalds > <torva...@linux-foundation.org> wrote: >> >> I'm assuming that the real cause is simply that "dev->reg_state" ends >> up being NETREG_UNREGISTERING or something. Maybe the BUG_ON() could >> be just removed, and replaced by the previous warning about >> NETREG_UNINITIALIZED. >> >> Something like the attached (TOTALLY UNTESTED) patch. > > .. might as well test it. > > That patch doesn't fix the problem, but it does show that yes, it was > NETREG_UNREGISTERING: > > unregister_netdevice: device pim6reg/ffff962dc4606000 was not registered (2) > > but then immediately afterwards we get > > general protection fault: 0000 [#1] SMP > Workqueue: netns cleanup_net > RIP: 0010:dev_shutdown+0xe/0xc0 > Call Trace: > rollback_registered_many+0x2a5/0x440 > unregister_netdevice_many+0x1e/0xb0 > default_device_exit_batch+0x145/0x170 > > which is due to a > > mov 0x388(%rdi),%eax > > where %rdi is 0xdead000000000090. That is at the very beginning of > dev_shutdown, it's "dev" itself that has that value, so it comes from > (_another_) invocation of rollback_registered_many(), when it does > that > > list_for_each_entry(dev, head, unreg_list) { > > so it seems to be a case of another "list_del() leaves list in bad > state", and it was the added test for "dev->reg_state != > NETREG_REGISTERED" that did that > > list_del(&dev->unreg_list); > > and left random contents in the unreg_list. > > So that "handle error case" was almost certainly just buggy too. > > And the bug seems to be that we're trying to unregister a netdevice > that has already been unregistered. > > Over to Eric and networking people. This oops is user-triggerable, and > leaves the machine in a bad state (the original BUG_ON() and the new > GP fault both happen while holding the RTNL, so networking is not > healthy afterwards. > > Linus >
Right, I've already posted a patch for ip6mr that should fix the issue. CCed you and LKML just now. Thanks, Nik