The attach patch fixes the issue.

The oops is called from cleanup_net when the namespace is destroyed.
conntrack iterates through outstanding events and calls death_by_timeout
on each of them, which in turn produces a call to
ctnetlink_conntrack_event. This calls nf_netlink_has_listeners, which
oopses because net->nfnl is NULL.

I made the container through (essentially) 'unshare -n'; I didn't
explicitly set up netlink, but I presume it was set up else net->nfnl
would have been NULL earlier (i.e. when an earlier connection timed
out). This would thus suggest that net->nfnl is made NULL during the
destruction of the container, which I think is done by
nfnetlink_net_exit_batch.

I can see that the various subsystems are deinitialised in the opposite
order to which the relevant register_pernet_subsys calls are called, and
both nf_conntrack and nfnetlink_net_ops register their relevant
subsystems. If nfnetlink_net_ops registered later than nfconntrack, then
its exit routine would have been called first, which would cause the
oops described. I am not sure there is anything to prevent this
happening in a container environment.

Whilst there's perhaps a more complex problem revolving around ordering
of subsystem deinit, it seems to me that missing a netlink event on a
container that is dying is not a disaster. An early check for net->nfnl
being non-NULL in ctnetlink_conntrack_event appears to fix this. There
may remain a potential race condition if it becomes NULL immediately
after being checked (I am not sure any lock is held at this point or how
synchronisation for subsystem deinitialization works).

This patch should apply on everything from 2.6.26 (if not before)
onwards; it appears to be a problem on all kernels. This was taken
against Ubuntu-3.0.0-11.17. I have torture-tested it with the above perl
script for 15 minutes or so; the perl script hung the machine within 20
seconds without this patch.


** Patch added: "Patch to fix oops"
   
https://bugs.launchpad.net/ubuntu/+source/linux-lts-backport-natty/+bug/843892/+attachment/2382583/+files/0001-Check-net-nfnl-for-NULL-in-ctnetlink_conntrack_event.patch

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/843892

Title:
  Repeatable kernel oops on container delete

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-lts-backport-natty/+bug/843892/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to