Short summary: ipsec uses a struct dst_ops object per net-namespace (e.g. per container), but does not correctly initialize each dst_ops object's percpu counter. This results in incorrect values for each net namespace's dst_ops counter.
Full details: ipsec uses xfrm objects, which contain dst objects, which are tracked via the xfrm dst_ops struct in its percpu counter. However, ipsec creates a dst_ops object for every net namespace, not just the main net namespace. A dst_ops template is created, and its contents copied to the dst_ops object for each new net namespace. However, ipsec only initializes the percpu counter in the dst_ops object once - for the template. The way percpu counters work is, the percpu counter object has a main counter variable, and a pointer to the percpu counter variables. The percpu variables only go up to a small "batch" size (32 or so), at which point the percpu variable's count is moved to the main counter variable. However since multiple ipsec net namespaces are all using different main counter variables but the same percpu counter variables, the count from each percpu variable can be moved to a different net namespace. The result is, one net namespace (i.e. container) may have its xfrm count decrease to below 0, while another net namespace may have its xfrm count increase forever, and eventually cause complete ipsec failure once the xfrm4_gc_thresh limit is exceeded. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1486670 Title: using ipsec, many connections result in no buffer space error Status in linux package in Ubuntu: In Progress Status in linux source package in Precise: In Progress Status in linux source package in Trusty: In Progress Status in linux source package in Wily: In Progress Bug description: Reproduction info: set up two LXC containers (although this probably isn't specific to LXC containers), and inside each setup ipsec with something similar to: conn nodeN aggressive=yes authby=secret auto=start closeaction=restart dpdaction=restart esp=aes256-aes256gmac-modp1024 ike=aes256-sha512-modp1024 keyexchange=ikev2 left=10.0.3.145 leftid=10.0.3.145 lifetime=12h reauth=no right=10.0.3.199 type=transport then repeatedly open connections to the peer, e.g.: while true; do ping -c1 10.0.3.199 ; sleep 0.1 ; done eventually, the connections will fail with: connect: No buffer space available the reproduction can be sped up by reducing the xfrm4_gc_thresh, e.g.: echo 5 > /proc/sys/net/ipv4/xfrm4_gc_thresh Once the error occurs, no more connections can be made to the peer (all fail with no buffer space available), however after a long period (e.g. overnight) the buffers will be cleaned up and connections can be made again. this happens even on the latest net-next kernel. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1486670/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp