On Linux 4.8-rc1 through 4-8-rc6 (latest rc), lxc fails start to Ubuntu 16.04 and Centos 7 containers [1], unless I first run "cgmanager -m name=systemd &" on the host, which, unlike the containers, was not running systemd or cgmanager.
Git bisect revealed that this behavior began with a commit entitled "cgroupns: Only allow creation of hierarchies in the initial cgroup namespace" [2], which appears to be an attempt to protect against a possible denial of service attack. Reversing the commit also restores successful commit the need to run that cgmanager process. [Eric and Tejun, I have bcc'ed you so you can be aware of this discussion thread, as you apparently respectively wrote and approved the commit.] Running that cgmanager invocation is pretty simple, and seems to me to be well worth closing a denial of service vulnerability, much as I dislike adding something systemd-specific to a non-systemd environment and adding a new dependency (lxc requires cgmanager on the host to run, I guess, any container that runs systemd). However, I am posting this message because I don't fully understand the problem, and, most importantly, I am wondering if I have stumbled on an unintended consequence of this commit that might have other indicate other potential breakage. If this new lxc behavior is completely acceptable, then I apologize for consuming people's time with it and hope that this message will allow others experiencing the same problem find an answer for it when they search the web. Adam Richter [1] Here is an example of failing to start one of these containers. $ sudo lxc-start --name ubuntu16.04_amd64 --foreground Failed to mount cgroup at /sys/fs/cgroup/systemd: Operation not permitted [!!!!!!] Failed to mount API filesystems, freezing. Freezing execution. [2] Here is the commit diff that triggers the new mishbehavior. commit 726a4994b05ff5b6f83d64b5b43c3251217366ce Author: Eric W. Biederman <[email protected]> Date: Fri Jul 15 06:36:44 2016 -0500 cgroupns: Only allow creation of hierarchies in the initial cgroup namespace Unprivileged users can't use hierarchies if they create them as they do not have privilieges to the root directory. Which means the only thing a hiearchy created by an unprivileged user is good for is expanding the number of cgroup links in every css_set, which is a DOS attack. We could allow hierarchies to be created in namespaces in the initial user namespace. Unfortunately there is only a single namespace for the names of heirarchies, so that is likely to create more confusion than not. So do the simple thing and restrict hiearchy creation to the initial cgroup namespace. Cc: [email protected] Fixes: a79a908fd2b0 ("cgroup: introduce cgroup namespaces") Signed-off-by: "Eric W. Biederman" <[email protected]> Signed-off-by: Tejun Heo <[email protected]> diff --git a/kernel/cgroup.c b/kernel/cgroup.c index e75efa8..e0be49f 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -2215,12 +2215,8 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type, goto out_unlock; } - /* - * We know this subsystem has not yet been bound. Users in a non-init - * user namespace may only mount hierarchies with no bound subsystems, - * i.e. 'none,name=user1' - */ - if (!opts.none && !capable(CAP_SYS_ADMIN)) { + /* Hierarchies may only be created in the initial cgroup namespace. */ + if (ns != &init_cgroup_ns) { ret = -EPERM; goto out_unlock; } _______________________________________________ lxc-users mailing list [email protected] http://lists.linuxcontainers.org/listinfo/lxc-users
