Hi :) this mail discusses my recent attempts of creating a cgroupfs, related problems and issues I encountered so far.
Problem statement ================= Linux has this feature called cgroups. It groups processes (threads) together in groups, furthermore so called controllers can be used to restrict the use of various resources (like cpu time, memory) on a per-group basis. A notable feature of cgroups is that a process cannot escape its group, and any children of a process are born into the same group as the parent. In order to create a cgroupfs on Hurd, one has to make the same guarantee. Currently the parental relationship of processes is a Hurd-only concept, established by the parent process calling proc_child. This is not robust enough, as a process can create a new process using task_create and not claim ownership of that process. cgroup documentation: https://www.kernel.org/doc/Documentation/cgroups/cgroups.txt Related problems ---------------- The proc server has to be privileged (needs the host-priv port) so that it can query the kernel for new processes (see proc/mgt.c (add_tasks)). It does that mainly when a process triggers a lookup like proc_task2proc for newly created tasks. Ironically then the task port for the new process is already known, still add_tasks requests *all* task ports from the kernel, only to mach_port_deallocate all the ones it already knows and to add new tasks (in my test it was always just one, the newly created task of which the task port is already known). I believe that this is done in the lookup mainly to do this periodically, so that any task will eventually get noticed (but see below). I do not know how expensive this is in practice, but it seems very wasteful and unnecessary to me. So there are three related issues: 1. Non-root users cannot start sub-hurds: https://savannah.gnu.org/bugs/?17341 2. Due to the process of discovering all task ports, any sub-hurd gets a handle of any task running on the system, so root users inside a sub-hurd can interfere with the operation of the parent hurd, which is undesirable from an isolation point of view. This is also related to 1. from a security point of view. 3. add_tasks is unnecessary and potentially wasteful. /hurd/proc is a dark corner indeed ---------------------------------- The routine description of proc_child looks harmless enough: /* Declare that a task is a child of the caller. The task's state will then inherit from the caller. This call can be made only once per task. */ routine proc_child ( process: process_t; child: task_t); But dragons are lurking here. sysdeps/mach/hurd/fork.c explains this best: /* Register the child with the proc server. It is important that this be that last thing we do before starting the child thread running. Once proc_child has been done for the task, it appears as a POSIX.1 process. Any errors we get must be detected before this point, and the child must have a message port so it responds to POSIX.1 signals. */ if (err = __USEPORT (PROC, __proc_child (port, newtask))) LOSE; So proc_child not only declares that the newly created process is ones task, but it also indicates that the process is all set up and ready to receive POSIX signals. Surely enough all hell broke loose when I tried to use my shiny new notification system (see below) to supply the parental relations instead of waiting for someone to call proc_child. Proposed solution ================= I propose a notification based system to fix all of the above issues: 1. The topmost /hurd/proc server registers for notifications at the kernel. The notifications are sent when a new task is created and carry the task ports of both the parent and the newly created task. Only one process can register for this notifications and it has to have the host_priv port to authenticate itself. This makes the kernel implementation quite tiny and unobtrusive. 2. Anyone can register for process change notifications at the proc server. These notifications carry the PID and PPID of newly created processes (and ones that died) and allow cgroupfs to implement the cgroup semantics. This information could also be obtained by polling the proc server and diffing the results, so it should not be necessary to restrict the usage of this interface. 3. A proc server running in a sub-hurd can register at the topmost proc server for new task notifications, like the topmost proc server registers with the kernel. The proc servers are just a little bit sub-hurd aware, and because of the robust parental relationship of the tasks (*not* processes) provided by the kernel it can track which sub-hurd a task belongs to and notify the appropriate proc server. Implementation ============== I've created a proof of concept implementation for points 1. and 2. I'll send it as follow-ups to this mail. It contains: * A general purpose notification library libhurdnotify. * A port of /hurd/init to libhurdnotify. * The new process notifications. * New task notifications in gnumach, the proc server registers for those and they arrive, though nothing useful is done with them atm. The cgroupfs repository can be found here: http://darnassus.sceen.net/gitweb/teythoon/cgroupfs.git/ The state of cgroupfs is described in my last blog post: https://teythoon.cryptobitch.de/posts/cgroupfs-is-as-cgroupy-as-it-gets/ I appreciate your input, Justus