On Mon, 2012-10-22 at 16:11 +0200, Lennart Poettering wrote: > On Sun, 21.10.12 17:25, Michael H. Warfield ([email protected]) wrote: > > > Hello, > > > > This is being directed to the systemd-devel community but I'm cc'ing the > > lxc-users community and the Fedora community on this for their input as > > well. I know it's not always good to cross post between multiple lists > > but this is of interest to all three communities who may have valuable > > input. > > > > I'm new to this particular list, just having joined after tracking a > > problem down to some systemd internals... > > > > Several people over the last year or two on the lxc-users list have been > > discussions trying to run certain distros (notably Fedora 16 and above, > > recent Arch Linux and possibly others) in LXC containers, virualizing > > entire servers this way. This is very similar to Virtuoso / OpenVZ only > > it's using the native Linux cgroups for the containers (primary reason I > > dumped OpenVZ was to avoid their custom patched kernels). These recent > > distros have switched to systemd for the main init process and this has > > proven to be disastrous for those of us using LXC and trying to install > > or update our containers.
> Note that it is explicitly our intention to make running systemd inside > of containers as smooth as possibly. The notes Kay linked summarize what > the container manager needs to do for best integration. > > To summarize the problem... The LXC startup binary sets up various > > things for /dev and /dev/pts for the container to run properly and this > > works perfectly fine for SystemV start-up scripts and/or Upstart. > > Unfortunately, systemd has mounts of devtmpfs on /dev and devpts > > on /dev/pts which then break things horribly. This is because the > > kernel currently lacks namespaces for devices and won't for some time to > > come (in design). When devtmpfs gets mounted over top of /dev in the > > container, it then hijacks the hosts console tty and several other > > devices which had been set up through bind mounts by LXC and should have > > been LEFT ALONE. > Please initialize a minimal tmpfs on /dev. systemd will then work fine. My containers have a reasonable /dev that work with Upstart just fine but they are not on tmpfs. Is mounting tmpfs on /dev and recreating that minimal /dev required? > > Yes! I recognize that this problem with devtmpfs and lack of namespaces > > is a potential security problem anyways that could (and does) cause > > serious container-to-host problems. We're just not going to get that > > fixed right away in the linux cgroups and namespaces. > No, devtmpfs really doesn't need updating, containers simply shouldn't > use it. Ok, yeah. That seems to be at the heart of the problem we're trying to solve. > > How do we work around this problem in systemd where it has hard coded > > mounts in the binary that we can't override or configure? Or is it > > there and I'm just missing it trying to examine the sources? That's how > > I found where the problem lay. > systemd will make use of pre-existing mounts if they exist, and only > mount something new if they don't exist. So you're saying that, if we have something mounted on /dev, that's what prevents systemd from mounting devtmpfs on /dev? That could be problematical. Tested out a couple of options there that didn't work. That's going to take some effort. > Note that there are reports that LXC has issues with the fact that newer > systemd enables shared mount propagation for all mounts by default (this > should actually be beneficial for containers as this ensures that new > mounts appear in the containers). LXC when run on such a system fails as > soon as it tries to use pivot_root(), as that is incompatible with > shared mount propagation. The needs fixing in LXC: it should use MS_MOVE > or MS_BIND to place the new root dir in / instead. A short term > work-around is to simply remount the root tree to private before > invoking LXC. But, I have systemd running on my host system (F17) and containers with sysvinit or upstart inits are all starting just fine. That sounds like it should impact all containers as pivot_root() is issued before systemd in the container is started. Or am I missing something here? That sounds like a problem for Serge and others to investigate further. I'll see about trying that workaround though. > Lennart > -- > Lennart Poettering - Red Hat, Inc. Regards, Mike -- Michael H. Warfield (AI4NB) | (770) 985-6132 | [email protected] /\/\|=mhw=|\/\/ | (678) 463-0932 | http://www.wittsend.com/mhw/ NIC whois: MHW9 | An optimist believes we live in the best of all PGP Key: 0x674627FF | possible worlds. A pessimist is sure of it!
signature.asc
Description: This is a digitally signed message part
_______________________________________________ systemd-devel mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/systemd-devel
