On Wed, Jul 20, 2016 at 03:29:30PM +0200, Lennart Poettering wrote: > On Wed, 20.07.16 12:53, Daniel P. Berrange ([email protected]) wrote: > > > For virtualized hosts it is quite common to want to confine all host OS > > processes to a subset of CPUs/RAM nodes, leaving the rest available for > > exclusive use by QEMU/KVM. Historically people have used the "isolcpus" > > kernel arg todo this, but last year that had its semantics changed, so > > that any CPUs listed there also get excluded from load balancing by the > > schedular making it quite useless in general non-real-time use cases > > where you still want QEMU threads load-balanced across CPUs. > > > > So the only option is to use the cpuset cgroup controller to confine > > procosses. AFAIK, systemd does not have an explicit support for the cpuset > > controller at this time, so I'm trying to work out the "optimal" way to > > achieve this behind systemd's back while minimising the risk that future > > systemd releases will break things. > > Yes, we don't support this as of now, but we'd like to. The thing > though is that the kernel interface for it is pretty borked as it is > right now, and until that's not fixed we are unlikely going to support > this in systemd. (And as I understood Tejun the mem vs. cpu thing in > cpuset is probably not going to stay the way it is either) > > But note that the non-cgroup CPUAffinity= setting should be good > enough for many use cases. Are you sure that isn't sufficient for you? > > Also note that systemd supports setting a system-wide CPUAffinity= for > itself during early boot, thus leaving all unlisted CPUs free for > specific services where you use CPUAffinity= to change this default.
Ah, interesting, I didn't notice you could set that globally. > > The key factor here is use of "Before" to ensure this gets run immediately > > after systemd switches root out of the initrd, and before /any/ long lived > > services are run. This lets us set cpuset placement on systemd (pid 1) > > itself and have that inherited by everything it spawns. I felt this is > > better than trying to move processes after they have already started, > > because it ensures that any memory allocations get taken from the right > > NUMA node immediately. > > > > Empirically this approach seems to work on Fedora 23 (systemd 222) and > > RHEL 7 (systemd 219), but I'm wondering if there's any pitfalls that I've > > not anticipated here. > > Yes, PID 1 was moved to the special scope unit init.scope as mentioned > above (in preparation for cgroupsv2 where inner cgroups can never > contain PIDs). This is likely going to break then. cgroupsv2 is likely to break many things once distros switch over, so I assume that wouldn't be done in a minor update - only a major new distro release so, not so concerning. > But again, I have the suspicion that CPUAffinity= might already > suffice for you? Yep, it looks like it should suffice for most people, unless they also wish to have memory node restrictions enforced from boot. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| _______________________________________________ systemd-devel mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/systemd-devel
