On Fri, Jan 26, 2024 at 01:45:39PM +0300, Michael Tokarev wrote: > 26.01.2024 12:06, Daniel P. Berrangé wrote: > > On Fri, Jan 26, 2024 at 08:44:13AM +0100, Laurent Vivier wrote: > > > Le 25/01/2024 à 23:29, Michael Tokarev a écrit : > > > > > I think the way using sysconf(_SC_OPEN_MAX) is more portable, simpler and > > > cleaner than the one using /proc/self/fd. > > > > A fallback that uses _SC_OPEN_MAX is good for portability, but it is > > should not be considered a replacement for iterating over /proc/self/fd, > > rather an additional fallback for non-Linux, or when /proc is not mounted. > > It is not uncommon for _SC_OPEN_MAX to be *exceedingly* high > > > > $ podman run -it quay.io/centos/centos:stream9 > > [root@4a440d62935c /]# ulimit -n > > 524288 > > > > Iterating over 1/2 a million FDs is a serious performance penalty that > > we don't want to have, so _SC_OPEN_MAX should always be the last resort. > > From yesterday conversation in IRC which started this: > > <mmlb> open files (-n) 1073741816 > > (it is a docker container) > They weren't able to start qemu.. :) > > Sanity of such setting is questionable, but ok. > > Not only linux implement close_range(2) syscall, it is also > available on some *BSDs. > > And the most important point is, - we should aim at using O_CLOEXEC > everywhere, without this need to close each FD at exec time. I think > qemu is the only software with such paranoid closing when just running > an interface setup script..
We should try to use O_CLOEXEC everywhere, but at the same time QEMU links to a large number of libraries, and we can't assume that they've reliably used O_CLOEXEC. Non-QEMU owned code that is mapped in process likely dwarfs QEMU owned code by a factor of x10. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
