On Wed, Nov 26, 2025 at 06:14:43PM +0000, Jon Kohler wrote: > > > > On Nov 6, 2025, at 10:53 AM, Daniel P. Berrangé <[email protected]> wrote: > > > > On Thu, Nov 06, 2025 at 09:31:43AM -0700, Jon Kohler wrote: > >> Increase MAX_MEM_PREALLOC_THREAD_COUNT from 16 to 32. This was last > >> touched in 2017 [1] and, since then, physical machine sizes and VMs > >> therein have continue to get even bigger, both on average and on the > >> extremes. > >> > >> For very large VMs, using 16 threads to preallocate memory can be a > >> non-trivial bottleneck during VM start-up and migration. Increasing > >> this limit to 32 threads reduces the time taken for these operations. > >> > >> Test results from quad socket Intel 8490H (4x 60 cores) show a fairly > >> linear gain of 50% with the 2x thread count increase. > >> > >> --------------------------------------------- > >> Idle Guest w/ 2M HugePages | Start-up time > >> --------------------------------------------- > >> 240 vCPU, 7.5TB (16 threads) | 2m41.955s > >> --------------------------------------------- > >> 240 vCPU, 7.5TB (32 threads) | 1m19.404s > >> --------------------------------------------- > >> > >> Note: Going above 32 threads appears to have diminishing returns at > >> the point where the memory bandwidth and context switching costs > >> appear to be a limiting factor to linear scaling. For posterity, on > >> the same system as above: > >> - 32 threads: 1m19s > >> - 48 threads: 1m4s > >> - 64 threads: 59s > >> - 240 threads: 50s > >> > >> Additional thread counts also get less interesting as the amount of > >> memory is to be preallocated is smaller. Putting that all together, > >> 32 threads appears to be a sane number with a solid speedup on fairly > >> modern hardware. To go faster, we'd either need to improve the hardware > >> (CPU/memory) itself or improve clear_pages_*() on the kernel side to > >> be more efficient. > >> > >> [1] 1e356fc14bea ("mem-prealloc: reduce large guest start-up and migration > >> time.") > >> > >> Signed-off-by: Jon Kohler <[email protected]> > >> --- > >> util/oslib-posix.c | 2 +- > >> 1 file changed, 1 insertion(+), 1 deletion(-) > > > > Reviewed-by: Daniel P. Berrangé <[email protected]> > > Thanks, Daniel ! > > Is there anything else we need on this one? Want to > make sure it doesn’t get lost.
Paolo (CCd) is primary maintainer for this code and should queue it. > >> diff --git a/util/oslib-posix.c b/util/oslib-posix.c > >> index 3c14b72665..dc001da66d 100644 > >> --- a/util/oslib-posix.c > >> +++ b/util/oslib-posix.c > >> @@ -61,7 +61,7 @@ > >> #include "qemu/memalign.h" > >> #include "qemu/mmap-alloc.h" > >> > >> -#define MAX_MEM_PREALLOC_THREAD_COUNT 16 > >> +#define MAX_MEM_PREALLOC_THREAD_COUNT 32 > >> > >> struct MemsetThread; With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
