On Thu, Jan 23, 2025 at 04:19:40PM +0300, Daniil Tatianin wrote: > Currently, passing mem-lock=on to QEMU causes memory usage to grow by > huge amounts: > > no memlock: > $ ./qemu-system-x86_64 -overcommit mem-lock=off > $ ps -p $(pidof ./qemu-system-x86_64) -o rss= > 45652 > > $ ./qemu-system-x86_64 -overcommit mem-lock=off -enable-kvm > $ ps -p $(pidof ./qemu-system-x86_64) -o rss= > 39756 > > memlock: > $ ./qemu-system-x86_64 -overcommit mem-lock=on > $ ps -p $(pidof ./qemu-system-x86_64) -o rss= > 1309876 > > $ ./qemu-system-x86_64 -overcommit mem-lock=on -enable-kvm > $ ps -p $(pidof ./qemu-system-x86_64) -o rss= > 259956 > > This is caused by the fact that mlockall(2) automatically > write-faults every existing and future anonymous mappings in the > process right away. > > One of the reasons to enable mem-lock is to protect a QEMU process' > pages from being compacted and migrated by kcompactd (which does so > by messing with a live process page tables causing thousands of TLB > flush IPIs per second) basically stealing all guest time while it's > active. > > mem-lock=on helps against this (given compact_unevictable_allowed is 0), > but the memory overhead it introduces is an undesirable side effect, > which we can completely avoid by passing MCL_ONFAULT to mlockall, which > is what this series allows to do with a new option for mem-lock called > on-fault. > > memlock-onfault: > $ ./qemu-system-x86_64 -overcommit mem-lock=on-fault > $ ps -p $(pidof ./qemu-system-x86_64) -o rss= > 54004 > > $ ./qemu-system-x86_64 -overcommit mem-lock=on-fault -enable-kvm > $ ps -p $(pidof ./qemu-system-x86_64) -o rss= > 47772 > > You may notice the memory usage is still slightly higher, in this case > by a few megabytes over the mem-lock=off case. I was able to trace this > down to a bug in the linux kernel with MCL_ONFAULT not being honored for > the early process heap (with brk(2) etc.) so it is still write-faulted in > this case, but it's still way less than it was with just the mem-lock=on. > > Changes since v1: > - Don't make a separate mem-lock-onfault, add an on-fault option to > mem-lock instead > > Changes since v2: > - Move overcommit option parsing out of line > - Make enable_mlock an enum instead > > Changes since v3: > - Rebase to latest master due to the recent sysemu -> system renames > > Daniil Tatianin (4): > os: add an ability to lock memory on_fault > system/vl: extract overcommit option parsing into a helper > system: introduce a new MlockState enum > overcommit: introduce mem-lock=on-fault > > hw/virtio/virtio-mem.c | 2 +- > include/system/os-posix.h | 2 +- > include/system/os-win32.h | 3 ++- > include/system/system.h | 12 ++++++++- > migration/postcopy-ram.c | 4 +-- > os-posix.c | 10 ++++++-- > qemu-options.hx | 14 +++++++---- > system/globals.c | 12 ++++++++- > system/vl.c | 52 +++++++++++++++++++++++++++++++-------- > 9 files changed, 87 insertions(+), 24 deletions(-)
Considering it's very mem relevant change and looks pretty benign.. I can pick this if nobody disagrees (or beats me to it, which I'd appreciate). I'll also provide at least one week for people to stop me. Thanks, -- Peter Xu