On Tue, May 04, 2021 at 12:21:25PM +0200, David Hildenbrand wrote: > On 04.05.21 12:09, Daniel P. Berrangé wrote: > > On Wed, Apr 28, 2021 at 03:37:48PM +0200, David Hildenbrand wrote: > > > Let's support RAM_NORESERVE via MAP_NORESERVE on Linux. The flag has no > > > effect on most shared mappings - except for hugetlbfs and anonymous > > > memory. > > > > > > Linux man page: > > > "MAP_NORESERVE: Do not reserve swap space for this mapping. When swap > > > space is reserved, one has the guarantee that it is possible to modify > > > the mapping. When swap space is not reserved one might get SIGSEGV > > > upon a write if no physical memory is available. See also the > > > discussion > > > of the file /proc/sys/vm/overcommit_memory in proc(5). In kernels > > > before > > > 2.6, this flag had effect only for private writable mappings." > > > > > > Note that the "guarantee" part is wrong with memory overcommit in Linux. > > > > > > Also, in Linux hugetlbfs is treated differently - we configure reservation > > > of huge pages from the pool, not reservation of swap space (huge pages > > > cannot be swapped). > > > > > > The rough behavior is [1]: > > > a) !Hugetlbfs: > > > > > > 1) Without MAP_NORESERVE *or* with memory overcommit under Linux > > > disabled ("/proc/sys/vm/overcommit_memory == 2"), the following > > > accounting/reservation happens: > > > For a file backed map > > > SHARED or READ-only - 0 cost (the file is the map not swap) > > > PRIVATE WRITABLE - size of mapping per instance > > > > > > For an anonymous or /dev/zero map > > > SHARED - size of mapping > > > PRIVATE READ-only - 0 cost (but of little use) > > > PRIVATE WRITABLE - size of mapping per instance > > > > > > 2) With MAP_NORESERVE, no accounting/reservation happens. > > > > > > b) Hugetlbfs: > > > > > > 1) Without MAP_NORESERVE, huge pages are reserved. > > > > > > 2) With MAP_NORESERVE, no huge pages are reserved. > > > > > > Note: With "/proc/sys/vm/overcommit_memory == 0", we were already able > > > to configure it for !hugetlbfs globally; this toggle now allows > > > configuring it more fine-grained, not for the whole system. > > > > > > The target use case is virtio-mem, which dynamically exposes memory > > > inside a large, sparse memory area to the VM. > > > > Can you explain this use case in more real world terms, as I'm not > > understanding what a mgmt app would actually do with this in > > practice ? > > Let's consider huge pages for simplicity. Assume you have 128 free huge > pages in your hypervisor that you want to dynamically assign to VMs. > > Further assume you have two VMs running. A workflow could look like > > 1. Assign all huge pages to VM 0 > 2. Reassign 64 huge pages to VM 1 > 3. Reassign another 32 huge pages to VM 1 > 4. Reasssign 16 huge pages to VM 0 > 5. ... > > Basically what we're used to doing with "ordinary" memory.
What does this look like in terms of the memory backend configuration when you boot VM 0 and VM 1 ? Are you saying that we boot both VMs with -object hostmem-memfd,size=128G,hugetlb=yes,hugetlbsize=1G,reserve=off and then we have another property set on 'virtio-mem' to tell it how much/little of that 128 G, to actually give to the guest ? How do we change that at runtime ? > For that to work with virtio-mem, you'll have to disable reservation of huge > pages for the virtio-mem managed memory region. > > (prealloction of huge pages in virtio-mem to protect from user mistakes is a > separate work item) > > reserve=off will be the default for virtio-mem, and actual > reservation/preallcoation will be done within virtio-mem. There could be use > for "reserve=off" for virtio-balloon use cases as well, but I'd like to > exclude that from the discussion for now. The hostmem backend defaults are indepdant of frontend usage, so when you say reserve=off is the default for virtio-mem, are you expecting the mgmt app like libvirt to specify that ? Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|