On 07/04/2026 22:50, Vishal Annapurve wrote: > On Tue, Apr 7, 2026 at 2:09 PM Michael Roth <[email protected]> wrote: >> >>> TLDR: >>> >>> + Think of populate ioctls not as KVM touching memory, but platform >>> handling population. >>> + KVM code (kvm_gmem_populate) still doesn't touch memory contents >>> + post_populate is platform-specific code that handles loading into >>> private destination memory just to support legacy non-in-place >>> conversion. >>> + Don't complicate populate ioctls by doing conversion just to support >>> legacy use-cases where platform-specific code has to do copying on >>> the host. >> >> That's a good point: these are only considerations in the context of >> actually copying from src->dst, but with in-place conversion the >> primary/more-performant approach will be for userspace to initial >> directly. I.e. if we enforced that, then gmem could right ascertain that >> it isn't even writing to private pages via these hooks and any >> manipulation of that memory is purely on the part of the trusted entity >> handling initial encryption/etc. >> >> I understand that we decided to keep the option of allowing separate >> src/dst even with in-place conversion, but it doesn't seem worthwhile if >> that necessarily means we need to glue population+conversion together in >> 1 clumsy interface that needs to handle partial return/error responses to >> userspace (or potentially get stuck forever in the conversion path). > > I think ARM needs userspace to specify separate source and destination > memory ranges for initial population as ARM doesn't support in-place > memory encryption. [1]
Indeed - CCA requires KVM to first "delegate" the page (effectively the shared->private conversion) which will destroy the contents. Then we can populate the data (but that obviously has to come from elsewhere). The closest CCA can do to an in-place conversion is for the kernel to copy the data to another temporary buffer and then the firmware can copy it back after the delegation. An early version of the CCA Linux patches did this (long before guest_memfd). However this is slower than it needs to be (two copies) and difficult to size the temporary buffer. Too small and you round-trip to the firmware more than you need to, too large and you waste memory. And, with increasing support for huge pages in guest_memfd and the CCA firmware (aka RMM), it's also challenging to preserve huge pages while doing this dance so I want to avoid it if possible. > [1] https://lore.kernel.org/kvm/[email protected]/ > >> >> So I agree with Ackerley's proposal (which I guess is the same as what's >> in this series). >> >> However, 1 other alternative would be to do what was suggested on the >> call, but require userspace to subsequently handle the shared->private >> conversion. I think that would be workable too. > > IIUC, Converting memory ranges to private after it essentially is > treated as private by the KVM CC backend will expose the > implementation to the same risk of userspace being able to access > private memory and compromise host safety which guest_memfd was > invented to address. At least in the Arm CCA case the "exposure" of the private memory is only in terms of allowing population - and only before the guest has run. The host isn't able to access the memory in any direct way after the memory has been delegated. But the RMM provides this populate method to copy data into memory (in a measured/controlled manner). >From a CCA perspective the logical flow is to mark the memory as private and then call the platform-specific function to populate the memory. But obviously we can fit in a KVM API which is different. Note that CCA has a specific VM property called 'RIPAS' (Realm IPA State). This is the guest's view of whether memory exists at a particular physical address. My current series takes the view that all guest_memfd memory is private RAM and the guest will have to specifically request that it is converted to shared. I'm hoping this series might provide a way for the VMM to configure this (before the guest starts executing). Thanks, Steve >> >> One other benefit to Ackerley's/current approach however is that it allows >> us to potentially keep hugepages intact in the populate path, since >> prep'ing/encrypting everything while it's in a shared state means gmem will >> split the hugepage and all the firmware/RMP/etc. data structures will only >> be able to handle individual 4K pages. I still suspect doing things like >> encoding the initial 2MB OVMF image as a single hugepage might yield >> enough benefit to explore this (at some point). So there's some niceness >> in knowing that Ackerley's approach would allow for that eventually and >> not require a complete rethink on these same topics. >> >> Thanks, >> >> Mike >> >>> >>>>>> >>>>>> [...snip...] >>>>>>
