Re: [PATCH RFC v4 10/44] KVM: guest_memfd: Add support for KVM_SET_MEMORY_ATTRIBUTES2

Sean Christopherson Wed, 08 Apr 2026 17:35:00 -0700

On Wed, Apr 08, 2026, Ackerley Tng wrote:
> Sean Christopherson <[email protected]> writes:
> > On Tue, Apr 07, 2026, Michael Roth wrote:
> >> On Tue, Apr 07, 2026 at 02:50:58PM -0700, Vishal Annapurve wrote:
> >> > > So I agree with Ackerley's proposal (which I guess is the same as 
> >> > > what's
> >> > > in this series).
> >> > >
> >> > > However, 1 other alternative would be to do what was suggested on the
> >> > > call, but require userspace to subsequently handle the shared->private
> >> > > conversion. I think that would be workable too.
> >> >
> >> > IIUC, Converting memory ranges to private after it essentially is
> >> > treated as private by the KVM CC backend will expose the
> >> > implementation to the same risk of userspace being able to access
> >> > private memory and compromise host safety which guest_memfd was
> >> > invented to address.
> >>
> >> Doh, fair point. Doing conversion as part of the populate call would allow
> >> us to use the filemap write-lock to avoid userspace being able to fault
> >> in private (as tracked by trusted entity) pages before they are
> >> transitioned to private (as tracked by KVM), so it's safer than having
> >> userspace drive it.
> >>
> >> But obviously I still think Ackerley's original proposal has more
> >> upsides than the alternatives mentioned so far.
> >
> > I'm a bit lost.  What exactly is/was Ackerley's original proposal?  If the 
> > answer
> > is "convert pages from shared=>private when populating via in-place 
> > conversion",
> > then I agree, because AFAICT, that's the only sane option.
> 
> Discussed this at PUCK today 2026-04-08.
> 
> The update is that the KVM_SET_MEMORY_ATTRIBUTES2 guest_memfd ioctl will
> now support the PRESERVE flag for TDX and SNP only if the setup for the
> VM in question hasn't yet been completed (KVM_TDX_FINALIZE_VM or
> KVM_SEV_SNP_LAUNCH_FINISH hasn't completed yet).
> 
> The populate flow will be
> 
> 1a. Get contents to be loaded in guest_memfd (src_addr: NULL) as shared
> OR
> 1b. Provide contents from some other userspace address (src_addr:
>     userspace address)
> 
> 2.  KVM_SET_MEMORY_ATTRIBUTES2(attribute: PRIVATE and flags: PRESERVE)
> 3.  KVM_SEV_SNP_LAUNCH_UPDATE() or KVM_TDX_INIT_MEM_REGION()
> ...
> 4.  KVM_SEV_SNP_LAUNCH_FINISH() or KVM_TDX_FINALIZE_VM()
> 
> This applies whether src_addr is some userspace address that is shared
> or NULL, so the non-in-place loading flow is not considered legacy. ARM
> CCA can still use that flow :)
> 
> Other than supporting PRESERVE only if the setup for the VM in question
> hasn't yet been completed, KVM's fault path will also not permit faults
> if the setup hasn't been completed. (Some exception setup will be used
> for TDX to be able to perform the required fault.)


Nit: as Mike (or Rick?) called out in PUCK, TDX's flow is now a separate path
thanks to commit 3ab3283dbb2c ("KVM: x86/mmu: Add dedicated API to map 
guest_memfd
pfn into TDP MMU").  I.e. it is NOT a fault in any way, shape, or form.

tdx_mem_page_add() already asserts pre_fault_allowed=false:

        if (KVM_BUG_ON(kvm->arch.pre_fault_allowed, kvm) ||
            KVM_BUG_ON(!kvm_tdx->page_add_src, kvm))
                return -EIO;

so I think we just to add similar checks in SEV and the MMU.  This can even be
done today as a hardening measure, as the rules aren't changing, we're just
doubling down on disallowing (pre-)faulting during pre-boot.

E.g.

diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 73cdcbccc89e..99f070cf2480 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -363,6 +363,9 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu 
*vcpu, gpa_t cr2_or_gpa,
        };
        int r;
 
+       if (KVM_BUG_ON(!vcpu->kvm->arch.pre_fault_allowed, vcpu->kvm))
+               return -EIO;
+
        if (vcpu->arch.mmu->root_role.direct) {
                /*
                 * Things like memslots don't understand the concept of a shared
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 2010b157e288..f0bbbda6e9c4 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2419,6 +2419,9 @@ static int snp_launch_update(struct kvm *kvm, struct 
kvm_sev_cmd *argp)
        if (!sev_snp_guest(kvm) || !sev->snp_context)
                return -EINVAL;
 
+       if (KVM_BUG_ON(kvm->arch.pre_fault_allowed, kvm)
+               return -EIO;
+
        if (copy_from_user(&params, u64_to_user_ptr(argp->data), 
sizeof(params)))
                return -EFAULT;

Re: [PATCH RFC v4 10/44] KVM: guest_memfd: Add support for KVM_SET_MEMORY_ATTRIBUTES2

Reply via email to