On Wed, Apr 08, 2026, Ackerley Tng wrote:
> Sean Christopherson <[email protected]> writes:
> > On Tue, Apr 07, 2026, Michael Roth wrote:
> >> On Tue, Apr 07, 2026 at 02:50:58PM -0700, Vishal Annapurve wrote:
> >> > > So I agree with Ackerley's proposal (which I guess is the same as
> >> > > what's
> >> > > in this series).
> >> > >
> >> > > However, 1 other alternative would be to do what was suggested on the
> >> > > call, but require userspace to subsequently handle the shared->private
> >> > > conversion. I think that would be workable too.
> >> >
> >> > IIUC, Converting memory ranges to private after it essentially is
> >> > treated as private by the KVM CC backend will expose the
> >> > implementation to the same risk of userspace being able to access
> >> > private memory and compromise host safety which guest_memfd was
> >> > invented to address.
> >>
> >> Doh, fair point. Doing conversion as part of the populate call would allow
> >> us to use the filemap write-lock to avoid userspace being able to fault
> >> in private (as tracked by trusted entity) pages before they are
> >> transitioned to private (as tracked by KVM), so it's safer than having
> >> userspace drive it.
> >>
> >> But obviously I still think Ackerley's original proposal has more
> >> upsides than the alternatives mentioned so far.
> >
> > I'm a bit lost. What exactly is/was Ackerley's original proposal? If the
> > answer
> > is "convert pages from shared=>private when populating via in-place
> > conversion",
> > then I agree, because AFAICT, that's the only sane option.
>
> Discussed this at PUCK today 2026-04-08.
>
> The update is that the KVM_SET_MEMORY_ATTRIBUTES2 guest_memfd ioctl will
> now support the PRESERVE flag for TDX and SNP only if the setup for the
> VM in question hasn't yet been completed (KVM_TDX_FINALIZE_VM or
> KVM_SEV_SNP_LAUNCH_FINISH hasn't completed yet).
>
> The populate flow will be
>
> 1a. Get contents to be loaded in guest_memfd (src_addr: NULL) as shared
> OR
> 1b. Provide contents from some other userspace address (src_addr:
> userspace address)
>
> 2. KVM_SET_MEMORY_ATTRIBUTES2(attribute: PRIVATE and flags: PRESERVE)
> 3. KVM_SEV_SNP_LAUNCH_UPDATE() or KVM_TDX_INIT_MEM_REGION()
> ...
> 4. KVM_SEV_SNP_LAUNCH_FINISH() or KVM_TDX_FINALIZE_VM()
>
> This applies whether src_addr is some userspace address that is shared
> or NULL, so the non-in-place loading flow is not considered legacy. ARM
> CCA can still use that flow :)
>
> Other than supporting PRESERVE only if the setup for the VM in question
> hasn't yet been completed, KVM's fault path will also not permit faults
> if the setup hasn't been completed. (Some exception setup will be used
> for TDX to be able to perform the required fault.)
Nit: as Mike (or Rick?) called out in PUCK, TDX's flow is now a separate path
thanks to commit 3ab3283dbb2c ("KVM: x86/mmu: Add dedicated API to map
guest_memfd
pfn into TDP MMU"). I.e. it is NOT a fault in any way, shape, or form.
tdx_mem_page_add() already asserts pre_fault_allowed=false:
if (KVM_BUG_ON(kvm->arch.pre_fault_allowed, kvm) ||
KVM_BUG_ON(!kvm_tdx->page_add_src, kvm))
return -EIO;
so I think we just to add similar checks in SEV and the MMU. This can even be
done today as a hardening measure, as the rules aren't changing, we're just
doubling down on disallowing (pre-)faulting during pre-boot.
E.g.
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 73cdcbccc89e..99f070cf2480 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -363,6 +363,9 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu
*vcpu, gpa_t cr2_or_gpa,
};
int r;
+ if (KVM_BUG_ON(!vcpu->kvm->arch.pre_fault_allowed, vcpu->kvm))
+ return -EIO;
+
if (vcpu->arch.mmu->root_role.direct) {
/*
* Things like memslots don't understand the concept of a shared
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 2010b157e288..f0bbbda6e9c4 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2419,6 +2419,9 @@ static int snp_launch_update(struct kvm *kvm, struct
kvm_sev_cmd *argp)
if (!sev_snp_guest(kvm) || !sev->snp_context)
return -EINVAL;
+ if (KVM_BUG_ON(kvm->arch.pre_fault_allowed, kvm)
+ return -EIO;
+
if (copy_from_user(¶ms, u64_to_user_ptr(argp->data),
sizeof(params)))
return -EFAULT;