On Fri, Feb 6, 2026 at 11:12 AM Sean Christopherson <[email protected]> wrote:
>
> On Fri, Feb 06, 2026, Jim Mattson wrote:
> > On Fri, Feb 6, 2026 at 10:23 AM Yosry Ahmed <[email protected]> wrote:
> > >
> > > February 6, 2026 at 10:19 AM, "Sean Christopherson" <[email protected]>
> > > wrote:
> > >
> > >
> > > >
> > > > On Thu, Feb 05, 2026, Jim Mattson wrote:
> > > >
> > > > >
> > > > > Cache g_pat from vmcb12 in svm->nested.gpat to avoid TOCTTOU issues,
> > > > > and
> > > > > add a validity check so that when nested paging is enabled for
> > > > > vmcb12, an
> > > > > invalid g_pat causes an immediate VMEXIT with exit code
> > > > > VMEXIT_INVALID, as
> > > > > specified in the APM, volume 2: "Nested Paging and VMRUN/VMEXIT."
> > > > >
> > > > > Fixes: 3d6368ef580a ("KVM: SVM: Add VMRUN handler")
> > > > > Signed-off-by: Jim Mattson <[email protected]>
> > > > > ---
> > > > > arch/x86/kvm/svm/nested.c | 4 +++-
> > > > > arch/x86/kvm/svm/svm.h | 3 +++
> > > > > 2 files changed, 6 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> > > > > index f72dbd10dcad..1d4ff6408b34 100644
> > > > > --- a/arch/x86/kvm/svm/nested.c
> > > > > +++ b/arch/x86/kvm/svm/nested.c
> > > > > @@ -1027,9 +1027,11 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu)
> > > > >
> > > > > nested_copy_vmcb_control_to_cache(svm, &vmcb12->control);
> > > > > nested_copy_vmcb_save_to_cache(svm, &vmcb12->save);
> > > > > + svm->nested.gpat = vmcb12->save.g_pat;
> > > > >
> > > > > if (!nested_vmcb_check_save(vcpu) ||
> > > > > - !nested_vmcb_check_controls(vcpu)) {
> > > > > + !nested_vmcb_check_controls(vcpu) ||
> > > > > + (nested_npt_enabled(svm) && !kvm_pat_valid(svm->nested.gpat))) {
> > > > > vmcb12->control.exit_code = SVM_EXIT_ERR;
> > > > > vmcb12->control.exit_info_1 = 0;
> > > > > vmcb12->control.exit_info_2 = 0;
> > > > > diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> > > > > index 986d90f2d4ca..42a4bf83b3aa 100644
> > > > > --- a/arch/x86/kvm/svm/svm.h
> > > > > +++ b/arch/x86/kvm/svm/svm.h
> > > > > @@ -208,6 +208,9 @@ struct svm_nested_state {
> > > > > */
> > > > > struct vmcb_save_area_cached save;
> > > > >
> > > > > + /* Cached guest PAT from vmcb12.save.g_pat */
> > > > > + u64 gpat;
> > > > >
> > > > Shouldn't this go in vmcb_save_area_cached?
> > >
> > > I believe Jim changed it after this discussion on v2:
> > > https://lore.kernel.org/kvm/[email protected]/.
>
> LOL, oh the irony:
>
> I'm going to cache it on its own to avoid confusion.
>
> > Right. The two issues with putting it in vmcb_save_area_cached were:
> >
> > 1. Checking all of vmcb_save_area_cached requires access to the
> > corresponding control area (or at least the boolean, "NTP enabled.")
>
> Checking the control area seems like the right answer (I went down that path
> before reading this).
>
> > 2. In the nested state serialization payload, everything else in the
> > vmcb_save_area_cached comes from L1 (host state to be restored at
> > emulated #VMEXIT.)
>
> Hmm, right, but *because* it's ignored, that gives us carte blanche to
> clobber it.
> More below.
>
> > The first issue was a little messy, but not that distasteful.
>
> I actually find it the opposite of distasteful. KVM definitely _should_ be
> checking the controls, not the vCPU state. If it weren't for needing to get
> at
> MAXPHYADDR in CPUID, I'd push to drop @vcpu entirely.
>
> > The second issue was really a mess.
>
> I'd rather have the mess contained and document though. Caching g_pat outside
> of vmcb_save_area_cached bleeds the mess into all of the relevant nSVM code,
> and
> doesn't leave any breadcrumbs in the code/comments to explain that it "needs"
> to
> be kept separate.
>
> AFAICT, the only "problem" is that g_pat in the serialization payload will be
> garbage when restoring state from an older KVM. But that's totally fine,
> precisely
> because L1's PAT isn't restored from vmcb01 on nested #VMEXIT, it's always
> resident
> in vcpu->arch.pat. So can't we just do this to avoid a spurious -EINVAL?
>
> /*
> * Validate host state saved from before VMRUN (see
> * nested_svm_check_permissions).
> */
> __nested_copy_vmcb_save_to_cache(&save_cached, save);
>
> /*
> * Stuff gPAT in L1's save state, as older KVM may not have saved L1's
> * gPAT. L1's PAT, i.e. hPAT for the vCPU, is *always* tracked in
> * vcpu->arch.pat, i.e. gPAT is a reflection of vcpu->arch.pat, not
> the
> * other way around.
> */
> save_cached.g_pat = vcpu->arch.pat;
Your comment is a bit optimistic. Qemu, for instance, hasn't restored
MSRs yet, so vcpu->arch.pat will actually be the current vCPU's PAT
(in the case of snapshot restore, some future PAT). But, in any case,
it should be a valid PAT.
> if (!(save->cr0 & X86_CR0_PG) ||
> !(save->cr0 & X86_CR0_PE) ||
> (save->rflags & X86_EFLAGS_VM) ||
> !nested_vmcb_check_save(vcpu, &ctl_cached, &save_cached))
Wrong ctl_cached. Those are the vmcb02 controls, but we are checking
the vmcb01 save state.
I think it would be better to add a boolean argument, "check_gpat,"
which will be false at this call site and nested_npt_enabled(vcpu) at
the other call site.
> goto out_free;
>
> Oh, and if we do plumb in @ctrl to __nested_vmcb_check_save(), I vote to
> opportunistically drop the useless single-use wrappers (probably in a
> standalone
> patch to plumb in @ctrl). E.g. (completely untested)
>
> ---
> arch/x86/kvm/svm/nested.c | 71 ++++++++++++++++++---------------------
> arch/x86/kvm/svm/svm.c | 2 +-
> arch/x86/kvm/svm/svm.h | 6 ++--
> 3 files changed, 35 insertions(+), 44 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index a7d6fc1382a7..a429947c8966 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -339,8 +339,8 @@ static bool nested_svm_check_bitmap_pa(struct kvm_vcpu
> *vcpu, u64 pa, u32 size)
> kvm_vcpu_is_legal_gpa(vcpu, addr + size - 1);
> }
>
> -static bool __nested_vmcb_check_controls(struct kvm_vcpu *vcpu,
> - struct vmcb_ctrl_area_cached
> *control)
> +static bool nested_vmcb_check_controls(struct kvm_vcpu *vcpu,
> + struct vmcb_ctrl_area_cached *control)
> {
> if (CC(!vmcb12_is_intercept(control, INTERCEPT_VMRUN)))
> return false;
> @@ -367,8 +367,9 @@ static bool __nested_vmcb_check_controls(struct kvm_vcpu
> *vcpu,
> }
>
> /* Common checks that apply to both L1 and L2 state. */
> -static bool __nested_vmcb_check_save(struct kvm_vcpu *vcpu,
> - struct vmcb_save_area_cached *save)
> +static bool nested_vmcb_check_save(struct kvm_vcpu *vcpu,
> + struct vmcb_ctrl_area_cached *ctrl,
> + struct vmcb_save_area_cached *save)
> {
> if (CC(!(save->efer & EFER_SVME)))
> return false;
> @@ -399,25 +400,13 @@ static bool __nested_vmcb_check_save(struct kvm_vcpu
> *vcpu,
> if (CC(!kvm_valid_efer(vcpu, save->efer)))
> return false;
>
> + if (CC(ctrl->nested_ctl & SVM_NESTED_CTL_NP_ENABLE) &&
> + !kvm_pat_valid(save->g_pat))
> + return false;
> +
> return true;
> }
>
> -static bool nested_vmcb_check_save(struct kvm_vcpu *vcpu)
> -{
> - struct vcpu_svm *svm = to_svm(vcpu);
> - struct vmcb_save_area_cached *save = &svm->nested.save;
> -
> - return __nested_vmcb_check_save(vcpu, save);
> -}
> -
> -static bool nested_vmcb_check_controls(struct kvm_vcpu *vcpu)
> -{
> - struct vcpu_svm *svm = to_svm(vcpu);
> - struct vmcb_ctrl_area_cached *ctl = &svm->nested.ctl;
> -
> - return __nested_vmcb_check_controls(vcpu, ctl);
> -}
> -
> /*
> * If a feature is not advertised to L1, clear the corresponding vmcb12
> * intercept.
> @@ -504,6 +493,9 @@ static void __nested_copy_vmcb_save_to_cache(struct
> vmcb_save_area_cached *to,
>
> to->dr6 = from->dr6;
> to->dr7 = from->dr7;
> +
> + to->g_pat = from->g_pat;
> +
> }
>
> void nested_copy_vmcb_save_to_cache(struct vcpu_svm *svm,
> @@ -644,17 +636,14 @@ static void nested_vmcb02_prepare_save(struct vcpu_svm
> *svm, struct vmcb *vmcb12
> svm->nested.force_msr_bitmap_recalc = true;
> }
>
> - if (npt_enabled) {
> - if (nested_npt_enabled(svm)) {
> - if (unlikely(new_vmcb12 ||
> - vmcb_is_dirty(vmcb12, VMCB_NPT))) {
> - vmcb02->save.g_pat = svm->nested.gpat;
> - vmcb_mark_dirty(vmcb02, VMCB_NPT);
> - }
> - } else {
> - vmcb02->save.g_pat = vcpu->arch.pat;
> + if (nested_npt_enabled(svm)) {
> + if (unlikely(new_vmcb12 || vmcb_is_dirty(vmcb12, VMCB_NPT))) {
> + vmcb02->save.g_pat = svm->nested.save.g_pat;
> vmcb_mark_dirty(vmcb02, VMCB_NPT);
> }
> + } else if (npt_enabled) {
> + vmcb02->save.g_pat = vcpu->arch.pat;
> + vmcb_mark_dirty(vmcb02, VMCB_NPT);
> }
>
> if (unlikely(new_vmcb12 || vmcb_is_dirty(vmcb12, VMCB_SEG))) {
> @@ -1028,11 +1017,9 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu)
>
> nested_copy_vmcb_control_to_cache(svm, &vmcb12->control);
> nested_copy_vmcb_save_to_cache(svm, &vmcb12->save);
> - svm->nested.gpat = vmcb12->save.g_pat;
>
> - if (!nested_vmcb_check_save(vcpu) ||
> - !nested_vmcb_check_controls(vcpu) ||
> - (nested_npt_enabled(svm) && !kvm_pat_valid(svm->nested.gpat))) {
> + if (!nested_vmcb_check_save(vcpu, &svm->nested.ctl,
> &svm->nested.save) ||
> + !nested_vmcb_check_controls(vcpu, &svm->nested.ctl)) {
> vmcb12->control.exit_code = SVM_EXIT_ERR;
> vmcb12->control.exit_info_1 = 0;
> vmcb12->control.exit_info_2 = 0;
> @@ -1766,7 +1753,7 @@ static int svm_get_nested_state(struct kvm_vcpu *vcpu,
> kvm_state.hdr.svm.vmcb_pa = svm->nested.vmcb12_gpa;
> if (nested_npt_enabled(svm)) {
> kvm_state.hdr.svm.flags |= KVM_STATE_SVM_VALID_GPAT;
> - kvm_state.hdr.svm.gpat = svm->nested.gpat;
> + kvm_state.hdr.svm.gpat = svm->nested.save.g_pat;
> }
> kvm_state.size += KVM_STATE_NESTED_SVM_VMCB_SIZE;
> kvm_state.flags |= KVM_STATE_NESTED_GUEST_MODE;
> @@ -1871,7 +1858,7 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu,
>
> ret = -EINVAL;
> __nested_copy_vmcb_control_to_cache(vcpu, &ctl_cached, ctl);
> - if (!__nested_vmcb_check_controls(vcpu, &ctl_cached))
> + if (!nested_vmcb_check_controls(vcpu, &ctl_cached))
> goto out_free;
>
> /*
> @@ -1887,15 +1874,21 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu,
> * nested_svm_check_permissions).
> */
> __nested_copy_vmcb_save_to_cache(&save_cached, save);
> +
> + /*
> + * Stuff gPAT in L1's save state, as older KVM may not have saved L1's
> + * gPAT. L1's PAT, i.e. hPAT for the vCPU, is *always* tracked in
> + * vcpu->arch.pat, i.e. hPAT is a reflection of vcpu->arch.pat, not
> the
> + * other way around.
> + */
> + save_cached.g_pat = vcpu->arch.pat;
> +
> if (!(save->cr0 & X86_CR0_PG) ||
> !(save->cr0 & X86_CR0_PE) ||
> (save->rflags & X86_EFLAGS_VM) ||
> - !__nested_vmcb_check_save(vcpu, &save_cached))
> + !nested_vmcb_check_save(vcpu, &ctl_cached, &save_cached))
> goto out_free;
>
> - /*
> - * Validate gPAT, if provided.
> - */
> if ((kvm_state->hdr.svm.flags & KVM_STATE_SVM_VALID_GPAT) &&
> !kvm_pat_valid(kvm_state->hdr.svm.gpat))
> goto out_free;
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index a6a44deec82b..bf8562a5f655 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -2862,7 +2862,7 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct
> msr_data *msr_info)
> WARN_ON_ONCE(msr_info->host_initiated && vcpu->wants_to_run);
> if (!msr_info->host_initiated && is_guest_mode(vcpu) &&
> nested_npt_enabled(svm))
> - msr_info->data = svm->nested.gpat;
> + msr_info->data = svm->nested.save.g_pat;
> else
> msr_info->data = vcpu->arch.pat;
> break;
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index a559cd45c8a9..6f07d8e3f06e 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -146,6 +146,7 @@ struct vmcb_save_area_cached {
> u64 cr0;
> u64 dr7;
> u64 dr6;
> + u64 g_pat;
> };
>
> struct vmcb_ctrl_area_cached {
> @@ -208,9 +209,6 @@ struct svm_nested_state {
> */
> struct vmcb_save_area_cached save;
>
> - /* Cached guest PAT from vmcb12.save.g_pat */
> - u64 gpat;
> -
> bool initialized;
>
> /*
> @@ -599,7 +597,7 @@ static inline bool nested_npt_enabled(struct vcpu_svm
> *svm)
>
> static inline void svm_set_gpat(struct vcpu_svm *svm, u64 data)
> {
> - svm->nested.gpat = data;
> + svm->nested.save.g_pat = data;
> svm_set_vmcb_gpat(svm->nested.vmcb02.ptr, data);
> }
>
>
> base-commit: 6461c50e232d6f81d5b9604236f7ee3df870e932
> --