spec-ctrl: Fix NMI race condition with VT-x MSR_SPEC_CTRL handling

Roger Pau Monné Fri, 14 Jan 2022 04:51:17 -0800

On Thu, Jan 13, 2022 at 04:38:33PM +0000, Andrew Cooper wrote:
> The logic was based on a mistaken understanding of how NMI blocking on vmexit
> works.  NMIs are only blocked for EXIT_REASON_NMI, and not for general exits.
> Therefore, an NMI can in general hit early in the vmx_asm_vmexit_handler path,
> and the guest's value will be clobbered before it is saved.
> 
> Switch to using MSR load/save lists.  This causes the guest value to be saved
> atomically with respect to NMIs/MCEs/etc.
> 
> First, update vmx_cpuid_policy_changed() to configure the load/save lists at
> the same time as configuring the intercepts.  This function is always used in
> remote context, so extend the vmx_vmcs_{enter,exit}() block to cover the whole
> function, rather than having multiple remote acquisitions of the same VMCS.
> 
> vmx_add_guest_msr() can fail, but only in ways which are entirely fatal to the
> guest, so handle failures using domain_crash().  vmx_del_msr() can fail with
> -ESRCH but we don't matter, and this path will be taken during domain create
> on a system lacking IBRSB.
> 
> Second, update vmx_msr_{read,write}_intercept() to use the load/save lists
> rather than vcpu_msrs, and update the comment to describe the new state
> location.
> 
> Finally, adjust the entry/exit asm.  Drop DO_SPEC_CTRL_ENTRY_FROM_HVM
> entirely, and use a shorter code sequence to simply reload Xen's setting from
> the top-of-stack block.
> 
> Because the guest values are loaded/saved atomically, we do not need to use
> the shadowing logic to cope with late NMIs/etc, so we can omit
> DO_SPEC_CTRL_EXIT_TO_GUEST entirely and VMRESUME/VMLAUNCH with Xen's value in
> context.  Furthermore, we can drop the SPEC_CTRL_ENTRY_FROM_PV too, as there's
> no need to switch back to Xen's context on an early failure.
> 
> Signed-off-by: Andrew Cooper <[email protected]>
> ---
> CC: Jan Beulich <[email protected]>
> CC: Roger Pau Monné <[email protected]>
> CC: Wei Liu <[email protected]>
> CC: Jun Nakajima <[email protected]>
> CC: Kevin Tian <[email protected]>
> 
> Needs backporting as far as people can tolerate.
> 
> If the entry/exit logic were in C, I'd ASSERT() that shadow tracking is off,
> but this is awkard to arrange in asm.
> ---
>  xen/arch/x86/hvm/vmx/entry.S             | 19 ++++++++++-------
>  xen/arch/x86/hvm/vmx/vmx.c               | 36 
> +++++++++++++++++++++++++++-----
>  xen/arch/x86/include/asm/msr.h           | 10 ++++++++-
>  xen/arch/x86/include/asm/spec_ctrl_asm.h | 32 ++++------------------------
>  4 files changed, 56 insertions(+), 41 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/vmx/entry.S b/xen/arch/x86/hvm/vmx/entry.S
> index 30139ae58e9d..297ed8685126 100644
> --- a/xen/arch/x86/hvm/vmx/entry.S
> +++ b/xen/arch/x86/hvm/vmx/entry.S
> @@ -35,7 +35,14 @@ ENTRY(vmx_asm_vmexit_handler)
>  
>          /* SPEC_CTRL_ENTRY_FROM_VMX    Req: b=curr %rsp=regs/cpuinfo, Clob: 
> acd */
>          ALTERNATIVE "", DO_OVERWRITE_RSB, X86_FEATURE_SC_RSB_HVM
> -        ALTERNATIVE "", DO_SPEC_CTRL_ENTRY_FROM_HVM, X86_FEATURE_SC_MSR_HVM
> +
> +        .macro restore_spec_ctrl
> +            mov    $MSR_SPEC_CTRL, %ecx
> +            movzbl CPUINFO_xen_spec_ctrl(%rsp), %eax
> +            xor    %edx, %edx
> +            wrmsr
> +        .endm
> +        ALTERNATIVE "", restore_spec_ctrl, X86_FEATURE_SC_MSR_HVM


I assume loading the host value into SPEC_CTRL strictly needs to
happen after the RSB overwrite, or else you could use a VM exit host
load MSR entry to handle MSR_SPEC_CTRL.

Also I don't understand why SPEC_CTRL shadowing is not cleared before
loading Xen's value now.

>          /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
>  
>          /* Hardware clears MSR_DEBUGCTL on VMExit.  Reinstate it if 
> debugging Xen. */
> @@ -83,7 +90,6 @@ UNLIKELY_END(realmode)
>  
>          /* WARNING! `ret`, `call *`, `jmp *` not safe beyond this point. */
>          /* SPEC_CTRL_EXIT_TO_VMX   Req: a=spec_ctrl %rsp=regs/cpuinfo, Clob: 
> cd */
> -        ALTERNATIVE "", DO_SPEC_CTRL_EXIT_TO_GUEST, X86_FEATURE_SC_MSR_HVM
>          ALTERNATIVE "", __stringify(verw CPUINFO_verw_sel(%rsp)), 
> X86_FEATURE_SC_VERW_HVM
>  
>          mov  VCPU_hvm_guest_cr2(%rbx),%rax
> @@ -119,12 +125,11 @@ UNLIKELY_END(realmode)
>          SAVE_ALL
>  
>          /*
> -         * PV variant needed here as no guest code has executed (so
> -         * MSR_SPEC_CTRL can't have changed value), and NMIs/MCEs are liable
> -         * to hit (in which case the HVM variant might corrupt things).
> +         * SPEC_CTRL_ENTRY notes
> +         *
> +         * If we end up here, no guest code has executed.  We still have 
> Xen's
> +         * choice of MSR_SPEC_CTRL in context, and the RSB is safe.
>           */
> -        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo Clob: acd */
> -        /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
>  
>          call vmx_vmentry_failure
>          jmp  .Lvmx_process_softirqs
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index 28ee6393f11e..d7feb5f9c455 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -592,6 +592,7 @@ void vmx_update_exception_bitmap(struct vcpu *v)
>  static void vmx_cpuid_policy_changed(struct vcpu *v)
>  {
>      const struct cpuid_policy *cp = v->domain->arch.cpuid;
> +    int rc = 0;
>  
>      if ( opt_hvm_fep ||
>           (v->domain->arch.cpuid->x86_vendor != boot_cpu_data.x86_vendor) )
> @@ -601,17 +602,27 @@ static void vmx_cpuid_policy_changed(struct vcpu *v)
>  
>      vmx_vmcs_enter(v);
>      vmx_update_exception_bitmap(v);
> -    vmx_vmcs_exit(v);
>  
>      /*
>       * We can safely pass MSR_SPEC_CTRL through to the guest, even if STIBP
>       * isn't enumerated in hardware, as SPEC_CTRL_STIBP is ignored.
>       */
>      if ( cp->feat.ibrsb )
> +    {
>          vmx_clear_msr_intercept(v, MSR_SPEC_CTRL, VMX_MSR_RW);
> +
> +        rc = vmx_add_guest_msr(v, MSR_SPEC_CTRL, 0);
> +        if ( rc )
> +            goto err;
> +    }
>      else
> +    {
>          vmx_set_msr_intercept(v, MSR_SPEC_CTRL, VMX_MSR_RW);
>  
> +        /* Can only fail with -ESRCH, and we don't care. */
> +        vmx_del_msr(v, MSR_SPEC_CTRL, VMX_MSR_GUEST);
> +    }
> +
>      /* MSR_PRED_CMD is safe to pass through if the guest knows about it. */
>      if ( cp->feat.ibrsb || cp->extd.ibpb )
>          vmx_clear_msr_intercept(v, MSR_PRED_CMD,  VMX_MSR_RW);
> @@ -623,6 +634,15 @@ static void vmx_cpuid_policy_changed(struct vcpu *v)
>          vmx_clear_msr_intercept(v, MSR_FLUSH_CMD, VMX_MSR_RW);
>      else
>          vmx_set_msr_intercept(v, MSR_FLUSH_CMD, VMX_MSR_RW);
> +
> + err:

Nit: I would name this out, since it's used by both the error and the
normal exit paths, but that's just my taste.

> +    vmx_vmcs_exit(v);
> +
> +    if ( rc )
> +    {
> +        printk(XENLOG_G_ERR "MSR load/save list error: %d", rc);
> +        domain_crash(v->domain);
> +    }
>  }
>  
>  int vmx_guest_x86_mode(struct vcpu *v)
> @@ -3128,7 +3148,6 @@ static int is_last_branch_msr(u32 ecx)
>  static int vmx_msr_read_intercept(unsigned int msr, uint64_t *msr_content)
>  {
>      struct vcpu *curr = current;
> -    const struct vcpu_msrs *msrs = curr->arch.msrs;
>      uint64_t tmp;
>  
>      HVM_DBG_LOG(DBG_LEVEL_MSR, "ecx=%#x", msr);
> @@ -3136,7 +3155,11 @@ static int vmx_msr_read_intercept(unsigned int msr, 
> uint64_t *msr_content)
>      switch ( msr )
>      {
>      case MSR_SPEC_CTRL: /* guest_rdmsr() has already performed #GP checks. */
> -        *msr_content = msrs->spec_ctrl.raw;
> +        if ( vmx_read_guest_msr(curr, msr, msr_content) )
> +        {
> +            ASSERT_UNREACHABLE();
> +            goto gp_fault;
> +        }

AFAICT this is required just for Xen internal callers, as a guest
attempt to access MSR_SPEC_CTRL will never be trapped, or if trapped it
would be because !cp->feat.ibrsb and thus won't get here anyway.

Thanks, Roger.

Re: [PATCH 3/3] x86/spec-ctrl: Fix NMI race condition with VT-x MSR_SPEC_CTRL handling

Reply via email to