On Fri, Nov 17, 2023 at 10:56:30AM +0100, Jan Beulich wrote:
> On 17.11.2023 10:18, James Dingwall wrote:
> > On Thu, Nov 16, 2023 at 04:32:47PM +0000, Andrew Cooper wrote:
> >> On 16/11/2023 4:15 pm, James Dingwall wrote:
> >>> Hi,
> >>>
> >>> Per the msr_relaxed documentation:
> >>>
> >>> "If using this option is necessary to fix an issue, please report a
> >>> bug."
> >>>
> >>> After recently upgrading an environment from Xen 4.14.5 to Xen 4.15.5 we
> >>> started experiencing a BSOD at boot with one of our Windows guests. We
> >>> found
> >>> that enabling `msr_relaxed = 1` in the guest configuration has resolved
> >>> the
> >>> problem. With a debug build of Xen and `hvm_debug=2048` on the command
> >>> line
> >>> the following messages were caught as the BSOD happened:
> >>>
> >>> (XEN) [HVM:11.0] <vmx_msr_read_intercept> ecx=0x1a2
> >>> (XEN) vmx.c:3298:d11v0 RDMSR 0x000001a2 unimplemented
> >>> (XEN) d11v0 VIRIDIAN CRASH: 1e ffffffffc0000096 fffff80b8de81eb5 0 0
> >>>
> >>> I found that MSR 0x1a2 is MSR_TEMPERATURE_TARGET and from that this patch
> >>> series from last month:
> >>>
> >>> https://patchwork.kernel.org/project/xen-devel/list/?series=796550
> >>>
> >>> Picking out just a small part of that fixes the problem for us. Although
> >>> the
> >>> the patch is against 4.15.5 I think it would be relevant to more recent
> >>> releases too.
> >>
> >> Which version of Windows, and what hardware?
> >>
> >> The Viridian Crash isn't about the RDMSR itself - it's presumably
> >> collateral damage shortly thereafter.
> >>
> >> Does filling in 0 for that MSR also resolve the issue? It's model
> >> specific and we absolutely cannot pass it through from real hardware
> >> like that.
> >>
> >
> > Hi Andrew,
> >
> > Thanks for your response. The guest is running Windows 10 and the crash
> > happens in a proprietary hardware driver. A little bit of knowledge as
> > they say was enough to stop the crash but I don't understand the impact
> > of what I've actually done...
> >
> > To rework the patch I'd need a bit of guidance, if I understand your
> > suggestion I set the MSR to 0 with this change in emul-priv-op.c:
>
> For the purpose of the experiment suggested by Andrew ...
>
> > diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c
> > index ed97b1d6fcc..66f5e417df6 100644
> > --- a/xen/arch/x86/pv/emul-priv-op.c
> > +++ b/xen/arch/x86/pv/emul-priv-op.c
> > @@ -976,6 +976,10 @@ static int read_msr(unsigned int reg, uint64_t *val,
> > *val = 0;
> > return X86EMUL_OKAY;
> >
> > + case MSR_TEMPERATURE_TARGET:
> > + *val = 0;
> > + return X86EMUL_OKAY;
> > +
> > case MSR_P6_PERFCTR(0) ... MSR_P6_PERFCTR(7):
> > case MSR_P6_EVNTSEL(0) ... MSR_P6_EVNTSEL(3):
> > case MSR_CORE_PERF_FIXED_CTR0 ... MSR_CORE_PERF_FIXED_CTR2:
>
> ... you wouldn't need this (affects PV domains only), and ...
>
> > and this in vmx.c:
> >
> > diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> > index 54023a92587..bbf37b7f272 100644
> > --- a/xen/arch/x86/hvm/vmx/vmx.c
> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> > @@ -3259,6 +3259,11 @@ static int vmx_msr_read_intercept(unsigned int msr,
> > uint64_t *msr_content)
> > if ( !nvmx_msr_read_intercept(msr, msr_content) )
> > goto gp_fault;
> > break;
> > +
> > + case MSR_TEMPERATURE_TARGET:
> > + *msr_content = 0;
> > + break;
> > +
> > case MSR_IA32_MISC_ENABLE:
> > rdmsrl(MSR_IA32_MISC_ENABLE, *msr_content);
> > /* Debug Trace Store is not supported. */
>
> ... indeed this ought to do. An eventual real patch may want to look
> different, though.
>
Thanks Jan, based on the information I've reduced the patch to what seems the
minimal necessary to workaround the BSOD. I assume simply not ending up at
X86EMUL_EXCEPTION is the resolution regardless of what value is set.
Regards,
James
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 54023a92587..bbf37b7f272 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -3259,6 +3259,11 @@ static int vmx_msr_read_intercept(unsigned int msr, uint64_t *msr_content)
if ( !nvmx_msr_read_intercept(msr, msr_content) )
goto gp_fault;
break;
+
+ case MSR_TEMPERATURE_TARGET:
+ *msr_content = 0;
+ break;
+
case MSR_IA32_MISC_ENABLE:
rdmsrl(MSR_IA32_MISC_ENABLE, *msr_content);
/* Debug Trace Store is not supported. */
diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h
index 8b3ad575dbc..34e800fdc01 100644
--- a/xen/include/asm-x86/msr-index.h
+++ b/xen/include/asm-x86/msr-index.h
@@ -498,6 +498,9 @@
#define MSR_IA32_MISC_ENABLE_XD_DISABLE (1ULL << 34)
#define MSR_IA32_TSC_DEADLINE 0x000006E0
+
+#define MSR_TEMPERATURE_TARGET 0x000001a2
+
#define MSR_IA32_ENERGY_PERF_BIAS 0x000001b0
/* Platform Shared Resource MSRs */