On 7/2/2025 3:47 AM, Konrad Rzeszutek Wilk wrote:
On Tue, Jul 01, 2025 at 05:47:06PM +0800, Xiaoyao Li wrote:
On 7/1/2025 5:22 PM, Alexandre Chartre wrote:

On 7/1/25 10:23, Xiaoyao Li wrote:
On 6/30/2025 9:30 PM, Alexandre Chartre wrote:
KVM emulates the ARCH_CAPABILITIES on x86 for both Intel and AMD
cpus, although the IA32_ARCH_CAPABILITIES MSR is an Intel-specific
MSR and it makes no sense to emulate it on AMD.

As a consequence, VMs created on AMD with qemu -cpu host and using
KVM will advertise the ARCH_CAPABILITIES feature and provide the
IA32_ARCH_CAPABILITIES MSR. This can cause issues (like Windows BSOD)
as the guest OS might not expect this MSR to exist on such cpus (the
AMD documentation specifies that ARCH_CAPABILITIES feature and MSR
are not defined on the AMD architecture).

A fix was proposed in KVM code, however KVM maintainers don't want to
change this behavior that exists for 6+ years and suggest changes to be
done in qemu instead.

So this commit changes the behavior in qemu so that ARCH_CAPABILITIES
is not provided by default on AMD cpus when the hypervisor emulates it,
but it can still be provided by explicitly setting arch-capabilities=on.

Signed-off-by: Alexandre Chartre <alexandre.char...@oracle.com>
---
   target/i386/cpu.c | 14 ++++++++++++++
   1 file changed, 14 insertions(+)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 0d35e95430..7e136c48df 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -8324,6 +8324,20 @@ void x86_cpu_expand_features(X86CPU *cpu,
Error **errp)
           }
       }
+    /*
+     * For years, KVM has inadvertently emulated the ARCH_CAPABILITIES
+     * MSR on AMD although this is an Intel-specific MSR; and KVM will
+     * continue doing so to not change its ABI for existing setups.
+     *
+     * So ensure that the ARCH_CAPABILITIES MSR is disabled on AMD cpus
+     * to prevent providing a cpu with an MSR which is not supposed to
+     * be there, unless it was explicitly requested by the user.
+     */
+    if (IS_AMD_CPU(env) &&
+        !(env->user_features[FEAT_7_0_EDX] &
CPUID_7_0_EDX_ARCH_CAPABILITIES)) {
+        env->features[FEAT_7_0_EDX] &=
~CPUID_7_0_EDX_ARCH_CAPABILITIES;
+    }

This changes the result for the existing usage of "-cpu host" on
AMD. So it will need a compat_prop to keep the old behavior for old
machine.

Right, I will look at that.


But I would like discuss if we really want to do it in QEMU.
ARCH_CAPABILITIES is not the only one KVM emulates unconditionally.
We have TSC_DEADLINE_TIMER as well. So why to treat them
differently? just because some Windows cannot boot? To me, it looks
just the bug of Windows. So please fix Windows. And to run with the
buggy Windows, we have the workaround: "-cpu host,-arch-capabilities"

Well, the Windows behavior is not that wrong as it conforms to the AMD
Manual
which specifies that ARCH_CAPABILITIES feature and MSR are not defined
on AMD
cpus; while QEMU/KVM are providing an hybrid kind of AMD cpu with Intel
feature/MSR.

It is currently reserved bit in AMD's manual. But it doesn't mean it will be
reserved forever. Nothing prevents AMD to implement it in the future.

And if it is implemented in the future (say in 100 years), then we
would expose it then by the virtue of -cpu host picking it up
automatically.

I wanted to talk about the impact on Windows implementation.

What if AMD implements 1 year later? Then at that time, the Windows will even fail booting on real AMD. Do you think is the correct implementation of Windows?


Software shouldn't set any expectation on the reserved bit.

Exactly. Which is why there is this fix which does not set those bits.
It should be done in KVM, but as you saw Sean agreed this is a bug, but
he did not want it in the kernel.

What about the TSC deadline MSR? That should not be exposed either as it is
not implemented on AMD.

Oh, no. It's not the rule of virtualization.

With virtualization, we don't need to present the vcpu 100% the same with real silicon. We can expose more (useful) features to vcpu as long as it's architecturally correct.

And with virtualization, people can tailor their own vcpu with different features/vendors/FMS as long as the configuration is architecturally correct.


Microsoft is fixing that behavior anyway and has provided a preview fix
(OS Build
26100.4484), so that's good news. But the goal here is also to prevent
such future
misbehavior. So if other features (like TSC_DEADLINE_TIMER) are exposed
while they
shouldn't then they should probably be fixed as well.
"-cpu host,-arch-capabilities" is indeed a workaround, but it defeats
the purpose
of the "-cpu host" option which is to provide a guest with the same
features as the
host. And this workaround basically says: "provide a guest with the same
cpu as
the host but disable this feature that the host doesn't provide"; this
doesn't make
sense. Also this workaround doesn't integrate well in heterogeneous
environments
(with Intel,  AMD, ARM or other cpus) where you just want to use "-cpu
host" whatever
the platform is, and not have a special case for AMD cpus.

As I said, it's just the workaround for users who want to run a specific
version of Windows with "-cpu host" on AMD. That's why it's called
workaround.

No? It is making the -cpu host expose the real bits.

Not add extra ones.

The root-cause is the wrong behavior of the specific version of Windows. If
you don't use the buggy Windows, you don't need the workaround.

Windows probably does this.

if (cpuid(arch_capabilities)
        // do something sensible.

From the description, doesn't Windows do something like

  if (IS_AMD && CPUID(arch_capabilities))
        ERROR(UNSUPPORTED PROCESSOR)

The problem is software cannot assume CPUID(arch_capabilities) is 0 on AMD.

That is a correct behavior based on reading the Intel SDM.

The AMD SDM says that if you don't detect a CPUID being set, then don't mess
with that MSR that is associated with that - otherwise you will get undefined
behaviors.


I am really missing what your agument here is? Is it that guest ABI got
screwed up 7 years ago (and the author of the patch agreed it was a
bug and so did the KVM maintainer) and we should just continue having this
bug because ... what?

No, it's not a bug of KVM, nor QEMU.

Reply via email to