On 7/2/2025 1:01 PM, Zhao Liu wrote:
Thanks Igor for looking here and thanks Konrad's explanation.
On 7/1/2025 6:26 PM, Zhao Liu wrote:
unless it was explicitly requested by the user.
But this could still break Windows, just like issue #3001, which enables
arch-capabilities for EPYC-Genoa. This fact shows that even explicitly
turning on arch-capabilities in AMD Guest and utilizing KVM's emulated
value would even break something.
So even for named CPUs, arch-capabilities=on doesn't reflect the fact
that it is purely emulated, and is (maybe?) harmful.
It is because Windows adds wrong code. So it breaks itself and it's just the
regression of Windows.
Could you please tell me what the Windows's wrong code is? And what's
wrong when someone is following the hardware spec?
the reason is that it's reserved on AMD hence software shouldn't even try
to use it or make any decisions based on that.
PS:
on contrary, doing such ad-hoc 'cleanups' for the sake of misbehaving
guest would actually complicate QEMU for no big reason.
The guest is not misbehaving. It is following the spec.
(That's my thinking, and please feel free to correct me.)
I think we need firstly aligned on what the behavior of the Windows that
hit "unsupported processor" is.
My understanding is, the Windows is doing something like
if (is_AMD && CPUID(arch_capabilities))
error(unsupported processor)
And I think this behavior is not correct.
However, it seems not the behavior of the Windows from your
understanding. So what's the behavior in you mind?
I had the same thought. Windows guys could also say they didn't access
the reserved MSR unconditionally, and they followed the CPUID feature
bit to access that MSR. When CPUID is set, it indicates that feature is
implemented.
At least I think it makes sense to rely on the CPUID to access the MSR.
Just as an example, it's unlikely that after the software finds a CPUID
of 1, it still need to download the latest spec version to confirm
whether the feature is actually implemented or reserved.
Based on the above point, this CPUID feature bit is set to 1 in KVM and
KVM also adds emulation (as a fix) specifically for this MSR. This means
that Guest is considered to have valid access to this feature MSR,
except that if Guest doesn't get what it wants, then it is reasonable
for Guest to assume that the current (v)CPU lacks hardware support and
mark it as "unsupported processor".
As Konrad's mentioned, there's the previous explanation about why KVM
sets this feature bit (it started with a little accident):
https://lore.kernel.org/kvm/CALMp9eRjDczhSirSismObZnzimxq4m+3s6Ka7OxwPj5Qj6X=b...@mail.gmail.com/#t
So I think the question is where this fix should be applied (KVM or
QEMU) or if it should be applied at all, rather than whether Windows has
the bug.
If we are agreed it's the bug of Windows, then no fix in QEMU/KVM at all.
But I do agree, such "cleanups" would complicate QEMU, as I listed
Eduardo as having done similar workaround six years ago:
https://lore.kernel.org/qemu-devel/20190125220606.4864-1-ehabk...@redhat.com/
Complexity and technical debt is an important consideration, and another
consideration is the impact of this issue. Luckily, newer versions of
Windows are actively compatible with KVM + QEMU:
https://blogs.windows.com/windows-insider/2025/06/23/announcing-windows-11-insider-preview-build-26120-4452-beta-channel/
But it's also hard to say if such a problem will happen again.
Especially if the software works fine on real hardware but fails in
"-host cpu" (which is supposed synchronized with host as much as
possible).
work fine on real hardware but have issue with virtualization doesn't
mean it is the problem of virtualization unless we figure out the
root-cause and prove that software/OS's behavior is correct.
If the problem is due to the wrong behavior of guest OS, then it has
nothing to do QEMU/KVM and QEMU/KVM cannot avoid such problem.