> On 22. Jul 2025, at 13:06, Xiaoyao Li <xiaoyao...@intel.com> wrote:
>
> On 7/22/2025 6:35 PM, Mohamed Mediouni wrote:
>>> On 22. Jul 2025, at 12:27, Xiaoyao Li<xiaoyao...@intel.com> wrote:
>>>
>>> On 7/22/2025 5:21 PM, Mathias Krause wrote:
>>>> On 22.07.25 05:45, Xiaoyao Li wrote:
>>>>> On 6/20/2025 3:42 AM, Mathias Krause wrote:
>>>>>> KVM has a weird behaviour when a guest executes VMCALL on an AMD system
>>>>>> or VMMCALL on an Intel CPU. Both naturally generate an invalid opcode
>>>>>> exception (#UD) as they are just the wrong instruction for the CPU
>>>>>> given. But instead of forwarding the exception to the guest, KVM tries
>>>>>> to patch the guest instruction to match the host's actual hypercall
>>>>>> instruction. That is doomed to fail as read-only code is rather the
>>>>>> standard these days. But, instead of letting go the patching attempt and
>>>>>> falling back to #UD injection, KVM injects the page fault instead.
>>>>>>
>>>>>> That's wrong on multiple levels. Not only isn't that a valid exception
>>>>>> to be generated by these instructions, confusing attempts to handle
>>>>>> them. It also destroys guest state by doing so, namely the value of CR2.
>>>>>>
>>>>>> Sean attempted to fix that in KVM[1] but the patch was never applied.
>>>>>>
>>>>>> Later, Oliver added a quirk bit in [2] so the behaviour can, at least,
>>>>>> conceptually be disabled. Paolo even called out to add this very
>>>>>> functionality to disable the quirk in QEMU[3]. So lets just do it.
>>>>>>
>>>>>> A new property 'hypercall-patching=on|off' is added, for the very
>>>>>> unlikely case that there are setups that really need the patching.
>>>>>> However, these would be vulnerable to memory corruption attacks freely
>>>>>> overwriting code as they please. So, my guess is, there are exactly 0
>>>>>> systems out there requiring this quirk.
>>>>> The default behavior is patching the hypercall for many years.
>>>>>
>>>>> If you desire to change the default behavior, please at least keep it
>>>>> unchanged for old machine version. i.e., introduce compat_property,
>>>>> which sets KVMState->hypercall_patching_enabled to true.
>>>> Well, the thing is, KVM's patching is done with the effective
>>>> permissions of the guest which means, if the code in question isn't
>>>> writable from the guest's point of view, KVM's attempt to modify it will
>>>> fail. This failure isn't transparent for the guest as it sees a #PF
>>>> instead of a #UD, and that's what I'm trying to fix by disabling the quirk.
>>>> The hypercall patching was introduced in Linux commit 7aa81cc04781
>>>> ("KVM: Refactor hypercall infrastructure (v3)") in v2.6.25. Until then
>>>> it was based on a dedicated hypercall page that was handled by KVM to
>>>> use the proper instruction of the KVM module in use (VMX or SVM).
>>>> Patching code was fine back then, but the introduction of DEBUG_RO_DATA
>>>> made the patching attempts fail and, ultimately, lead to Paolo handle
>>>> this with commit c1118b3602c2 ("x86: kvm: use alternatives for VMCALL
>>>> vs. VMMCALL if kernel text is read-only").
>>>> However, his change still doesn't account for the cross-vendor live
>>>> migration case (Intel<->AMD), which will still be broken, causing the
>>>> before mentioned bogus #PF, which will just lead to misleading Oops
>>>> reports, confusing the poor souls, trying to make sense of it.
>>>> IMHO, there is no valid reason for still having the patching in place as
>>>> the .text of non-ancient kernel's will be write-protected, making
>>>> patching attempts fail. And, as they fail with a #PF instead of #UD, the
>>>> guest cannot even handle them appropriately, as there was no memory
>>>> write attempt from its point of view. Therefore the default should be to
>>>> disable it, IMO. This won't prevent guests making use of the wrong
>>>> instruction from trapping, but, at least, now they'll get the correct
>>>> exception vector and can handle it appropriately.
>>> But you don't accout for the case that guest kernel is built without
>>> CONFIG_STRICT_KERNEL_RWX enabled, or without CONFIG_DEBUG_RO_DATA, or for
>>> whatever reason the guest's text is not readonly, and the VM needs to be
>>> migrated among different vendors (Intel <-> AMD).
>>>
>>> Before this patch, the above usecase works well. But with this patch, the
>>> guest will gets #UD after migrated to different vendors.
>>>
>>> I heard from some small CSPs that they do want to the ability to live
>>> migrate VMs among Intel and AMD host.
>>>
>> Is there a particular reason to not handle that #UD as a hypercall to
>> support that scenario?
>
> do you mean KVM handles the first hardware #UD due to the wrong opcode of
> hypercall by directly emulate the hypercall instead of going to emulator to
> patch the guest memory with modifying the guest memory?
>
> If so, I agree with it. I thought the same solution and had no bandwidth and
> motivation to raise the idea to KVM community.
>
Yes, I think that’d be the best way to go in this case…
-Mohamed
>> Especially with some guest OS kernels having kernel patch protection with
>> periodic scrubbing of r/o memory (ie Windows), doesn’t sound great to
>> override anything in a way the guest OS kernel can notice…
>
>