>-----Original Message-----
>From: Steven Sistare <[email protected]>
>Subject: Re: [PATCH v2 6/6] accel/kvm: Fix SIGSEGV when execute
>"query-balloon" after CPR transfer
>
>On 10/8/2025 6:22 AM, Duan, Zhenzhong wrote:
>>
>>
>>> -----Original Message-----
>>> From: Steven Sistare <[email protected]>
>>> Subject: Re: [PATCH v2 6/6] accel/kvm: Fix SIGSEGV when execute
>>> "query-balloon" after CPR transfer
>>>
>>> On 9/30/2025 2:00 AM, Duan, Zhenzhong wrote:
>>>>> -----Original Message-----
>>>>> From: Steven Sistare <[email protected]>
>>>>> Subject: Re: [PATCH v2 6/6] accel/kvm: Fix SIGSEGV when execute
>>>>> "query-balloon" after CPR transfer
>>>>>
>>>>> On 9/28/2025 4:54 AM, Zhenzhong Duan wrote:
>>>>>> After CPR transfer, source QEMU closes kvm fd and sets kvm_state to
>>>>> NULL,
>>>>>> "query-balloon" will check kvm_state->sync_mmu and trigger NULL
>>> pointer
>>>>>> reference.
>>>>>>
>>>>>> We don't need to NULL kvm_state as all states in kvm_state aren't
>>> released
>>>>>> actually. Just closing kvm fd is enough so we could still query states
>>>>>> through "query_*" qmp command.
>>>>>
>>>>> IMO this does not make sense. Much of the state in kvm_state was
>>> derived
>>>> >from ioctl's on the descriptors, and closing them invalidates it. Asking
>>>>> historical questions about what used to be makes no sense.
>>>>
>>>> You also have your valid point.
>>>>
>>>>>
>>>>> Clearing kvm_state and setting kvm_allowed=false would be a safer fix.
>>>
>>> Setting kvm_allowed=false causes kvm_enabled() to return false which
>should
>>> prevent kvm_state from being dereferenced anywhere:
>>>
>>> #define kvm_enabled() (kvm_allowed)
>>>
>>> Eg for the balloon:
>>>
>>> static bool have_balloon(Error **errp)
>>> {
>>> if (kvm_enabled() && !kvm_has_sync_mmu()) {
>>
>> OK, will do, clearing kvm_allowed is a good idea.
>>
>> Currently there are two qmp commands "query-balloon" and
>"query-cpu-definitions"
>> causing SIGSEGV after CPR-transfer, clearing kvm_allowed helps for both.
>> But clearing both kvm_allowed and kvm_state cause SIGSEGV on
>"query-cpu-definitions".
>>
>> I'll send a patch to clearing only kvm_allowed if you are fine with it.
>
>I still don't love the idea. kvm_state is no longer valid.
OK, what about another idea to not call vfio_cpr_kvm_close_notifier() for
cpr-transfer, like below:
--- a/hw/vfio/cpr.c
+++ b/hw/vfio/cpr.c
@@ -197,7 +197,7 @@ void vfio_cpr_add_kvm_notifier(void)
if (!kvm_close_notifier.notify) {
migration_add_notifier_modes(&kvm_close_notifier,
vfio_cpr_kvm_close_notifier,
- MIG_MODE_CPR_TRANSFER, MIG_MODE_CPR_EXEC,
+ MIG_MODE_CPR_EXEC,
-1);
}
}
The close_kvm_after_cpr is only for cpr-exec, with this change, we are in same
situation as live migration and can run all qmp commands same as after live
migration.
>
>It sounds like "query-cpu-definitions" is missing a check for kvm_enabled().
Yes.
Thanks
Zhenzhong