On Fri, Oct 10, 2025 at 03:54:42AM +0000, Duan, Zhenzhong wrote: > > > >-----Original Message----- > >From: Daniel P. Berrangé <[email protected]> > >Subject: Re: [PATCH v2 6/6] accel/kvm: Fix SIGSEGV when execute > >"query-balloon" after CPR transfer > > > >On Thu, Oct 09, 2025 at 10:32:34AM -0400, Steven Sistare wrote: > >> On 10/9/2025 10:19 AM, Daniel P. Berrangé wrote: > >> > On Thu, Oct 09, 2025 at 10:11:17AM -0400, Steven Sistare wrote: > >> > > On 10/8/2025 6:22 AM, Duan, Zhenzhong wrote: > >> > > > > >> > > > > >> > > > > -----Original Message----- > >> > > > > From: Steven Sistare <[email protected]> > >> > > > > Subject: Re: [PATCH v2 6/6] accel/kvm: Fix SIGSEGV when execute > >> > > > > "query-balloon" after CPR transfer > >> > > > > > >> > > > > On 9/30/2025 2:00 AM, Duan, Zhenzhong wrote: > >> > > > > > > -----Original Message----- > >> > > > > > > From: Steven Sistare <[email protected]> > >> > > > > > > Subject: Re: [PATCH v2 6/6] accel/kvm: Fix SIGSEGV when > >execute > >> > > > > > > "query-balloon" after CPR transfer > >> > > > > > > > >> > > > > > > On 9/28/2025 4:54 AM, Zhenzhong Duan wrote: > >> > > > > > > > After CPR transfer, source QEMU closes kvm fd and sets > >kvm_state to > >> > > > > > > NULL, > >> > > > > > > > "query-balloon" will check kvm_state->sync_mmu and trigger > >NULL > >> > > > > pointer > >> > > > > > > > reference. > >> > > > > > > > > >> > > > > > > > We don't need to NULL kvm_state as all states in kvm_state > >aren't > >> > > > > released > >> > > > > > > > actually. Just closing kvm fd is enough so we could still > >> > > > > > > > query > >states > >> > > > > > > > through "query_*" qmp command. > >> > > > > > > > >> > > > > > > IMO this does not make sense. Much of the state in kvm_state > >was > >> > > > > derived > >> > > > > > >from ioctl's on the descriptors, and closing them invalidates > >> > > > > > >it. > >Asking > >> > > > > > > historical questions about what used to be makes no sense. > >> > > > > > > >> > > > > > You also have your valid point. > >> > > > > > > >> > > > > > > > >> > > > > > > Clearing kvm_state and setting kvm_allowed=false would be a > >safer fix. > >> > > > > > >> > > > > Setting kvm_allowed=false causes kvm_enabled() to return false > >which should > >> > > > > prevent kvm_state from being dereferenced anywhere: > >> > > > > > >> > > > > #define kvm_enabled() (kvm_allowed) > >> > > > > > >> > > > > Eg for the balloon: > >> > > > > > >> > > > > static bool have_balloon(Error **errp) > >> > > > > { > >> > > > > if (kvm_enabled() && !kvm_has_sync_mmu()) { > >> > > > > >> > > > OK, will do, clearing kvm_allowed is a good idea. > >> > > > > >> > > > Currently there are two qmp commands "query-balloon" and > >"query-cpu-definitions" > >> > > > causing SIGSEGV after CPR-transfer, clearing kvm_allowed helps for > >both. > >> > > > But clearing both kvm_allowed and kvm_state cause SIGSEGV on > >"query-cpu-definitions". > >> > > > > >> > > > I'll send a patch to clearing only kvm_allowed if you are fine with > >> > > > it. > >> > > > >> > > I still don't love the idea. kvm_state is no longer valid. > >> > > > >> > > It sounds like "query-cpu-definitions" is missing a check for > >kvm_enabled(). > >> > > >> > This patch / bug feels like it is side-stepping a general conceptual > >> > question. After cpr-transfer is complete what QMP commands are still > >> > valid to call in general ? This thread mentions two which have caused > >> > bugs, but beyond that it feels like a large subset of QMP comamnds > >> > are conceptually invalid to use. > >> > >> Agreed. I don't see why these commands should be issued to the dead > >instance. > >> How about migration status, quit, and nothing else? > >> Half serious. > > > >I was hoping you'd suggest such a short list :-) > > > >Essentially a case of calling qmp_for_each_command(), and in the callback > >run qmp_disable_command() for everything not in the allow-list. > > Thanks for your suggestion, I agree this is a perfect scheme, but is also > heavy on this corner case😊 > Just curious if we need to do same for live migration. E.g., send qmp command > on source qemu after live migration.
At the end of a normal migration, there is no functional reason to prevent use of any QMP command. Indeed we need to have the ability to carry on using the orignal source QEMU, in case the final steps on migration on the target fail and we need to restart the source. The cpr-transfer is unusual in the restrictions it has after completion. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
