And Anushree, would you also agree that bug LP: 2067383 / Bugzilla: 206641 - https://bugs.launchpad.net/bugs/2067383) can be considered as a duplicate bug of this one (LP: 2076587 / Bugzilla: 208538) ? They seem to be suspiciously similar ...
** Description changed: + SRU Justification: + + [ Impact ] + + * While running a (nested) KVM guest on Power 10 (with PowerVM) + and performing a CPU hotplug, trying to set to 68 vCPUs, + the KVM guest crashes. + + * In the failure case the KVM guest has maxvcpus 128, + and it starts fine with an initial value of 4 vCPUs, + but fails after a larger increase (here to 68 vCPUs). + + * The error reported is: + [ 662.102542] KVM: Create Guest vcpu hcall failed, rc=-44 + error: Unable to read from monitor: Connection reset by peer + + * This especially seems to happen in memory constraint systems. + + * This can be avoided by pre-creating and parking vCPUs on success + or return error otherwise, which then leads to a graceful error + in case of a vCPU hotplug failure, while the guest keeps running. + + [ Fix ] + + * 08c3286822 ("accel/kvm: Extract common KVM vCPU {creation,parking} + code") [pre-req] + + * c6a3d7bc9e ("accel/kvm: Introduce kvm_create_and_park_vcpu() helper") + + * 18530e7c57 ("cpu-common.c: export cpu_get_free_index to be reused + later") + + * cfb52d07f5 ("target/ppc: handle vcpu hotplug failure gracefully") + + [ Test Plan ] + + * Setup an IBM Power10 system (with firmware FW1060 or newer, + that comes with nested KVM support), running Ubuntu Server 24.04. + + * Install and configure KVM on this system with a (higher) + maxvcpus value of 128, but have a (smaller) initial value of 4 vCPUs. + $ virsh define ubu2404.xml + + * Now after successful definition, start the VM: + $ virsh start ubu2404 --console + + * If the VM is up and running increase the vCPUs to a larger value + here 68: + $ virsh setvcpus ubu2404 68 + + * A system with an unpatched qemu will crash, showing: + [ 662.102542] KVM: Create Guest vcpu hcall failed, rc=-44 + error: Unable to read from monitor: Connection reset by peer + + * A patches environment will: + - either just successfully hotplug the new amount (68) of vCPUs + without further messages + - or (in case very memory constraint) print a (graceful) error + message that hotplug couldn't be performed, + but stays up and running: + error: internal error: unable to execute QEMU command 'device_add': \ + kvmppc_cpu_realize: vcpu hotplug failed with -12 + + * Since certain firmware is required, IBM is doing the test and validation + (and already successfully verified based on the PPA test builds). + + [ Where problems could occur ] + + * All modification were done in target/ppc/kvm.c + and are with that limited to the IBM Power platform, + and will not affect other architectures. + + * The implementation of the pre-creation of vCPUs (init cpu_target_realize) + may lead to early failures when a user doesn't expect to have such an + amount of vCPUs yet. + + * And the pre-creation and especially parking (kvm_create_and_park_vcpu) + will probably consume more resources than before. + + * Hence a patched system might run with a reduced max amount of vCPUs, + but instead will not crash hard, but gracefully fail on lack of resources. + + * This case and the patch(es) are also discussed in more detail here: + https://lore.kernel.org/qemu-devel/20240516053211.145504-1-hars...@linux.ibm.com/T/#t + and here: + https://bugzilla.redhat.com/show_bug.cgi?id=2304078 + + [ Other Info ] + + * The code is upstream accepted with qemu v9.1.0(-rc0), + and the upload to oracular was done, + and now only noble is affected. + + * Ubuntu releases older than noble are not affected, + since (nested) KVM virtualization on P10 + was introduced starting with noble. + __________ + == Comment: #0 - SEETEENA THOUFEEK <sthou...@in.ibm.com> - 2024-08-12 03:47:06 == +++ This bug was initially created as a clone of Bug #205620 +++ ---Problem Description--- cpu hotplug crashes the guest!cpu hotplug crashes the guest! - + ---Steps to Reproduce--- - I have been trying for the CPU hotplugging to the guest with maxvcpus as 128 and current value I am giving as 4! but when I try to hotplug 68 vcpus to the guest, it crahses and we get error message as: + I have been trying for the CPU hotplugging to the guest with maxvcpus as 128 and current value I am giving as 4! but when I try to hotplug 68 vcpus to the guest, it crahses and we get error message as: [ 303.808494] KVM: Create Guest vcpu hcall failed, rc=-44 error: Unable to read from monitor: Connection reset by peer - Steps to reproduce: 1) virsh define bug.xml 2) virsh start Fedora39 --console 3) virsh setvcpus Fedora39 68 - Output : + Output : [ 662.102542] KVM: Create Guest vcpu hcall failed, rc=-44 error: Unable to read from monitor: Connection reset by peer - - If resources are less, in my thinking it should fail gracefully! + If resources are less, in my thinking it should fail gracefully! Attaching the XML file that i have used and will post the observations on MDC system there i saw this same failure on higher number. fixed with upstream commit https://github.com/qemu/qemu/commit/cfb52d07f53aa916003d43f69c945c2b42bc6374 - - Machine Type = na - + + Machine Type = na + ---Debugger--- A debugger is not configured - - Contact Information = sthou...@in.ibm.com - + + Contact Information = sthou...@in.ibm.com + ---uname output--- NA ** Changed in: ubuntu-power-systems Status: Triaged => In Progress -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2076587 Title: cpu hotplug crashes the guest! To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/2076587/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs