Verification done with focal-proposed (now -updates). As discussed with Andreas last Thursday, we had positive, general feedback from Managed Solutions (formerly Bootstack) with the package still in focal-proposed, but wanted to do a more detailed testing.
Here is a more detailed verification of the test plan, performed in Oracle Cloud with 2 bare-metal instances (type/shape: BM.Standard.E3, AMD EPYC 7742/"EPYC-Rome"). [1] https://docs.oracle.com/en- us/iaas/Content/Compute/References/computeshapes.htm I performed an Openstack Ussuri (Ubuntu Focal) deployment with Juju on virtual machines for all other components; only the Nova Compute components run on such bare-metal. (This has many steps and details, not covered in here.) Summary: ------- This test covered the test plan: * Start a VM before/after the package upgrade (focal-proposed), checking the VM XML for that flag (e.g., policy change from require to disable) * Ensure that nova is able to start *with* and *without* enable/disable cpu flag changes. * Ensure live migration works on both ways across the 2 hypervisors *with* and *without* enable/disable cpu flag changes. Details: ------- We'll be using hypervisors mfo-bm4 and mfo-bm5: $ openstack hypervisor list +----+---------------------+-----------------+-----------------+-------+ | ID | Hypervisor Hostname | Hypervisor Type | Host IP | State | +----+---------------------+-----------------+-----------------+-------+ | 1 | mfo-bm4 | QEMU | 129.146.175.125 | up | | 2 | mfo-bm5 | QEMU | 129.146.120.140 | up | +----+---------------------+-----------------+-----------------+-------+ nova-compute packages: $ for HOST in mfo-bm{4,5}; do ssh $HOST 'hostname; dpkg -s nova-compute | grep -e Package -e Version; echo'; done mfo-bm4 Package: nova-compute Version: 2:21.2.4-0ubuntu2.13 mfo-bm5 Package: nova-compute Version: 2:21.2.4-0ubuntu2.13 The processor has `xsaves` enabled until kernel version 5.15.0-1039-oracle (later kernel versions ship the patch to remove `xsaves` in some EPYC CPUs). Note that `xsaves` is present in /proc/cpuinfo: $ for HOST in mfo-bm{4,5}; do ssh $HOST 'hostname; uname -rv; echo; head -n5 /proc/cpuinfo; echo; grep -m1 -o "xsave\w\+" /proc/cpuinfo; echo'; done mfo-bm4 5.15.0-1039-oracle #45~20.04.1-Ubuntu SMP Fri Jul 14 16:50:19 UTC 2023 processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 49 model name : AMD EPYC 7742 64-Core Processor xsaveopt xsavec xsaves xsaveerptr mfo-bm5 5.15.0-1039-oracle #45~20.04.1-Ubuntu SMP Fri Jul 14 16:50:19 UTC 2023 processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 49 model name : AMD EPYC 7742 64-Core Processor xsaveopt xsavec xsaves xsaveerptr Configure nova-compute for the EPYC-Rome cpu model, and restart nova- compute.service: $ for HOST in mfo-bm{4,5}; do ssh $HOST 'hostname; sudo grep -w -e cpu_mode -e cpu_model -e cpu_model_extra_flags /etc/nova/nova.conf; echo'; done mfo-bm4 cpu_mode = custom cpu_model = EPYC-Rome mfo-bm5 cpu_mode = custom cpu_model = EPYC-Rome For starters, let's create a VM and later live migrate it across the 2 hypervisors, back and forth. $ openstack server create \ --image cirros \ --boot-from-volume 8 \ --flavor flavor-all-one \ --network test-network \ --os-compute-api-version 2.74 \ --host mfo-bm4 \ vm1 $ openstack server show vm1 | grep -e :hypervisor_hostname -e :instance_name -e :vm_state | OS-EXT-SRV-ATTR:hypervisor_hostname | mfo-bm4 | | OS-EXT-SRV-ATTR:instance_name | instance-0000000f | | OS-EXT-STS:vm_state | active | $ for HOST in mfo-bm{4,5}; do ssh $HOST 'hostname; sudo virsh list --name'; done mfo-bm4 instance-0000000f mfo-bm5 Note its libvirt XML has the xsaves cpu flag as 'require': $ ssh mfo-bm4 "sudo virsh dumpxml instance-0000000f | grep -e '<cpu mode=' -e '<model .*EPYC-Rome' -e '<feature .*xsaves'" <cpu mode='custom' match='exact' check='full'> <model fallback='forbid'>EPYC-Rome</model> <feature policy='require' name='xsaves'/> Now live migrate to the other hypervisor: $ openstack server migrate --wait --live-migration --host mfo-bm5 --os-compute-api-version 2.30 vm1 Complete $ openstack server show vm1 | grep -e :hypervisor_hostname -e :instance_name -e :vm_state | OS-EXT-SRV-ATTR:hypervisor_hostname | mfo-bm5 | | OS-EXT-SRV-ATTR:instance_name | instance-0000000f | | OS-EXT-STS:vm_state | active | $ for HOST in mfo-bm{4,5}; do ssh $HOST 'hostname; sudo virsh list --name'; done mfo-bm4 mfo-bm5 instance-0000000f ssh mfo-bm5 "sudo virsh dumpxml instance-0000000f | grep -e '<cpu mode=' -e '<model .*EPYC-Rome' -e '<feature .*xsaves'" <cpu mode='custom' match='exact' check='full'> <model fallback='forbid'>EPYC-Rome</model> <feature policy='require' name='xsaves'/> And back to the first hypervisor: $ openstack server migrate --wait --live-migration --host mfo-bm4 --os-compute-api-version 2.30 vm1 Complete $ openstack server show vm1 | grep -e :hypervisor_hostname -e :instance_name -e :vm_state | OS-EXT-SRV-ATTR:hypervisor_hostname | mfo-bm4 | | OS-EXT-SRV-ATTR:instance_name | instance-0000000f | | OS-EXT-STS:vm_state | active | $ for HOST in mfo-bm{4,5}; do ssh $HOST 'hostname; sudo virsh list --name'; done mfo-bm4 instance-0000000f mfo-bm5 $ ssh mfo-bm4 "sudo virsh dumpxml instance-0000000f | grep -e '<cpu mode=' -e '<model .*EPYC-Rome' -e '<feature .*xsaves'" <cpu mode='custom' match='exact' check='full'> <model fallback='forbid'>EPYC-Rome</model> <feature policy='require' name='xsaves'/> All good, live migration is validated on both hypervisors. Now.. Upgrade the kernel *only* in the *second* hypervisor (mfo-bm5). Note `xsaves` is no longer listed its /proc/cpuinfo. $ for HOST in mfo-bm{4,5}; do ssh $HOST 'hostname; uname -rv; echo; head -n5 /proc/cpuinfo; echo; grep -m1 -o "xsave\w\+" /proc/cpuinfo; echo'; done mfo-bm4 5.15.0-1039-oracle #45~20.04.1-Ubuntu SMP Fri Jul 14 16:50:19 UTC 2023 processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 49 model name : AMD EPYC 7742 64-Core Processor xsaveopt xsavec xsaves xsaveerptr mfo-bm5 5.15.0-1059-oracle #65~20.04.1-Ubuntu SMP Fri Apr 19 14:17:36 UTC 2024 processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 49 model name : AMD EPYC 7742 64-Core Processor xsaveopt xsavec xsaveerptr Ok, now try to migrate vm1 to mfo-bm5. It does NOT migrate. $ openstack server migrate --wait --live-migration --host mfo-bm5 --os-compute-api-version 2.30 vm1; date Complete Sat Aug 24 20:46:29 UTC 2024 $ openstack server show vm1 | grep -e :hypervisor_hostname -e :instance_name -e :vm_state | OS-EXT-SRV-ATTR:hypervisor_hostname | mfo-bm4 | | OS-EXT-SRV-ATTR:instance_name | instance-0000000f | | OS-EXT-STS:vm_state | active | Checking the logs, nova complains about `xsaves`. This happens because the vm1 libvirt XML has `xsaves` as `required`, but the other hypervisor is no longer able to provide that flag. mfo-bm4:/var/log/nova/nova-compute.log 2024-08-24 20:46:28.047 110400 ERROR nova.virt.libvirt.driver [-] [instance: 10ca54d7-1b51-47bc-9ace-13883b295d55] Live Migration failure: operation failed: guest CPU doesn't match specification: missing features: xsaves: libvirt.libvirtError: operation failed: guest CPU doesn't match specification: missing features: xsaves Note that the other way around works: (i.e., vm2 starts in mfo-bm5 without xsaves, and thus can migrate to mfo-bm4 with xsaves, and back, as it started without xsaves; see vm2's libvirt XML has `xsaves` as `disable`) $ openstack server create \ --image cirros \ --boot-from-volume 8 \ --flavor flavor-all-one \ --network test-network \ --os-compute-api-version 2.74 \ --host mfo-bm5 \ vm2 $ openstack server show vm2 | grep -e :hypervisor_hostname -e :instance_name -e :vm_state | OS-EXT-SRV-ATTR:hypervisor_hostname | mfo-bm5 | | OS-EXT-SRV-ATTR:instance_name | instance-00000010 | | OS-EXT-STS:vm_state | active | $ for HOST in mfo-bm{4,5}; do ssh $HOST 'hostname; sudo virsh list --name'; done mfo-bm4 instance-0000000f mfo-bm5 instance-00000010 $ ssh mfo-bm5 "sudo virsh dumpxml instance-00000010 | grep -e '<cpu mode=' -e '<model .*EPYC-Rome' -e '<feature .*xsaves'" <cpu mode='custom' match='exact' check='full'> <model fallback='forbid'>EPYC-Rome</model> <feature policy='disable' name='xsaves'/> $ openstack server migrate --wait --live-migration --host mfo-bm4 --os-compute-api-version 2.30 vm2 Complete $ openstack server show vm2 | grep -e :hypervisor_hostname -e :instance_name -e :vm_state | OS-EXT-SRV-ATTR:hypervisor_hostname | mfo-bm4 | | OS-EXT-SRV-ATTR:instance_name | instance-00000010 | | OS-EXT-STS:vm_state | active | $ for HOST in mfo-bm{4,5}; do ssh $HOST 'hostname; sudo virsh list --name'; done mfo-bm4 instance-0000000f instance-00000010 mfo-bm5 $ ssh mfo-bm4 "sudo virsh dumpxml instance-00000010 | grep -e '<cpu mode=' -e '<model .*EPYC-Rome' -e '<feature .*xsaves'" <cpu mode='custom' match='exact' check='full'> <model fallback='forbid'>EPYC-Rome</model> <feature policy='disable' name='xsaves'/> $ openstack server migrate --wait --live-migration --host mfo-bm5 --os-compute-api-version 2.30 vm2 Complete $ for HOST in mfo-bm{4,5}; do ssh $HOST 'hostname; sudo virsh list --name'; done mfo-bm4 instance-0000000f mfo-bm5 instance-00000010 $ ssh mfo-bm5 "sudo virsh dumpxml instance-00000010 | grep -e '<cpu mode=' -e '<model .*EPYC-Rome' -e '<feature .*xsaves'" <cpu mode='custom' match='exact' check='full'> <model fallback='forbid'>EPYC-Rome</model> <feature policy='disable' name='xsaves'/> Now.. Enable the new option: 'cpu_model_extra_flags = -xsaves', and restart nova-compute.service. $ for HOST in mfo-bm{4,5}; do ssh $HOST 'hostname; sudo grep -w -e cpu_mode -e cpu_model -e cpu_model_extra_flags /etc/nova/nova.conf; echo'; done mfo-bm4 cpu_mode = custom cpu_model = EPYC-Rome cpu_model_extra_flags = -xsaves mfo-bm5 cpu_mode = custom cpu_model = EPYC-Rome cpu_model_extra_flags = -xsaves Nothing changes in the VM immediately, of course; `xsaves` is still `require`: $ ssh mfo-bm4 "sudo virsh dumpxml instance-0000000f | grep -e '<cpu mode=' -e '<model .*EPYC-Rome' -e '<feature .*xsaves'" <cpu mode='custom' match='exact' check='full'> <model fallback='forbid'>EPYC-Rome</model> <feature policy='require' name='xsaves'/> But it does change after a restart (regenerates the libvirt VM XML, uses a new QEMU process): $ openstack server stop vm1 $ openstack server show vm1 | grep -e :hypervisor_hostname -e :instance_name -e :vm_state | OS-EXT-SRV-ATTR:hypervisor_hostname | mfo-bm4 | | OS-EXT-SRV-ATTR:instance_name | instance-0000000f | | OS-EXT-STS:vm_state | stopped | $ openstack server start vm1 $ openstack server show vm1 | grep -e :hypervisor_hostname -e :instance_name -e :vm_state | OS-EXT-SRV-ATTR:hypervisor_hostname | mfo-bm4 | | OS-EXT-SRV-ATTR:instance_name | instance-0000000f | | OS-EXT-STS:vm_state | active | Note that xsaves is now *disabled* in the VM $ ssh mfo-bm4 "sudo virsh dumpxml instance-0000000f | grep -e '<cpu mode=' -e '<model .*EPYC-Rome' -e '<feature .*xsaves'" <cpu mode='custom' match='exact' check='full'> <model fallback='forbid'>EPYC-Rome</model> <feature policy='disable' name='xsaves'/> Despite the host still having the flag: $ ssh mfo-bm4 'hostname; uname -rv; echo; head -n5 /proc/cpuinfo; echo; grep -m1 -o "xsave\w\+" /proc/cpuinfo' mfo-bm4 5.15.0-1039-oracle #45~20.04.1-Ubuntu SMP Fri Jul 14 16:50:19 UTC 2023 processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 49 model name : AMD EPYC 7742 64-Core Processor xsaveopt xsavec xsaves xsaveerptr And now it *CAN* migrate to the other hypervisor without the flag, as it is longer `require`d. $ openstack server migrate --wait --live-migration --host mfo-bm5 --os-compute-api-version 2.30 vm1 Complete $ openstack server show vm1 | grep -e :hypervisor_hostname -e :instance_name -e :vm_state | OS-EXT-SRV-ATTR:hypervisor_hostname | mfo-bm5 | | OS-EXT-SRV-ATTR:instance_name | instance-0000000f | | OS-EXT-STS:vm_state | active | $ ssh mfo-bm5 'hostname; uname -rv; echo; head -n5 /proc/cpuinfo; echo; grep -m1 -o "xsave\w\+" /proc/cpuinfo' mfo-bm5 5.15.0-1059-oracle #65~20.04.1-Ubuntu SMP Fri Apr 19 14:17:36 UTC 2024 processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 49 model name : AMD EPYC 7742 64-Core Processor xsaveopt xsavec xsaveerptr And back: $ openstack server migrate --wait --live-migration --host mfo-bm4 --os-compute-api-version 2.30 vm1 Complete $ openstack server show vm1 | grep -e :hypervisor_hostname -e :instance_name -e :vm_state | OS-EXT-SRV-ATTR:hypervisor_hostname | mfo-bm4 | | OS-EXT-SRV-ATTR:instance_name | instance-0000000f | | OS-EXT-STS:vm_state | active | $ ssh mfo-bm4 "sudo virsh dumpxml instance-0000000f | grep -e '<cpu mode=' -e '<model .*EPYC-Rome' -e '<feature .*xsaves'" <cpu mode='custom' match='exact' check='full'> <model fallback='forbid'>EPYC-Rome</model> <feature policy='disable' name='xsaves'/> Oh, and the previously running/other VM too: $ openstack server show vm2 | grep -e :hypervisor_hostname -e :instance_name -e :vm_state | OS-EXT-SRV-ATTR:hypervisor_hostname | mfo-bm5 | | OS-EXT-SRV-ATTR:instance_name | instance-00000010 | | OS-EXT-STS:vm_state | active | $ openstack server migrate --wait --live-migration --host mfo-bm4 --os-compute-api-version 2.30 vm2 Complete $ openstack server show vm2 | grep -e :hypervisor_hostname -e :instance_name -e :vm_state | OS-EXT-SRV-ATTR:hypervisor_hostname | mfo-bm4 | | OS-EXT-SRV-ATTR:instance_name | instance-00000010 | | OS-EXT-STS:vm_state | active | $ openstack server migrate --wait --live-migration --host mfo-bm5 --os-compute-api-version 2.30 vm2 Complete $ openstack server show vm2 | grep -e :hypervisor_hostname -e :instance_name -e :vm_state | OS-EXT-SRV-ATTR:hypervisor_hostname | mfo-bm5 | | OS-EXT-SRV-ATTR:instance_name | instance-00000010 | | OS-EXT-STS:vm_state | active | $ Great. Now, the default, without any config in nova.conf; to validate nova itself works in that case. Just like the above, we expect this should fail in a VM created in mfo-bm4 / migrate to mfo-bm5, and work the other way around. Let's try migrations back and forth: $ openstack server stop vm1 $ openstack server show vm1 | grep -e :hypervisor_hostname -e :instance_name -e :vm_state | OS-EXT-SRV-ATTR:hypervisor_hostname | mfo-bm4 | | OS-EXT-SRV-ATTR:instance_name | instance-0000000f | | OS-EXT-STS:vm_state | stopped | $ openstack server stop vm2 $ openstack server show vm2 | grep -e :hypervisor_hostname -e :instance_name -e :vm_state | OS-EXT-SRV-ATTR:hypervisor_hostname | mfo-bm5 | | OS-EXT-SRV-ATTR:instance_name | instance-00000010 | | OS-EXT-STS:vm_state | stopped | $ openstack server start vm1 $ openstack server show vm1 | grep -e :hypervisor_hostname -e :instance_name -e :vm_state | OS-EXT-SRV-ATTR:hypervisor_hostname | mfo-bm4 | | OS-EXT-SRV-ATTR:instance_name | instance-0000000f | | OS-EXT-STS:vm_state | active | $ openstack server start vm2 $ openstack server show vm2 | grep -e :hypervisor_hostname -e :instance_name -e :vm_state | OS-EXT-SRV-ATTR:hypervisor_hostname | mfo-bm5 | | OS-EXT-SRV-ATTR:instance_name | instance-00000010 | | OS-EXT-STS:vm_state | active | $ for HOST in mfo-bm{4,5}; do ssh $HOST 'hostname; sudo virsh list --name'; done mfo-bm4 instance-0000000f mfo-bm5 instance-00000010 $ ssh mfo-bm4 "sudo virsh dumpxml instance-0000000f | grep -e '<cpu mode=' -e '<model .*EPYC-Rome' -e '<feature .*xsaves'" <cpu mode='custom' match='exact' check='full'> <model fallback='forbid'>EPYC-Rome</model> <feature policy='require' name='xsaves'/> $ ssh mfo-bm5 "sudo virsh dumpxml instance-00000010 | grep -e '<cpu mode=' -e '<model .*EPYC-Rome' -e '<feature .*xsaves'" <cpu mode='custom' match='exact' check='full'> <model fallback='forbid'>EPYC-Rome</model> <feature policy='disable' name='xsaves'/> Indeed, only vm2 migrated, as the first vm1/hypervisor no longer has the xsaves 'disable' configured. $ openstack server show vm1 | grep -e :hypervisor_hostname -e :instance_name -e :vm_state | OS-EXT-SRV-ATTR:hypervisor_hostname | mfo-bm4 | | OS-EXT-SRV-ATTR:instance_name | instance-0000000f | | OS-EXT-STS:vm_state | active | $ openstack server show vm2 | grep -e :hypervisor_hostname -e :instance_name -e :vm_state | OS-EXT-SRV-ATTR:hypervisor_hostname | mfo-bm4 | | OS-EXT-SRV-ATTR:instance_name | instance-00000010 | | OS-EXT-STS:vm_state | active | Let's upgrade mfo-bm4's kernel (so xsaves is removed), and try again. $ openstack server stop vm1 $ openstack server stop vm2 # Restart mfo-bm4; kernel upgrade removes xsaves. Both mfo-bm4 and mfo-bm5 no longer have xsaves now: $ for HOST in mfo-bm{4,5}; do ssh $HOST 'hostname; uname -rv; echo; head -n5 /proc/cpuinfo; echo; grep -m1 -o "xsave\w\+" /proc/cpuinfo; echo'; done mfo-bm4 5.15.0-1059-oracle #65~20.04.1-Ubuntu SMP Fri Apr 19 14:17:36 UTC 2024 processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 49 model name : AMD EPYC 7742 64-Core Processor xsaveopt xsavec xsaveerptr mfo-bm5 5.15.0-1059-oracle #65~20.04.1-Ubuntu SMP Fri Apr 19 14:17:36 UTC 2024 processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 49 model name : AMD EPYC 7742 64-Core Processor xsaveopt xsavec xsaveerptr Let's start and migrate the VMs with default settings: $ openstack server show vm1 | grep -e :hypervisor_hostname -e :instance_name -e :vm_state | OS-EXT-SRV-ATTR:hypervisor_hostname | mfo-bm4 | | OS-EXT-SRV-ATTR:instance_name | instance-0000000f | | OS-EXT-STS:vm_state | active | $ openstack server show vm2 | grep -e :hypervisor_hostname -e :instance_name -e :vm_state | OS-EXT-SRV-ATTR:hypervisor_hostname | mfo-bm4 | | OS-EXT-SRV-ATTR:instance_name | instance-00000010 | | OS-EXT-STS:vm_state | active | $ ssh mfo-bm4 "sudo virsh dumpxml instance-0000000f | grep -e '<cpu mode=' -e '<model .*EPYC-Rome' -e '<feature .*xsaves'" <cpu mode='custom' match='exact' check='full'> <model fallback='forbid'>EPYC-Rome</model> <feature policy='disable' name='xsaves'/> $ ssh mfo-bm4 "sudo virsh dumpxml instance-00000010 | grep -e '<cpu mode=' -e '<model .*EPYC-Rome' -e '<feature .*xsaves'" <cpu mode='custom' match='exact' check='full'> <model fallback='forbid'>EPYC-Rome</model> <feature policy='disable' name='xsaves'/> Both VMs have xsaves disabled by default now. And can freely migrate too. From first hypervisor to the second: $ openstack server migrate --wait --live-migration --host mfo-bm5 --os-compute-api-version 2.30 vm1 Complete $ openstack server migrate --wait --live-migration --host mfo-bm5 --os-compute-api-version 2.30 vm2 Complete $ openstack server show vm1 | grep -e :hypervisor_hostname -e :instance_name -e :vm_state | OS-EXT-SRV-ATTR:hypervisor_hostname | mfo-bm5 | | OS-EXT-SRV-ATTR:instance_name | instance-0000000f | | OS-EXT-STS:vm_state | active | $ openstack server show vm2 | grep -e :hypervisor_hostname -e :instance_name -e :vm_state | OS-EXT-SRV-ATTR:hypervisor_hostname | mfo-bm5 | | OS-EXT-SRV-ATTR:instance_name | instance-00000010 | | OS-EXT-STS:vm_state | active | And back to the first hypervisor: $ openstack server migrate --wait --live-migration --host mfo-bm4 --os-compute-api-version 2.30 vm1 Complete $ openstack server migrate --wait --live-migration --host mfo-bm4 --os-compute-api-version 2.30 vm2 Complete $ openstack server show vm1 | grep -e :hypervisor_hostname -e :instance_name -e :vm_state | OS-EXT-SRV-ATTR:hypervisor_hostname | mfo-bm4 | | OS-EXT-SRV-ATTR:instance_name | instance-0000000f | | OS-EXT-STS:vm_state | active | $ openstack server show vm2 | grep -e :hypervisor_hostname -e :instance_name -e :vm_state | OS-EXT-SRV-ATTR:hypervisor_hostname | mfo-bm4 | | OS-EXT-SRV-ATTR:instance_name | instance-00000010 | | OS-EXT-STS:vm_state | active | All good! ** Tags removed: verification-needed verification-needed-focal ** Tags added: verification-done verification-done-focal -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2048517 Title: EPYC-Rome model without XSAVES may break live migration since the removal of the flag on the physical CPU To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/nova/+bug/2048517/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
