** Description changed: - Upstream patch: https://github.com/tianocore/edk2/pull/10856/commits/f8a8bb717c53c651750025aefaa5654f383bd02e - (To be added to Plucky via Debian) + Upstream patch: + https://github.com/tianocore/edk2/pull/10856/commits/f8a8bb717c53c651750025aefaa5654f383bd02e SRU Justification: [ Impact ] Due to an inefficiency in the way older host kernels manage pfnmaps for guest VM memory ranges[0], guests with large-BAR GPUs passed through have a very long (multiple minutes) initialization time when the MMIO window advertised by OVMF is sufficiently sized for the passed-through BARs (i.e., the correct OVMF behavior). However, in the past, users have benefited from fast guest boot times when OVMF advertised an MMIO window that was too small to accommodate the full BAR, since this resulted in the long PCI initialization process being skipped (and retried later in a way that omitted the slow path, if pci=realloc pci=nocrs were set). While the root cause is being fully addressed in the upstream kernel[1], the solution relies on huge pfnmap support, which is not expected to be backported into the 6.8 or 6.11 -generic kernels. As a result, the only kernel improvement supported on those kernels is this patch[2], which reduces the extra boot time by about half. Unfortunately, that boot time is still an average of 1-3 minutes longer per-VM-boot than what can be achieved when the host is running a version of OVMF without PlatformDynamicMmioWindow (PDMW) support (introduced in [3]) (as was the case in Jammy's version of OVMF). [ Test Plan ] I have confirmed that this cleanly applies to the latest noble OVMF and prepared a test PPA: https://launchpad.net/~mitchellaugustin/+archive/ubuntu/edk2-honor-user- mmio-window I have verified that this knob works as expected for values large enough for the GPU MMIO windows (as supported by the original behavior) and for values smaller than what PDMW computes (newly introduced by this patch). On DGX H100, forcing a value of 1024 (lower than required for passed-through GPUs) results in desired fast boot time, with GPUs still being usable as long as pci=nocrs pci=realloc are set in the guest, even on legacy kernels. I also observed no regressions, and no change in behavior when X-PciMmio64Mb is absent or above the PDMW-calculated value. [ Fix ] Since there is no way to force the use of the classic MMIO window size4 in any version of OVMF after 3, and since we have a use case for such functionality on legacy distro kernels that would yield significant, recurring compute time savings across all impacted VMs, apply this change to this knob's behavior to make this workaround possible on Noble. [ Where problems could occur ] If there are user deployments on Noble in which X-PciMmio64Mb is currently explicitly set to a value smaller than the PDMW-computed value, those deployments are currently ignoring the X-PciMmio64Mb value and instead using that which is calculated by PDMW. If any such deployments exist, *and* are specifying values that are too small for their GPUs' MMIO windows, *and* do not have `pci=realloc pci=nocrs` set, their passed-through GPUs will stop working until they either raise X-PciMmio64Mb to be large enough for their MMIO windows, remove X-PciMmio64Mb from their config (if PDMW's value is high enough), or add `pci=nocrs pci=realloc` to their guest kernel config to obtain the benefits of this patch. However, from the perspective of OVMF, we are making the X-PciMmio64Mb behavior more consistent, so I do not believe the above risk should be a blocker for including this patch. (I also suspect that those circumstances are uncommon, since anyone impacted by their use of X-PciMmio64Mb today must only be specifying values larger than PDMW, who will not be impacted by this.) Additionally, this patch only adds new opt-in functionality and does not impact anyone not using X-PciMmio64Mb, so it shouldn't have much regression risk outside of that. [ Example of how to enable a faster VM boot time via libvirt XML ] 1. Install a version of OVMF which contains the patch detailed in this bug report on your *host* 2. In the guest VM, set the following kernel command line options (in /etc/default/grub, or equivalent: `pci=nocrs pci=realloc`) 3. On the host, run `virsh edit <your VM>` 4. Adjust your top level <domain> element to add the qemu xmlns: `<domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>` 5. Add the following to your xml within <domain></domain>: <qemu:commandline> <qemu:arg value='-fw_cfg'/> <qemu:arg value='name=opt/ovmf/X-PciMmio64Mb,string=1024'/> </qemu:commandline> (note: the "1024" is arbitrary - any value *too small* for your GPU aperture will work here) 6. Start the VM. It should boot as fast as it would on Jammy, and the GPUs should be usable as long as (2) was done properly. [0]: https://lore.kernel.org/all/cahta-uyp07fgm6t1ozqkqadsa5jrzo0reneyzgqzub4mdrr...@mail.gmail.com/ [1]: https://lore.kernel.org/all/[email protected]/ [2]: https://lore.kernel.org/all/[email protected]/ [3]: https://github.com/tianocore/edk2/commit/ecb778d0ac62560aa172786ba19521f27bc3f650 [4]: https://edk2.groups.io/g/devel/topic/109651206?p=Created,,,20,1,0,0 + + + Search terms: Slow VM boot time, GPU passthrough, QEMU, libvirt, several minutes, slow virtual machine boot, ubuntu noble, ubuntu 24.04
-- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2101903 Title: Backport "OvmfPkg: Use user-specified opt/ovmf/X-PciMmio64Mb value unconditionally" to Noble To manage notifications about this bug go to: https://bugs.launchpad.net/edk2/+bug/2101903/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
