X-PciMmio64Mb value unconditionally" to Noble

Mitchell Augustin Sat, 24 May 2025 13:54:26 -0700

** Description changed:

- Upstream patch: 
https://github.com/tianocore/edk2/pull/10856/commits/f8a8bb717c53c651750025aefaa5654f383bd02e
- (To be added to Plucky via Debian)
+ Upstream patch:
+ 
https://github.com/tianocore/edk2/pull/10856/commits/f8a8bb717c53c651750025aefaa5654f383bd02e
  
  SRU Justification:
  
  [ Impact ]
  
  Due to an inefficiency in the way older host kernels manage pfnmaps for
  guest VM memory ranges[0], guests with large-BAR GPUs passed through
  have a very long (multiple minutes) initialization time when the MMIO
  window advertised by OVMF is sufficiently sized for the passed-through
  BARs (i.e., the correct OVMF behavior). However, in the past, users have
  benefited from fast guest boot times when OVMF advertised an MMIO window
  that was too small to accommodate the full BAR, since this resulted in
  the long PCI initialization process being skipped (and retried later in
  a way that omitted the slow path, if pci=realloc pci=nocrs were set).
  
  While the root cause is being fully addressed in the upstream kernel[1],
  the solution relies on huge pfnmap support, which is not expected to be
  backported into the 6.8 or 6.11 -generic kernels. As a result, the only
  kernel improvement supported on those kernels is this patch[2], which
  reduces the extra boot time by about half. Unfortunately, that boot time
  is still an average of 1-3 minutes longer per-VM-boot than what can be
  achieved when the host is running a version of OVMF without
  PlatformDynamicMmioWindow (PDMW) support (introduced in [3]) (as was the
  case in Jammy's version of OVMF).
  
  [ Test Plan ]
  
  I have confirmed that this cleanly applies to the latest noble OVMF and
  prepared a test PPA:
  https://launchpad.net/~mitchellaugustin/+archive/ubuntu/edk2-honor-user-
  mmio-window
  
  I have verified that this knob works as expected for values large enough for 
the GPU MMIO windows (as supported by the original behavior) and for values 
smaller than what PDMW computes (newly introduced by this patch).
  On DGX H100, forcing a value of 1024 (lower than required for passed-through 
GPUs) results in desired fast boot time, with GPUs still being usable as long 
as pci=nocrs pci=realloc are set in the guest, even on legacy kernels. I also 
observed no regressions, and no change in behavior when X-PciMmio64Mb is absent 
or above the PDMW-calculated value.
  
  [ Fix ]
  
  Since there is no way to force the use of the classic MMIO window size4
  in any version of OVMF after 3, and since we have a use case for such
  functionality on legacy distro kernels that would yield significant,
  recurring compute time savings across all impacted VMs, apply this
  change to this knob's behavior to make this workaround possible on
  Noble.
  
  [ Where problems could occur ]
  
  If there are user deployments on Noble in which X-PciMmio64Mb is
  currently explicitly set to a value smaller than the PDMW-computed
  value, those deployments are currently ignoring the X-PciMmio64Mb value
  and instead using that which is calculated by PDMW. If any such
  deployments exist, *and* are specifying values that are too small for
  their GPUs' MMIO windows, *and* do not have `pci=realloc pci=nocrs` set,
  their passed-through GPUs will stop working until they either raise
  X-PciMmio64Mb to be large enough for their MMIO windows, remove
  X-PciMmio64Mb from their config (if PDMW's value is high enough), or add
  `pci=nocrs pci=realloc` to their guest kernel config to obtain the
  benefits of this patch.
  
  However, from the perspective of OVMF, we are making the X-PciMmio64Mb
  behavior more consistent, so I do not believe the above risk should be a
  blocker for including this patch. (I also suspect that those
  circumstances are uncommon, since anyone impacted by their use of
  X-PciMmio64Mb today must only be specifying values larger than PDMW, who
  will not be impacted by this.)
  
  Additionally, this patch only adds new opt-in functionality and does not
  impact anyone not using X-PciMmio64Mb, so it shouldn't have much
  regression risk outside of that.
  
  [ Example of how to enable a faster VM boot time via libvirt XML ]
  
  1. Install a version of OVMF which contains the patch detailed in this bug 
report on your *host*
  2. In the guest VM, set the following kernel command line options (in 
/etc/default/grub, or equivalent: `pci=nocrs pci=realloc`)
  3. On the host, run `virsh edit <your VM>`
  4. Adjust your top level <domain> element to add the qemu xmlns: `<domain 
type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>`
  5. Add the following to your xml within <domain></domain>:
    <qemu:commandline>
      <qemu:arg value='-fw_cfg'/>
      <qemu:arg value='name=opt/ovmf/X-PciMmio64Mb,string=1024'/>
    </qemu:commandline>
  (note: the "1024" is arbitrary - any value *too small* for your GPU aperture 
will work here)
  6. Start the VM. It should boot as fast as it would on Jammy, and the GPUs 
should be usable as long as (2) was done properly.
  
  [0]: 
https://lore.kernel.org/all/cahta-uyp07fgm6t1ozqkqadsa5jrzo0reneyzgqzub4mdrr...@mail.gmail.com/
  [1]: 
https://lore.kernel.org/all/[email protected]/
  [2]: 
https://lore.kernel.org/all/[email protected]/
  [3]: 
https://github.com/tianocore/edk2/commit/ecb778d0ac62560aa172786ba19521f27bc3f650
  [4]: https://edk2.groups.io/g/devel/topic/109651206?p=Created,,,20,1,0,0
+ 
+ 
+ Search terms: Slow VM boot time, GPU passthrough, QEMU, libvirt, several 
minutes, slow virtual machine boot, ubuntu noble, ubuntu 24.04


-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2101903

Title:
  Backport "OvmfPkg: Use user-specified opt/ovmf/X-PciMmio64Mb value
  unconditionally" to Noble

To manage notifications about this bug go to:
https://bugs.launchpad.net/edk2/+bug/2101903/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2101903] Re: Backport "OvmfPkg: Use user-specified opt/ovmf/X-PciMmio64Mb value unconditionally" to Noble

Reply via email to