I'll start by stating that I'm not familiar with ed2k, nor with the many acronyms that are being used in this SRU, or the internal details of how device pass-through works.
Second, let me emphasize that the outcome quoted below is awesome: > boot-to-login time was reduced from ~1 minute to <13 seconds on a VM with 4 passed-through H100 GPUs. So where do I begin. Let me state some assumptions I'm making based on what little I understood of this problem, and you let me know where I'm wrong (or right). a) From the patch description: Prior to this change, OVMF considers opt/ovmf/X-PciMmio64Mb the minimum aperture size, allowing us to force the window to be larger but not smaller than what PlatformDynamicMmioWindow calculates. Adjust OVMF so that a smaller value for the aperture is honored. That reads to me like a change in behavior that needs discussing. Some parameter was considered a lower limit before, and now it's no longer a lower limit. Is it documented and known that prior to this change opt/ovmf/X-PciMmio64Mb is a lower limit? Would users be surprised (negatively) that now it's no longer a limit of any kind? b) Let's assume the patch lands and I can now set "the aperture" to a value lower than opt/ovmf/X-PciMmio64Mb. From what I understood of the patch description, this makes the "long PCI initialization process be skipped". b1) what are the consequences of that skip? b2) isn't there a less intrusive way to have that skip happen, instead of setting a lower aperture size? Sounds like we are just taking advantage of a side effect of setting the aperture to a value lower than the previous limit. Why not, let's say, a new option called "pci_initialization_skip=1"? Because that's a kernel change? b3) if the "long PCI initialization process" happens later, what are the consequences to the host? Certain hardware will not be readily available after boot? Will the slowness that we are skipping at boot not just manifest itself later, and disturb the workload of the system, instead of the boot? In other words, isn't this change just kicking the can down the road, and making something else slow now, something that used to be fast (like post-boot operations)? c) Continuing on the consequences of (b), the PCI initialization skip. The patch says it will be retried later, AS LONG AS certain options are used in the, I presume, kernel command-line: "pci=realloc pci=nocrs". c1) Are the mentioned options default? c2) Are users expected to be using those options with this kind of hardware? c3) In the "where problems could occurr" section, you very helpfully outline some consequences (much appreciated!). Let me quote it here: > their passed-through GPUs will stop working until they either raise > X-PciMmio64Mb to be large enough > for their MMIO windows, remove X-PciMmio64Mb from their config (if PDMW's > value is high enough), or add > `pci=nocrs pci=realloc` to their guest kernel config to obtain the benefits > of this patch. c3.1) Are such users expected to know these details? c3.2) If the host is updated with this new ed2k, running guests would only be impacted if they reboot, I presume? d) From what I understood, impacted users would still have to make changes to their systems in order to take advantage of this update, right? Is that change "obvious"? Which value should they set their aperture size to? How can they find that out? -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2101903 Title: Backport "OvmfPkg: Use user-specified opt/ovmf/X-PciMmio64Mb value unconditionally" to Noble To manage notifications about this bug go to: https://bugs.launchpad.net/edk2/+bug/2101903/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs