** Description changed: + For Ubuntu 24.04 users facing this issue - please see this related bug + report for a workaround to achieve much faster VM boots with large GPU + passthrough: https://bugs.launchpad.net/ubuntu/+source/edk2/+bug/2101903 + SRU Justification: [ Impact ] VM guests that have large-BAR GPUs passed through to them will take 2x as long to initialize all device BARs without this patch [ Test Plan ] I verified that this patch applies cleanly to the Noble kernel and resolves the bug on DGX H100 and DGX A100. I observed no regressions. This can be verified on any machine with a sufficiently large BAR and the capability to pass through to a VM using vfio. To verify no regressions, I applied this patch to the guest kernel, then rebooted and confirmed that: 1. The measured PCI initialization time on boot was ~50% of the unmodified kernel 2. Relevant parts of /proc/iomem mappings, the PCI init section of dmesg output, and lspci -vv output remained unchanged between the system with the unmodified kernel and with the patched kernel 3. The Nvidia driver still successfully loaded and was shown via nvidia-smi after the patch was applied [ Fix ] Roughly half of the time consuming device configuration options invoked during the PCI probe function can be eliminated by rearranging the memory and I/O disable/enable calls such that they only occur per-device rather than per-BAR. This is what the upstream patch does, and it results in roughly half the excess initialization time being eliminated reliably during VM boot. [ Where problems could occur ] I do not expect any regressions. The only callers of ABIs changed by this patch are also adjusted within this patch, and the functional change only removes entirely redundant calls to disable/enable PCI memory/IO. [ Additional Context ] Upstream patch: https://lore.kernel.org/all/[email protected]/ Upstream bug report: https://lore.kernel.org/all/cahta-uyp07fgm6t1ozqkqadsa5jrzo0reneyzgqzub4mdrr...@mail.gmail.com/
** Description changed: For Ubuntu 24.04 users facing this issue - please see this related bug report for a workaround to achieve much faster VM boots with large GPU passthrough: https://bugs.launchpad.net/ubuntu/+source/edk2/+bug/2101903 + + (The fix in *this* bug report reduced the slowness by half without + additional user intervention, but much faster speeds can be achieved as + described in the report linked above on Noble) SRU Justification: [ Impact ] VM guests that have large-BAR GPUs passed through to them will take 2x as long to initialize all device BARs without this patch [ Test Plan ] I verified that this patch applies cleanly to the Noble kernel and resolves the bug on DGX H100 and DGX A100. I observed no regressions. This can be verified on any machine with a sufficiently large BAR and the capability to pass through to a VM using vfio. To verify no regressions, I applied this patch to the guest kernel, then rebooted and confirmed that: 1. The measured PCI initialization time on boot was ~50% of the unmodified kernel 2. Relevant parts of /proc/iomem mappings, the PCI init section of dmesg output, and lspci -vv output remained unchanged between the system with the unmodified kernel and with the patched kernel 3. The Nvidia driver still successfully loaded and was shown via nvidia-smi after the patch was applied [ Fix ] Roughly half of the time consuming device configuration options invoked during the PCI probe function can be eliminated by rearranging the memory and I/O disable/enable calls such that they only occur per-device rather than per-BAR. This is what the upstream patch does, and it results in roughly half the excess initialization time being eliminated reliably during VM boot. [ Where problems could occur ] I do not expect any regressions. The only callers of ABIs changed by this patch are also adjusted within this patch, and the functional change only removes entirely redundant calls to disable/enable PCI memory/IO. [ Additional Context ] Upstream patch: https://lore.kernel.org/all/[email protected]/ Upstream bug report: https://lore.kernel.org/all/cahta-uyp07fgm6t1ozqkqadsa5jrzo0reneyzgqzub4mdrr...@mail.gmail.com/ ** Description changed: For Ubuntu 24.04 users facing this issue - please see this related bug report for a workaround to achieve much faster VM boots with large GPU passthrough: https://bugs.launchpad.net/ubuntu/+source/edk2/+bug/2101903 (The fix in *this* bug report reduced the slowness by half without - additional user intervention, but much faster speeds can be achieved as - described in the report linked above on Noble) + additional user intervention, but much faster speeds can be achieved on + Noble with the fw_cfg changes described in the report linked above) SRU Justification: [ Impact ] VM guests that have large-BAR GPUs passed through to them will take 2x as long to initialize all device BARs without this patch [ Test Plan ] I verified that this patch applies cleanly to the Noble kernel and resolves the bug on DGX H100 and DGX A100. I observed no regressions. This can be verified on any machine with a sufficiently large BAR and the capability to pass through to a VM using vfio. To verify no regressions, I applied this patch to the guest kernel, then rebooted and confirmed that: 1. The measured PCI initialization time on boot was ~50% of the unmodified kernel 2. Relevant parts of /proc/iomem mappings, the PCI init section of dmesg output, and lspci -vv output remained unchanged between the system with the unmodified kernel and with the patched kernel 3. The Nvidia driver still successfully loaded and was shown via nvidia-smi after the patch was applied [ Fix ] Roughly half of the time consuming device configuration options invoked during the PCI probe function can be eliminated by rearranging the memory and I/O disable/enable calls such that they only occur per-device rather than per-BAR. This is what the upstream patch does, and it results in roughly half the excess initialization time being eliminated reliably during VM boot. [ Where problems could occur ] I do not expect any regressions. The only callers of ABIs changed by this patch are also adjusted within this patch, and the functional change only removes entirely redundant calls to disable/enable PCI memory/IO. [ Additional Context ] Upstream patch: https://lore.kernel.org/all/[email protected]/ Upstream bug report: https://lore.kernel.org/all/cahta-uyp07fgm6t1ozqkqadsa5jrzo0reneyzgqzub4mdrr...@mail.gmail.com/ -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2097389 Title: VM boots slowly with large-BAR GPU Passthrough due to pci/probe.c redundancy To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2097389/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
