jammy-proposed kernel verified to fix bug on DGX H100: I am seeing the boot time improvement on 2/2 tests (the time between virsh start finishing and the boot reaching Host and Network Name Lookups is down to 1:30, from 2:30 on the old kernel), GPU driver works, and the timing of the console outputs where the BARs are being mapped are such that it looks like the expected 'batching' behavior is happening
** Tags removed: verification-needed-jammy-linux ** Tags added: verification-done-jammy-linux -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2097389 Title: VM boots slowly with large-BAR GPU Passthrough due to pci/probe.c redundancy Status in linux package in Ubuntu: Invalid Status in linux source package in Jammy: Fix Committed Status in linux source package in Noble: Fix Released Status in linux source package in Oracular: Fix Released Bug description: SRU Justification: [ Impact ] VM guests that have large-BAR GPUs passed through to them will take 2x as long to initialize all device BARs without this patch [ Test Plan ] I verified that this patch applies cleanly to the Noble kernel and resolves the bug on DGX H100 and DGX A100. I observed no regressions. This can be verified on any machine with a sufficiently large BAR and the capability to pass through to a VM using vfio. To verify no regressions, I applied this patch to the guest kernel, then rebooted and confirmed that: 1. The measured PCI initialization time on boot was ~50% of the unmodified kernel 2. Relevant parts of /proc/iomem mappings, the PCI init section of dmesg output, and lspci -vv output remained unchanged between the system with the unmodified kernel and with the patched kernel 3. The Nvidia driver still successfully loaded and was shown via nvidia-smi after the patch was applied [ Fix ] Roughly half of the time consuming device configuration options invoked during the PCI probe function can be eliminated by rearranging the memory and I/O disable/enable calls such that they only occur per-device rather than per-BAR. This is what the upstream patch does, and it results in roughly half the excess initialization time being eliminated reliably during VM boot. [ Where problems could occur ] I do not expect any regressions. The only callers of ABIs changed by this patch are also adjusted within this patch, and the functional change only removes entirely redundant calls to disable/enable PCI memory/IO. [ Additional Context ] Upstream patch: https://lore.kernel.org/all/20250111210652.402845-1-alex.william...@redhat.com/ Upstream bug report: https://lore.kernel.org/all/cahta-uyp07fgm6t1ozqkqadsa5jrzo0reneyzgqzub4mdrr...@mail.gmail.com/ To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2097389/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp