I'm not sure if the problem is related to the amdgpu driver now. After reverting my changes back to the most recent firmware I ran the apport- collect command and it failed, hanging at the lspci command. I rebooted and retried apport-collect, which succeeded (they're the files posted above.) Using the current firmware amdgpu drivers wasn't the actual problem because the lspci command worked with them and I was able to run a VM with GPU passthrough as well (the logs posted above from apport- collect may not be that valuable, since everything was working on that boot.) It must be an intermittent issue that I first noticed on 2025-04-12. I've reviewed my logs for each boot and thought the issue was related to timing, with the GPU on PCI 10:00.0 being initialized before the driverctl command applying the vfio-pci driver, but on the most recent reboot I saw the amdgpu driver initialize the GPU, then the driverctl replace it but actually logged that it failed (which I've never seen before when reviewing 30 boots) yet the lspci command succeeds and the VM with GPU passthrough works. Here's an example of what I thought was the issue in the logs:
Apr 26 14:49:12 dark kernel: [drm] Initialized amdgpu 3.59.0 for 0000:10:00.0 on minor 2 Apr 26 14:49:12 dark kernel: amdgpu 0000:10:00.0: [drm] fb1: amdgpudrmfb frame buffer device Apr 26 14:49:04 dark systemd[1]: Starting driverctl@pci-0000:10:00.1.service - Load the driverctl override for pci-0000:10:00.1... Apr 26 14:49:04 dark (udev-worker)[880]: controlC1: /usr/lib/udev/rules.d/78-sound-card.rules:5 Failed to write ATTR{/sys/devices/pci0000:00/0000:00:03.1/0000:0e:00.0/0000:0f:00.0/0000:10:00.1/sound/card1/controlC1/../uevent}, ignoring: No such file or directory Apr 26 14:49:12 dark driverctl[1940]: /usr/sbin/driverctl: line 72: /sys//devices/pci0000:00/0000:00:03.1/0000:0e:00.0/0000:0f:00.0/0000:10:00.0/driver/unbind: Permission denied Apr 26 14:49:12 dark driverctl[1940]: driverctl: unbinding 0000:10:00.0 failed Apr 26 14:49:12 dark kernel: amdgpu 0000:10:00.0: amdgpu: amdgpu: finishing device. Apr 26 14:49:12 dark kernel: [drm] amdgpu: ttm finalized Apr 26 14:49:12 dark kernel: vfio-pci 0000:10:00.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=io+mem:owns=none Apr 26 14:49:06 dark systemd[1]: Starting qemu-kvm.service - QEMU KVM preparation - module, ksm, hugepages... Apr 26 14:49:06 dark systemd[1]: Finished qemu-kvm.service - QEMU KVM preparation - module, ksm, hugepages. Apr 26 14:49:06 dark systemd[1]: Finished driverctl@pci-0000:10:00.1.service - Load the driverctl override for pci-0000:10:00.1. Apr 26 14:49:09 dark systemd[1]: Starting driverctl@pci-0000:10:00.0.service - Load the driverctl override for pci-0000:10:00.0... Apr 26 14:49:09 dark systemd[1]: driverctl@pci-0000:10:00.0.service: Main process exited, code=exited, status=1/FAILURE Apr 26 14:49:09 dark systemd[1]: driverctl@pci-0000:10:00.0.service: Failed with result 'exit-code'. Apr 26 14:49:09 dark systemd[1]: Failed to start driverctl@pci-0000:10:00.0.service - Load the driverctl override for pci-0000:10:00.0. Apr 26 14:49:12 dark systemd[1]: Starting driverctl@pci-0000:10:00.0.service - Load the driverctl override for pci-0000:10:00.0... Apr 26 14:49:12 dark systemd[1]: Finished driverctl@pci-0000:10:00.0.service - Load the driverctl override for pci-0000:10:00.0. > lspci -nn ... 10:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21 [Radeon RX 6800/6800 XT / 6900 XT] [1002:73bf] (rev c3) (prog-if 00 [VGA controller]) Subsystem: Sapphire Technology Limited Pulse Radeon RX 6800 [1da2:e437] Flags: bus master, fast devsel, latency 0, IRQ 124, IOMMU group 34 Memory at 1400000000 (64-bit, prefetchable) [size=16G] Memory at 1200000000 (64-bit, prefetchable) [size=2M] I/O ports at e000 [size=256] Memory at fcc00000 (32-bit, non-prefetchable) [size=1M] Expansion ROM at fcd00000 [disabled] [size=128K] Capabilities: <access denied> Kernel driver in use: vfio-pci Kernel modules: amdgpu 10:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21/23 HDMI/DP Audio Controller [1002:ab28] Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21/23 HDMI/DP Audio Controller [1002:ab28] Flags: bus master, fast devsel, latency 0, IRQ 125, IOMMU group 35 Memory at fcd20000 (32-bit, non-prefetchable) [size=16K] Capabilities: <access denied> Kernel driver in use: vfio-pci Kernel modules: snd_hda_intel ... ... 12:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller [1022:149c] (prog-if 30 [XHCI]) Subsystem: Gigabyte Technology Co., Ltd Matisse USB 3.0 Host Controller [1458:5007] Flags: bus master, fast devsel, latency 0, IRQ 122, IOMMU group 39 Memory at fc900000 (64-bit, non-prefetchable) [size=1M] Capabilities: <access denied> Kernel driver in use: vfio-pci Kernel modules: xhci_pci There's still an issue but I can't confirm its related to the firmware update. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-firmware in Ubuntu. https://bugs.launchpad.net/bugs/2107285 Title: KVM VM with GPU passthrough won't start Status in linux-firmware package in Ubuntu: New Bug description: Host OS: Ubuntu 24.04.2 LTS Kernel 6.11.0-21-generic CPU: AMD Ryzen 9 5900X Software Firmware version: F2 GPU 1: AMD Radeon RX 6400 (Used by Host OS) GPU 2: AMD Radeon RX 6800 (Used by VMs via GPU passthrough, on PCI bus 10:00.0) $ apt-cache policy linux-firmware linux-firmware: Installed: 20240318.git3b128b60-0ubuntu2.11 Candidate: 20240318.git3b128b60-0ubuntu2.11 Version table: *** 20240318.git3b128b60-0ubuntu2.11 500 500 http://us.archive.ubuntu.com/ubuntu noble-updates/main amd64 Packages 500 http://security.ubuntu.com/ubuntu noble-security/main amd64 Packages 100 /var/lib/dpkg/status 20240318.git3b128b60-0ubuntu2 500 500 http://us.archive.ubuntu.com/ubuntu noble/main amd64 Packages What should have happened: VM with GPU passthrough should start What happend instead: VM with GPU passthrough wouldn't start. I tried running 'lspci -nns 0000:10:00.0' but this hung the terminal. Virtual Machine Manager was now showing it couldn't connect to the KVM daemon. I rebooted the Host OS but running 'lspci -nns 0000:10:00.0' again hung and I still couldn't start the VM with GPU passthrough. Extra info: After installing updates to the Host OS on 2025-4-10, VMs without GPU passthrough worked fine. On 2025-4-12 I tried to start a VM with GPU passthrough but it wouldn't start. On 2025-4-10 one of the Host OS updates was linux-firmware:amd64 (20240318.git3b128b60-0ubuntu2.10 -> 20240318.git3b128b60-0ubuntu2.11). I wanted to test downgrading the linux-firmware back to version 2.10 but that is no longer available. I was able to find, from this launchpad, the files that were in the 2.10 and 2.11 versions of linux- firmware. I found the differences between the files for the amdgpu firmware files. I overwrote the /lib/firmware/amdgpu files on my host OS with the files from 2.10 and rebooted - the VM with GPU passthrough was able to start (and the lspci command worked.) The list of amdgpu firmware files I overwrote was: gc_11_5_1_imu.bin.zst gc_11_5_1_me.bin.zst gc_11_5_1_mec.bin.zst gc_11_5_1_mes1.bin.zst gc_11_5_1_mes_2.bin.zst gc_11_5_1_pfp.bin.zst gc_11_5_1_rlc.bin.zst isp_4_1_1.bin.zst psp_14_0_1_ta.bin.zst psp_14_0_1_toc.bin.zst sdma_6_1_1.bin.zst vcn_4_0_6_1.bin.zst vcn_4_0_6.bin.zst vpe_6_1_1.bin.zst --- ProblemType: Bug ApportVersion: 2.28.1-0ubuntu3.5 Architecture: amd64 CRDA: N/A CasperMD5CheckResult: pass CurrentDesktop: ubuntu:GNOME Dependencies: firmware-sof-signed 2023.12.1-1ubuntu1.4 DistroRelease: Ubuntu 24.04 InstallationDate: Installed on 2024-06-01 (326 days ago) InstallationMedia: Ubuntu 24.04 LTS "Noble Numbat" - Release amd64 (20240424) MachineType: Gigabyte Technology Co., Ltd. X570S AORUS PRO AX Package: linux-firmware 20240318.git3b128b60-0ubuntu2.11 PackageArchitecture: amd64 ProcEnviron: LANG=en_US.UTF-8 PATH=(custom, no user) SHELL=/bin/bash TERM=xterm-256color XDG_RUNTIME_DIR=<set> ProcFB: 0 amdgpudrmfb ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-6.11.0-21-generic root=/dev/mapper/ubuntu--vg-ubuntu--lv ro quiet splash amd_iommu=on iommu=pt vt.handoff=7 ProcVersionSignature: Ubuntu 6.11.0-21.21~24.04.1-generic 6.11.11 RelatedPackageVersions: linux-restricted-modules-6.11.0-21-generic N/A linux-backports-modules-6.11.0-21-generic N/A linux-firmware 20240318.git3b128b60-0ubuntu2.11 Tags: noble wayland-session Uname: Linux 6.11.0-21-generic x86_64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: adm cdrom dip kvm libvirt libvirt-dnsmasq lpadmin plugdev storage sudo users _MarkForUpload: True dmi.bios.date: 07/08/2021 dmi.bios.release: 5.17 dmi.bios.vendor: American Megatrends International, LLC. dmi.bios.version: F2 dmi.board.asset.tag: Default string dmi.board.name: X570S AORUS PRO AX dmi.board.vendor: Gigabyte Technology Co., Ltd. dmi.board.version: x.x dmi.chassis.asset.tag: Default string dmi.chassis.type: 3 dmi.chassis.vendor: Default string dmi.chassis.version: Default string dmi.modalias: dmi:bvnAmericanMegatrendsInternational,LLC.:bvrF2:bd07/08/2021:br5.17:svnGigabyteTechnologyCo.,Ltd.:pnX570SAORUSPROAX:pvr-CF:rvnGigabyteTechnologyCo.,Ltd.:rnX570SAORUSPROAX:rvrx.x:cvnDefaultstring:ct3:cvrDefaultstring:skuDefaultstring: dmi.product.family: X570 MB dmi.product.name: X570S AORUS PRO AX dmi.product.sku: Default string dmi.product.version: -CF dmi.sys.vendor: Gigabyte Technology Co., Ltd. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-firmware/+bug/2107285/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp