I'm not sure if the problem is related to the amdgpu driver now. After
reverting my changes back to the most recent firmware I ran the apport-
collect command and it failed, hanging at the lspci command. I rebooted
and retried apport-collect, which succeeded (they're the files posted
above.) Using the current firmware amdgpu drivers wasn't the actual
problem because the lspci command worked with them and I was able to run
a VM with GPU passthrough as well (the logs posted above from apport-
collect may not be that valuable, since everything was working on that
boot.) It must be an intermittent issue that I first noticed on
2025-04-12. I've reviewed my logs for each boot and thought the issue
was related to timing, with the GPU on PCI 10:00.0 being initialized
before the driverctl command applying the vfio-pci driver, but on the
most recent reboot I saw the amdgpu driver initialize the GPU, then the
driverctl replace it but actually logged that it failed (which I've
never seen before when reviewing 30 boots) yet the lspci command
succeeds and the VM with GPU passthrough works. Here's an example of
what I thought was the issue in the logs:
Apr 26 14:49:12 dark kernel: [drm] Initialized amdgpu 3.59.0 for 0000:10:00.0
on minor 2
Apr 26 14:49:12 dark kernel: amdgpu 0000:10:00.0: [drm] fb1: amdgpudrmfb frame
buffer device
Apr 26 14:49:04 dark systemd[1]: Starting driverctl@pci-0000:10:00.1.service -
Load the driverctl override for pci-0000:10:00.1...
Apr 26 14:49:04 dark (udev-worker)[880]: controlC1:
/usr/lib/udev/rules.d/78-sound-card.rules:5 Failed to write
ATTR{/sys/devices/pci0000:00/0000:00:03.1/0000:0e:00.0/0000:0f:00.0/0000:10:00.1/sound/card1/controlC1/../uevent},
ignoring: No such file or directory
Apr 26 14:49:12 dark driverctl[1940]: /usr/sbin/driverctl: line 72:
/sys//devices/pci0000:00/0000:00:03.1/0000:0e:00.0/0000:0f:00.0/0000:10:00.0/driver/unbind:
Permission denied
Apr 26 14:49:12 dark driverctl[1940]: driverctl: unbinding 0000:10:00.0 failed
Apr 26 14:49:12 dark kernel: amdgpu 0000:10:00.0: amdgpu: amdgpu: finishing
device.
Apr 26 14:49:12 dark kernel: [drm] amdgpu: ttm finalized
Apr 26 14:49:12 dark kernel: vfio-pci 0000:10:00.0: vgaarb: VGA decodes
changed: olddecodes=io+mem,decodes=io+mem:owns=none
Apr 26 14:49:06 dark systemd[1]: Starting qemu-kvm.service - QEMU KVM
preparation - module, ksm, hugepages...
Apr 26 14:49:06 dark systemd[1]: Finished qemu-kvm.service - QEMU KVM
preparation - module, ksm, hugepages.
Apr 26 14:49:06 dark systemd[1]: Finished driverctl@pci-0000:10:00.1.service -
Load the driverctl override for pci-0000:10:00.1.
Apr 26 14:49:09 dark systemd[1]: Starting driverctl@pci-0000:10:00.0.service -
Load the driverctl override for pci-0000:10:00.0...
Apr 26 14:49:09 dark systemd[1]: driverctl@pci-0000:10:00.0.service: Main
process exited, code=exited, status=1/FAILURE
Apr 26 14:49:09 dark systemd[1]: driverctl@pci-0000:10:00.0.service: Failed
with result 'exit-code'.
Apr 26 14:49:09 dark systemd[1]: Failed to start
driverctl@pci-0000:10:00.0.service - Load the driverctl override for
pci-0000:10:00.0.
Apr 26 14:49:12 dark systemd[1]: Starting driverctl@pci-0000:10:00.0.service -
Load the driverctl override for pci-0000:10:00.0...
Apr 26 14:49:12 dark systemd[1]: Finished driverctl@pci-0000:10:00.0.service -
Load the driverctl override for pci-0000:10:00.0.
> lspci -nn
...
10:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc.
[AMD/ATI] Navi 21 [Radeon RX 6800/6800 XT / 6900 XT] [1002:73bf] (rev c3)
(prog-if 00 [VGA controller])
Subsystem: Sapphire Technology Limited Pulse Radeon RX 6800 [1da2:e437]
Flags: bus master, fast devsel, latency 0, IRQ 124, IOMMU group 34
Memory at 1400000000 (64-bit, prefetchable) [size=16G]
Memory at 1200000000 (64-bit, prefetchable) [size=2M]
I/O ports at e000 [size=256]
Memory at fcc00000 (32-bit, non-prefetchable) [size=1M]
Expansion ROM at fcd00000 [disabled] [size=128K]
Capabilities: <access denied>
Kernel driver in use: vfio-pci
Kernel modules: amdgpu
10:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21/23
HDMI/DP Audio Controller [1002:ab28]
Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21/23 HDMI/DP
Audio Controller [1002:ab28]
Flags: bus master, fast devsel, latency 0, IRQ 125, IOMMU group 35
Memory at fcd20000 (32-bit, non-prefetchable) [size=16K]
Capabilities: <access denied>
Kernel driver in use: vfio-pci
Kernel modules: snd_hda_intel
...
...
12:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Matisse USB
3.0 Host Controller [1022:149c] (prog-if 30 [XHCI])
Subsystem: Gigabyte Technology Co., Ltd Matisse USB 3.0 Host Controller
[1458:5007]
Flags: bus master, fast devsel, latency 0, IRQ 122, IOMMU group 39
Memory at fc900000 (64-bit, non-prefetchable) [size=1M]
Capabilities: <access denied>
Kernel driver in use: vfio-pci
Kernel modules: xhci_pci
There's still an issue but I can't confirm its related to the firmware
update.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2107285
Title:
KVM VM with GPU passthrough won't start
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-firmware/+bug/2107285/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs