I'm not sure if the problem is related to the amdgpu driver now. After
reverting my changes back to the most recent firmware I ran the apport-
collect command and it failed, hanging at the lspci command. I rebooted
and retried apport-collect, which succeeded (they're the files posted
above.) Using the current firmware amdgpu drivers wasn't the actual
problem because the lspci command worked with them and I was able to run
a VM with GPU passthrough as well (the logs posted above from apport-
collect may not be that valuable, since everything was working on that
boot.) It must be an intermittent issue that I first noticed on
2025-04-12. I've reviewed my logs for each boot and thought the issue
was related to timing, with the GPU on PCI 10:00.0 being initialized
before the driverctl command applying the vfio-pci driver, but on the
most recent reboot I saw the amdgpu driver initialize the GPU, then the
driverctl replace it but actually logged that it failed (which I've
never seen before when reviewing 30 boots) yet the lspci command
succeeds and the VM with GPU passthrough works. Here's an example of
what I thought was the issue in the logs:

Apr 26 14:49:12 dark kernel: [drm] Initialized amdgpu 3.59.0 for 0000:10:00.0 
on minor 2
Apr 26 14:49:12 dark kernel: amdgpu 0000:10:00.0: [drm] fb1: amdgpudrmfb frame 
buffer device
Apr 26 14:49:04 dark systemd[1]: Starting driverctl@pci-0000:10:00.1.service - 
Load the driverctl override for pci-0000:10:00.1...
Apr 26 14:49:04 dark (udev-worker)[880]: controlC1: 
/usr/lib/udev/rules.d/78-sound-card.rules:5 Failed to write 
ATTR{/sys/devices/pci0000:00/0000:00:03.1/0000:0e:00.0/0000:0f:00.0/0000:10:00.1/sound/card1/controlC1/../uevent},
 ignoring: No such file or directory
Apr 26 14:49:12 dark driverctl[1940]: /usr/sbin/driverctl: line 72: 
/sys//devices/pci0000:00/0000:00:03.1/0000:0e:00.0/0000:0f:00.0/0000:10:00.0/driver/unbind:
 Permission denied
Apr 26 14:49:12 dark driverctl[1940]: driverctl: unbinding 0000:10:00.0 failed
Apr 26 14:49:12 dark kernel: amdgpu 0000:10:00.0: amdgpu: amdgpu: finishing 
device.
Apr 26 14:49:12 dark kernel: [drm] amdgpu: ttm finalized
Apr 26 14:49:12 dark kernel: vfio-pci 0000:10:00.0: vgaarb: VGA decodes 
changed: olddecodes=io+mem,decodes=io+mem:owns=none
Apr 26 14:49:06 dark systemd[1]: Starting qemu-kvm.service - QEMU KVM 
preparation - module, ksm, hugepages...
Apr 26 14:49:06 dark systemd[1]: Finished qemu-kvm.service - QEMU KVM 
preparation - module, ksm, hugepages.
Apr 26 14:49:06 dark systemd[1]: Finished driverctl@pci-0000:10:00.1.service - 
Load the driverctl override for pci-0000:10:00.1.
Apr 26 14:49:09 dark systemd[1]: Starting driverctl@pci-0000:10:00.0.service - 
Load the driverctl override for pci-0000:10:00.0...
Apr 26 14:49:09 dark systemd[1]: driverctl@pci-0000:10:00.0.service: Main 
process exited, code=exited, status=1/FAILURE
Apr 26 14:49:09 dark systemd[1]: driverctl@pci-0000:10:00.0.service: Failed 
with result 'exit-code'.
Apr 26 14:49:09 dark systemd[1]: Failed to start 
driverctl@pci-0000:10:00.0.service - Load the driverctl override for 
pci-0000:10:00.0.
Apr 26 14:49:12 dark systemd[1]: Starting driverctl@pci-0000:10:00.0.service - 
Load the driverctl override for pci-0000:10:00.0...
Apr 26 14:49:12 dark systemd[1]: Finished driverctl@pci-0000:10:00.0.service - 
Load the driverctl override for pci-0000:10:00.0.


> lspci -nn
...
10:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. 
[AMD/ATI] Navi 21 [Radeon RX 6800/6800 XT / 6900 XT] [1002:73bf] (rev c3) 
(prog-if 00 [VGA controller])
        Subsystem: Sapphire Technology Limited Pulse Radeon RX 6800 [1da2:e437]
        Flags: bus master, fast devsel, latency 0, IRQ 124, IOMMU group 34
        Memory at 1400000000 (64-bit, prefetchable) [size=16G]
        Memory at 1200000000 (64-bit, prefetchable) [size=2M]
        I/O ports at e000 [size=256]
        Memory at fcc00000 (32-bit, non-prefetchable) [size=1M]
        Expansion ROM at fcd00000 [disabled] [size=128K]
        Capabilities: <access denied>
        Kernel driver in use: vfio-pci
        Kernel modules: amdgpu

10:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21/23 
HDMI/DP Audio Controller [1002:ab28]
        Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21/23 HDMI/DP 
Audio Controller [1002:ab28]
        Flags: bus master, fast devsel, latency 0, IRQ 125, IOMMU group 35
        Memory at fcd20000 (32-bit, non-prefetchable) [size=16K]
        Capabilities: <access denied>
        Kernel driver in use: vfio-pci
        Kernel modules: snd_hda_intel
...
...
12:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Matisse USB 
3.0 Host Controller [1022:149c] (prog-if 30 [XHCI])
        Subsystem: Gigabyte Technology Co., Ltd Matisse USB 3.0 Host Controller 
[1458:5007]
        Flags: bus master, fast devsel, latency 0, IRQ 122, IOMMU group 39
        Memory at fc900000 (64-bit, non-prefetchable) [size=1M]
        Capabilities: <access denied>
        Kernel driver in use: vfio-pci
        Kernel modules: xhci_pci

There's still an issue but I can't confirm its related to the firmware
update.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-firmware in Ubuntu.
https://bugs.launchpad.net/bugs/2107285

Title:
  KVM VM with GPU passthrough won't start

Status in linux-firmware package in Ubuntu:
  New

Bug description:
  Host OS:
  Ubuntu 24.04.2 LTS
  Kernel 6.11.0-21-generic
  CPU: AMD Ryzen 9 5900X
  Software Firmware version: F2
  GPU 1: AMD Radeon RX 6400 (Used by Host OS)
  GPU 2: AMD Radeon RX 6800 (Used by VMs via GPU passthrough, on PCI bus 
10:00.0)

  $ apt-cache policy linux-firmware
  linux-firmware:
    Installed: 20240318.git3b128b60-0ubuntu2.11
    Candidate: 20240318.git3b128b60-0ubuntu2.11
    Version table:
   *** 20240318.git3b128b60-0ubuntu2.11 500
          500 http://us.archive.ubuntu.com/ubuntu noble-updates/main amd64 
Packages
          500 http://security.ubuntu.com/ubuntu noble-security/main amd64 
Packages
          100 /var/lib/dpkg/status
       20240318.git3b128b60-0ubuntu2 500
          500 http://us.archive.ubuntu.com/ubuntu noble/main amd64 Packages

  What should have happened:

  VM with GPU passthrough should start

  What happend instead:

  VM with GPU passthrough wouldn't start. I tried running 'lspci -nns
  0000:10:00.0' but this hung the terminal. Virtual Machine Manager was
  now showing it couldn't connect to the KVM daemon. I rebooted the Host
  OS but running 'lspci -nns 0000:10:00.0' again hung and I still
  couldn't start the VM with GPU passthrough.

  Extra info:

  After installing updates to the Host OS on 2025-4-10, VMs without GPU
  passthrough worked fine. On 2025-4-12 I tried to start a VM with GPU
  passthrough but it wouldn't start.

  On 2025-4-10 one of the Host OS updates was linux-firmware:amd64
  (20240318.git3b128b60-0ubuntu2.10 ->
  20240318.git3b128b60-0ubuntu2.11).

  I wanted to test downgrading the linux-firmware back to version 2.10
  but that is no longer available. I was able to find, from this
  launchpad, the files that were in the 2.10 and 2.11 versions of linux-
  firmware. I found the differences between the files for the amdgpu
  firmware files. I overwrote the /lib/firmware/amdgpu files on my host
  OS with the files from 2.10 and rebooted - the VM with GPU passthrough
  was able to start (and the lspci command worked.)

  The list of amdgpu firmware files I overwrote was:

  gc_11_5_1_imu.bin.zst
  gc_11_5_1_me.bin.zst
  gc_11_5_1_mec.bin.zst
  gc_11_5_1_mes1.bin.zst
  gc_11_5_1_mes_2.bin.zst
  gc_11_5_1_pfp.bin.zst
  gc_11_5_1_rlc.bin.zst
  isp_4_1_1.bin.zst
  psp_14_0_1_ta.bin.zst
  psp_14_0_1_toc.bin.zst
  sdma_6_1_1.bin.zst
  vcn_4_0_6_1.bin.zst
  vcn_4_0_6.bin.zst
  vpe_6_1_1.bin.zst
  --- 
  ProblemType: Bug
  ApportVersion: 2.28.1-0ubuntu3.5
  Architecture: amd64
  CRDA: N/A
  CasperMD5CheckResult: pass
  CurrentDesktop: ubuntu:GNOME
  Dependencies: firmware-sof-signed 2023.12.1-1ubuntu1.4
  DistroRelease: Ubuntu 24.04
  InstallationDate: Installed on 2024-06-01 (326 days ago)
  InstallationMedia: Ubuntu 24.04 LTS "Noble Numbat" - Release amd64 (20240424)
  MachineType: Gigabyte Technology Co., Ltd. X570S AORUS PRO AX
  Package: linux-firmware 20240318.git3b128b60-0ubuntu2.11
  PackageArchitecture: amd64
  ProcEnviron:
   LANG=en_US.UTF-8
   PATH=(custom, no user)
   SHELL=/bin/bash
   TERM=xterm-256color
   XDG_RUNTIME_DIR=<set>
  ProcFB: 0 amdgpudrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-6.11.0-21-generic 
root=/dev/mapper/ubuntu--vg-ubuntu--lv ro quiet splash amd_iommu=on iommu=pt 
vt.handoff=7
  ProcVersionSignature: Ubuntu 6.11.0-21.21~24.04.1-generic 6.11.11
  RelatedPackageVersions:
   linux-restricted-modules-6.11.0-21-generic N/A
   linux-backports-modules-6.11.0-21-generic  N/A
   linux-firmware                             20240318.git3b128b60-0ubuntu2.11
  Tags: noble wayland-session
  Uname: Linux 6.11.0-21-generic x86_64
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups: adm cdrom dip kvm libvirt libvirt-dnsmasq lpadmin plugdev storage 
sudo users
  _MarkForUpload: True
  dmi.bios.date: 07/08/2021
  dmi.bios.release: 5.17
  dmi.bios.vendor: American Megatrends International, LLC.
  dmi.bios.version: F2
  dmi.board.asset.tag: Default string
  dmi.board.name: X570S AORUS PRO AX
  dmi.board.vendor: Gigabyte Technology Co., Ltd.
  dmi.board.version: x.x
  dmi.chassis.asset.tag: Default string
  dmi.chassis.type: 3
  dmi.chassis.vendor: Default string
  dmi.chassis.version: Default string
  dmi.modalias: 
dmi:bvnAmericanMegatrendsInternational,LLC.:bvrF2:bd07/08/2021:br5.17:svnGigabyteTechnologyCo.,Ltd.:pnX570SAORUSPROAX:pvr-CF:rvnGigabyteTechnologyCo.,Ltd.:rnX570SAORUSPROAX:rvrx.x:cvnDefaultstring:ct3:cvrDefaultstring:skuDefaultstring:
  dmi.product.family: X570 MB
  dmi.product.name: X570S AORUS PRO AX
  dmi.product.sku: Default string
  dmi.product.version: -CF
  dmi.sys.vendor: Gigabyte Technology Co., Ltd.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-firmware/+bug/2107285/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to