Public bug reported:

[ Impact ]

On navi3x on 6.14 generic kernel, the amddgpu drivers fails to load on
navi3x systems due to errors in PSP firmware loading, leading to the
following error messages:

[  627.871752] amdgpu 0000:03:00.0: amdgpu: PSP load kdb failed!
[  628.056253] [drm:psp_v13_0_ring_destroy [amdgpu]] *ERROR* Fail to stop psp 
ring
[  628.056543] amdgpu 0000:03:00.0: amdgpu: PSP firmware loading failed
[  628.056546] [drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* hw_init of IP 
block <psp> failed -22
[  628.056777] amdgpu 0000:03:00.0: amdgpu: amdgpu_device_ip_init failed
[  628.056779] amdgpu 0000:03:00.0: amdgpu: Fatal error during GPU init
[  628.056781] amdgpu 0000:03:00.0: amdgpu: amdgpu: finishing device.
[  628.056866] ------------[ cut here ]------------
[  628.056867] WARNING: CPU: 8 PID: 2133 at 
drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:631 amdgpu_irq_put+0x9c/0xb0 [amdgpu]
[  628.057093] Modules linked in: amdgpu(+) amdxcp gpu_sched 
drm_panel_backlight_quirks drm_buddy drm_ttm_helper ttm drm_exec 
drm_suballoc_helper drm_display_helper cec rc_core i2c_algo_bit nls_iso8859_1 
xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink 
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype 
nft_compat nf_tables br_netfilter bridge stp llc overlay qrtr binfmt_misc 
snd_sof_pci_intel_tgl snd_sof_pci_intel_cnl snd_sof_intel_hda_generic 
soundwire_intel soundwire_cadence snd_sof_intel_hda_common snd_soc_hdac_hda 
snd_sof_intel_hda_mlink snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp 
snd_sof snd_sof_utils snd_soc_acpi_intel_match snd_soc_acpi_intel_sdca_quirks 
soundwire_generic_allocation snd_soc_acpi soundwire_bus snd_soc_sdca 
intel_rapl_msr snd_soc_avs intel_rapl_common snd_soc_hda_codec 
intel_uncore_frequency intel_uncore_frequency_common snd_hda_ext_core 
intel_tcc_cooling snd_soc_core snd_hda_codec_hdmi x86_pkg_temp_thermal intel_
 powerclamp snd_compress ac97_bus
[  628.057124]  snd_pcm_dmaengine coretemp snd_hda_intel snd_intel_dspcfg 
snd_intel_sdw_acpi kvm_intel snd_usb_audio snd_hda_codec snd_hda_core 
snd_usbmidi_lib snd_hwdep snd_seq_midi kvm snd_ump snd_seq_midi_event 
snd_rawmidi snd_pcm irqbypass polyval_clmulni mc cmdlinepart polyval_generic 
snd_seq ghash_clmulni_intel sha256_ssse3 spi_nor sha1_ssse3 snd_seq_device 
aesni_intel snd_timer mei_hdcp spd5118 mtd mei_pxp crypto_simd cryptd mfd_aaeon 
rapl asus_nb_wmi eeepc_wmi asus_wmi snd i2c_i801 sparse_keymap intel_cstate 
mei_me platform_profile wmi_bmof i2c_smbus spi_intel_pci i2c_mux soundcore 
spi_intel mei intel_pmc_core pmt_telemetry pmt_class intel_vsec acpi_pad 
acpi_tad mac_hid sch_fq_codel msr parport_pc ppdev lp parport efi_pstore 
nfnetlink dmi_sysfs ip_tables x_tables autofs4 hid_generic usbhid hid nvme 
nvme_core igc nvme_auth ahci intel_lpss_pci intel_lpss libahci idma64 vmd 
ucsi_acpi typec_ucsi typec video pinctrl_alderlake wmi

[ Fix ]

Update the following firmware to the latest:

- DMCUB:
amdgpu/dcn_3_2_0_dmcub.bin

- GC:
amdgpu/gc_11_0_0_me.bin
amdgpu/gc_11_0_0_mec.bin
amdgpu/gc_11_0_0_mes1.bin
amdgpu/gc_11_0_0_mes_2.bin
amdgpu/gc_11_0_0_pfp.bin

- PSP:
amdgpu/psp_13_0_0_sos.bin
amdgpu/psp_13_0_0_ta.bin

- SDMA:
amdgpu/sdma_6_0_0.bin

- SMU:
amdgpu/smu_13_0_0.bin

- VCN:
amdgpu/vcn_4_0_0.bin

[ Test ]

On a navi3x system, boot into graphic environment, SSH into the DUT, and
repeatedly load and unload module ~20 times:

1. sudo modprobe amdgpu
2. sudo modprobe -r amdgpu

[ Where the problem could occur ]

This should impact only the GPUs with those versions of IP blocks.

[ Other Information ]

Relevant upstream commits:

- DMCUB
https://gitlab.com/kernel-firmware/linux-firmware/-/commit/a26e413e7481d12ab5a53f77e0cdde2d5be937d8

- GC
https://gitlab.com/kernel-firmware/linux-firmware/-/commit/7dea59d23b921a7218d2ca63167bf87ac160827b

- PSP
https://gitlab.com/kernel-firmware/linux-firmware/-/commit/154e8d1559f63e59ca18548450ce7ddb7943bf8d

- SDMA
https://gitlab.com/kernel-firmware/linux-firmware/-/commit/c7beb200e2f8f411f7aef78302f50afcece708a5

- SMU
https://gitlab.com/kernel-firmware/linux-firmware/-/commit/dfa8be4ec1bb839ecc5145f6386d0b6ad1cc36f7

- VCN
https://gitlab.com/kernel-firmware/linux-firmware/-/commit/85014781be88a15ae7f5fb60f736e697b87bcff6

** Affects: linux-firmware (Ubuntu)
     Importance: Undecided
     Assignee: Leo Lin (0xff07)
         Status: New


** Tags: originate-from-2122662

** Changed in: linux-firmware (Ubuntu)
     Assignee: (unassigned) => Leo Lin (0xff07)

** Tags added: originate-from-2122662

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-firmware in Ubuntu.
https://bugs.launchpad.net/bugs/2125139

Title:
  [SRU] Fix amdgpu loading errors on Navi3x systems

Status in linux-firmware package in Ubuntu:
  New

Bug description:
  [ Impact ]

  On navi3x on 6.14 generic kernel, the amddgpu drivers fails to load on
  navi3x systems due to errors in PSP firmware loading, leading to the
  following error messages:

  [  627.871752] amdgpu 0000:03:00.0: amdgpu: PSP load kdb failed!
  [  628.056253] [drm:psp_v13_0_ring_destroy [amdgpu]] *ERROR* Fail to stop psp 
ring
  [  628.056543] amdgpu 0000:03:00.0: amdgpu: PSP firmware loading failed
  [  628.056546] [drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* hw_init of IP 
block <psp> failed -22
  [  628.056777] amdgpu 0000:03:00.0: amdgpu: amdgpu_device_ip_init failed
  [  628.056779] amdgpu 0000:03:00.0: amdgpu: Fatal error during GPU init
  [  628.056781] amdgpu 0000:03:00.0: amdgpu: amdgpu: finishing device.
  [  628.056866] ------------[ cut here ]------------
  [  628.056867] WARNING: CPU: 8 PID: 2133 at 
drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:631 amdgpu_irq_put+0x9c/0xb0 [amdgpu]
  [  628.057093] Modules linked in: amdgpu(+) amdxcp gpu_sched 
drm_panel_backlight_quirks drm_buddy drm_ttm_helper ttm drm_exec 
drm_suballoc_helper drm_display_helper cec rc_core i2c_algo_bit nls_iso8859_1 
xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink 
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype 
nft_compat nf_tables br_netfilter bridge stp llc overlay qrtr binfmt_misc 
snd_sof_pci_intel_tgl snd_sof_pci_intel_cnl snd_sof_intel_hda_generic 
soundwire_intel soundwire_cadence snd_sof_intel_hda_common snd_soc_hdac_hda 
snd_sof_intel_hda_mlink snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp 
snd_sof snd_sof_utils snd_soc_acpi_intel_match snd_soc_acpi_intel_sdca_quirks 
soundwire_generic_allocation snd_soc_acpi soundwire_bus snd_soc_sdca 
intel_rapl_msr snd_soc_avs intel_rapl_common snd_soc_hda_codec 
intel_uncore_frequency intel_uncore_frequency_common snd_hda_ext_core 
intel_tcc_cooling snd_soc_core snd_hda_codec_hdmi x86_pkg_temp_thermal inte
 l_powerclamp snd_compress ac97_bus
  [  628.057124]  snd_pcm_dmaengine coretemp snd_hda_intel snd_intel_dspcfg 
snd_intel_sdw_acpi kvm_intel snd_usb_audio snd_hda_codec snd_hda_core 
snd_usbmidi_lib snd_hwdep snd_seq_midi kvm snd_ump snd_seq_midi_event 
snd_rawmidi snd_pcm irqbypass polyval_clmulni mc cmdlinepart polyval_generic 
snd_seq ghash_clmulni_intel sha256_ssse3 spi_nor sha1_ssse3 snd_seq_device 
aesni_intel snd_timer mei_hdcp spd5118 mtd mei_pxp crypto_simd cryptd mfd_aaeon 
rapl asus_nb_wmi eeepc_wmi asus_wmi snd i2c_i801 sparse_keymap intel_cstate 
mei_me platform_profile wmi_bmof i2c_smbus spi_intel_pci i2c_mux soundcore 
spi_intel mei intel_pmc_core pmt_telemetry pmt_class intel_vsec acpi_pad 
acpi_tad mac_hid sch_fq_codel msr parport_pc ppdev lp parport efi_pstore 
nfnetlink dmi_sysfs ip_tables x_tables autofs4 hid_generic usbhid hid nvme 
nvme_core igc nvme_auth ahci intel_lpss_pci intel_lpss libahci idma64 vmd 
ucsi_acpi typec_ucsi typec video pinctrl_alderlake wmi

  [ Fix ]

  Update the following firmware to the latest:

  - DMCUB:
  amdgpu/dcn_3_2_0_dmcub.bin

  - GC:
  amdgpu/gc_11_0_0_me.bin
  amdgpu/gc_11_0_0_mec.bin
  amdgpu/gc_11_0_0_mes1.bin
  amdgpu/gc_11_0_0_mes_2.bin
  amdgpu/gc_11_0_0_pfp.bin

  - PSP:
  amdgpu/psp_13_0_0_sos.bin
  amdgpu/psp_13_0_0_ta.bin

  - SDMA:
  amdgpu/sdma_6_0_0.bin

  - SMU:
  amdgpu/smu_13_0_0.bin

  - VCN:
  amdgpu/vcn_4_0_0.bin

  [ Test ]

  On a navi3x system, boot into graphic environment, SSH into the DUT,
  and repeatedly load and unload module ~20 times:

  1. sudo modprobe amdgpu
  2. sudo modprobe -r amdgpu

  [ Where the problem could occur ]

  This should impact only the GPUs with those versions of IP blocks.

  [ Other Information ]

  Relevant upstream commits:

  - DMCUB
  
https://gitlab.com/kernel-firmware/linux-firmware/-/commit/a26e413e7481d12ab5a53f77e0cdde2d5be937d8

  - GC
  
https://gitlab.com/kernel-firmware/linux-firmware/-/commit/7dea59d23b921a7218d2ca63167bf87ac160827b

  - PSP
  
https://gitlab.com/kernel-firmware/linux-firmware/-/commit/154e8d1559f63e59ca18548450ce7ddb7943bf8d

  - SDMA
  
https://gitlab.com/kernel-firmware/linux-firmware/-/commit/c7beb200e2f8f411f7aef78302f50afcece708a5

  - SMU
  
https://gitlab.com/kernel-firmware/linux-firmware/-/commit/dfa8be4ec1bb839ecc5145f6386d0b6ad1cc36f7

  - VCN
  
https://gitlab.com/kernel-firmware/linux-firmware/-/commit/85014781be88a15ae7f5fb60f736e697b87bcff6

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-firmware/+bug/2125139/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to