Public bug reported: [ Impact ]
On navi3x on 6.14 generic kernel, the amddgpu drivers fails to load on navi3x systems due to errors in PSP firmware loading, leading to the following error messages: [ 627.871752] amdgpu 0000:03:00.0: amdgpu: PSP load kdb failed! [ 628.056253] [drm:psp_v13_0_ring_destroy [amdgpu]] *ERROR* Fail to stop psp ring [ 628.056543] amdgpu 0000:03:00.0: amdgpu: PSP firmware loading failed [ 628.056546] [drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* hw_init of IP block <psp> failed -22 [ 628.056777] amdgpu 0000:03:00.0: amdgpu: amdgpu_device_ip_init failed [ 628.056779] amdgpu 0000:03:00.0: amdgpu: Fatal error during GPU init [ 628.056781] amdgpu 0000:03:00.0: amdgpu: amdgpu: finishing device. [ 628.056866] ------------[ cut here ]------------ [ 628.056867] WARNING: CPU: 8 PID: 2133 at drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:631 amdgpu_irq_put+0x9c/0xb0 [amdgpu] [ 628.057093] Modules linked in: amdgpu(+) amdxcp gpu_sched drm_panel_backlight_quirks drm_buddy drm_ttm_helper ttm drm_exec drm_suballoc_helper drm_display_helper cec rc_core i2c_algo_bit nls_iso8859_1 xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables br_netfilter bridge stp llc overlay qrtr binfmt_misc snd_sof_pci_intel_tgl snd_sof_pci_intel_cnl snd_sof_intel_hda_generic soundwire_intel soundwire_cadence snd_sof_intel_hda_common snd_soc_hdac_hda snd_sof_intel_hda_mlink snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils snd_soc_acpi_intel_match snd_soc_acpi_intel_sdca_quirks soundwire_generic_allocation snd_soc_acpi soundwire_bus snd_soc_sdca intel_rapl_msr snd_soc_avs intel_rapl_common snd_soc_hda_codec intel_uncore_frequency intel_uncore_frequency_common snd_hda_ext_core intel_tcc_cooling snd_soc_core snd_hda_codec_hdmi x86_pkg_temp_thermal intel_ powerclamp snd_compress ac97_bus [ 628.057124] snd_pcm_dmaengine coretemp snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi kvm_intel snd_usb_audio snd_hda_codec snd_hda_core snd_usbmidi_lib snd_hwdep snd_seq_midi kvm snd_ump snd_seq_midi_event snd_rawmidi snd_pcm irqbypass polyval_clmulni mc cmdlinepart polyval_generic snd_seq ghash_clmulni_intel sha256_ssse3 spi_nor sha1_ssse3 snd_seq_device aesni_intel snd_timer mei_hdcp spd5118 mtd mei_pxp crypto_simd cryptd mfd_aaeon rapl asus_nb_wmi eeepc_wmi asus_wmi snd i2c_i801 sparse_keymap intel_cstate mei_me platform_profile wmi_bmof i2c_smbus spi_intel_pci i2c_mux soundcore spi_intel mei intel_pmc_core pmt_telemetry pmt_class intel_vsec acpi_pad acpi_tad mac_hid sch_fq_codel msr parport_pc ppdev lp parport efi_pstore nfnetlink dmi_sysfs ip_tables x_tables autofs4 hid_generic usbhid hid nvme nvme_core igc nvme_auth ahci intel_lpss_pci intel_lpss libahci idma64 vmd ucsi_acpi typec_ucsi typec video pinctrl_alderlake wmi [ Fix ] Update the following firmware to the latest: - DMCUB: amdgpu/dcn_3_2_0_dmcub.bin - GC: amdgpu/gc_11_0_0_me.bin amdgpu/gc_11_0_0_mec.bin amdgpu/gc_11_0_0_mes1.bin amdgpu/gc_11_0_0_mes_2.bin amdgpu/gc_11_0_0_pfp.bin - PSP: amdgpu/psp_13_0_0_sos.bin amdgpu/psp_13_0_0_ta.bin - SDMA: amdgpu/sdma_6_0_0.bin - SMU: amdgpu/smu_13_0_0.bin - VCN: amdgpu/vcn_4_0_0.bin [ Test ] On a navi3x system, boot into graphic environment, SSH into the DUT, and repeatedly load and unload module ~20 times: 1. sudo modprobe amdgpu 2. sudo modprobe -r amdgpu [ Where the problem could occur ] This should impact only the GPUs with those versions of IP blocks. [ Other Information ] Relevant upstream commits: - DMCUB https://gitlab.com/kernel-firmware/linux-firmware/-/commit/a26e413e7481d12ab5a53f77e0cdde2d5be937d8 - GC https://gitlab.com/kernel-firmware/linux-firmware/-/commit/7dea59d23b921a7218d2ca63167bf87ac160827b - PSP https://gitlab.com/kernel-firmware/linux-firmware/-/commit/154e8d1559f63e59ca18548450ce7ddb7943bf8d - SDMA https://gitlab.com/kernel-firmware/linux-firmware/-/commit/c7beb200e2f8f411f7aef78302f50afcece708a5 - SMU https://gitlab.com/kernel-firmware/linux-firmware/-/commit/dfa8be4ec1bb839ecc5145f6386d0b6ad1cc36f7 - VCN https://gitlab.com/kernel-firmware/linux-firmware/-/commit/85014781be88a15ae7f5fb60f736e697b87bcff6 ** Affects: linux-firmware (Ubuntu) Importance: Undecided Assignee: Leo Lin (0xff07) Status: New ** Tags: originate-from-2122662 ** Changed in: linux-firmware (Ubuntu) Assignee: (unassigned) => Leo Lin (0xff07) ** Tags added: originate-from-2122662 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-firmware in Ubuntu. https://bugs.launchpad.net/bugs/2125139 Title: [SRU] Fix amdgpu loading errors on Navi3x systems Status in linux-firmware package in Ubuntu: New Bug description: [ Impact ] On navi3x on 6.14 generic kernel, the amddgpu drivers fails to load on navi3x systems due to errors in PSP firmware loading, leading to the following error messages: [ 627.871752] amdgpu 0000:03:00.0: amdgpu: PSP load kdb failed! [ 628.056253] [drm:psp_v13_0_ring_destroy [amdgpu]] *ERROR* Fail to stop psp ring [ 628.056543] amdgpu 0000:03:00.0: amdgpu: PSP firmware loading failed [ 628.056546] [drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* hw_init of IP block <psp> failed -22 [ 628.056777] amdgpu 0000:03:00.0: amdgpu: amdgpu_device_ip_init failed [ 628.056779] amdgpu 0000:03:00.0: amdgpu: Fatal error during GPU init [ 628.056781] amdgpu 0000:03:00.0: amdgpu: amdgpu: finishing device. [ 628.056866] ------------[ cut here ]------------ [ 628.056867] WARNING: CPU: 8 PID: 2133 at drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:631 amdgpu_irq_put+0x9c/0xb0 [amdgpu] [ 628.057093] Modules linked in: amdgpu(+) amdxcp gpu_sched drm_panel_backlight_quirks drm_buddy drm_ttm_helper ttm drm_exec drm_suballoc_helper drm_display_helper cec rc_core i2c_algo_bit nls_iso8859_1 xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables br_netfilter bridge stp llc overlay qrtr binfmt_misc snd_sof_pci_intel_tgl snd_sof_pci_intel_cnl snd_sof_intel_hda_generic soundwire_intel soundwire_cadence snd_sof_intel_hda_common snd_soc_hdac_hda snd_sof_intel_hda_mlink snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils snd_soc_acpi_intel_match snd_soc_acpi_intel_sdca_quirks soundwire_generic_allocation snd_soc_acpi soundwire_bus snd_soc_sdca intel_rapl_msr snd_soc_avs intel_rapl_common snd_soc_hda_codec intel_uncore_frequency intel_uncore_frequency_common snd_hda_ext_core intel_tcc_cooling snd_soc_core snd_hda_codec_hdmi x86_pkg_temp_thermal inte l_powerclamp snd_compress ac97_bus [ 628.057124] snd_pcm_dmaengine coretemp snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi kvm_intel snd_usb_audio snd_hda_codec snd_hda_core snd_usbmidi_lib snd_hwdep snd_seq_midi kvm snd_ump snd_seq_midi_event snd_rawmidi snd_pcm irqbypass polyval_clmulni mc cmdlinepart polyval_generic snd_seq ghash_clmulni_intel sha256_ssse3 spi_nor sha1_ssse3 snd_seq_device aesni_intel snd_timer mei_hdcp spd5118 mtd mei_pxp crypto_simd cryptd mfd_aaeon rapl asus_nb_wmi eeepc_wmi asus_wmi snd i2c_i801 sparse_keymap intel_cstate mei_me platform_profile wmi_bmof i2c_smbus spi_intel_pci i2c_mux soundcore spi_intel mei intel_pmc_core pmt_telemetry pmt_class intel_vsec acpi_pad acpi_tad mac_hid sch_fq_codel msr parport_pc ppdev lp parport efi_pstore nfnetlink dmi_sysfs ip_tables x_tables autofs4 hid_generic usbhid hid nvme nvme_core igc nvme_auth ahci intel_lpss_pci intel_lpss libahci idma64 vmd ucsi_acpi typec_ucsi typec video pinctrl_alderlake wmi [ Fix ] Update the following firmware to the latest: - DMCUB: amdgpu/dcn_3_2_0_dmcub.bin - GC: amdgpu/gc_11_0_0_me.bin amdgpu/gc_11_0_0_mec.bin amdgpu/gc_11_0_0_mes1.bin amdgpu/gc_11_0_0_mes_2.bin amdgpu/gc_11_0_0_pfp.bin - PSP: amdgpu/psp_13_0_0_sos.bin amdgpu/psp_13_0_0_ta.bin - SDMA: amdgpu/sdma_6_0_0.bin - SMU: amdgpu/smu_13_0_0.bin - VCN: amdgpu/vcn_4_0_0.bin [ Test ] On a navi3x system, boot into graphic environment, SSH into the DUT, and repeatedly load and unload module ~20 times: 1. sudo modprobe amdgpu 2. sudo modprobe -r amdgpu [ Where the problem could occur ] This should impact only the GPUs with those versions of IP blocks. [ Other Information ] Relevant upstream commits: - DMCUB https://gitlab.com/kernel-firmware/linux-firmware/-/commit/a26e413e7481d12ab5a53f77e0cdde2d5be937d8 - GC https://gitlab.com/kernel-firmware/linux-firmware/-/commit/7dea59d23b921a7218d2ca63167bf87ac160827b - PSP https://gitlab.com/kernel-firmware/linux-firmware/-/commit/154e8d1559f63e59ca18548450ce7ddb7943bf8d - SDMA https://gitlab.com/kernel-firmware/linux-firmware/-/commit/c7beb200e2f8f411f7aef78302f50afcece708a5 - SMU https://gitlab.com/kernel-firmware/linux-firmware/-/commit/dfa8be4ec1bb839ecc5145f6386d0b6ad1cc36f7 - VCN https://gitlab.com/kernel-firmware/linux-firmware/-/commit/85014781be88a15ae7f5fb60f736e697b87bcff6 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-firmware/+bug/2125139/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : [email protected] Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp

