On Tue, May 20, 2025 at 9:22 PM Mikhail Gavrilov <[email protected]> wrote: > > > Could you more details about your setup, and how you were able to repro it ? > >
Hi, Were you able to reproduce the issue? I’ve prepared a step-by-step guide that may help: 1. Set up a system with a Radeon 6900XT and an LG TV connected via HDMI. 2. Install Fedora Rawhide. 3. Build and install kernel 6.15-rc7 using my .config (attached in the first message). 4. Boot into the custom-built kernel. 5. Set the display resolution to 3840×2160 @ 120 Hz. (This step is optional but may help trigger the issue faster.) 6. Generate heavy system load. I use an infinite kernel rebuild loop: <fish shell> > for i in (seq 1 400000); make clean && make -j32 bzImage && make -j32 > modules; end </fish shell> Expected behavior: System remains stable during heavy load. Actual behavior: 1. First, the kernel log is filled with repeated messages: amdgpu 0000:03:00.0: amdgpu: [drm] DP AUX transfer fail:4 2. After a short while under load, more severe errors appear: amdgpu 0000:03:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data 3. Finally, the system completely freezes with a hard lockup: watchdog: CPU28: Watchdog detected hard LOCKUP on cpu 28 Modules linked in: uinput rfcomm snd_seq_dummy snd_hrtimer nft_queue nfnetlink_queue nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables qrtr bnep sunrpc binfmt_misc amd_atl intel_rapl_msr intel_rapl_common edac_mce_amd btusb snd_hda_codec_realtek btrtl mt7921e btintel mt7921_common snd_hda_codec_generic btbcm mt792x_lib btmtk snd_hda_scodec_component snd_hda_codec_hdmi snd_usb_audio mt76_connac_lib kvm_amd snd_hda_intel bluetooth mt76 snd_intel_dspcfg snd_usbmidi_lib snd_intel_sdw_acpi snd_hda_codec mc kvm spd5118 mac80211 snd_ump snd_hda_core snd_rawmidi snd_hwdep vfat irqbypass fat snd_seq snd_seq_device wmi_bmof libarc4 rapl r8169 pcspkr snd_pcm cfg80211 i2c_piix4 snd_timer k10temp i2c_smbus realtek snd rfkill joydev soundcore gpio_amdpt gpio_generic loop nfnetlink zram lz4hc_compress lz4_compress amdgpu amdxcp i2c_algo_bit drm_ttm_helper ttm drm_exec nvme polyval_clmulni gpu_sched polyval_generic ghash_clmulni_intel drm_suballoc_helper ucsi_ccg nvme_core sha512_ssse3 typec_ucsi drm_panel_backlight_quirks sha256_ssse3 drm_buddy nvme_keyring typec sha1_ssse3 sp5100_tco nvme_auth drm_display_helper cec video wmi fuse irq event stamp: 117172 hardirqs last enabled at (117171): [<ffffffff9e001566>] asm_common_interrupt+0x26/0x40 hardirqs last disabled at (117172): [<ffffffffa1c00f97>] irqentry_enter+0x57/0x60 softirqs last enabled at (117144): [<ffffffff9e614919>] handle_softirqs+0x579/0x840 softirqs last disabled at (117137): [<ffffffff9e614d16>] __irq_exit_rcu+0x126/0x240 CPU: 28 UID: 1000 PID: 1737394 Comm: as Tainted: G W L ------ --- 6.15.0-0.rc6.250515g088d13246a46.54.fc43.x86_64+debug #1 PREEMPT(lazy) Tainted: [W]=WARN, [L]=SOFTLOCKUP Hardware name: ASRock B650I Lightning WiFi/B650I Lightning WiFi, BIOS 3.08 09/18/2024 RIP: 0010:delay_halt_mwaitx+0x20/0x50 Code: 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 53 65 48 8b 05 56 13 0d 04 31 d2 48 89 d1 48 05 00 00 ca a5 0f 01 fa <b8> ff ff ff ff b9 02 00 00 00 48 39 c6 48 0f 47 f0 b8 f0 00 00 00 RSP: 0000:ffffc9003b68f820 EFLAGS: 00000087 RAX: ffff888fda610000 RBX: 000000000000118c RCX: 0000000000000000 RDX: 0000000000000000 RSI: 000000000000118c RDI: 000023b4c02956f6 RBP: 000023b4c02956f6 R08: ffffffffc14b01a9 R09: fffffbfff49570d4 R10: 000000000000001c R11: 0000000000002000 R12: ffffed1040583d43 R13: ffffed1040583d17 R14: 00000000000186a0 R15: ffff888202c1e800 FS: 00007f1da07bcd00(0000) GS:ffff889034970000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1da0a43930 CR3: 00000003e594a000 CR4: 0000000000f50ef0 PKRU: 55555554 Call Trace: <TASK> delay_halt.part.0+0x33/0x60 dmub_srv_wait_for_idle+0x12f/0x1d0 [amdgpu] dc_dmub_srv_cmd_run_list+0x99/0x2a0 [amdgpu] dc_dmub_srv_drr_update_cmd+0x158/0x340 [amdgpu] ? __lock_acquire+0x40f/0x1160 ? __pfx_dc_dmub_srv_drr_update_cmd+0x10/0x10 [amdgpu] ? lock_acquire.part.0+0xc8/0x270 ? local_clock_noinstr+0xf/0x130 optc1_set_drr+0x18b/0xf20 [amdgpu] ? rcu_is_watching+0x15/0xe0 set_drr_and_clear_adjust_pending+0xa6/0x180 [amdgpu] ? __lock_acquire+0x40f/0x1160 dcn10_set_drr+0x224/0x390 [amdgpu] ? __pfx_dcn10_set_drr+0x10/0x10 [amdgpu] ? local_clock+0x15/0x30 ? __lock_release.isra.0+0x1cb/0x340 ? rcu_is_watching+0x15/0xe0 dc_stream_adjust_vmin_vmax+0x4d9/0xd60 [amdgpu] ? __pfx_dc_stream_adjust_vmin_vmax+0x10/0x10 [amdgpu] ? dm_crtc_high_irq+0x4c8/0xb70 [amdgpu] ? __raw_spin_lock_irqsave+0x60/0x90 dm_crtc_high_irq+0x7b5/0xb70 [amdgpu] ? amdgpu_dm_irq_handler+0xf3/0x2a0 [amdgpu] amdgpu_dm_irq_handler+0x19a/0x2a0 [amdgpu] amdgpu_irq_dispatch+0x286/0x670 [amdgpu] ? find_held_lock+0x2b/0x80 ? __pfx_amdgpu_irq_dispatch+0x10/0x10 [amdgpu] ? __pfx___drm_dev_dbg+0x10/0x10 ? do_raw_spin_unlock+0x59/0x230 ? __wake_up+0x44/0x60 amdgpu_ih_process+0x1c4/0x3a0 [amdgpu] ? __pfx_amdgpu_irq_handler+0x10/0x10 [amdgpu] amdgpu_irq_handler+0x27/0xb0 [amdgpu] ? __pfx_amdgpu_irq_handler+0x10/0x10 [amdgpu] __handle_irq_event_percpu+0x1b5/0x510 handle_irq_event+0xab/0x1c0 handle_edge_irq+0x213/0xb50 __common_interrupt+0xad/0x1d0 ? irq_enter_rcu+0x26/0x190 common_interrupt+0x5a/0xe0 asm_common_interrupt+0x26/0x40 RIP: 0033:0x5639d9e740ee Code: 45 c8 85 d2 74 04 41 80 08 04 48 83 c4 58 4c 89 c8 5b 41 5c 41 5d 41 5e 41 5f 5d c3 48 8b 57 10 44 8b 15 fd bd 08 00 4c 03 0a <45> 85 d2 0f 84 33 ff ff ff 83 c8 04 4c 89 4f 20 88 07 4c 89 c8 c3 RSP: 002b:00007ffce2334e38 EFLAGS: 00000202 RAX: 0000000000000001 RBX: 00007f1d91572388 RCX: 0000000000000002 RDX: 00007f1d90e9c750 RSI: 0000000000000000 RDI: 00007f1d912e5d20 RBP: 00007ffce2334e40 R08: 00007f1d912e5d20 R09: 000000000000e119 R10: 0000000000000001 R11: 0000000000000002 R12: 00007f1d9157f008 R13: 0000000000000000 R14: 00005639d9e74f90 R15: 00007f1da06fe730 </TASK> INFO: NMI handler (perf_event_nmi_handler) took too long to run: 5.441 msecs Environment: GPU: AMD Radeon 6900XT Display: LG TV via HDMI Kernel: 6.15-rc7, built from source using provided config Distro: Fedora Rawhide Motherboard: ASRock B650I Lightning WiFi BIOS: 3.08 (2024-09-18) Additional diagnostic info: Full kernel log ending with stack trace from delay_halt_mwaitx() Series of dc_dmub_srv_drr_update_cmd() and dc_stream_adjust_vmin_vmax() calls in call trace System enters unrecoverable lock state after ~few minutes of heavy compilation -- Best Regards, Mike Gavrilov.
