Public bug reported: ## Summary
Framework Laptop 16 with AMD Phoenix APU (Radeon 780M, gfx1103, device 0x15bf) experiences intermittent MES (Micro Engine Scheduler) firmware timeouts over a period of hours, eventually culminating in a gfxhub page fault triggered by Chrome/Chromium GPU workloads. The subsequent ring reset fails, a MODE2 GPU reset is attempted and reports success, but the GPU never actually recovers — producing an endless stream of `wait_for_completion_timeout` errors. The display goes permanently black, requiring a hard reboot. Audio over USB continues to work, indicating the CPU is still running. This has been occurring for approximately 3 months across OEM kernel versions (6.14 and 6.17 series). ## Hardware - **Laptop:** Framework Laptop 16 - **APU (iGPU):** AMD Phoenix, Radeon 780M (PCI 0000:c4:00.0, device 0x15bf, DCN 3.1.4, gfx_v11_0) - **dGPU:** AMD Navi 33, Radeon RX 7700S (PCI 0000:03:00.0, device 0x7480, DCN 3.2.1) - **Display setup:** Internal panel + 2x external 4K monitors, one via Caldigit TBT4 Element Hub (Thunderbolt 4), one direct USB-C - **RAM:** Shared VRAM 8192M (iGPU), 8176M GDDR6 (dGPU) ## Software - **OS:** Ubuntu 24.04 LTS - **Kernel:** 6.17.0-1017-oem (linux-image-oem-24.04) - **Mesa:** 25.2.8-0ubuntu0.24.04.1 - **RADV:** Mesa 25.2.8 - **Desktop:** GNOME on Wayland - **Display Core:** v3.2.340 ## Crash Pattern The crash follows a consistent multi-stage pattern: ### Stage 1: Intermittent MES timeouts (hours before crash) The iGPU's MES firmware intermittently fails to respond. These appear throughout the session, hours before the fatal crash, and are not associated with any specific userspace workload: ``` amdgpu 0000:c4:00.0: amdgpu: MES failed to respond to msg=MISC (WAIT_REG_MEM) amdgpu 0000:c4:00.0: amdgpu: failed to reg_write_reg_wait ``` In the attached log, these occur at: 14:55, 15:27, 16:30, 21:02 on Apr 3, with the fatal crash not occurring until 10:44 on Apr 6 — suggesting a slowly degrading MES state. ### Stage 2: Page fault triggered by Chrome GPU workload ``` amdgpu 0000:c4:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:4 pasid:32773) amdgpu 0000:c4:00.0: amdgpu: Process chrome pid 5545 thread chrome:cs0 pid 5579 amdgpu 0000:c4:00.0: amdgpu: in page starting at address 0x000000003f800000 from client 10 amdgpu 0000:c4:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00401430 amdgpu 0000:c4:00.0: amdgpu: Faulty UTCL2 client ID: SQC (data) (0xa) amdgpu 0000:c4:00.0: amdgpu: PERMISSION_FAULTS: 0x3 amdgpu 0000:c4:00.0: amdgpu: MAPPING_ERROR: 0x0 ``` ### Stage 3: Ring timeout and failed ring reset ``` amdgpu 0000:c4:00.0: amdgpu: ring gfx_0.0.0 timeout, signaled seq=13348850, emitted seq=13348852 amdgpu 0000:c4:00.0: amdgpu: Starting gfx_0.0.0 ring reset amdgpu 0000:c4:00.0: amdgpu: Ring gfx_0.0.0 reset failed ``` ### Stage 4: GPU reset — reports success but fails to recover ``` amdgpu 0000:c4:00.0: amdgpu: GPU reset begin! amdgpu 0000:c4:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE amdgpu 0000:c4:00.0: amdgpu: failed to unmap legacy queue [drm:gfx_v11_0_cp_gfx_enable.isra.0 [amdgpu]] *ERROR* failed to halt cp gfx amdgpu 0000:c4:00.0: amdgpu: MODE2 reset amdgpu 0000:c4:00.0: amdgpu: GPU reset succeeded, trying to resume ``` ### Stage 5: GPU never actually recovers Despite reporting success, the GPU enters an unrecoverable state with repeating errors every ~10 seconds: ``` amdgpu 0000:c4:00.0: amdgpu: [drm] *ERROR* wait_for_completion_timeout timeout! ``` Display is permanently black. System requires hard reboot. ## Additional observations - The dGPU (0000:03:00.0) shows `SMU driver if version not matched` on every resume but does not exhibit MES failures. - Frequent `DMUB HPD IRQ callback: link_index=5` events (~every 2-10 minutes) suggest the Thunderbolt dock's DisplayPort tunnel is intermittently renegotiating. This may be contributing to MES firmware stress. - `gnome-shell: Cursor update failed: drmModeAtomicCommit: Invalid argument` appears occasionally throughout the session. - The crash occurs on the iGPU which drives the displays. The dGPU has no CRTC connected. - Occurs with both ANGLE-on-OpenGL and ANGLE-on-Vulkan Chrome configurations. - `amdgpu.dcdebugmask=0x410` kernel parameter did not resolve the issue. ## Attached files 1. `fw16-crash-log.txt` — Filtered journal from the crash boot (amdgpu/drm/MES/reset related lines) 2. `fw16-full-journal-crash-boot.txt` — Full unfiltered journal from the crash boot 3. `about-gpu.txt` — chrome://gpu output showing driver/rendering configuration ## Potentially related upstream issues - https://gitlab.freedesktop.org/drm/amd/-/issues/4296 (amdgpu thread safety / MES issues) - https://community.frame.work/t/amdgpu-gfxhub-page-fault-display-freezing-in-ubuntu-25-10/80712 ProblemType: Bug DistroRelease: Ubuntu 24.04 Package: linux-image-6.17.0-1017-oem 6.17.0-1017.17 ProcVersionSignature: Ubuntu 6.17.0-1017.17-oem 6.17.13 Uname: Linux 6.17.0-1017-oem x86_64 ApportVersion: 2.28.1-0ubuntu3.8 Architecture: amd64 AudioDevicesInUse: USER PID ACCESS COMMAND /dev/snd/controlC2: scotty 4339 F.... wireplumber /dev/snd/controlC0: scotty 4339 F.... wireplumber /dev/snd/controlC1: scotty 4339 F.... wireplumber /dev/snd/seq: scotty 4333 F.... pipewire CasperMD5CheckResult: pass CurrentDesktop: ubuntu:GNOME Date: Mon Apr 6 12:07:24 2026 InstallationDate: Installed on 2024-04-17 (720 days ago) InstallationMedia: Ubuntu 22.04.4 LTS "Jammy Jellyfish" - Release amd64 (20240220) MachineType: Framework Laptop 16 (AMD Ryzen 7040 Series) ProcEnviron: LANG=en_US.UTF-8 PATH=(custom, no user) SHELL=/usr/bin/zsh TERM=xterm-256color XDG_RUNTIME_DIR=<set> ProcFB: 0 amdgpudrmfb ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-6.17.0-1017-oem root=/dev/mapper/vgubuntu-root ro quiet splash vt.handoff=7 RelatedPackageVersions: linux-restricted-modules-6.17.0-1017-oem N/A linux-backports-modules-6.17.0-1017-oem N/A linux-firmware 20240318.git3b128b60-0ubuntu2.25 SourcePackage: linux-oem-6.17 UpgradeStatus: Upgraded to noble on 2024-09-13 (570 days ago) dmi.bios.date: 12/22/2025 dmi.bios.release: 4.3 dmi.bios.vendor: INSYDE Corp. dmi.bios.version: 04.03 dmi.board.asset.tag: * dmi.board.name: FRANMZCP09 dmi.board.vendor: Framework dmi.board.version: A9 dmi.chassis.asset.tag: FRAGACCPA94083000M dmi.chassis.type: 10 dmi.chassis.vendor: Framework dmi.chassis.version: A9 dmi.modalias: dmi:bvnINSYDECorp.:bvr04.03:bd12/22/2025:br4.3:svnFramework:pnLaptop16(AMDRyzen7040Series):pvrA9:rvnFramework:rnFRANMZCP09:rvrA9:cvnFramework:ct10:cvrA9:skuFRAGACCP09: dmi.product.family: 16in Laptop dmi.product.name: Laptop 16 (AMD Ryzen 7040 Series) dmi.product.sku: FRAGACCP09 dmi.product.version: A9 dmi.sys.vendor: Framework ** Affects: linux-oem-6.17 (Ubuntu) Importance: Undecided Status: New ** Tags: amd64 apport-bug noble wayland-session ** Attachment added: "journalctl -b -2 | grep -E "(amdgpu|gfx|drm|page fault|reset|wedged|MES|timeout)" > ~/fw16-crash-log.txt" https://bugs.launchpad.net/bugs/2147367/+attachment/5959054/+files/fw16-crash-log.txt -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2147367 Title: amdgpu: MES firmware intermittently unresponsive on Phoenix APU (gfx1103), leading to unrecoverable GPU hang on Framework 16 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-oem-6.17/+bug/2147367/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
