Control: tags -1 + moreinfo Control: severity -1 important Hi,
On Sun, Apr 27, 2025 at 09:23:09PM -0500, Abdurahman Elmawi wrote: > Package: src:linux > Version: 6.12.22-1 > Severity: critical > Tags: upstream > Justification: breaks unrelated software > > Dear Maintainer, > > While running Debian Trixie (Testing) with kernel 6.12.22-1, I encountered a > serious amdgpu driver issue resulting in a full system lockup. > > What led up to the situation: > - System is running Debian Trixie, with KDE Plasma desktop environment > (installed via tasksel). > - I had a large number of windows open and was heavily multitasking. > - Suddenly, the system completely froze. > - Attempting to switch to a TTY (Ctrl+Alt+F4) took a very long time before any > text appeared. > - I eventually killed the KDE session and re-logged in, after which the system > returned to normal. > > What exactly did you do (or not do) that was effective (or ineffective): > - Switching to TTY eventually worked (after long delay). > - Killing the KDE session recovered the system without a reboot. > > What was the outcome of this action: > - I lost all unsaved data > - After re-logging in, system worked normally again. > - The issue has not reoccurred so far. > > What outcome did you expect instead: > - The system should not lock up during normal multitasking. > - GPU/display updates should not cause hard stalls or flip timeouts. > > Relevant logs from dmesg: > ``` > amdgpu 0000:0b:00.0: [drm] *ERROR* [CRTC:85:crtc-0] flip_done timed out > amdgpu 0000:0b:00.0: [drm] *ERROR* [CRTC:85:crtc-0] commit wait timed out > amdgpu 0000:0b:00.0: [drm] *ERROR* [PLANE:82:plane-7] commit wait timed out > WARNING: CPU: 3 PID: 994 at > drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:8646 > amdgpu_dm_atomic_commit_tail > ``` > > Tracebacks point to amdgpu_dm_atomic_commit_tail > > Hardware details: > - Motherboard: ASUS TUF GAMING X570-PLUS (Wi-Fi) > - GPU: AMD Radeon RX 6600 > - Kernel: 6.12.22-amd64 > - Firmware: Up-to-date firmware-amd-graphics package installed. > > Notes: > - This problem may be related to other recent AMDGPU instability reports on > 6.12.x, but this is distinct: no PCIe AER errors were present, and no VRAM > leaks were observed prior to the crash. > - The bug appears in normal operation, not just after suspend/resume. We have uploaded 6.12.25-1 to unstable which contains further amdgpu related fixes recently (and the kernel is targeting for trixie). Might you be able to update to 6.12.25 please and see if you still encounter the amdgpu instability? The problem here will be likely if you cannot repoduce the issue. The behaviour you described and the long switching time to a tty might as well indicate other components in unserspace with a memory hog, and system swapping, maybe finally invoking the OOM. If you still have the system running after this recovery having the full kernel log attached would be helpful. At least from the excerpt you posted it seems that the system came online after a suspend. But having the logs form around when the lockup happended would be (maybe) helpful. Regards, Salvatore