Public bug reported:

Since the update:

 xserver-xorg-video-ati-hwe-18.04 (1:19.0.1-1ubuntu1~18.04.1) bionic;

which resulted from:

 https://bugs.launchpad.net/fedora/+source/xserver-xorg-video-
ati/+bug/1841718

I've experienced GPU freezes where all video becomes unresponsive, both
Xorg and Ctrl+Alt terminal switching, and the GPU fan goes to full. I am
still able to access the system via SSH.

Sometimes dmesg ends up full of this message repeating over and over:

 radeon 0000:01:00.0: ring 0 stalled for more than 24040msec
 radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000009e44 last 
fence id 0x0000000000009e49 on ring 0)

I sometimes get a few GPU soft reset which seem to fail in drm(?):

 radeon 0000:01:00.0: Saved 110839 dwords of commands on ring 0.
 radeon 0000:01:00.0: GPU softreset: 0x00000008
 ...
 radeon 0000:01:00.0: Wait for MC idle timedout !
 radeon 0000:01:00.0: Wait for MC idle timedout !
 [drm] PCIE GART of 1024M enabled (table at 0x0000000000162000).
 radeon 0000:01:00.0: WB enabled 
 radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 
and cpu addr 0x00000000725651ad
 radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c 
and cpu addr 0x00000000c3678ed8
 radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x0000000000072118 
and cpu addr 0x00000000dbd9e01b
 [drm:r600_ring_test [radeon]] *ERROR* radeon: ring 0 test failed 
(scratch(0x8504)=0xCAFEDEAD)
 [drm:evergreen_resume [radeon]] *ERROR* evergreen startup failed on resume

Even if the above reset doesn't happen, this freeze always results in a
unable to handle page fault" BUG in radeon_ring_backup, entered from
various call paths, eg:

 BUG: unable to handle page fault for address: ffffbc2d80574ffc
 ...
 Oops: 0000 [#1] SMP PTI 
 CPU: 2 PID: 11243 Comm: kworker/2:1H Not tainted 5.5.0-050500-generic 
#202001262030
 Workqueue: radeon-crtc radeon_flip_work_func [radeon]
 RIP: 0010:radeon_ring_backup+0xc9/0x140 [radeon]
 Call Trace:
  radeon_gpu_reset+0xc3/0x2f0 [radeon]
  radeon_flip_work_func+0x1f3/0x250 [radeon]
  ? __schedule+0x2e0/0x760
  process_one_work+0x1b5/0x370
  worker_thread+0x50/0x3d0
  kthread+0x104/0x140
  ? process_one_work+0x370/0x370
  ? kthread_park+0x90/0x90
  ret_from_fork+0x35/0x40

or:

 BUG: unable to handle page fault for address: ffffc03901000ffc
 ...
 Oops: 0000 [#1] SMP PTI

 CPU: 3 PID: 2227 Comm: compton Not tainted 5.3.0-28-generic #30~18.04.1-Ubuntu
 RIP: 0010:radeon_ring_backup+0xd3/0x140 [radeon]
 Call Trace:
  radeon_gpu_reset+0xb9/0x340 [radeon]
  ? dma_fence_wait_timeout+0x48/0x110
  ? reservation_object_wait_timeout_rcu+0x19d/0x340
  radeon_gem_handle_lockup.part.4+0xe/0x20 [radeon]
  radeon_gem_wait_idle_ioctl+0xa6/0x110 [radeon]
  ? radeon_gem_busy_ioctl+0x80/0x80 [radeon]
  drm_ioctl_kernel+0xb0/0x100 [drm]
  drm_ioctl+0x389/0x450 [drm]
  ? radeon_gem_busy_ioctl+0x80/0x80 [radeon]
  ? __switch_to_asm+0x40/0x70
  ? __switch_to_asm+0x34/0x70
  ? __switch_to_asm+0x40/0x70
  ? __switch_to_asm+0x40/0x70
  ? __switch_to_asm+0x34/0x70
  ? __switch_to_asm+0x40/0x70
  ? __switch_to_asm+0x34/0x70
  ? __switch_to_asm+0x40/0x70
  radeon_drm_ioctl+0x4f/0x80 [radeon]
  do_vfs_ioctl+0xa9/0x640
  ? __schedule+0x2b0/0x670
  ksys_ioctl+0x75/0x80
  __x64_sys_ioctl+0x1a/0x20
  do_syscall_64+0x5a/0x130
  entry_SYSCALL_64_after_hwframe+0x44/0xa9

I've tried both 5.3.0-28-generic and 5.5.0-050500-generic from kernel-
ppa but that made no difference. It appears to be a bug in radeon.

Nothing specific makes this happen, just regular usage with a
compositing window manager. I'm not playing games or particularly
exercising the GPU. The last two times I was just reading in web
browser. It's also happened in the middle of the night while I was
asleep. Sometimes I have a few days uptime, sometimes it happens in less
than 24 hours from boot.

This never happened before the radeon update mentioned on the first
line.

I'll attach two files of dmesg output. As per
https://wiki.ubuntu.com/X/Troubleshooting/Freeze I've installed and
started apport for next time it happens.

** Affects: xserver-xorg-video-ati (Ubuntu)
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1863390

Title:
  GPU lockup ring 0 stalled for more than X msec

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-ati/+bug/1863390/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to