apport information ** Attachment added: "ProcInterrupts.txt" https://bugs.launchpad.net/bugs/1990323/+attachment/5617876/+files/ProcInterrupts.txt
-- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1990323 Title: amdgpu driver hangs periodically, causes display to permanently crash Status in linux package in Ubuntu: Incomplete Bug description: Apologies in advance if this isn't the right place to file this bug. Please let me know if I should report this elsewhere or if there's any other info I can add. What I suspect is an amdgpu driver issue has been causing display issues and machine crashes. Once the issue starts, the display won't come back from being blank, and turning the machine off takes five minutes or longer. Additionally, executing `sensors` hangs on reading data from the gpu and can even cause 100% CPU utilization for multiple minutes. I believe this significant delays to other system calls, as the entire machine will start to behave sluggishly in spurts where every program hangs and then many things happen all at once. I can't reliably trigger the issue, although repeatedly reading `sensors` in a loop seems to be one method that eventually works. The main symptom other than the behavior described above is a cycle of messages in the journal of the form kernel: amdgpu: last message was failed ret is 0 kernel: amdgpu: failed to send message 145 ret is 0 kernel: amdgpu: last message was failed ret is 0 kernel: amdgpu: failed to send message 146 ret is 0 The messages sent are 145, 146, 5e, and 148. I've had this GPU for 5 years without any of these problems. However, I only recently upgraded from an older Intel CPU to a Ryzen 5600 and a ASRock B550m Pro4. I have no idea if the issues are related to the upgrade or how they could be. Some misc info. Distro: Ubuntu 22.04.1 LTS x86_64 Kernel: 5.15.0-48-generic Graphics card: Radeon R9 380X Motherboard: ASRock B550M Pro4 (P2.30 BIOS) CPU: Ryzen 5600 Desktop: i3 (with picom compositor) I'm attaching the relevant logs from my most recent boot when I was able to boot and use the machine for several hours. I left the machine to blank its screen and when I returned, I was unable to unblank the screen. The only thing I could do was press the power button and leave the machine to shutdown over the course of the next 5 or 10 minutes. The only other thing that is interesting is that on startup I'm seeing the following warning about undefined behavior. UBSAN: shift-out-of-bounds in /build/linux-kQ6jNR/linux-5.15.0/drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_device_queue_manager.c:997:32 shift exponent 64 is too large for 64-bit type 'long long unsigned int' CPU: 10 PID: 483 Comm: systemd-udevd Not tainted 5.15.0-48-generic #54-Ubuntu Hardware name: To Be Filled By O.E.M. B550M Pro4/B550M Pro4, BIOS P2.30 02/24/2022 Call Trace: <TASK> show_stack+0x52/0x5c dump_stack_lvl+0x4a/0x63 dump_stack+0x10/0x16 ubsan_epilogue+0x9/0x49 __ubsan_handle_shift_out_of_bounds.cold+0x61/0xef initialize_nocpsch.cold+0x15/0x59 [amdgpu] device_queue_manager_init+0x20b/0x3b0 [amdgpu] kgd2kfd_device_init.cold+0x1af/0x483 [amdgpu] amdgpu_amdkfd_device_init+0x135/0x170 [amdgpu] amdgpu_device_ip_init+0x681/0x6a4 [amdgpu] loop33: detected capacity change from 0 to 8 amdgpu_device_init.cold+0x25b/0x7db [amdgpu] ? do_pci_enable_device+0xdb/0x110 amdgpu_driver_load_kms+0x1e/0x270 [amdgpu] amdgpu_pci_probe+0x1ce/0x260 [amdgpu] local_pci_probe+0x4b/0x90 pci_device_probe+0x119/0x1f0 really_probe+0x222/0x420 __driver_probe_device+0x119/0x190 driver_probe_device+0x23/0xc0 __driver_attach+0xbd/0x1e0 ? __device_attach_driver+0x120/0x120 bus_for_each_dev+0x7e/0xd0 driver_attach+0x1e/0x30 bus_add_driver+0x148/0x220 driver_register+0x95/0x100 __pci_register_driver+0x68/0x70 amdgpu_init+0x7c/0x1000 [amdgpu] ? 0xffffffffc1a40000 do_one_initcall+0x48/0x1e0 ? kmem_cache_alloc_trace+0x19e/0x2e0 do_init_module+0x52/0x260 load_module+0xacd/0xbc0 __do_sys_finit_module+0xbf/0x120 __x64_sys_finit_module+0x18/0x20 do_syscall_64+0x5c/0xc0 ? syscall_exit_to_user_mode+0x27/0x50 ? __x64_sys_newfstatat+0x1c/0x30 ? do_syscall_64+0x69/0xc0 ? __x64_sys_mmap+0x33/0x50 ? do_syscall_64+0x69/0xc0 ? do_syscall_64+0x69/0xc0 entry_SYSCALL_64_after_hwframe+0x61/0xcb RIP: 0033:0x7f06f3fb9a3d Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c3 a3 0f 00 f7 d8 64 89 01 48 RSP: 002b:00007ffc7ce54ae8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 RAX: ffffffffffffffda RBX: 0000556c9ab3e3d0 RCX: 00007f06f3fb9a3d RDX: 0000000000000000 RSI: 00007f06f4150441 RDI: 000000000000001a RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000002 R10: 000000000000001a R11: 0000000000000246 R12: 00007f06f4150441 R13: 0000556c9aa05fb0 R14: 0000556c9ab40460 R15: 0000556c9ab35150 </TASK> It's even visible on screen before the splash screen appears. I don't remember seeing this before the motherboard/CPU upgrade. I haven't tried to trigger the issue from a bootable USB, but I can confirm that the warning about undefined behavior is present there as well. --- ProblemType: Bug ApportVersion: 2.20.11-0ubuntu82.1 Architecture: amd64 AudioDevicesInUse: USER PID ACCESS COMMAND /dev/snd/controlC1: emichael 2373 F.... pulseaudio /dev/snd/controlC2: emichael 2373 F.... pulseaudio /dev/snd/controlC0: emichael 2373 F.... pulseaudio CasperMD5CheckResult: unknown CurrentDesktop: i3 DistroRelease: Ubuntu 22.04 HibernationDevice: RESUME=UUID=4154f3bc-32d5-44ad-8af7-193b3f9c6483 InstallationDate: Installed on 2016-01-18 (2438 days ago) InstallationMedia: Ubuntu 14.04.3 LTS "Trusty Tahr" - Beta amd64 (20150805) IwConfig: lo no wireless extensions. enp5s0 no wireless extensions. MachineType: To Be Filled By O.E.M. B550M Pro4 Package: linux (not installed) ProcFB: 0 amdgpudrmfb ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.15.0-48-generic root=UUID=91431a5f-fc77-4987-9f34-3e61da41a3b4 ro ProcVersionSignature: Ubuntu 5.15.0-48.54-generic 5.15.53 RelatedPackageVersions: linux-restricted-modules-5.15.0-48-generic N/A linux-backports-modules-5.15.0-48-generic N/A linux-firmware 20220329.git681281e4-0ubuntu3.5 RfKill: 0: hci0: Bluetooth Soft blocked: no Hard blocked: no Tags: jammy Uname: Linux 5.15.0-48-generic x86_64 UpgradeStatus: Upgraded to jammy on 2022-05-06 (137 days ago) UserGroups: adm cdrom dip docker fuse lpadmin plugdev sambashare sudo video _MarkForUpload: True dmi.bios.date: 02/24/2022 dmi.bios.release: 5.17 dmi.bios.vendor: American Megatrends International, LLC. dmi.bios.version: P2.30 dmi.board.name: B550M Pro4 dmi.board.vendor: ASRock dmi.chassis.asset.tag: To Be Filled By O.E.M. dmi.chassis.type: 3 dmi.chassis.vendor: To Be Filled By O.E.M. dmi.chassis.version: To Be Filled By O.E.M. dmi.modalias: dmi:bvnAmericanMegatrendsInternational,LLC.:bvrP2.30:bd02/24/2022:br5.17:svnToBeFilledByO.E.M.:pnB550MPro4:pvrToBeFilledByO.E.M.:rvnASRock:rnB550MPro4:rvr:cvnToBeFilledByO.E.M.:ct3:cvrToBeFilledByO.E.M.:skuToBeFilledByO.E.M.: dmi.product.family: To Be Filled By O.E.M. dmi.product.name: B550M Pro4 dmi.product.sku: To Be Filled By O.E.M. dmi.product.version: To Be Filled By O.E.M. dmi.sys.vendor: To Be Filled By O.E.M. modified.conffile..etc.default.apport: [modified] mtime.conffile..etc.default.apport: 2022-09-17T22:35:32.791110 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1990323/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp