This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:
apport-collect 1782716 and then change the status of the bug to 'Confirmed'. If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'. This change has been made by an automated script, maintained by the Ubuntu Kernel Team. ** Changed in: linux (Ubuntu) Status: New => Incomplete ** Tags added: cosmic -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1782716 Title: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout Status in linux package in Ubuntu: Incomplete Bug description: Running the 4.17.0-5-generic kernel on a ppc64le machine with a Radeon R9 Fury GPU 0033:01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Fiji [Radeon R9 FURY / NANO Series] (rev ff) [ 2361.958847] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last signaled seq=8777, last emitted seq=8778 [ 2362.080397] EEH: Frozen PHB#33-PE#0 detected [ 2362.080470] EEH: PE location: CPU2 Slot1 (16x), PHB location: N/A [ 2362.080568] CPU: 53 PID: 874 Comm: kworker/53:1 Not tainted 4.17.0-5-generic #6-Ubuntu [ 2362.080575] Workqueue: events drm_sched_job_timedout [gpu_sched] [ 2362.080577] Call Trace: [ 2362.080584] [c0000000fb7078f0] [c000000000d275ac] dump_stack+0xb0/0xf4 (unreliable) [ 2362.080590] [c0000000fb707930] [c00000000003ba0c] eeh_dev_check_failure+0x5bc/0x5e0 [ 2362.080593] [c0000000fb7079e0] [c00000000003babc] eeh_check_failure+0x8c/0xd0 [ 2362.080628] [c0000000fb707a20] [c00800000cfa1b88] amdgpu_mm_rreg+0x280/0x2a0 [amdgpu] [ 2362.080676] [c0000000fb707a70] [c00800000d04cf68] gmc_v8_0_check_soft_reset+0x30/0xe0 [amdgpu] [ 2362.080711] [c0000000fb707aa0] [c00800000cfa1194] amdgpu_device_ip_check_soft_reset.part.1+0x8c/0x140 [amdgpu] [ 2362.080745] [c0000000fb707b30] [c00800000cfa649c] amdgpu_device_gpu_recover+0x854/0xa40 [amdgpu] [ 2362.080799] [c0000000fb707c00] [c00800000d0b97a4] amdgpu_job_timedout+0x5c/0x80 [amdgpu] [ 2362.080805] [c0000000fb707c70] [c00800000c8f0040] drm_sched_job_timedout+0x38/0x60 [gpu_sched] [ 2362.080810] [c0000000fb707c90] [c000000000137928] process_one_work+0x298/0x580 [ 2362.080813] [c0000000fb707d20] [c000000000137c98] worker_thread+0x88/0x610 [ 2362.080817] [c0000000fb707dc0] [c000000000140958] kthread+0x1a8/0x1b0 [ 2362.080822] [c0000000fb707e30] [c00000000000b658] ret_from_kernel_thread+0x5c/0x84 [ 2362.080827] [drm] IP block:gmc_v8_0 is hung! [ 2362.080832] [drm] IP block:tonga_ih is hung! [ 2362.080843] [drm] IP block:gfx_v8_0 is hung! [ 2362.080845] EEH: Detected PCI bus error on PHB#33-PE#0 [ 2362.080847] EEH: This PCI device has failed 1 times in the last hour [ 2362.080849] EEH: Notify device drivers to shutdown [ 2362.080850] [drm] IP block:sdma_v3_0 is hung! [ 2362.080856] [drm] IP block:uvd_v6_0 is hung! [ 2362.080858] EEH: Collect temporary log [ 2362.080866] [drm] IP block:vce_v3_0 is hung! [ 2362.080867] [drm] GPU recovery disabled. [ 2362.080903] EEH: of node=0033:01:00.1 [ 2362.080905] EEH: PCI device/vendor: ffffffff [ 2362.080907] EEH: PCI cmd/status register: ffffffff [ 2362.080908] EEH: PCI-E capabilities and status follow: [ 2362.080915] EEH: PCI-E 00: ffffffff ffffffff ffffffff ffffffff [ 2362.080920] EEH: PCI-E 10: ffffffff ffffffff ffffffff ffffffff [ 2362.080921] EEH: PCI-E 20: ffffffff [ 2362.080922] EEH: PCI-E AER capability register set follows: [ 2362.080928] EEH: PCI-E AER 00: ffffffff ffffffff ffffffff ffffffff [ 2362.080933] EEH: PCI-E AER 10: ffffffff ffffffff ffffffff ffffffff [ 2362.080938] EEH: PCI-E AER 20: ffffffff ffffffff ffffffff ffffffff [ 2362.080940] EEH: PCI-E AER 30: ffffffff ffffffff [ 2362.080941] EEH: of node=0033:01:00.0 [ 2362.080943] EEH: PCI device/vendor: ffffffff [ 2362.080945] EEH: PCI cmd/status register: ffffffff [ 2362.080945] EEH: PCI-E capabilities and status follow: [ 2362.080951] EEH: PCI-E 00: ffffffff ffffffff ffffffff ffffffff [ 2362.080956] EEH: PCI-E 10: ffffffff ffffffff ffffffff ffffffff [ 2362.080957] EEH: PCI-E 20: ffffffff [ 2362.080958] EEH: PCI-E AER capability register set follows: [ 2362.080964] EEH: PCI-E AER 00: ffffffff ffffffff ffffffff ffffffff [ 2362.080969] EEH: PCI-E AER 10: ffffffff ffffffff ffffffff ffffffff [ 2362.080974] EEH: PCI-E AER 20: ffffffff ffffffff ffffffff ffffffff [ 2362.080975] EEH: PCI-E AER 30: ffffffff ffffffff [ 2362.080977] PHB4 PHB#51 Diag-data (Version: 1) [ 2362.080978] brdgCtl: 00000002 [ 2362.080979] RootSts: 00060020 00402000 c1010008 00100107 00000000 [ 2362.080980] RootErrSts: 00000000 00000020 00000000 [ 2362.080981] PhbSts: 0000001c00000000 0000001c00000000 [ 2362.080982] Lem: 0000000100000000 0000000000000000 0000000100000000 [ 2362.080983] PhbErr: 000000c000000000 0000008000000000 2148000098000240 a008400000000000 [ 2362.080984] RegbErr: 0090000000000000 0010000000000000 4800003c00000000 0000000000000200 [ 2362.080985] PE[000] A/B: 8000000000000000 8000000000000000 [ 2362.080987] PE[..1fe] A/B: as above [ 2362.080988] PE[1ff] A/B: b740002a01000000 8000000000000000 [ 2362.080988] EEH: Reset with hotplug activity [ 2362.579139] iommu: Removing device 0033:01:00.1 from group 3 [ 2362.579206] pci 0033:01:00.1: Dropping the link to 0033:01:00.0 [ 2362.579665] [drm] amdgpu: finishing device. [ 2363.495059] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout, last signaled seq=8052, last emitted seq=8054 [ 2363.495192] [drm] IP block:gmc_v8_0 is hung! [ 2363.495197] [drm] IP block:tonga_ih is hung! [ 2363.495208] [drm] IP block:gfx_v8_0 is hung! [ 2363.495212] [drm] IP block:sdma_v3_0 is hung! [ 2363.495217] [drm] IP block:uvd_v6_0 is hung! [ 2363.495225] [drm] IP block:vce_v3_0 is hung! [ 2363.495226] [drm] GPU recovery disabled. [ 2372.712463] [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRTC:43:crtc-0] hw_done or flip_done timed out To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1782716/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp