With upstream kernels I get this (and a frozen desktop): [ 2604.488694] [drm] dce_get_required_clocks_state: clocks unsupported disp_clk 681000 pix_clk 154000 [ 2634.551719] cfg80211: Loading compiled-in X.509 certificates for regulatory database [ 2634.554170] cfg80211: Loaded X.509 cert 'sforshee: 00b28ddf47aef9cea7' [ 3060.974388] [drm] dce_get_required_clocks_state: clocks unsupported disp_clk 681000 pix_clk 154000 [ 3510.632708] [drm] dce_get_required_clocks_state: clocks unsupported disp_clk 681000 pix_clk 154000 [ 3527.956089] [drm] dce_get_required_clocks_state: clocks unsupported disp_clk 681000 pix_clk 154000 [ 4992.501324] [drm] dce_get_required_clocks_state: clocks unsupported disp_clk 681000 pix_clk 154000 [ 5015.179529] [drm] dce_get_required_clocks_state: clocks unsupported disp_clk 681000 pix_clk 154000 [ 5189.342133] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last signaled seq=4657, last emitted seq=4658 [ 5189.342233] [drm] GPU recovery disabled. [ 5317.867388] INFO: task kworker/u257:3:54387 blocked for more than 120 seconds. [ 5317.867471] Not tainted 4.18.0-041800rc6-generic #201807221830 [ 5317.867548] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5317.867656] kworker/u257:3 D 0 54387 2 0x00000808 [ 5317.867675] Workqueue: events_unbound commit_work [drm_kms_helper] [ 5317.867677] Call Trace: [ 5317.867680] [c000000fe3447460] [0000002a00000000] 0x2a00000000 (unreliable) [ 5317.867688] [c000000fe3447630] [c00000000001c430] __switch_to+0x260/0x4c0 [ 5317.867694] [c000000fe3447690] [c000000000d67b44] __schedule+0x304/0xad0 [ 5317.867697] [c000000fe3447760] [c000000000d68358] schedule+0x48/0xc0 [ 5317.867701] [c000000fe3447780] [c000000000d6d1b8] schedule_timeout+0x348/0x510 [ 5317.867707] [c000000fe3447880] [c000000000928b60] dma_fence_default_wait+0x2b0/0x350 [ 5317.867710] [c000000fe34478f0] [c00000000092780c] dma_fence_wait_timeout+0x6c/0x1b0 [ 5317.867714] [c000000fe3447930] [c00000000092aeb0] reservation_object_wait_timeout_rcu+0x320/0x3d0 [ 5317.867774] [c000000fe34479b0] [c00800000d5fc220] amdgpu_dm_do_flip+0x138/0x3b0 [amdgpu] [ 5317.867831] [c000000fe3447b00] [c00800000d6001a0] amdgpu_dm_atomic_commit_tail+0x7f8/0xf20 [amdgpu] [ 5317.867840] [c000000fe3447c60] [c00800000cb72da4] commit_tail+0x6c/0xe0 [drm_kms_helper] [ 5317.867846] [c000000fe3447c90] [c000000000138720] process_one_work+0x2b0/0x560 [ 5317.867850] [c000000fe3447d20] [c000000000138a58] worker_thread+0x88/0x610 [ 5317.867854] [c000000fe3447dc0] [c0000000001416fc] kthread+0x1ac/0x1c0 [ 5317.867859] [c000000fe3447e30] [c00000000000b65c] ret_from_kernel_thread+0x5c/0x80 [ 5438.711397] INFO: task kworker/u257:3:54387 blocked for more than 120 seconds. [ 5438.711473] Not tainted 4.18.0-041800rc6-generic #201807221830 [ 5438.711552] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
If I kill the wayland session: [ 7012.419912] EEH: Frozen PHB#33-PE#0 detected [ 7012.419919] EEH: PE location: CPU2 Slot1 (16x), PHB location: N/A [ 7012.419923] CPU: 74 PID: 126541 Comm: pulseaudio Not tainted 4.18.0-041800rc6-generic #201807221830 [ 7012.419924] Call Trace: [ 7012.419932] [c000200b36333300] [c000000000d4ce3c] dump_stack+0xb0/0xf4 (unreliable) [ 7012.419936] [c000200b36333340] [c00000000003b0ac] eeh_dev_check_failure+0x4ac/0x5e0 [ 7012.419938] [c000200b363333e0] [c00000000003b26c] eeh_check_failure+0x8c/0xd0 [ 7012.419945] [c000200b36333420] [c008000016342ae8] pci_azx_readw+0x80/0xb0 [snd_hda_intel] [ 7012.419950] [c000200b36333450] [c0080000161c5790] snd_hdac_bus_send_cmd+0x78/0x210 [snd_hda_core] [ 7012.419956] [c000200b363334a0] [c0080000162a20ec] azx_send_cmd+0x34/0x390 [snd_hda_codec] [ 7012.419959] [c000200b36333530] [c0080000161c0274] snd_hdac_bus_exec_verb_unlocked+0x7c/0x280 [snd_hda_core] [ 7012.419964] [c000200b36333590] [c00800001629240c] codec_exec_verb+0xb4/0x1f0 [snd_hda_codec] [ 7012.419967] [c000200b36333630] [c0080000161c1a10] snd_hdac_exec_verb+0x38/0x90 [snd_hda_core] [ 7012.419971] [c000200b36333650] [c0080000161c4158] hda_reg_write+0x120/0x3b0 [snd_hda_core] [ 7012.419974] [c000200b363336c0] [c0000000008c87e8] _regmap_write+0x98/0x190 [ 7012.419977] [c000200b36333710] [c0000000008ca5b4] regmap_write+0x74/0xc0 [ 7012.419981] [c000200b36333750] [c0080000161c47e4] snd_hdac_regmap_write_raw+0x4c/0x130 [snd_hda_core] [ 7012.419985] [c000200b36333790] [c008000016485d80] hdmi_pcm_open+0x168/0x4a0 [snd_hda_codec_hdmi] [ 7012.419989] [c000200b36333820] [c0080000162a12e8] azx_pcm_open+0x1b0/0x3d0 [snd_hda_codec] [ 7012.419995] [c000200b36333890] [c0080000160ab3dc] snd_pcm_open_substream+0xb4/0x1a0 [snd_pcm] [ 7012.419998] [c000200b36333920] [c0080000160ab5d4] snd_pcm_open+0x10c/0x2e0 [snd_pcm] [ 7012.420002] [c000200b363339b0] [c0080000160ab8c4] snd_pcm_playback_open+0x6c/0xa8 [snd_pcm] [ 7012.420008] [c000200b363339f0] [c00800000f9c0750] snd_open+0x108/0x240 [snd] [ 7012.420011] [c000200b36333a90] [c000000000401ee8] chrdev_open+0x128/0x270 [ 7012.420015] [c000200b36333af0] [c0000000003f4f10] do_dentry_open+0x1e0/0x450 [ 7012.420017] [c000200b36333b50] [c0000000004123e8] do_last+0x318/0xa40 [ 7012.420018] [c000200b36333c00] [c000000000412c04] path_openat+0xf4/0x3f0 [ 7012.420020] [c000200b36333c80] [c0000000004147b0] do_filp_open+0x80/0x100 [ 7012.420022] [c000200b36333db0] [c0000000003f7268] do_sys_open+0x228/0x2f0 [ 7012.420025] [c000200b36333e30] [c00000000000b288] system_call+0x5c/0x70 [ 7012.420055] EEH: Detected PCI bus error on PHB#33-PE#0 [ 7012.420059] EEH: This PCI device has failed 1 times in the last hour and will be permanently disabled after 5 failures. [ 7012.420063] EEH: Notify device drivers to shutdown [ 7012.420072] EEH: Beginning: 'error_detected(IO frozen)' [ 7012.420102] EEH: PE#0 (PCI 0033:01:00.1): driver not EEH aware [ 7012.420104] EEH: PE#0 (PCI 0033:01:00.0): driver not EEH aware [ 7012.420106] EEH: Finished:'error_detected(IO frozen)' with aggregate recovery state:'none' [ 7012.420116] EEH: Collect temporary log [ 7012.420163] EEH: of node=0033:01:00.1 [ 7012.420166] EEH: PCI device/vendor: ffffffff [ 7012.420168] EEH: PCI cmd/status register: ffffffff [ 7012.420170] EEH: PCI-E capabilities and status follow: [ 7012.420179] EEH: PCI-E 00: ffffffff ffffffff ffffffff ffffffff [ 7012.420187] EEH: PCI-E 10: ffffffff ffffffff ffffffff ffffffff [ 7012.420188] EEH: PCI-E 20: ffffffff [ 7012.420189] EEH: PCI-E AER capability register set follows: [ 7012.420197] EEH: PCI-E AER 00: ffffffff ffffffff ffffffff ffffffff [ 7012.420204] EEH: PCI-E AER 10: ffffffff ffffffff ffffffff ffffffff [ 7012.420211] EEH: PCI-E AER 20: ffffffff ffffffff ffffffff ffffffff [ 7012.420214] EEH: PCI-E AER 30: ffffffff ffffffff [ 7012.420216] EEH: of node=0033:01:00.0 [ 7012.420218] EEH: PCI device/vendor: ffffffff [ 7012.420220] EEH: PCI cmd/status register: ffffffff [ 7012.420221] EEH: PCI-E capabilities and status follow: [ 7012.420229] EEH: PCI-E 00: ffffffff ffffffff ffffffff ffffffff [ 7012.420236] EEH: PCI-E 10: ffffffff ffffffff ffffffff ffffffff [ 7012.420237] EEH: PCI-E 20: ffffffff [ 7012.420238] EEH: PCI-E AER capability register set follows: [ 7012.420246] EEH: PCI-E AER 00: ffffffff ffffffff ffffffff ffffffff [ 7012.420253] EEH: PCI-E AER 10: ffffffff ffffffff ffffffff ffffffff [ 7012.420261] EEH: PCI-E AER 20: ffffffff ffffffff ffffffff ffffffff [ 7012.420267] EEH: PCI-E AER 30: ffffffff ffffffff [ 7012.420270] PHB4 PHB#51 Diag-data (Version: 1) [ 7012.420271] brdgCtl: 00000002 [ 7012.420273] RootSts: 00060020 00402000 c1010008 00100107 00000000 [ 7012.420274] RootErrSts: 00000000 00000020 00000000 [ 7012.420276] PhbSts: 0000001c00000000 0000001c00000000 [ 7012.420277] Lem: 0000000100000000 0000000000000000 0000000100000000 [ 7012.420278] PhbErr: 000000c000000000 0000008000000000 2148000098000240 a008400000000000 [ 7012.420280] RegbErr: 0090000000000000 0010000000000000 4800003c00000000 0000000000000200 [ 7012.420282] PE[000] A/B: 8000000000000000 8000000000000000 [ 7012.420285] PE[..1fe] A/B: as above [ 7012.420286] PE[1ff] A/B: b740002a01000000 8000000000000000 [ 7012.420287] EEH: Reset with hotplug activity [ 7012.817635] iommu: Removing device 0033:01:00.1 from group 3 [ 7012.817682] pci 0033:01:00.1: Dropping the link to 0033:01:00.0 [ 7012.818009] [drm] amdgpu: finishing device. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1782716 Title: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout Status in linux package in Ubuntu: Incomplete Bug description: Running the 4.17.0-5-generic kernel on a ppc64le machine with a Radeon R9 Fury GPU 0033:01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Fiji [Radeon R9 FURY / NANO Series] (rev ff) [ 2361.958847] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last signaled seq=8777, last emitted seq=8778 [ 2362.080397] EEH: Frozen PHB#33-PE#0 detected [ 2362.080470] EEH: PE location: CPU2 Slot1 (16x), PHB location: N/A [ 2362.080568] CPU: 53 PID: 874 Comm: kworker/53:1 Not tainted 4.17.0-5-generic #6-Ubuntu [ 2362.080575] Workqueue: events drm_sched_job_timedout [gpu_sched] [ 2362.080577] Call Trace: [ 2362.080584] [c0000000fb7078f0] [c000000000d275ac] dump_stack+0xb0/0xf4 (unreliable) [ 2362.080590] [c0000000fb707930] [c00000000003ba0c] eeh_dev_check_failure+0x5bc/0x5e0 [ 2362.080593] [c0000000fb7079e0] [c00000000003babc] eeh_check_failure+0x8c/0xd0 [ 2362.080628] [c0000000fb707a20] [c00800000cfa1b88] amdgpu_mm_rreg+0x280/0x2a0 [amdgpu] [ 2362.080676] [c0000000fb707a70] [c00800000d04cf68] gmc_v8_0_check_soft_reset+0x30/0xe0 [amdgpu] [ 2362.080711] [c0000000fb707aa0] [c00800000cfa1194] amdgpu_device_ip_check_soft_reset.part.1+0x8c/0x140 [amdgpu] [ 2362.080745] [c0000000fb707b30] [c00800000cfa649c] amdgpu_device_gpu_recover+0x854/0xa40 [amdgpu] [ 2362.080799] [c0000000fb707c00] [c00800000d0b97a4] amdgpu_job_timedout+0x5c/0x80 [amdgpu] [ 2362.080805] [c0000000fb707c70] [c00800000c8f0040] drm_sched_job_timedout+0x38/0x60 [gpu_sched] [ 2362.080810] [c0000000fb707c90] [c000000000137928] process_one_work+0x298/0x580 [ 2362.080813] [c0000000fb707d20] [c000000000137c98] worker_thread+0x88/0x610 [ 2362.080817] [c0000000fb707dc0] [c000000000140958] kthread+0x1a8/0x1b0 [ 2362.080822] [c0000000fb707e30] [c00000000000b658] ret_from_kernel_thread+0x5c/0x84 [ 2362.080827] [drm] IP block:gmc_v8_0 is hung! [ 2362.080832] [drm] IP block:tonga_ih is hung! [ 2362.080843] [drm] IP block:gfx_v8_0 is hung! [ 2362.080845] EEH: Detected PCI bus error on PHB#33-PE#0 [ 2362.080847] EEH: This PCI device has failed 1 times in the last hour [ 2362.080849] EEH: Notify device drivers to shutdown [ 2362.080850] [drm] IP block:sdma_v3_0 is hung! [ 2362.080856] [drm] IP block:uvd_v6_0 is hung! [ 2362.080858] EEH: Collect temporary log [ 2362.080866] [drm] IP block:vce_v3_0 is hung! [ 2362.080867] [drm] GPU recovery disabled. [ 2362.080903] EEH: of node=0033:01:00.1 [ 2362.080905] EEH: PCI device/vendor: ffffffff [ 2362.080907] EEH: PCI cmd/status register: ffffffff [ 2362.080908] EEH: PCI-E capabilities and status follow: [ 2362.080915] EEH: PCI-E 00: ffffffff ffffffff ffffffff ffffffff [ 2362.080920] EEH: PCI-E 10: ffffffff ffffffff ffffffff ffffffff [ 2362.080921] EEH: PCI-E 20: ffffffff [ 2362.080922] EEH: PCI-E AER capability register set follows: [ 2362.080928] EEH: PCI-E AER 00: ffffffff ffffffff ffffffff ffffffff [ 2362.080933] EEH: PCI-E AER 10: ffffffff ffffffff ffffffff ffffffff [ 2362.080938] EEH: PCI-E AER 20: ffffffff ffffffff ffffffff ffffffff [ 2362.080940] EEH: PCI-E AER 30: ffffffff ffffffff [ 2362.080941] EEH: of node=0033:01:00.0 [ 2362.080943] EEH: PCI device/vendor: ffffffff [ 2362.080945] EEH: PCI cmd/status register: ffffffff [ 2362.080945] EEH: PCI-E capabilities and status follow: [ 2362.080951] EEH: PCI-E 00: ffffffff ffffffff ffffffff ffffffff [ 2362.080956] EEH: PCI-E 10: ffffffff ffffffff ffffffff ffffffff [ 2362.080957] EEH: PCI-E 20: ffffffff [ 2362.080958] EEH: PCI-E AER capability register set follows: [ 2362.080964] EEH: PCI-E AER 00: ffffffff ffffffff ffffffff ffffffff [ 2362.080969] EEH: PCI-E AER 10: ffffffff ffffffff ffffffff ffffffff [ 2362.080974] EEH: PCI-E AER 20: ffffffff ffffffff ffffffff ffffffff [ 2362.080975] EEH: PCI-E AER 30: ffffffff ffffffff [ 2362.080977] PHB4 PHB#51 Diag-data (Version: 1) [ 2362.080978] brdgCtl: 00000002 [ 2362.080979] RootSts: 00060020 00402000 c1010008 00100107 00000000 [ 2362.080980] RootErrSts: 00000000 00000020 00000000 [ 2362.080981] PhbSts: 0000001c00000000 0000001c00000000 [ 2362.080982] Lem: 0000000100000000 0000000000000000 0000000100000000 [ 2362.080983] PhbErr: 000000c000000000 0000008000000000 2148000098000240 a008400000000000 [ 2362.080984] RegbErr: 0090000000000000 0010000000000000 4800003c00000000 0000000000000200 [ 2362.080985] PE[000] A/B: 8000000000000000 8000000000000000 [ 2362.080987] PE[..1fe] A/B: as above [ 2362.080988] PE[1ff] A/B: b740002a01000000 8000000000000000 [ 2362.080988] EEH: Reset with hotplug activity [ 2362.579139] iommu: Removing device 0033:01:00.1 from group 3 [ 2362.579206] pci 0033:01:00.1: Dropping the link to 0033:01:00.0 [ 2362.579665] [drm] amdgpu: finishing device. [ 2363.495059] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout, last signaled seq=8052, last emitted seq=8054 [ 2363.495192] [drm] IP block:gmc_v8_0 is hung! [ 2363.495197] [drm] IP block:tonga_ih is hung! [ 2363.495208] [drm] IP block:gfx_v8_0 is hung! [ 2363.495212] [drm] IP block:sdma_v3_0 is hung! [ 2363.495217] [drm] IP block:uvd_v6_0 is hung! [ 2363.495225] [drm] IP block:vce_v3_0 is hung! [ 2363.495226] [drm] GPU recovery disabled. [ 2372.712463] [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRTC:43:crtc-0] hw_done or flip_done timed out To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1782716/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp