[Kernel-packages] [Bug 2036742] Re: amdgpu crash on Mantic

Mario Limonciello Wed, 20 Sep 2023 11:11:11 -0700

> [    5.134271] kernel: [drm:detect_link_and_local_sink [amdgpu]] *ERROR* No 
> EDID read.
> [    5.322247] kernel: [drm:detect_link_and_local_sink [amdgpu]] *ERROR* No 
> EDID read.
> [    5.510230] kernel: [drm:detect_link_and_local_sink [amdgpu]] *ERROR* No 
> EDID read.


Is this connected to a KVM?  The lack of reading the EDID is concerning.

> UBSAN warnings could be a red herring. They've added a compiler flag
that complains about flexible arrays if they're declared incorrectly
(false positive). Will take a look tomorrow.

Yeah I agree they're probably a red herring.  The actual issue is that
UVD IP block fails to init due to a timeout.

[    6.025262] kernel: amdgpu 0000:01:00.0: [drm:amdgpu_ring_test_helper 
[amdgpu]] *ERROR* ring uvd test failed (-110)
[    6.025511] kernel: [drm:amdgpu_device_ip_init [amdgpu]] *ERROR* hw_init of 
IP block <uvd_v6_0> failed -110
[    6.025661] kernel: amdgpu 0000:01:00.0: amdgpu: amdgpu_device_ip_init failed
[    6.025663] kernel: amdgpu 0000:01:00.0: amdgpu: Fatal error during GPU init
[    6.025737] kernel: amdgpu 0000:01:00.0: amdgpu: amdgpu: finishing device.


As a potential workaround (this isn't a solution), you might be able to skip 
the uvd_v6_0 IP block init.
To do this you need to look up which IP block number it is which is from your 
logs:

[    4.836457] kernel: [drm] add ip block number 0 <vi_common>
[    4.836458] kernel: [drm] add ip block number 1 <gmc_v8_0>
[    4.836459] kernel: [drm] add ip block number 2 <tonga_ih>
[    4.836459] kernel: [drm] add ip block number 3 <gfx_v8_0>
[    4.836460] kernel: [drm] add ip block number 4 <sdma_v3_0>
[    4.836461] kernel: [drm] add ip block number 5 <powerplay>
[    4.836462] kernel: [drm] add ip block number 6 <dm>
[    4.836462] kernel: [drm] add ip block number 7 <uvd_v6_0>
[    4.836463] kernel: [drm] add ip block number 8 <vce_v3_0>

Then you can add "amdgpu.ip_block_mask=0xffffff7f" to your kernel
command line to skip IP block 7 (uvd_v6_0).

If that helps the issue then it does confirm the out of bounds check is
a red herring and the real issue is the uvd stuff.  I'd like to see data
points for those other kernels I suggested to narrow down when this
problem started.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2036742

Title:
  amdgpu crash on Mantic

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  [Impact]

  Booting from USB the latest Mantic Desktop daily image (2023-09-20),
  just after the initial logs, nothing is displayed on screen. The
  system is still alive since _autoinstall_ works as intended. Once
  provisioned, the problem is still present.

  It seems related to https://bugs.launchpad.net/ubuntu/+source/linux-
  firmware/+bug/2029396 .

  dmesg attached.

  [Test Case]

  Live boot Ubuntu Mantic Desktop canary (2023-09-19)

  [Where Problems Could Occur]

  Dell Optiplex 5090
  - Intel Core(TM) i7-11700
  - Advanced Micro Devices, Inc. [AMD/ATI] - 1002:699f

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2036742/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 2036742] Re: amdgpu crash on Mantic

Reply via email to