You have been subscribed to a public bug:
A hang occurs with a possible kernel BUG at arch/x86/mm/init_64.c:154
during the memmap_init_zone_device initialization call in the AMDGPU
init sequence.
When the kernel BUG error occurs, this is the expected good result after
the [drm] JPEG decode line. memmap_init_zone_device should execute, then
amdgpum HMM, and this is where the kernel BUG happens.
=========================
Aug 09 00:07:09.659512 host-ruby-942e kernel: [drm] JPEG decode initialized
successfully.
Aug 09 00:07:09.659521 host-ruby-942e kernel: memmap_init_zone_device
initialised 16777216 pages in 136ms
Aug 09 00:07:09.659531 host-ruby-942e kernel: amdgpum HMM registered 65520MB
device memory
Aug 09 00:07:09.659694 host-ruby-942e kernel: kfd kfd: amdgpu: Allocated
3989536 bytes on gart
Aug 09 00:07:09.659838 host-ruby-942e kernel: kfd kfd: amdgpu: Total number of
KFD nodes to be created: 1
Aug 09 00:07:09.659849 host-ruby-942e kernel: amdgpu: Virtual CRAT table
created for GPU
Aug 09 00:07:09.659858 host-ruby-942e kernel: amdgpu: Topology: Add dGPU node
[0x740f:0x1002]
Aug 09 00:07:09.659985 host-ruby-942e kernel: kfd kfd: amdgpu: added device
1002:740f
====================
The issue is a timing-related race condition when setting up the CPU
page tables during the AMDGPU driver initialization. The potential issue
could fall under Linux memory management for this 5-level page table
error
The issue occurs during a server reboot stress. Server environment
should have at least 1 x AMD MI210 GPU with amd gpu driver installed and
enabled. Use ipmitool to drive chassis cold boot in a loop with loop
count set to 1000. We are able to reliably reproduce this issue beyond
500 boot cycles.
** Affects: linux (Ubuntu)
Importance: Undecided
Status: New
** Tags: bot-comment
--
lvl 5 pagetable system hang
https://bugs.launchpad.net/bugs/2096860
You received this bug notification because you are a member of Kernel Packages,
which is subscribed to linux in Ubuntu.
--
Mailing list: https://launchpad.net/~kernel-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp