Public bug reported:
[Summary]
A kernel NULL pointer dereference occurs in the AMDGPU driver when running
compute workloads(e.g., ROCm application). This fault leads to a kernel crash
and points out a missing null check during GPU VM fault handling.
[Reproduce steps]
1. Run ROCm unit tests
2. Observe NULL pointer dereference
```
[ +1.413182] amdgpu 0000:c4:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40
vmid:8 pasid:32774)
[ +0.000019] amdgpu 0000:c4:00.0: amdgpu: in process precise-memory- pid 6661
thread precise-memory- pid 6661)
[ +0.000004] amdgpu 0000:c4:00.0: amdgpu: in page starting at address
0x0000000000000000 from client 10
[ +0.000004] amdgpu 0000:c4:00.0: amdgpu:
GCVM_L2_PROTECTION_FAULT_STATUS:0x00841050
[ +0.000003] amdgpu 0000:c4:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
[ +0.000002] amdgpu 0000:c4:00.0: amdgpu: MORE_FAULTS: 0x0
[ +0.000002] amdgpu 0000:c4:00.0: amdgpu: WALKER_ERROR: 0x0
[ +0.000001] amdgpu 0000:c4:00.0: amdgpu: PERMISSION_FAULTS: 0x5
[ +0.000001] amdgpu 0000:c4:00.0: amdgpu: MAPPING_ERROR: 0x0
[ +0.000002] amdgpu 0000:c4:00.0: amdgpu: RW: 0x1
[ +0.024753] amdgpu: Freeing queue vital buffer 0x7ffed9c00000, queue evicted
[ +0.000005] amdgpu: Freeing queue vital buffer 0x7ffedb400000, queue evicted
[ +3.642612] ptrace attach of
"/home/master/user/1064/temp/4763/can_spawn_for_attach.x"[6788] was attempted
by "/opt/rocm-7.1.0/bin/rocgdb -nw -nx -q -iex set height 0 -iex set width 0
-iex set interactive-mode on"[6793]
[ +5.961307] BUG: kernel NULL pointer dereference, address: 0000000000000018
[ +0.000008] #PF: supervisor read access in kernel mode
[ +0.000002] #PF: error_code(0x0000) - not-present page
[ +0.000001] PGD 1130a6067 P4D 1130a6067 PUD 167bbd067 PMD 0
[ +0.000003] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
[ +0.000004] CPU: 1 UID: 1000 PID: 7169 Comm: fork-exec-gpu-t Tainted: G OE
6.14.0-29-generic #29~24.04.1-Ubuntu
```
** Affects: linux-oem-6.14 (Ubuntu)
Importance: Undecided
Status: New
** Tags: originate-from-2131135
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2131157
Title:
NULL pointer dereference in VM fault handling
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-oem-6.14/+bug/2131157/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs