Public bug reported:

[Summary]
A kernel NULL pointer dereference occurs in the AMDGPU driver when running 
compute workloads(e.g., ROCm application). This fault leads to a kernel crash 
and points out a missing null check during GPU VM fault handling.

[Reproduce steps]
1. Run ROCm unit tests
2. Observe NULL pointer dereference

```
[ +1.413182] amdgpu 0000:c4:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 
vmid:8 pasid:32774)
[ +0.000019] amdgpu 0000:c4:00.0: amdgpu: in process precise-memory- pid 6661 
thread precise-memory- pid 6661)
[ +0.000004] amdgpu 0000:c4:00.0: amdgpu: in page starting at address 
0x0000000000000000 from client 10
[ +0.000004] amdgpu 0000:c4:00.0: amdgpu: 
GCVM_L2_PROTECTION_FAULT_STATUS:0x00841050
[ +0.000003] amdgpu 0000:c4:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
[ +0.000002] amdgpu 0000:c4:00.0: amdgpu: MORE_FAULTS: 0x0
[ +0.000002] amdgpu 0000:c4:00.0: amdgpu: WALKER_ERROR: 0x0
[ +0.000001] amdgpu 0000:c4:00.0: amdgpu: PERMISSION_FAULTS: 0x5
[ +0.000001] amdgpu 0000:c4:00.0: amdgpu: MAPPING_ERROR: 0x0
[ +0.000002] amdgpu 0000:c4:00.0: amdgpu: RW: 0x1
[ +0.024753] amdgpu: Freeing queue vital buffer 0x7ffed9c00000, queue evicted
[ +0.000005] amdgpu: Freeing queue vital buffer 0x7ffedb400000, queue evicted
[ +3.642612] ptrace attach of 
"/home/master/user/1064/temp/4763/can_spawn_for_attach.x"[6788] was attempted 
by "/opt/rocm-7.1.0/bin/rocgdb -nw -nx -q -iex set height 0 -iex set width 0 
-iex set interactive-mode on"[6793]
[ +5.961307] BUG: kernel NULL pointer dereference, address: 0000000000000018
[ +0.000008] #PF: supervisor read access in kernel mode
[ +0.000002] #PF: error_code(0x0000) - not-present page
[ +0.000001] PGD 1130a6067 P4D 1130a6067 PUD 167bbd067 PMD 0
[ +0.000003] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
[ +0.000004] CPU: 1 UID: 1000 PID: 7169 Comm: fork-exec-gpu-t Tainted: G OE 
6.14.0-29-generic #29~24.04.1-Ubuntu
```

** Affects: linux-oem-6.14 (Ubuntu)
     Importance: Undecided
         Status: New


** Tags: originate-from-2131135

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2131157

Title:
  NULL pointer dereference in VM fault handling

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-oem-6.14/+bug/2131157/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to