Mateusz,

Thanks for the verification, here is the SRU submittion for questing 
kernel(6.17)
https://lists.ubuntu.com/archives/kernel-team/2026-April/166872.html

And resolute kernel(7.0) includes those fixes, so set it as fix
released.

** Description changed:

+ SRU Justification
+ 
+ [Impact]
+ System freezes during boot on machines with AMD Southern Islands (SI) GPUs
+ using the amdgpu driver
+ .
+ The amdgpu driver calls flush_gpu_tlb_pasid() in a workqueue, but on SI
+ hardware this function pointer is NULL. The kernel hits a NULL pointer
+ dereference in amdgpu_gmc_flush_gpu_tlb_pasid() and crashes.
+ 
+ Error log:
+ kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000
+ kernel: Workqueue: events amdgpu_tlb_fence_work [amdgpu]
+ kernel: RIP: 0010:0x0
+ kernel: Call Trace:
+ kernel:  amdgpu_gmc_flush_gpu_tlb_pasid+0xfd/0x480 [amdgpu]
+ kernel:  amdgpu_tlb_fence_work+0x77/0x110 [amdgpu]
+ 
+ Hits every boot on affected hardware. Regression from 6.17.0-14 to
+ 6.17.0-19.
+ 
+ [Fix]
+ Two patches fix this together:
+ 1. f4db9913e4d3 ("drm/amdgpu: validate the flush_gpu_tlb_pasid()")
+    Adds a NULL check for flush_gpu_tlb_pasid before calling it.
+    Upstream in v7.0-rc1.
+ 2. e3a6eff92bbd ("drm/amdgpu: Fix validating flush_gpu_tlb_pasid()")
+    Fixes the first patch — the early return skipped the unlock, causing
+    a deadlock. Changes the bare return to a goto that unlocks first.
+    Upstream in v7.0-rc1.
+    Fixes: f4db9913e4d3
+ 
+ [Test Plan]
+ On a machine with an AMD SI GPU (Tahiti, Pitcairn, Verde, Oland, Hainan)
+ booted with amdgpu.si_support=1:
+ 
+ $ sudo reboot
+ 
+ Without patches: kernel NULL pointer dereference during boot, system freezes.
+ With patches: system boots normally, no crash or error in dmesg.
+ 
+ Check dmesg after boot:
+ $ dmesg | grep -i "BUG\|NULL pointer\|amdgpu"
+ 
+ Without patches: "BUG: kernel NULL pointer dereference" present.
+ With patches: no BUG or NULL pointer lines.
+ 
+ [Where problems could occur]
+ Could break TLB flushing on amdgpu.
+ 
+ If the NULL check gates too broadly, TLB flushes could be skipped on GPUs
+ that do have flush_gpu_tlb_pasid. This would cause stale TLB entries and
+ GPU page faults or rendering corruption.
+ 
+ The unlock path change in the second patch touches the reset/lock logic in
+ amdgpu_gmc_flush_gpu_tlb_pasid(). A wrong goto target could leave the
+ reset domain lock held, deadlocking the GPU.
+ 
+ [Other Info]
+ Both patches are upstream in v7.0-rc1.
+ 
+ ===========================================================
+ 
  Ubuntu 25.10 with kernel 6.17.0-19-generic doesn't boot on my PC. I
  freezes on the booting screen, and the kernel logs show a bug:
  
  kernel: Linux version 6.17.0-19-generic (buildd@lcy02-amd64-084) 
(x86_64-linux-gnu-gcc (Ubuntu 15.2.0-4ubuntu4) 15.2.0, GNU ld (GNU Binutils for 
Ubuntu) 2.45) #19-Ubuntu SMP PREEMPT_DYNAMIC Fri Mar  6 14:02:58 UTC 2026 
(Ubuntu 6.17.0-19.19-generic 6.17.13)
  kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-6.17.0-19-generic 
root=UUID=354e3c09-bfde-4e47-850f-fe872a882ae5 ro quiet splash 
radeon.si_support=0 amdgpu.si_support=1 
crashkernel=2G-4G:320M,4G-32G:512M,32G-64G:1024M,64G-128G:2048M,128G-:4096M 
vt.handoff=7
  # ...
  kernel: [drm] Initialized amdgpu 3.64.0 for 0000:01:00.0 on minor 1
  kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000
  kernel: #PF: supervisor instruction fetch in kernel mode
  kernel: #PF: error_code(0x0010) - not-present page
- kernel: PGD 0 P4D 0 
+ kernel: PGD 0 P4D 0
  kernel: Oops: Oops: 0010 [#1] SMP PTI
- kernel: CPU: 3 UID: 0 PID: 109 Comm: kworker/3:1 Not tainted 
6.17.0-19-generic #19-Ubuntu PREEMPT(voluntary) 
+ kernel: CPU: 3 UID: 0 PID: 109 Comm: kworker/3:1 Not tainted 
6.17.0-19-generic #19-Ubuntu PREEMPT(voluntary)
  kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z77 
Pro3, BIOS P1.10 04/10/2012
  kernel: Workqueue: events amdgpu_tlb_fence_work [amdgpu]
  kernel: RIP: 0010:0x0
  kernel: Code: Unable to access opcode bytes at 0xffffffffffffffd6.
  kernel: RSP: 0018:ffffce560061fdb0 EFLAGS: 00010246
  kernel: RAX: 0000000000000000 RBX: 0000000000008000 RCX: 0000000000000001
  kernel: RDX: 0000000000000002 RSI: 0000000000008000 RDI: ffff8a4a6d180000
  kernel: RBP: ffffce560061fe08 R08: 0000000000000000 R09: 0000000000000000
  kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
  kernel: R13: 0000000000000000 R14: ffff8a4a6d180000 R15: 0000000000000000
  kernel: FS:  0000000000000000(0000) GS:ffff8a4da87ff000(0000) 
knlGS:0000000000000000
  kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  kernel: CR2: ffffffffffffffd6 CR3: 00000003de040002 CR4: 00000000001726f0
  kernel: Call Trace:
  kernel:  <TASK>
  kernel:  amdgpu_gmc_flush_gpu_tlb_pasid+0xfd/0x480 [amdgpu]
  kernel:  amdgpu_tlb_fence_work+0x77/0x110 [amdgpu]
  kernel:  process_one_work+0x18e/0x370
  kernel:  worker_thread+0x317/0x450
  kernel:  ? _raw_spin_lock_irqsave+0xe/0x20
  kernel:  ? __pfx_worker_thread+0x10/0x10
  kernel:  kthread+0x10b/0x220
  kernel:  ? __pfx_kthread+0x10/0x10
  kernel:  ret_from_fork+0x134/0x150
  kernel:  ? __pfx_kthread+0x10/0x10
  kernel:  ret_from_fork_asm+0x1a/0x30
  kernel:  </TASK>
  kernel: Modules linked in: rfcomm cmac algif_hash algif_skcipher af_alg bnep 
ip6t_REJECT nf_reject_ipv6 xt_hl ip6t_rt ipt_REJECT nf_reject_ipv4 xt_LOG 
nf_log_syslog nft_limit xt_limit xt_addrtype xt_mac xt_tcpudp xt_conntrack 
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat binfmt_misc nf_tables 
amdgpu(+) usblp intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal 
intel_powerclamp coretemp kvm_intel amdxcp at24 mei_hdcp mei_pxp kvm 
snd_hda_codec_atihdmi drm_panel_backlight_quirks gpu_sched irqbypass 
snd_hda_codec_hdmi drm_buddy snd_hda_codec_alc662 rapl btusb 
snd_hda_codec_realtek_lib intel_cstate snd_hda_codec_generic radeon btrtl 
snd_hda_intel btintel i2c_i801 btbcm snd_hda_codec btmtk i2c_smbus 
drm_ttm_helper i2c_mux ttm bluetooth snd_seq_midi snd_hda_core 
snd_seq_midi_event drm_exec snd_intel_dspcfg snd_rawmidi drm_suballoc_helper 
snd_intel_sdw_acpi drm_display_helper lpc_ich snd_hwdep snd_seq snd_pcm 
snd_seq_device cec snd_timer rc_core snd i2c_algo_bit soundcore mei_me mei 
intel_smartconnect joydev
  kernel:  input_leds mac_hid sch_fq_codel msr parport_pc ppdev lp parport 
efi_pstore nfnetlink dmi_sysfs ip_tables x_tables autofs4 dm_crypt wacom uas 
usb_storage hid_generic usbhid hid r8169 polyval_clmulni ghash_clmulni_intel 
psmouse ahci realtek serio_raw libahci video wmi aesni_intel
  kernel: CR2: 0000000000000000
  kernel: ---[ end trace 0000000000000000 ]---
  kernel: RIP: 0010:0x0
  kernel: Code: Unable to access opcode bytes at 0xffffffffffffffd6.
  kernel: RSP: 0018:ffffce560061fdb0 EFLAGS: 00010246
  kernel: RAX: 0000000000000000 RBX: 0000000000008000 RCX: 0000000000000001
  kernel: RDX: 0000000000000002 RSI: 0000000000008000 RDI: ffff8a4a6d180000
  kernel: RBP: ffffce560061fe08 R08: 0000000000000000 R09: 0000000000000000
  kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
  kernel: R13: 0000000000000000 R14: ffff8a4a6d180000 R15: 0000000000000000
  kernel: FS:  0000000000000000(0000) GS:ffff8a4da87ff000(0000) 
knlGS:0000000000000000
  kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  kernel: CR2: ffffffffffffffd6 CR3: 00000003de040002 CR4: 00000000001726f0
  kernel: note: kworker/3:1[109] exited with irqs disabled
  kernel: loop50: detected capacity change from 0 to 8
  kernel: fbcon: amdgpudrmfb (fb0) is primary device
  kernel: fbcon: Deferring console take-over
  kernel: amdgpu 0000:01:00.0: [drm] fb0: amdgpudrmfb frame buffer device
  kernel: NET: Registered PF_QIPCRTR protocol family
  kernel:  sdb: sdb1 sdb2 sdb3 sdb4 < sdb5 sdb6 sdb7 >
  kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000
  kernel: #PF: supervisor instruction fetch in kernel mode
  kernel: #PF: error_code(0x0010) - not-present page
- kernel: PGD 0 P4D 0 
+ kernel: PGD 0 P4D 0
  kernel: Oops: Oops: 0010 [#2] SMP PTI
- kernel: CPU: 1 UID: 0 PID: 91 Comm: kworker/1:1 Tainted: G      D             
6.17.0-19-generic #19-Ubuntu PREEMPT(voluntary) 
+ kernel: CPU: 1 UID: 0 PID: 91 Comm: kworker/1:1 Tainted: G      D             
6.17.0-19-generic #19-Ubuntu PREEMPT(voluntary)
  kernel: Tainted: [D]=DIE
  kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z77 
Pro3, BIOS P1.10 04/10/2012
  kernel: Workqueue: events amdgpu_tlb_fence_work [amdgpu]
  kernel: RIP: 0010:0x0
  kernel: Code: Unable to access opcode bytes at 0xffffffffffffffd6.
  kernel: RSP: 0000:ffffce5600477db0 EFLAGS: 00010246
  kernel: RAX: 0000000000000000 RBX: 0000000000008001 RCX: 0000000000000001
  kernel: RDX: 0000000000000002 RSI: 0000000000008001 RDI: ffff8a4a6d180000
  kernel: RBP: ffffce5600477e08 R08: 0000000000000000 R09: 0000000000000000
  kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
  kernel: R13: 0000000000000000 R14: ffff8a4a6d180000 R15: 0000000000000000
  kernel: FS:  0000000000000000(0000) GS:ffff8a4da86ff000(0000) 
knlGS:0000000000000000
  kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  kernel: CR2: ffffffffffffffd6 CR3: 0000000101242006 CR4: 00000000001726f0
  kernel: Call Trace:
  kernel:  <TASK>
  kernel:  amdgpu_gmc_flush_gpu_tlb_pasid+0xfd/0x480 [amdgpu]
  kernel:  amdgpu_tlb_fence_work+0x77/0x110 [amdgpu]
  kernel:  process_one_work+0x18e/0x370
  kernel:  worker_thread+0x317/0x450
  kernel:  ? _raw_spin_lock_irqsave+0xe/0x20
  kernel:  ? __pfx_worker_thread+0x10/0x10
  kernel:  kthread+0x10b/0x220
  kernel:  ? __pfx_kthread+0x10/0x10
  kernel:  ret_from_fork+0x134/0x150
  kernel:  ? __pfx_kthread+0x10/0x10
  kernel:  ret_from_fork_asm+0x1a/0x30
  kernel:  </TASK>
  kernel: Modules linked in: qrtr rfcomm cmac algif_hash algif_skcipher af_alg 
bnep ip6t_REJECT nf_reject_ipv6 xt_hl ip6t_rt ipt_REJECT nf_reject_ipv4 xt_LOG 
nf_log_syslog nft_limit xt_limit xt_addrtype xt_mac xt_tcpudp xt_conntrack 
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat binfmt_misc nf_tables 
amdgpu usblp intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal 
intel_powerclamp coretemp kvm_intel amdxcp at24 mei_hdcp mei_pxp kvm 
snd_hda_codec_atihdmi drm_panel_backlight_quirks gpu_sched irqbypass 
snd_hda_codec_hdmi drm_buddy snd_hda_codec_alc662 rapl btusb 
snd_hda_codec_realtek_lib intel_cstate snd_hda_codec_generic radeon btrtl 
snd_hda_intel btintel i2c_i801 btbcm snd_hda_codec btmtk i2c_smbus 
drm_ttm_helper i2c_mux ttm bluetooth snd_seq_midi snd_hda_core 
snd_seq_midi_event drm_exec snd_intel_dspcfg snd_rawmidi drm_suballoc_helper 
snd_intel_sdw_acpi drm_display_helper lpc_ich snd_hwdep snd_seq snd_pcm 
snd_seq_device cec snd_timer rc_core snd i2c_algo_bit soundcore mei_me mei 
intel_smartconnect joydev
  kernel:  input_leds mac_hid sch_fq_codel msr parport_pc ppdev lp parport 
efi_pstore nfnetlink dmi_sysfs ip_tables x_tables autofs4 dm_crypt wacom uas 
usb_storage hid_generic usbhid hid r8169 polyval_clmulni ghash_clmulni_intel 
psmouse ahci realtek serio_raw libahci video wmi aesni_intel
  kernel: CR2: 0000000000000000
  kernel: ---[ end trace 0000000000000000 ]---
  kernel: RIP: 0010:0x0
  kernel: Code: Unable to access opcode bytes at 0xffffffffffffffd6.
  kernel: RSP: 0018:ffffce560061fdb0 EFLAGS: 00010246
  kernel: RAX: 0000000000000000 RBX: 0000000000008000 RCX: 0000000000000001
  kernel: RDX: 0000000000000002 RSI: 0000000000008000 RDI: ffff8a4a6d180000
  kernel: RBP: ffffce560061fe08 R08: 0000000000000000 R09: 0000000000000000
  kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
  kernel: R13: 0000000000000000 R14: ffff8a4a6d180000 R15: 0000000000000000
  kernel: FS:  0000000000000000(0000) GS:ffff8a4da86ff000(0000) 
knlGS:0000000000000000
  kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  kernel: CR2: ffffffffffffffd6 CR3: 0000000101242006 CR4: 00000000001726f0
  kernel: note: kworker/1:1[91] exited with irqs disabled
  
  The previous kernel 6.17.0-14-generic boots without any issues.
  
  I'll try to attach the required information using `apport-collect -p linux 
BUG#`, but it'll be collected after successfully booting with 6.17.0-14, 
whereas the bug occurs with 6.17.0-19.
- --- 
+ ---
  ProblemType: Bug
  ApportVersion: 2.33.1-0ubuntu3
  Architecture: amd64
  AudioDevicesInUse:
-  USER        PID ACCESS COMMAND
-  /dev/snd/controlC0:  mateusz    3017 F.... wireplumber
-  /dev/snd/controlC1:  mateusz    3017 F.... wireplumber
-  /dev/snd/seq:        mateusz    2999 F.... pipewire
+  USER        PID ACCESS COMMAND
+  /dev/snd/controlC0:  mateusz    3017 F.... wireplumber
+  /dev/snd/controlC1:  mateusz    3017 F.... wireplumber
+  /dev/snd/seq:        mateusz    2999 F.... pipewire
  CasperMD5CheckResult: unknown
  CurrentDesktop: ubuntu:GNOME
  DistroRelease: Ubuntu 25.10
  InstallationDate: Installed on 2020-10-14 (1979 days ago)
  InstallationMedia: Ubuntu 20.04.1 LTS "Focal Fossa" - Release amd64 (20200731)
  MachineType: To Be Filled By O.E.M. To Be Filled By O.E.M.
  Package: linux (not installed)
  ProcEnviron:
-  LANG=pl_PL.UTF-8
-  PATH=(custom, no user)
-  SHELL=/bin/bash
-  TERM=xterm-256color
+  LANG=pl_PL.UTF-8
+  PATH=(custom, no user)
+  SHELL=/bin/bash
+  TERM=xterm-256color
  ProcFB: 0 amdgpudrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-6.17.0-14-generic 
root=UUID=354e3c09-bfde-4e47-850f-fe872a882ae5 ro quiet splash 
radeon.si_support=0 amdgpu.si_support=1 
crashkernel=2G-4G:320M,4G-32G:512M,32G-64G:1024M,64G-128G:2048M,128G-:4096M 
vt.handoff=7
  ProcVersionSignature: Ubuntu 6.17.0-14.14-generic 6.17.9
  RelatedPackageVersions:
-  firmware-sof   N/A
-  linux-firmware 20250901.git993ff19b-0ubuntu1.9
+  firmware-sof   N/A
+  linux-firmware 20250901.git993ff19b-0ubuntu1.9
  RfKill:
-  0: hci0: Bluetooth
-       Soft blocked: yes
-       Hard blocked: no
+  0: hci0: Bluetooth
+   Soft blocked: yes
+   Hard blocked: no
  Tags: questing
  Uname: Linux 6.17.0-14-generic x86_64
  UpgradeStatus: Upgraded to questing on 2026-01-10 (65 days ago)
  UserGroups: N/A
  _MarkForUpload: True
  dmi.bios.date: 04/10/2012
  dmi.bios.release: 4.6
  dmi.bios.vendor: American Megatrends Inc.
  dmi.bios.version: P1.10
  dmi.board.name: Z77 Pro3
  dmi.board.vendor: ASRock
  dmi.chassis.asset.tag: To Be Filled By O.E.M.
  dmi.chassis.type: 3
  dmi.chassis.vendor: To Be Filled By O.E.M.
  dmi.chassis.version: To Be Filled By O.E.M.
  dmi.modalias: 
dmi:bvnAmericanMegatrendsInc.:bvrP1.10:bd04/10/2012:br4.6:svnToBeFilledByO.E.M.:pnToBeFilledByO.E.M.:pvrToBeFilledByO.E.M.:rvnASRock:rnZ77Pro3:rvr:cvnToBeFilledByO.E.M.:ct3:cvrToBeFilledByO.E.M.:skuToBeFilledByO.E.M.:
  dmi.product.family: To Be Filled By O.E.M.
  dmi.product.name: To Be Filled By O.E.M.
  dmi.product.sku: To Be Filled By O.E.M.
  dmi.product.version: To Be Filled By O.E.M.
  dmi.sys.vendor: To Be Filled By O.E.M.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2144577

Title:
  BUG: kernel NULL pointer dereference in amdgpu

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2144577/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to