Hello Salvatore,
Thanks for the guidance. I have performed git bisection and have arrived
at the first commit where the regression is seen, please see below.
I should note that the kernel oops is not fatal right away, i.e. while
it appears to crash a kworker (repeatedly), it doesn't immediately make
the system unresponsive. That does seem to occur eventually, however.
Amit Gurdasani
git bisect output:
# amitg @ athena in …/linux-stable (23316ed) (BISECTING) [new;] [01:00:00]
$ git bisect bad
23316ed02c228b52f871050f98a155f3d566c450 is the first bad commit
commit 23316ed02c228b52f871050f98a155f3d566c450 (HEAD)
Author: Prike Liang <[email protected]>
Date: Fri Oct 31 17:02:51 2025 +0800
drm/amdgpu: attach tlb fence to the PTs update
commit b4a7f4e7ad2b120a94f3111f92a11520052c762d upstream.
Ensure the userq TLB flush is emitted only after
the VM update finishes and the PT BOs have been
annotated with bookkeeping fences.
Suggested-by: Christian König <[email protected]>
Signed-off-by: Prike Liang <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
(cherry picked from commit f3854e04b708d73276c4488231a8bd66d30b4671)
Cc: [email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
# amitg @ athena in …/linux-stable (23316ed) (BISECTING) [new;] [01:00:04]
$ git bisect log
git bisect start
# status: waiting for both good and bad commits
# good: [1bfd0faa78d09eb41b81b002e0292db0f3e75de0] Linux 6.17.9
git bisect good 1bfd0faa78d09eb41b81b002e0292db0f3e75de0
# status: waiting for bad commit, 1 good commit known
# bad: [5439375ca6987ed27eba246a3b9e036357fd6ba2] Linux 6.17.11
git bisect bad 5439375ca6987ed27eba246a3b9e036357fd6ba2
# good: [cd1aa3e40297c783f69c6e44f0d7aafa10fb131f] drm/i915/psr: Check
drm_dp_dpcd_read return value on PSR dpcd init
git bisect good cd1aa3e40297c783f69c6e44f0d7aafa10fb131f
# good: [9d0adde1319591a70cf827ce6f2cc18541b75ada] iio: adc: ad7124: fix
temperature channel
git bisect good 9d0adde1319591a70cf827ce6f2cc18541b75ada
# good: [88163f85d59b4164884df900ee171720fd26686b] mptcp: Initialise
rcv_mss before calling tcp_send_active_reset() in mptcp_do_fastclose().
git bisect good 88163f85d59b4164884df900ee171720fd26686b
# good: [b4f97ed17917c50e5232083c0dd60f655e12a341] drm: sti: fix device
leaks at component probe
git bisect good b4f97ed17917c50e5232083c0dd60f655e12a341
# bad: [32abbcf4379a0f851d7eb9d4389e7bf5c64bf6c0] net: dsa: microchip:
Don't free uninitialized ksz_irq
git bisect bad 32abbcf4379a0f851d7eb9d4389e7bf5c64bf6c0
# bad: [62150f1e7ec707da76ff353fb7db51fef9cd6557] drm/amd/display: Check
NULL before accessing
git bisect bad 62150f1e7ec707da76ff353fb7db51fef9cd6557
# good: [1966838d1c82149cbf4a652322d26a6e5aae9c4e] drm/xe/guc: Fix
stack_depot usage
git bisect good 1966838d1c82149cbf4a652322d26a6e5aae9c4e
# bad: [418ec6670bc2e44c100ae9709e4ea261de5de198] drm/amd/amdgpu:
reserve vm invalidation engine for uni_mes
git bisect bad 418ec6670bc2e44c100ae9709e4ea261de5de198
# bad: [23316ed02c228b52f871050f98a155f3d566c450] drm/amdgpu: attach tlb
fence to the PTs update
git bisect bad 23316ed02c228b52f871050f98a155f3d566c450
# first bad commit: [23316ed02c228b52f871050f98a155f3d566c450]
drm/amdgpu: attach tlb fence to the PTs update
The first kernel oops seen with the bisected kernel was:
2025-12-17T00:57:38.831219+00:00 athena kernel: BUG: kernel NULL pointer
dereference, address: 0000000000000000
2025-12-17T00:57:38.831224+00:00 athena kernel: #PF: supervisor
instruction fetch in kernel mode
2025-12-17T00:57:38.831225+00:00 athena kernel: #PF: error_code(0x0010)
- not-present page
2025-12-17T00:57:38.831225+00:00 athena kernel: PGD 0 P4D 0
2025-12-17T00:57:38.831226+00:00 athena kernel: Oops: Oops: 0010 [#4]
SMP NOPTI
2025-12-17T00:57:38.831226+00:00 athena kernel: CPU: 12 UID: 0 PID: 212
Comm: kworker/12:1 Tainted: G S UD OE 6.17.10+ #12 PREEMPT(lazy)
2025-12-17T00:57:38.831227+00:00 athena kernel: Tainted:
[S]=CPU_OUT_OF_SPEC, [U]=USER, [D]=DIE, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
2025-12-17T00:57:38.831228+00:00 athena kernel: Hardware name: Gigabyte
Technology Co., Ltd. Z690 UD DDR4/Z690 UD DDR4, BIOS F29 09/27/2024
2025-12-17T00:57:38.831228+00:00 athena kernel: Workqueue: events
amdgpu_tlb_fence_work [amdgpu]
2025-12-17T00:57:38.831230+00:00 athena kernel: RIP: 0010:0x0
2025-12-17T00:57:38.831230+00:00 athena kernel: Code: Unable to access
opcode bytes at 0xffffffffffffffd6.
2025-12-17T00:57:38.831231+00:00 athena kernel: RSP:
0018:ffffd0218092fde0 EFLAGS: 00010246
2025-12-17T00:57:38.831231+00:00 athena kernel: RAX: 0000000000000000
RBX: 0000000000008003 RCX: 0000000000000001
2025-12-17T00:57:38.831231+00:00 athena kernel: RDX: 0000000000000002
RSI: 0000000000008003 RDI: ffff8b11cde00000
2025-12-17T00:57:38.831231+00:00 athena kernel: RBP: 0000000000000001
R08: 0000000000000000 R09: 0000000000000001
2025-12-17T00:57:38.831232+00:00 athena kernel: R10: 000000000000000c
R11: 0000000000000000 R12: 0000000000000000
2025-12-17T00:57:38.831232+00:00 athena kernel: R13: 0000000000000002
R14: 0000000000000000 R15: ffff8b11cde00000
2025-12-17T00:57:38.831232+00:00 athena kernel: FS:
0000000000000000(0000) GS:ffff8b19b4f39000(0000) knlGS:0000000000000000
2025-12-17T00:57:38.831232+00:00 athena kernel: CS: 0010 DS: 0000 ES:
0000 CR0: 0000000080050033
2025-12-17T00:57:38.831232+00:00 athena kernel: CR2: ffffffffffffffd6
CR3: 000000016c280000 CR4: 0000000000f52ef0
2025-12-17T00:57:38.831233+00:00 athena kernel: PKRU: 55555554
2025-12-17T00:57:38.831233+00:00 athena kernel: Call Trace:
2025-12-17T00:57:38.831233+00:00 athena kernel: <TASK>
2025-12-17T00:57:38.831233+00:00 athena kernel:
amdgpu_gmc_flush_gpu_tlb_pasid+0xd6/0x400 [amdgpu]
2025-12-17T00:57:38.831234+00:00 athena kernel:
amdgpu_tlb_fence_work+0x6e/0xe0 [amdgpu]
2025-12-17T00:57:38.831234+00:00 athena kernel: process_one_work+0x18f/0x350
2025-12-17T00:57:38.831234+00:00 athena kernel: worker_thread+0x25a/0x3a0
2025-12-17T00:57:38.831234+00:00 athena kernel: ?
__pfx_worker_thread+0x10/0x10
2025-12-17T00:57:38.831234+00:00 athena kernel: kthread+0xf9/0x240
2025-12-17T00:57:38.831235+00:00 athena kernel: ? __pfx_kthread+0x10/0x10
2025-12-17T00:57:38.831235+00:00 athena kernel: ? __pfx_kthread+0x10/0x10
2025-12-17T00:57:38.831235+00:00 athena kernel: ret_from_fork+0x194/0x1c0
2025-12-17T00:57:38.831235+00:00 athena kernel: ? __pfx_kthread+0x10/0x10
2025-12-17T00:57:38.831236+00:00 athena kernel: ret_from_fork_asm+0x1a/0x30
2025-12-17T00:57:38.831236+00:00 athena kernel: </TASK>
2025-12-17T00:57:38.831236+00:00 athena kernel: Modules linked in:
snd_seq_dummy snd_hrtimer snd_seq_midi snd_seq_midi_event snd_seq
xt_conntrack xt_MASQUERADE xt_set ip_set xt_CHECKSUM xt_addrtype
ipt_REJECT nf_reject_ipv4 xt_tcpudp xfrm_user nft_compat x_tables
xfrm_algo nft_chain_nat nf_nat nf_conntrack tls nf_defrag_ipv6
nf_defrag_ipv4 nf_tables bridge stp appletalk psnap llc overlay cfg80211
qrtr rfkill uinput sunrpc binfmt_misc nls_ascii nls_cp437 vfat fat
crc32c_cryptoapi dm_integrity snd_sof_pci_intel_tgl
snd_sof_pci_intel_cnl snd_sof_intel_hda_generic mei_hdcp snd_sof_pci
mei_pxp intel_rapl_msr snd_sof_xtensa_dsp snd_sof_intel_hda_common
intel_uncore_frequency intel_uncore_frequency_common snd_soc_hdac_hda
x86_pkg_temp_thermal snd_sof_intel_hda intel_powerclamp
snd_hda_codec_intelhdmi snd_sof snd_sof_utils snd_soc_acpi_intel_match
kvm_intel snd_soc_acpi snd_soc_acpi_intel_sdca_quirks
snd_sof_intel_hda_mlink snd_hda_codec_alc662 snd_soc_sdca
snd_hda_codec_realtek_lib snd_hda_codec_atihdmi snd_hda_codec_generic
snd_hda_codec_hdmi snd_soc_avs kvm
2025-12-17T00:57:38.831237+00:00 athena kernel: snd_hda_intel
snd_soc_hda_codec snd_usb_audio snd_hda_ext_core snd_hda_codec
snd_soc_core snd_intel_dspcfg uvcvideo snd_hwdep snd_usbmidi_lib
snd_hda_core processor_thermal_device_pci snd_rawmidi irqbypass
videobuf2_vmalloc processor_thermal_device snd_compress
ghash_clmulni_intel snd_seq_device videobuf2_memops
processor_thermal_wt_hint snd_pcm_oss aesni_intel uvc
platform_temperature_control rapl processor_thermal_rfim videobuf2_v4l2
snd_mixer_oss intel_cstate processor_thermal_rapl snd_pcm videodev
intel_rapl_common mei_me snd_timer intel_uncore mxm_wmi ee1004 wmi_bmof
pcspkr gigabyte_wmi processor_thermal_wt_req mei videobuf2_common snd
processor_thermal_power_floor mc processor_thermal_mbox joydev soundcore
int340x_thermal_zone intel_pmc_core pmt_telemetry pmt_discovery
pmt_class int3400_thermal intel_pmc_ssram_telemetry acpi_pad acpi_tad
acpi_thermal_rel evdev button sg ppdev lp parport_pc it87(OE) bfq
parport kyber_iosched coretemp drivetemp netconsole msr i2c_dev
hwmon_vid efi_pstore nfnetlink
2025-12-17T00:57:38.831237+00:00 athena kernel: autofs4 xfs raid10
raid0 dm_mirror dm_region_hash dm_log wacom dm_thin_pool
dm_persistent_data dm_bufio dm_bio_prison raid1 dm_raid raid456
async_raid6_recov async_memcpy async_pq async_xor xor async_tx
hid_generic usbhid hid raid6_pq md_mod dm_mod xe uas intel_vsec
usb_storage amdgpu sr_mod drm_gpuvm configfs sd_mod cdrom
drm_gpusvm_helper amdxcp gpu_sched drm_panel_backlight_quirks crc16 i915
radeon drm_ttm_helper drm_buddy ttm drm_client_lib drm_exec i2c_algo_bit
drm_suballoc_helper drm_display_helper xhci_pci_renesas drm_kms_helper
ahci r8169 iTCO_wdt libahci intel_pmc_bxt xhci_pci iTCO_vendor_support
watchdog xhci_hcd drm libata nvme nvme_core realtek usbcore mdio_devres
scsi_mod nvme_keyring libphy cec nvme_auth video mdio_bus intel_lpss_pci
rc_core i2c_i801 pinctrl_alderlake intel_lpss wmi scsi_common fan
i2c_smbus usb_common idma64 efivarfs
2025-12-17T00:57:38.831237+00:00 athena kernel: CR2: 0000000000000000
2025-12-17T00:57:38.831238+00:00 athena kernel: ---[ end trace
0000000000000000 ]---
On 16/12/2025 12:22, Salvatore Bonaccorso wrote:
Control: tags -1 + upstream moreinfo
Hi Amit,
On Tue, Dec 16, 2025 at 11:26:26AM +0000, Amit Gurdasani wrote:
Package: linux-image-6.17.11+deb14-amd64
Version: 6.17.11-1
Severity: important
X-Debbugs-Cc: [email protected]
User: [email protected]
Usertags: amd64
Dear Maintainer,
I had my Debian testing desktop reboot after an unattended upgrade where the
kernel was updated from linux-image-6.17.9+deb14-amd64 to
linux-image-6.17.11+deb14-amd64. After reboot overnight, I found the machine
unresponsive in the morning. Rebooting revealed many successive kernel oops
in amdgpu. Oops text below.
The GPU in use is an AMD Radeon R9 270X ("Curacao XT"), from the GCN 1.0
generation. I _am_ using the following argument on the kernel command line
to gain some performance:
amdgpu.ppfeaturemask=0xffffffff
This kernel oops was not occurring in kernels up to and including 6.17.9
(Debian-packaged).
I have not tried to boot 6.17.11 without the amdgpu.ppfeaturemask=0xffffffff
kernel command-line argument to see if the oops still occurs.
I did find that there was some work done on amdgpu in November:
https://lists.freedesktop.org/archives/amd-gfx/2025-November/133356.html
I don't know enough to know whether that work could cause this kernel oops.
Downgrading back to 6.17.9 has eliminated the kernel oops.
Thanks,
Amit Gurdasani
Oops text:
2025-12-16T10:12:43.249625+00:00 athena kernel: amdgpu 0000:01:00.0:
[drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on uvd (-110).
2025-12-16T10:12:43.249625+00:00 athena kernel: amdgpu 0000:01:00.0: amdgpu: ib
ring test failed (-110).
2025-12-16T10:12:43.249627+00:00 athena kernel: BUG: kernel NULL pointer
dereference, address: 0000000000000000
2025-12-16T10:12:43.249627+00:00 athena kernel: #PF: supervisor instruction
fetch in kernel mode
2025-12-16T10:12:43.249627+00:00 athena kernel: #PF: error_code(0x0010) -
not-present page
2025-12-16T10:12:43.249627+00:00 athena kernel: PGD 0 P4D 0
2025-12-16T10:12:43.249627+00:00 athena kernel: Oops: Oops: 0010 [#1] SMP NOPTI
2025-12-16T10:12:43.249627+00:00 athena kernel: CPU: 2 UID: 0 PID: 564 Comm:
kworker/2:2 Tainted: G S U 6.17.11+deb14-amd64 #1 PREEMPT(lazy)
Debian 6.17.11-1
2025-12-16T10:12:43.249628+00:00 athena kernel: Tainted: [S]=CPU_OUT_OF_SPEC,
[U]=USER
2025-12-16T10:12:43.249629+00:00 athena kernel: Hardware name: Gigabyte
Technology Co., Ltd. Z690 UD DDR4/Z690 UD DDR4, BIOS F29 09/27/2024
2025-12-16T10:12:43.249629+00:00 athena kernel: Workqueue: events
amdgpu_tlb_fence_work [amdgpu]
2025-12-16T10:12:43.249629+00:00 athena kernel: RIP: 0010:0x0
2025-12-16T10:12:43.249630+00:00 athena kernel: Code: Unable to access opcode
bytes at 0xffffffffffffffd6.
2025-12-16T10:12:43.249630+00:00 athena kernel: RSP: 0018:ffffca9902dbfde0
EFLAGS: 00010246
2025-12-16T10:12:43.249630+00:00 athena kernel: RAX: 0000000000000000 RBX:
0000000000008000 RCX: 0000000000000001
2025-12-16T10:12:43.249632+00:00 athena kernel: RDX: 0000000000000002 RSI:
0000000000008000 RDI: ffff8a34a8a00000
2025-12-16T10:12:43.249633+00:00 athena kernel: RBP: 0000000000000001 R08:
0000000000000000 R09: 0000000000000001
2025-12-16T10:12:43.249633+00:00 athena kernel: R10: 0000000000000002 R11:
0000000000000000 R12: 0000000000000000
2025-12-16T10:12:43.249633+00:00 athena kernel: R13: 0000000000000002 R14:
0000000000000000 R15: ffff8a34a8a00000
2025-12-16T10:12:43.249633+00:00 athena kernel: FS: 0000000000000000(0000)
GS:ffff8a3c8a288000(0000) knlGS:0000000000000000
2025-12-16T10:12:43.249633+00:00 athena kernel: CS: 0010 DS: 0000 ES: 0000
CR0: 0000000080050033
2025-12-16T10:12:43.249635+00:00 athena kernel: CR2: ffffffffffffffd6 CR3:
000000018082c000 CR4: 0000000000f50ef0
2025-12-16T10:12:43.249635+00:00 athena kernel: PKRU: 55555554
2025-12-16T10:12:43.249635+00:00 athena kernel: Call Trace:
2025-12-16T10:12:43.249635+00:00 athena kernel: <TASK>
2025-12-16T10:12:43.249635+00:00 athena kernel:
amdgpu_gmc_flush_gpu_tlb_pasid+0xd6/0x400 [amdgpu]
2025-12-16T10:12:43.249635+00:00 athena kernel:
amdgpu_tlb_fence_work+0x6e/0xe0 [amdgpu]
2025-12-16T10:12:43.249636+00:00 athena kernel: process_one_work+0x18f/0x350
2025-12-16T10:12:43.249638+00:00 athena kernel: worker_thread+0x25a/0x3a0
2025-12-16T10:12:43.249638+00:00 athena kernel: ? __pfx_worker_thread+0x10/0x10
2025-12-16T10:12:43.249639+00:00 athena kernel: kthread+0xf9/0x240
2025-12-16T10:12:43.249639+00:00 athena kernel: ? __pfx_kthread+0x10/0x10
2025-12-16T10:12:43.249639+00:00 athena kernel: ? __pfx_kthread+0x10/0x10
2025-12-16T10:12:43.249639+00:00 athena kernel: ret_from_fork+0x194/0x1c0
2025-12-16T10:12:43.249640+00:00 athena kernel: ? __pfx_kthread+0x10/0x10
2025-12-16T10:12:43.249641+00:00 athena kernel: ret_from_fork_asm+0x1a/0x30
2025-12-16T10:12:43.249641+00:00 athena kernel: </TASK>
2025-12-16T10:12:43.249641+00:00 athena kernel: Modules linked in: dm_thin_pool
dm_persistent_data dm_bio_prison dm_bufio raid1 dm_raid raid456
async_raid6_recov async_memcpy async_pq async_xor async_tx md_mod xor
hid_generic usbhid hid raid6_pq amdgpu(+) dm_mod xe uas usb_storage intel_vsec
sr_mod drm_gpuvm configfs amdxcp cdrom sd_mod drm_panel_backlight_quirks
drm_gpusvm_helper gpu_sched crc16 i915 radeon drm_ttm_helper drm_buddy ttm
drm_exec drm_suballoc_helper i2c_algo_bit drm_display_helper cec rc_core
drm_client_lib ahci drm_kms_helper iTCO_wdt intel_pmc_bxt libahci
xhci_pci_renesas iTCO_vendor_support drm xhci_pci libata watchdog mxm_wmi nvme
xhci_hcd r8169 nvme_core realtek mdio_devres scsi_mod usbcore libphy video
nvme_keyring intel_lpss_pci i2c_i801 mdio_bus intel_lpss nvme_auth wmi fan
i2c_smbus scsi_common button pinctrl_alderlake usb_common idma64 efivarfs
2025-12-16T10:12:43.249643+00:00 athena kernel: CR2: 0000000000000000
2025-12-16T10:12:43.249646+00:00 athena kernel: ---[ end trace 0000000000000000
]---
2025-12-16T10:12:43.249646+00:00 athena kernel: RIP: 0010:0x0
2025-12-16T10:12:43.249647+00:00 athena kernel: Code: Unable to access opcode
bytes at 0xffffffffffffffd6.
2025-12-16T10:12:43.249647+00:00 athena kernel: RSP: 0018:ffffca9902dbfde0
EFLAGS: 00010246
2025-12-16T10:12:43.249647+00:00 athena kernel: RAX: 0000000000000000 RBX:
0000000000008000 RCX: 0000000000000001
2025-12-16T10:12:43.249647+00:00 athena kernel: RDX: 0000000000000002 RSI:
0000000000008000 RDI: ffff8a34a8a00000
2025-12-16T10:12:43.249647+00:00 athena kernel: RBP: 0000000000000001 R08:
0000000000000000 R09: 0000000000000001
2025-12-16T10:12:43.249649+00:00 athena kernel: R10: 0000000000000002 R11:
0000000000000000 R12: 0000000000000000
2025-12-16T10:12:43.249649+00:00 athena kernel: R13: 0000000000000002 R14:
0000000000000000 R15: ffff8a34a8a00000
2025-12-16T10:12:43.249649+00:00 athena kernel: FS: 0000000000000000(0000)
GS:ffff8a3c8a288000(0000) knlGS:0000000000000000
2025-12-16T10:12:43.249649+00:00 athena kernel: CS: 0010 DS: 0000 ES: 0000
CR0: 0000000080050033
2025-12-16T10:12:43.249649+00:00 athena kernel: CR2: ffffffffffffffd6 CR3:
000000018082c000 CR4: 0000000000f50ef0
2025-12-16T10:12:43.249649+00:00 athena kernel: PKRU: 55555554
2025-12-16T10:12:43.249651+00:00 athena kernel: note: kworker/2:2[564] exited
with irqs disabled
Thanks for the report. As this is a regression between the two
version, can you please bisect the changes to identify which commits
breaks. That would involve compiling and testing a few kernels:
git clone --single-branch -b linux-6.17.y
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
cd linux-stable
git checkout v6.17.9
cp /boot/config-$(uname -r) .config
yes '' | make localmodconfig
make savedefconfig
mv defconfig arch/x86/configs/my_defconfig
# test 6.17.9 to ensure this is "good"
make my_defconfig
make -j $(nproc) bindeb-pkg
... install the resulting .deb package and confirm it successfully boots /
problem does not exist
# test 6.17.11 to ensure this is "bad"
git checkout v6.12.35
make my_defconfig
make -j $(nproc) bindeb-pkg
... install the resulting .deb package and confirm it fails to boot /
problem exists
With that confirmed, the bisection can start:
git bisect start
git bisect good v6.17.9
git bisect bad v6.17.11
In each bisection step git checks out a state between the oldest
known-bad and the newest known-good commit. In each step test using:
make my_defconfig
make -j $(nproc) bindeb-pkg
... install, try to boot / verify if problem exists
and if the problem is hit run:
git bisect bad
and if the problem doesn't trigger run:
git bisect good
. Please pay attention to always select the just built kernel for
booting, it won't always be the default kernel picked up by grub.
Iterate until git announces to have identified the first bad commit.
Then provide the output of
git bisect log
In the course of the bisection you might have to uninstall previous
kernels again to not exhaust the disk space in /boot. Also in the end
uninstall all self-built kernels again.
Regards,
Salvatore