Public bug reported:

Two kernel crashes occurred on the same machine (Dell Pro Max Tower T2
FCT2250, RTX 4090, Ubuntu 24.04.4 LTS) running kernel 6.17.0-1023-oem
with the NVIDIA 595.71.05 open kernel module.

=== Crash 1 — 2026-05-30 ===

Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: 
__schedule+0x763/0x7a0
CPU: 21 UID: 1001 PID: 104912 Comm: HeapHelper
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE

The crashing process "HeapHelper" is a CUDA Unified Memory (UVM)
management helper thread. The kernel stack was corrupted in the
scheduler, detected by the stack-protector canary check.

=== Crash 2 — 2026-06-04 ===

kernel tried to execute NX-protected page - exploit attempt? (uid: 1001)
BUG: unable to handle page fault for address: ffff8e4081580000
#PF: supervisor instruction fetch in kernel mode
#PF: error_code(0x0011) - permissions violation
CPU: 21 UID: 1001 PID: 365452 Comm: iou-sqp-365444
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
RIP: 0010:0xffff8e4081580000
Call Trace:
  ? io_sq_thread+0x3cc/0x810
  ? ret_from_fork+0x121/0x140

The crashing thread "iou-sqp-365444" is an io_uring SQPOLL kernel
thread. The instruction pointer points to an NX-protected page (all
zeros), indicating a corrupted function pointer.

=== Common Characteristics ===

Both crashes share:
- Same physical CPU core: CPU 21
- Same user: UID 1001 (running CUDA + io_uring workloads)
- Same kernel taint: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE (nvidia open kernel 
module)
- Same kernel version: 6.17.0-1023-oem
- Both are memory corruption patterns

=== System Information ===

Machine: Dell Pro Max Tower T2 FCT2250
BIOS: 1.14.1 (2026-04-02)
CPU: Intel Core Ultra 9 285K (24 cores)
GPU: NVIDIA GeForce RTX 4090
Memory: 128 GB
OS: Ubuntu 24.04.4 LTS
Kernel (crashed): 6.17.0-1023-oem
Kernel (current): 6.17.0-1024-oem
NVIDIA driver: 595.71.05 (open kernel module, DKMS)

=== Hypothesis ===
The nvidia_uvm driver (open kernel module) corrupts kernel memory when managing 
GPU page tables for processes that simultaneously use CUDA Unified Memory and 
io_uring asynchronous I/O. The corruption is later dereferenced by kernel 
threads (io_sq_thread, __schedule).

The fact that both crashes hit CPU 21 suggests a possible per-CPU data
structure issue.

=== Mitigation ===

Setting kernel.io_uring_disabled=1 (disables io_uring SQPOLL mode only)
has stabilized the system.

=== Attachments ===

Crash dump files available on the affected machine:
- /var/crash/linux-image-6.17.0-1024-oem-202606041723.crash (June 4)
- /var/crash/linux-image-6.17.0-1023-oem-202605301822.crash (May 30)

Full vmcore dmesg extracts also available upon request.

** Affects: linux-oem-6.17 (Ubuntu)
     Importance: Undecided
         Status: New


** Tags: kernel-crash memory-corruption nvidia oem-kernel regression

** Attachment added: "linux-image-6.17.0-1024-oem-crash.zip"
   
https://bugs.launchpad.net/bugs/2155623/+attachment/5975676/+files/linux-image-6.17.0-1024-oem-crash.zip

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2155623

Title:
  Kernel memory corruption: io_sq_thread executes NX page on CPU 21 —
  likely nvidia_uvm interaction (6.17.0-1023-oem)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-oem-6.17/+bug/2155623/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to