Public bug reported:
**Package:** linux (Ubuntu Noble 24.04 LTS)
**Kernel:** 6.8.0-124-generic #124-Ubuntu SMP PREEMPT_DYNAMIC (Ubuntu
6.8.0-124.124, base 6.8.12)
**Severity:** High — repeatable hard crash (kernel panic / reboot) on
production hosts, ~every 2–6 days
---
## Summary
On heavily loaded LXD container hosts running 6.8.0-124-generic, the
kernel panics repeatedly with memory corruption localized to the
**dentry slab cache**. A captured kdump vmcore shows a **use-after-
free**: the dentry slab freelist is overwritten with non-pointer
garbage, and concurrent threads doing `/proc` path lookups and process-
exit dentry teardown fault on the corrupted objects.
The corruption is confirmed by `crash`'s own slab validator (`kmem -s`
reports `invalid freepointer` on dentry slabs) and by three CPUs caught
mid-fault in the same dentry alloc/free paths in a single core.
The same workload on **6.8.0-90-generic does not exhibit the crash**
(see A/B test below), which points to a regression in the dentry/procfs
path between -90 and -124, or to a latent race newly exposed by changes
in that range.
---
## Environment
- **Hardware:** Supermicro SYS-611C-TN4R / X13DDW-A, BIOS 2.7 (07/23/2025),
dual-socket, 64 logical CPUs, 256 GB RAM
- **Root/storage:** OpenZFS (zfs 2.2.2-0ubuntu9.4), kernel tainted `PO`
(out-of-tree + proprietary ZFS module)
- **Workload:** ~90 LXD system containers (managed WordPress hosting); very
high concurrent fork/exit and `/proc` scanning from per-container nginx /
php-fpm / mariadbd / redis plus host-side monitoring (`ps`), backups (`tar`)
- **Crash cadence:** every ~2–6 days; uptime at this capture was 8 days
- **EDAC/MCE:** clean (`ras-mc-ctl --summary/--errors` show no memory or PCIe
errors; IPMI SEL clean apart from PSU/chassis noise) — not a hardware memory
fault
---
## Impact
Each event is a hard kernel panic. With `panic_on_oops=1` / `panic=10`
the host self-reboots, but every crash is a full outage of ~90 tenant
containers. The corruption surfaces in unrelated subsystems (dentry
teardown, dentry alloc, socket/pid allocation) because it is a slab
freelist UAF — the faulting site is never the bug site, which makes it
look like random instability until the dump is examined.
---
## Crash analysis (from kdump vmcore, full matching dbgsym)
Panic task and primary oops:
```
PANIC: "Oops: 0000 [#1] PREEMPT SMP NOPTI"
COMMAND: "ps" CPU: 37
[exception RIP: dentry_unlink_inode+251] (NULL deref; RAX/RDX/RSI/RDI = 0)
#8 dentry_unlink_inode
#9 __dentry_kill
#10 shrink_dentry_list
#11 shrink_dcache_parent
#12 d_invalidate
#13 lookup_fast
#14 walk_component
#15 path_lookupat
#16 filename_lookup
#17 vfs_statx
#18 vfs_fstatat
#19 __do_sys_newfstatat
```
The corrupted dentry is a procfs pid entry — `/proc/<pid>/cmdline`:
```
struct dentry {
d_name.name = "cmdline"
d_iname = "cmdline"
d_inode = 0x0 <-- already unlinked
d_op = pid_dentry_operations
d_lockref.count = -128 (0xffffff80) <-- refcount already driven negative
}
```
`crash`'s slab validator independently flags the dentry cache as corrupt
(no `slub_debug` was active at capture — this is structural freelist
validation):
```
kmem: dentry: slab: ffd8d770cc2fe300 invalid freepointer: 7d6cf1f4997700d6
kmem: dentry: slab: ffd8d770cc1abe00 invalid freepointer: 7d6cf1f494205b56
kmem: kmalloc-rcl-64: slab: ffd8d770cc26a700 invalid freepointer:
55ab8f7b3288b69a
```
Three CPUs were simultaneously in dentry alloc/free paths at panic — the
race, in one snapshot:
| CPU | Task | Operation | Fault |
|-----|------|-----------|-------|
| 37 | ps | dentry teardown: `dentry_unlink_inode ← __dentry_kill ←
shrink_dentry_list ← d_invalidate ← lookup_fast` (`/proc` stat walk) | NULL
deref on already-freed dentry (panicked first) |
| 4 | ps | dentry teardown: `dentry_unlink_inode ← __dentry_kill ← dput ←
lookup_fast ← open_last_lookups ← openat` | same fault site; spinning in
`native_queued_spin_lock_slowpath` |
| 45 | tar | dentry **allocation**: `kmem_cache_alloc_lru ← __d_alloc ←
d_alloc_parallel ← __lookup_slow` (stat walk) | GPF on poisoned freelist
pointer; R14 = dentry cache addr |
The `tar` GPF register state shows the poisoned pointer being consumed
from the dentry slab:
```
[exception RIP: kmem_cache_alloc_lru+221]
general protection fault (non-canonical address)
RAX: 627117ed820fc609 RDI: 627117ed820fc5a9 <-- garbage freelist pointer
R14: ff1a80bec01f6800 <-- dentry kmem_cache
```
This matches earlier pstore-only captures of the same host, where the
first event was consistently a GPF in `kmem_cache_alloc_lru` on a non-
canonical freelist pointer reached via `__d_alloc` / `alloc_pid` /
`sock_alloc_file` — all dentry/slab allocations off the fork/exit hot
paths.
---
## What is ruled out
- **Not ZFS.** All ZFS caches (`zfs_znode_cache`, `dnode_t`, `dmu_buf_impl_t`,
`arc_buf_*`) are intact in `kmem -s` — no `invalid freepointer` — despite
millions of live objects. ZFS appears only as a passing frame on the clone
path. (Kernel is ZFS-tainted; noted for completeness, but the corrupted cache
is core VFS `dentry`, not any ZFS slab.)
- **Not AppArmor notification CVEs (USN-8373-1 / CVE-2026-47326..47328).**
`apparmor_auditcache` is clean/empty; the AppArmor notification interface is
not in active use on these hosts (no `aa-notify` consumer,
`features/policy/notify` empty). The fault is in core procfs/VFS dentry
handling (`pid_dentry_operations`), unrelated to AppArmor.
- **Not hardware.** EDAC/MCE/SEL clean; corruption is structurally consistent
(always dentry slab, always teardown/alloc paths), not the random scatter of
failing DIMMs.
---
## A/B test (kernel version isolation)
Two near-identical heavily loaded hosts that both crashed on -124:
- **Host A (vps232):** kept on **6.8.0-124**, kdump-armed, used to capture this
vmcore.
- **Host B (vps193):** rolled back to **6.8.0-90-generic**, same workload (~90
containers), as control.
Expected discriminator within one crash interval: if Host B on -90 stays
up while Host A on -124 keeps crashing, the regression is localized to
the -90→-124 range. (Result will be added as a follow-up comment.)
Note: 6.8.0-124.124 is the newest generic kernel currently published for
Noble, so there is no forward kernel to test against — rollback to -90
is the only available containment.
---
## Reproduction conditions
Not yet reduced to a minimal reproducer, but reliably reproduced in
production by:
- High logical-CPU-count host (64) with high process density (~90 LXD
containers)
- Sustained concurrent `/proc` traversal (host monitoring running `ps`/stat
loops) **plus** continuous process churn (per-container php-fpm/nginx
fork+exit) **plus** filesystem tree walks (`tar` backups)
- i.e. heavy concurrent `__d_alloc` (lookup) and
`__dentry_kill`/`proc_flush_pid` (exit + invalidate) against the shared dentry
cache
Mean time to corruption: ~2–6 days of normal production load.
---
## Artifacts available on request
- Full kdump vmcore (`/var/crash/...`, ~17 GB, PARTIAL DUMP via makedumpfile)
captured against `linux-image-unsigned-6.8.0-124-generic-dbgsym` 6.8.0-124.124
(matching build-id)
- `crash` session output: `bt`, `bt -a` (all 64 CPUs), `kmem -s`, `kmem -S
dentry`, `struct dentry` of the corrupted object, `log`
- Five prior pstore dmesg captures from the same host showing the recurring
signature
- apport-collected host/config data (will attach via `ubuntu-bug linux`)
## Planned follow-up
Host A is being rebooted with `slub_debug=FZP` to catch the corrupting
write **at the bad free** (red-zone/poison validation), which should
name the exact freeing path. That trace will be attached as a follow-up
comment once the next event is captured.
Full 17 GB kdump vmcore (PARTIAL DUMP, makedumpfile) retained on the
affected host, captured against linux-image-unsigned-6.8.0-124-generic-
dbgsym 6.8.0-124.124 (matching build-id). Available to the assigned
engineer on request
ProblemType: Bug
DistroRelease: Ubuntu 24.04
Package: linux-image-6.8.0-124-generic 6.8.0-124.124
ProcVersionSignature: Ubuntu 6.8.0-124.124-generic 6.8.12
Uname: Linux 6.8.0-124-generic x86_64
NonfreeKernelModules: zfs
AlsaDevices:
total 0
crw-rw---- 1 root audio 116, 1 Jun 21 19:29 seq
crw-rw---- 1 root audio 116, 33 Jun 21 19:29 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.28.1-0ubuntu3.8
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq',
'/dev/snd/timer'] failed with exit code 1:
CRDA: N/A
CasperMD5CheckResult: pass
Date: Sun Jun 21 20:43:01 2026
InstallationDate: Installed on 2025-12-01 (202 days ago)
InstallationMedia: Ubuntu-Server 24.04.3 LTS "Noble Numbat" - Release amd64
(20250805.1)
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
Lsusb:
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 001 Device 002: ID 1d6b:0107 Linux Foundation USB Virtual Hub
Bus 001 Device 003: ID 0557:9241 ATEN International Co., Ltd SMCI HID KM
Bus 001 Device 004: ID 0b1f:03ee Insyde Software Corp. RNDIS/Ethernet Gadget
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
MachineType: Supermicro SYS-611C-TN4R
PciMultimedia:
ProcEnviron:
LANG=en_US.UTF-8
PATH=(custom, no user)
SHELL=/bin/bash
TERM=xterm
XDG_RUNTIME_DIR=<set>
ProcFB: 0 astdrmfb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-6.8.0-124-generic
root=UUID=3e867032-21c4-416e-b45f-a17d1dae6788 ro
crashkernel=2G-4G:320M,4G-32G:512M,32G-64G:1024M,64G-128G:2048M,128G-:4096M
panic_on_oops=1 panic=10
RelatedPackageVersions:
linux-restricted-modules-6.8.0-124-generic N/A
linux-backports-modules-6.8.0-124-generic N/A
linux-firmware 20240318.git3b128b60-0ubuntu2.26
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 07/23/2025
dmi.bios.release: 5.32
dmi.bios.vendor: American Megatrends International, LLC.
dmi.bios.version: 2.7
dmi.board.asset.tag: Base Board Asset Tag
dmi.board.name: X13DDW-A
dmi.board.vendor: Supermicro
dmi.board.version: 1.01
dmi.chassis.asset.tag: Chassis Asset Tag
dmi.chassis.type: 1
dmi.chassis.vendor: Supermicro
dmi.chassis.version: 0123456789
dmi.modalias:
dmi:bvnAmericanMegatrendsInternational,LLC.:bvr2.7:bd07/23/2025:br5.32:svnSupermicro:pnSYS-611C-TN4R:pvr0123456789:rvnSupermicro:rnX13DDW-A:rvr1.01:cvnSupermicro:ct1:cvr0123456789:skuTobefilledbyO.E.M.:
dmi.product.family: Family
dmi.product.name: SYS-611C-TN4R
dmi.product.sku: To be filled by O.E.M.
dmi.product.version: 0123456789
dmi.sys.vendor: Supermicro
** Affects: linux (Ubuntu)
Importance: Undecided
Status: New
** Tags: amd64 apport-bug noble
** Attachment added: "vps232-panic-dmesg.txt.gz"
https://bugs.launchpad.net/bugs/2157755/+attachment/5978295/+files/vps232-panic-dmesg.txt.gz
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2157755
Title:
[linux 6.8.0-124-generic] Dentry-cache slab use-after-free under
concurrent /proc lookup + process exit on high-density LXD hosts
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2157755/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs