On Wed, 10 May 2023 22:55:31 +0200
Alexander Bluhm <[email protected]> wrote:
> ddb{1}> show panic
> *cpu1: kernel diagnostic assertion "pm == pted->pted_pmap" failed: file
> "/usr/src/sys/arch/powerpc64/powerpc64/pmap.c", line 865
I have seen other panics in powerpc64/pmap.c, but not this one.
powerpc64-1.ports.openbsd.org, with 16 cores, panics about every
20 hours while building packages, with either 'assertion "slbd"
failed' (pmap_vp_enter) or 'unable to allocate L2' (pmap_vp_enter) or
'failed to allocate pted' (pmap_enter). These are all failures to
allocate from a pool when uvm_km_pages.free == 0.
When a panic happens, ddb disables the locks in the kernel, then
other cpus might bypass locks and panic. The multiple panics can mess
the console, but "show panic" in ddb reveals them. I know from
"show struct uvm_km_pages uvm_km_pages" that my most recent panics
have uvm_km_pages.free == 0. For example,
ddb{10}> show panic
cpu1: pool_do_get: pted free list modified: page 0xc00000003d544000; item addr
0xc00000003d5442e0; offset 0x0=0xc000000135a0fd20 != 0xaecb1f31ca612a68
*cpu2: kernel diagnostic assertion "slbd" failed: file "/sys/arch/powerpc64/pow
erpc64/pmap.c", line 608
cpu5: kernel diagnostic assertion "((flags & PGO_LOCKED) != 0 && rw_lock_held(
uobj->vmobjlock)) || (flags & PGO_LOCKED) == 0" failed: file "/sys/uvm/uvm_vnod
e.c", line 953
cpu7: pool_do_get: pted free list modified: page 0xc00000003d544000; item addr
0xc00000003d544740; offset 0x0=0x0 != 0xaecb1f31ca612fc8
cpu10: uvm_mapent_clone: no space in map for entry in empty map
cpu11: pool_do_get: pted free list modified: page 0xc00000003d544000; item add
r 0xc00000003d544158; offset 0x0=0x0 != 0xaecb1f31ca6129d0
cpu12: pool_do_get: pted free list modified: page 0xc00000003d544000; item add
r 0xc00000003d544158; offset 0x0=0x0 != 0xaecb1f31ca6129d0
ddb{2}> show struct uvm_km_pages uvm_km_pages
struct uvm_km_pages at 0xfb61c0 (65592 bytes) {mtx = {mtx_owner =
(volatile void *)0x0, mtx_wantipl = 0x7, mtx_oldipl = 0x7}, lowat =
0x200, hiwat = 0x2000, free = 0x0, page = 13835058060324941824,
freelist = (struct uvm_km_free_page *)0x0, freelistlen = 0x0, km_proc
= (struct proc *)0xc00000011426f340}
The cpus can bypass locks before they enter ddb, because they can see
events in a different order. The 1st cpu sends IPI_DDB to the others,
before it does db_active++ (in powerpc64/db_interface.c db_ktrap).
The others can see db_active++ (which disables the locks) before they
receive the IPI.
Bluhm, your "show panic" has only 1 panic, the 'assertion "pm ==
pted->pted_pmap" failed' on cpu1. This suggests that cpu1 wasn't
bypassing locks (because no other cpu had db_active++ before cpu1),
but if cpu1 didn't bypass the vp lock nor the pool locks, then I don't
know how 'pm == pted->pted_pmap' can fail.
Your cpu1 was in mprotect(2), cpu0 was handling an instruction page
fault (0x400), cpu2 was handling a data page fault (0x300), and we
can't see cpu3. I guess that db_active++ on cpu1 (after the panic)
caused a problem on cpu3, but I'm not sure.
This panic might be difficult to reproduce. Your powerpc64 has
only 4 cores; my 4-core powerpc64 at home built a release but didn't
panic; the 16-core powerpc64-1.ports gets different panics.
--George