On Fri, Nov 21, 2025 at 08:12:55AM +0100, Michal Meloun wrote: > I have confirmed that jmalloc assertions are caused by mmap() failure. It > can return non-zeroed page(s) for mmap(MAP_ANON), which is clearly a bug. > > I have confirmed this on native ARMv7, and according to Mark, it is also > reproducible on ARM32 and i386 jails. I think I saw it also on a > memory-constrained (4 GB) aarch64, but I cannot reproduce it yet. > > Have somebody idea how to identify vm faults associated with anon mmap to > trigger detection of this failure in kernel? Or any other hint?
I think It would be much more visible if freshly allocated anonymous pages are corrupted. A similar mechanism to get zeroed pages is used to get fresh page table pages, and corruption there must cause a lot of kernel page faults with 'invalid PTE bit' hw reports. But of course everything is possible. VM has an optimization where we track known-to-be-zeroed free page separately, by marking them with PG_ZERO flag. If allocation needs a zeroed page and the flag is set, we skip calling pmap_zero_page() on it. Also, in vm_page_free_prep() when we are told that the page is zeroed, with DIAGNOSTIC enabled, on amd64 and arm64, we do check for that. So lets add slow check for vm_fault code that supposedly zeroed page is indeed zeroed. Can you try to catch the issue with the patch applied, and DIAGNOSTIC enabled? Patch is arch-agnostic and I believe should work on armv7, although obviously causing slowdown. commit 1a9e20dc8f7faadeb839ea6a04c83a4bf2652925 Author: Konstantin Belousov <[email protected]> Date: Fri Nov 21 10:34:51 2025 +0200 vm_fault: under DIAGNOSTIC, verify that PG_ZERO page is indeed zeroed diff --git a/sys/vm/vm_fault.c b/sys/vm/vm_fault.c index 2e150b368d71..32bec33502fb 100644 --- a/sys/vm/vm_fault.c +++ b/sys/vm/vm_fault.c @@ -85,6 +85,8 @@ #include <sys/refcount.h> #include <sys/resourcevar.h> #include <sys/rwlock.h> +#include <sys/sched.h> +#include <sys/sf_buf.h> #include <sys/signalvar.h> #include <sys/sysctl.h> #include <sys/sysent.h> @@ -1220,6 +1222,20 @@ vm_fault_zerofill(struct faultstate *fs) if ((fs->m->flags & PG_ZERO) == 0) { pmap_zero_page(fs->m); } else { +#ifdef DIAGNOSTIC + struct sf_buf *sf; + unsigned long *p; + int i; + + sched_pin(); + sf = sf_buf_alloc(fs->m, SFB_CPUPRIVATE); + p = (unsigned long *)sf_buf_kva(sf); + for (i = 0; i < PAGE_SIZE / sizeof(*p); i++, p++) + KASSERT(*p == 0, ("zerocheck failed page %p PG_ZERO %d %jx", + fs->m, i, (uintmax_t)*p)); + sf_buf_free(sf); + sched_unpin(); +#endif VM_CNT_INC(v_ozfod); } VM_CNT_INC(v_zfod);
