On Wed, Mar 18, 2026 at 09:17:41AM +0100, David Hildenbrand (Arm) wrote:
> On 3/17/26 16:08, Audra Mitchell wrote:
> > Sorry! I missed this email so never responded!
> >
> > On Tue, Feb 24, 2026 at 05:15:14PM +0100, David Hildenbrand (Arm) wrote:
> >> On 2/18/26 19:42, Audra Mitchell wrote:
> >>> On architectures with separate user address space, such as s390 or
> >>> those without an MMU, the call to __access_ok will return true.
> >>
> >> Where is this __access_ok() you mention here? Somewhere in
> >> fs/proc/task_mmu.c?
> >>
> >> Where in the soft-dirty test is that triggered?
> >>
> >> I'm wondering whether the soft-dirty test should be adjusted, but I did
> >> not yet understand from where this behavior is triggered.
> >
> > The problem arises when we are checking to see what features/categories are
> > supported. The call chain for the soft-dirty program goes:
> >
> > main()
> > ->test_simple()
> > ->pagemap_is_softdirty()
> > ->page_entry_is()
> > ->pagemap_scan_supported()
> > ->__pagemap_scan_get_categories()
> > ->ioctl()
> >
> > We enter the kernel with an ioctl, expecting to have an EFAULT returned (see
> > the comment from pagemap_scan_get_categories():
> >
> > /* Provide an invalid address in order to trigger EFAULT. */
> > ret = __pagemap_scan_get_categories(fd, start, (struct page_region
> > *) ~0UL);
> >
> > Once we enter the kernel, we will check the arguments passed which includes
> > the
> > call to access_ok:
> >
> > do_pagemap_cmd()
> > ->do_pagemap_scan()
> > ->pagemap_scan_get_args()
> > ->access_ok()
> >
> > Here is the path within pagemap_scan_get_args where we expect to fail return
> > the EFAULT:
> >
> > if (arg->vec && !access_ok((void __user *)(long)arg->vec,
> > size_mul(arg->vec_len, sizeof(struct
> > page_region))))
> > return -EFAULT;
> >
> > However, if CONFIG_ALTERNATE_USER_ADDRESS_SPACE is enabled or if CONFIG_MMU
> > is
> > NOT enabled, then we just return true:
> >
> > if (IS_ENABLED(CONFIG_ALTERNATE_USER_ADDRESS_SPACE) ||
> > !IS_ENABLED(CONFIG_MMU))
> > return true;
> >
> > The intent appears to be just getting the categories available to us and
> > verifying that we have the feature available for testing. However, this
> > corner
> > case means the soft-dirty test will fail with the following:
> >
>
> Thanks for the information, we should clarify that in the patch description.
>
> > # --------------------
> > # running ./soft-dirty
> > # --------------------
> > # TAP version 13
> > # 1..15
> > # Bail out! PAGEMAP_SCAN succeeded unexpectedly
> > # # Totals: pass:0 fail:0 xfail:0 xpass:0 skip:0 error:0
> > # [FAIL]
> > not ok 1 soft-dirty # exit=1
> > # SUMMARY: PASS=0 SKIP=0 FAIL=1
> > 1..1
> >
> > Since the intent is just to validate that the features are available to us
> > for
> > testing, I think we can just modify the check so that we don't fail if we
> > return 0.
> >
> > Let me know what you think, or if you have more questions!
>
> What about simply testing for success on a test area, wouldn't that be more
> reliable
> and clearer?
>
> diff --git a/tools/testing/selftests/mm/vm_util.c
> b/tools/testing/selftests/mm/vm_util.c
> index a6d4ff7dfdc0..489a8d4d915d 100644
> --- a/tools/testing/selftests/mm/vm_util.c
> +++ b/tools/testing/selftests/mm/vm_util.c
> @@ -67,21 +67,26 @@ static uint64_t pagemap_scan_get_categories(int fd, char
> *start)
> }
>
> /* `start` is any valid address. */
> -static bool pagemap_scan_supported(int fd, char *start)
> +static bool pagemap_scan_supported(int fd)
> {
> + const size_t pagesize = getpagesize();
> static int supported = -1;
> - int ret;
> + struct page_region r;
> + void *test_area;
>
> if (supported != -1)
> return supported;
>
> - /* Provide an invalid address in order to trigger EFAULT. */
> - ret = __pagemap_scan_get_categories(fd, start, (struct page_region *)
> ~0UL);
> - if (ret == 0)
> - ksft_exit_fail_msg("PAGEMAP_SCAN succeeded unexpectedly\n");
> -
> - supported = errno == EFAULT;
> -
> + test_area = mmap(0, pagesize, PROT_READ | PROT_WRITE,
> + MAP_ANONYMOUS | MAP_PRIVATE, 0, 0);
> + if (test_area == MAP_FAILED) {
> + ksft_print_msg("WARN: mmap() failed: %s\n", strerror(errno));
> + supported = 0;
> + } else {
> + supported = __pagemap_scan_get_categories(fd, test_area, &r) >=
> 0;
> + ksft_print_msg("errno: %d\n", errno);
> + munmap(test_area, pagesize);
> + }
> return supported;
> }
>
> @@ -90,7 +95,7 @@ static bool page_entry_is(int fd, char *start, char *desc,
> {
> bool m = pagemap_get_entry(fd, start) & pagemap_flags;
>
> - if (pagemap_scan_supported(fd, start)) {
> + if (pagemap_scan_supported(fd)) {
> bool s = pagemap_scan_get_categories(fd, start) &
> pagescan_flags;
>
> if (m == s)
> --
> 2.43.0
>
>
> >
> >> Do we have a Fixes: tag?
> >
> > I always hesistate to add a Fixes tag on situations like this since this is
> > a
> > corner case that was not considered by the original author. If we need a
> > fixes tag, then it would be:
> >
> > Fixes: 600bca580579 ("selftests/mm: check that PAGEMAP_SCAN returns correct
> > categories")
>
> Yes, please add that. We nowadays also add proper Fixes tags for tests.
>
> --
> Cheers,
>
> David
Audra - to be clear this is discussion about mm process not your patch
specifically.
OK again I'm starting to think we just shouldn't support fix-patches at all any
more.
This is an example of a change being done in a fix-patch that's _really_
causing issues.
Because this has now caused mayhem in mm-unstable and the 'kinda stable-ish'
branch now won't compile any self tests.
The fix in [0] on Chris Down's test series was for too many args to this
function (the patch changing this should have been rebased on mm-unstable and
changed Chris's caller there).
But now since this patch above ^ got yanked, that 'fix' has stayed in place and
now no mm self tests compile.
And now we see [1], hilariously.
[0]:https://lore.kernel.org/linux-mm/[email protected]/
[1]:https://lore.kernel.org/linux-mm/[email protected]/
This kind of massive levels of confusion and 'I am just trying to run some self
tests on what-should-be-for-next' is just not helpful...
I think we need a for-next branch that actually consists of stuff we genuinely
mean to take (i.e. review has settled) instead of 'literally everything because
we move stuff from mm-new unconditionally'.
Anyway we should revert the fix in [0] because it's broken now.
Cheers, Lorenzo