On 04.06.25 16:19, Lorenzo Stoakes wrote:
The walk_page_range_novma() function is rather confusing - it supports two
modes, one used often, the other used only for debugging.
The first mode is the common case of traversal of kernel page tables, which
is what nearly all callers use this for.
Secondly it provides an unusual debugging interface that allows for the
traversal of page tables in a userland range of memory even for that memory
which is not described by a VMA.
It is far from certain that such page tables should even exist, but perhaps
this is precisely why it is useful as a debugging mechanism.
As a result, this is utilised by ptdump only. Historically, things were
reversed - ptdump was the only user, and other parts of the kernel evolved
to use the kernel page table walking here.
Since we have some complicated and confusing locking rules for the novma
case, it makes sense to separate the two usages into their own functions.
Doing this also provide self-documentation as to the intent of the caller -
are they doing something rather unusual or are they simply doing a standard
kernel page table walk?
We therefore establish two separate functions - walk_page_range_debug() for
this single usage, and walk_kernel_page_table_range() for general kernel
page table walking.
We additionally make walk_page_range_debug() internal to mm.
Note that ptdump uses the precise same function for kernel walking as a
convenience, so we permit this but make it very explicit by having
walk_page_range_novma() invoke walk_kernel_page_table_range() in this case.
Signed-off-by: Lorenzo Stoakes <[email protected]>
Acked-by: Mike Rapoport (Microsoft) <[email protected]>
---
[...]
bool try_get_and_clear_pmd(struct mm_struct *mm, pmd_t *pmd, pmd_t *pmdval);
diff --git a/mm/pagewalk.c b/mm/pagewalk.c
index e478777c86e1..057a125c3bc0 100644
--- a/mm/pagewalk.c
+++ b/mm/pagewalk.c
@@ -584,9 +584,28 @@ int walk_page_range(struct mm_struct *mm, unsigned long
start,
return walk_page_range_mm(mm, start, end, ops, private);
}
+static int __walk_page_range_novma(struct mm_struct *mm, unsigned long start,
+ unsigned long end, const struct mm_walk_ops *ops,
+ pgd_t *pgd, void *private)
+{
+ struct mm_walk walk = {
+ .ops = ops,
+ .mm = mm,
+ .pgd = pgd,
+ .private = private,
+ .no_vma = true
+ };
+
+ if (start >= end || !walk.mm)
+ return -EINVAL;
+ if (!check_ops_valid(ops))
+ return -EINVAL;
I'm wondering if that could be moved into walk_pgd_range().
+
+ return walk_pgd_range(start, end, &walk);
+}
+
I would inline that into both functions and finally get rid of that
"novma" ... beauty of a function.
Well, we still have the "no_vma" parameter, but that's a different thing.
E.g.,, there is no need to check for walk.mm in the
walk_kernel_page_table_range() case.
--
Cheers,
David / dhildenb