On 11/24/25 14:22, Kevin Brodsky wrote:
Despite recent efforts to prevent lazy_mmu sections from nesting, it
remains difficult to ensure that it never occurs - and in fact it
does occur on arm64 in certain situations (CONFIG_DEBUG_PAGEALLOC).
Commit 1ef3095b1405 ("arm64/mm: Permit lazy_mmu_mode to be nested")
made nesting tolerable on arm64, but without truly supporting it:
the inner call to leave() disables the batching optimisation before
the outer section ends.

This patch actually enables lazy_mmu sections to nest by tracking
the nesting level in task_struct, in a similar fashion to e.g.
pagefault_{enable,disable}(). This is fully handled by the generic
lazy_mmu helpers that were recently introduced.

lazy_mmu sections were not initially intended to nest, so we need to
clarify the semantics w.r.t. the arch_*_lazy_mmu_mode() callbacks.
This patch takes the following approach:

* The outermost calls to lazy_mmu_mode_{enable,disable}() trigger
   calls to arch_{enter,leave}_lazy_mmu_mode() - this is unchanged.

* Nested calls to lazy_mmu_mode_{enable,disable}() are not forwarded
   to the arch via arch_{enter,leave} - lazy MMU remains enabled so
   the assumption is that these callbacks are not relevant. However,
   existing code may rely on a call to disable() to flush any batched
   state, regardless of nesting. arch_flush_lazy_mmu_mode() is
   therefore called in that situation.

A separate interface was recently introduced to temporarily pause
the lazy MMU mode: lazy_mmu_mode_{pause,resume}(). pause() fully
exits the mode *regardless of the nesting level*, and resume()
restores the mode at the same nesting level.

pause()/resume() are themselves allowed to nest, so we actually
store two nesting levels in task_struct: enable_count and
pause_count. A new helper in_lazy_mmu_mode() is introduced to
determine whether we are currently in lazy MMU mode; this will be
used in subsequent patches to replace the various ways arch's
currently track whether the mode is enabled.

In summary (enable/pause represent the values *after* the call):

lazy_mmu_mode_enable()          -> arch_enter()          enable=1 pause=0
     lazy_mmu_mode_enable()     -> ø             enable=2 pause=0
        lazy_mmu_mode_pause()   -> arch_leave()     enable=2 pause=1
        lazy_mmu_mode_resume()  -> arch_enter()     enable=2 pause=0
     lazy_mmu_mode_disable()    -> arch_flush()     enable=1 pause=0
lazy_mmu_mode_disable()         -> arch_leave()     enable=0 pause=0

Note: in_lazy_mmu_mode() is added to <linux/sched.h> to allow arch
headers included by <linux/pgtable.h> to use it.

Signed-off-by: Kevin Brodsky <[email protected]>

Nothing jumped at me, so

Acked-by: David Hildenbrand (Red Hat) <[email protected]>

Hoping we can get some more eyes to have a look.

--
Cheers

David

Reply via email to