Hi Harry,

On Tue, Mar 31, 2026 at 04:03:13PM +0900, Harry Yoo (Oracle) wrote:
> On Mon, Mar 30, 2026 at 01:11:03PM +0300, Mike Rapoport wrote:
> > From: "Mike Rapoport (Microsoft)" <[email protected]>
> > 
> > mfill_atomic() passes a lot of parameters down to its callees.
> > 
> > Aggregate them all into mfill_state structure and pass this structure to
> > functions that implement various UFFDIO_ commands.
> > 
> > Tracking the state in a structure will allow moving the code that retries
> > copying of data for UFFDIO_COPY into mfill_atomic_pte_copy() and make the
> > loop in mfill_atomic() identical for all UFFDIO operations on PTE-mapped
> > memory.
> > 
> > The mfill_state definition is deliberately local to mm/userfaultfd.c,
> > hence shmem_mfill_atomic_pte() is not updated.
> > 
> > [[email protected]: properly initialize mfill_state.len to fix
> >                        folio_add_new_anon_rmap() WARN]
> > Link: https://lkml.kernel.org/r/abehBY7QakYF9bK4@hyeyoo
> > Signed-off-by: Mike Rapoport (Microsoft) <[email protected]>
> > Signed-off-by: Harry Yoo <[email protected]>
> > Acked-by: David Hildenbrand (Arm) <[email protected]>
> > ---
> >  mm/userfaultfd.c | 148 ++++++++++++++++++++++++++---------------------
> >  1 file changed, 82 insertions(+), 66 deletions(-)
> > 
> > @@ -790,12 +804,14 @@ static __always_inline ssize_t mfill_atomic(struct 
> > userfaultfd_ctx *ctx,
> >         uffd_flags_mode_is(flags, MFILL_ATOMIC_CONTINUE))
> >             goto out_unlock;
> >  
> > -   while (src_addr < src_start + len) {
> > -           pmd_t dst_pmdval;
> > +   state.vma = dst_vma;
> 
> Oh wait, the lock leak was introduced in patch 2.

Lock leak was introduced in patch 4 that moved getting the vma.
Patch 2 missed the assignment of state.len and introduced an issue with
bound checks.

> If there's an error between uffd_mfill_lock() and `state.vma = dst_vma`,
> it remains unlocked.
> 
> Probably should have been fixed in 2, not patch 4...
> Sorry didn't realize it earlier.
> 
> > -           VM_WARN_ON_ONCE(dst_addr >= dst_start + len);
> > +   while (state.src_addr < src_start + len) {
> > +           VM_WARN_ON_ONCE(state.dst_addr >= dst_start + len);
> > +
> > +           pmd_t dst_pmdval;
> >  
> > -           dst_pmd = mm_alloc_pmd(dst_mm, dst_addr);
> > +           dst_pmd = mm_alloc_pmd(dst_mm, state.dst_addr);
> >             if (unlikely(!dst_pmd)) {
> >                     err = -ENOMEM;
> >                     break;
> > @@ -866,10 +882,10 @@ static __always_inline ssize_t mfill_atomic(struct 
> > userfaultfd_ctx *ctx,
> >  
> >  out_unlock:
> >     up_read(&ctx->map_changing_lock);
> > -   uffd_mfill_unlock(dst_vma);
> > +   uffd_mfill_unlock(state.vma);
> >  out:
> > -   if (folio)
> > -           folio_put(folio);
> > +   if (state.folio)
> > +           folio_put(state.folio);
> 
> Sashiko raised a concern [2] that it the VMA might be unmapped and
> a new mapping created as a uffd hugetlb vma and leak the folio by
> going through
> 
> `if (is_vm_hugetlb_page(dst_vma))
>         return mfill_atomic_hugetlb(ctx, dst_vma, dst_start,
>                                     src_start, len, flags);`
> 
> but it appears to be a false positive (to me) because
> 
> `if (atomic_read(&ctx->mmap_changing))` check should have detected unmapping
> and free the folio?

I think it's real, and it's there more or less from the beginning, although
nobody hit it yet :)

Before retrying the copy we drop all the locks, so if the copy is really
long the old mapping can be wiped and a new mapping can be created instead.

There's already a v4 of a patch that attempts to solve this:

https://lore.kernel.org/all/[email protected]
 
> [2] 
> https://sashiko.dev/#/patchset/20260330101116.1117699-1-rppt%40kernel.org?patch=13671
> 
> >     VM_WARN_ON_ONCE(copied < 0);
> >     VM_WARN_ON_ONCE(err > 0);
> >     VM_WARN_ON_ONCE(!copied && !err);
> 
> Otherwise looks correct to me.
> 
> -- 
> Cheers,
> Harry / Hyeonggon

-- 
Sincerely yours,
Mike.

Reply via email to