On Fri, Apr 17, 2026 at 10:44:19PM -0400, Zi Yan wrote:
>This check ensures the correctness of collapse read-only THPs for FSes
>after READ_ONLY_THP_FOR_FS is enabled by default for all FSes supporting
>PMD THP pagecache.
>
>READ_ONLY_THP_FOR_FS only supports read-only fd and uses mapping->nr_thps
>and inode->i_writecount to prevent any write to read-only to-be-collapsed
>folios. In upcoming commits, READ_ONLY_THP_FOR_FS will be removed and the
>aforementioned mechanism will go away too. To ensure khugepaged functions
>as expected after the changes, skip if any folio is dirty after
>try_to_unmap(), since a dirty folio means this read-only folio
>got some writes via mmap can happen between try_to_unmap() and
>try_to_unmap_flush() via cached TLB entries and khugepaged does not support
>writable pagecache folio collapse yet.
>
>Signed-off-by: Zi Yan <[email protected]>
>---
> mm/khugepaged.c | 25 +++++++++++++++++++++----
> 1 file changed, 21 insertions(+), 4 deletions(-)
>
>diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>index 3eb5d982d3d3..1c0fdc81d276 100644
>--- a/mm/khugepaged.c
>+++ b/mm/khugepaged.c
>@@ -1979,8 +1979,7 @@ static enum scan_result collapse_file(struct mm_struct 
>*mm, unsigned long addr,
>                               }
>                       } else if (folio_test_dirty(folio)) {
>                               /*
>-                               * khugepaged only works on read-only fd,
>-                               * so this page is dirty because it hasn't
>+                               * This page is dirty because it hasn't
>                                * been flushed since first write. There
>                                * won't be new dirty pages.
>                                *
>@@ -2038,8 +2037,8 @@ static enum scan_result collapse_file(struct mm_struct 
>*mm, unsigned long addr,
>               if (!is_shmem && (folio_test_dirty(folio) ||
>                                 folio_test_writeback(folio))) {
>                       /*
>-                       * khugepaged only works on read-only fd, so this
>-                       * folio is dirty because it hasn't been flushed
>+                       * khugepaged only works on clean file-backed folios,
>+                       * so this folio is dirty because it hasn't been flushed
>                        * since first write.
>                        */
>                       result = SCAN_PAGE_DIRTY_OR_WRITEBACK;
>@@ -2083,6 +2082,24 @@ static enum scan_result collapse_file(struct mm_struct 
>*mm, unsigned long addr,
>                       goto out_unlock;
>               }
> 
>+              /*
>+               * At this point, the folio is locked, unmapped. Make sure the
>+               * folio is clean, so that no one else is able to write to it,
>+               * since that would require taking the folio lock first.
>+               * Otherwise that means the folio was pointed by a dirty PTE and
>+               * some CPU might have a valid TLB entry with dirty bit set
>+               * still pointing to this folio and writes can happen without
>+               * causing a page table walk and folio lock acquisition before
>+               * the try_to_unmap_flush() below is done. After the collapse,
>+               * file-backed folio is not set as dirty and can be discarded
>+               * before any new write marks the folio dirty, causing data
>+               * corruption.
>+               */
>+              if (!is_shmem && folio_test_dirty(folio)) {
>+                      result = SCAN_PAGE_DIRTY_OR_WRITEBACK;
>+                      goto out_unlock;

Looks buggy :)

This runs after folio_isolate_lru() and after xas_lock_irq(&xas) ...

If not missing something, "goto out_unlock" would leave the xarray lock
held and the folio off the LRU :)

Note that the block right above does call xas_unlock_irq(&xas), and it
also does call folio_putback_lru(folio):

---8<---
                if (folio_ref_count(folio) != 2 + folio_nr_pages(folio)) {
                        result = SCAN_PAGE_COUNT;
                        xas_unlock_irq(&xas);     <-
                        folio_putback_lru(folio); <-
                        goto out_unlock;
                }
---

So we should follow the same cleanup as that block here, right?

Reply via email to