On 23 Apr 2026, at 4:30, Lance Yang wrote:

> On Fri, Apr 17, 2026 at 10:44:19PM -0400, Zi Yan wrote:
>> This check ensures the correctness of collapse read-only THPs for FSes
>> after READ_ONLY_THP_FOR_FS is enabled by default for all FSes supporting
>> PMD THP pagecache.
>>
>> READ_ONLY_THP_FOR_FS only supports read-only fd and uses mapping->nr_thps
>> and inode->i_writecount to prevent any write to read-only to-be-collapsed
>> folios. In upcoming commits, READ_ONLY_THP_FOR_FS will be removed and the
>> aforementioned mechanism will go away too. To ensure khugepaged functions
>> as expected after the changes, skip if any folio is dirty after
>> try_to_unmap(), since a dirty folio means this read-only folio
>> got some writes via mmap can happen between try_to_unmap() and
>> try_to_unmap_flush() via cached TLB entries and khugepaged does not support
>> writable pagecache folio collapse yet.
>>
>> Signed-off-by: Zi Yan <[email protected]>
>> ---
>> mm/khugepaged.c | 25 +++++++++++++++++++++----
>> 1 file changed, 21 insertions(+), 4 deletions(-)
>>
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> index 3eb5d982d3d3..1c0fdc81d276 100644
>> --- a/mm/khugepaged.c
>> +++ b/mm/khugepaged.c
>> @@ -1979,8 +1979,7 @@ static enum scan_result collapse_file(struct mm_struct 
>> *mm, unsigned long addr,
>>                              }
>>                      } else if (folio_test_dirty(folio)) {
>>                              /*
>> -                             * khugepaged only works on read-only fd,
>> -                             * so this page is dirty because it hasn't
>> +                             * This page is dirty because it hasn't
>>                               * been flushed since first write. There
>>                               * won't be new dirty pages.
>>                               *
>> @@ -2038,8 +2037,8 @@ static enum scan_result collapse_file(struct mm_struct 
>> *mm, unsigned long addr,
>>              if (!is_shmem && (folio_test_dirty(folio) ||
>>                                folio_test_writeback(folio))) {
>>                      /*
>> -                     * khugepaged only works on read-only fd, so this
>> -                     * folio is dirty because it hasn't been flushed
>> +                     * khugepaged only works on clean file-backed folios,
>> +                     * so this folio is dirty because it hasn't been flushed
>>                       * since first write.
>>                       */
>>                      result = SCAN_PAGE_DIRTY_OR_WRITEBACK;
>> @@ -2083,6 +2082,24 @@ static enum scan_result collapse_file(struct 
>> mm_struct *mm, unsigned long addr,
>>                      goto out_unlock;
>>              }
>>
>> +            /*
>> +             * At this point, the folio is locked, unmapped. Make sure the
>> +             * folio is clean, so that no one else is able to write to it,
>> +             * since that would require taking the folio lock first.
>> +             * Otherwise that means the folio was pointed by a dirty PTE and
>> +             * some CPU might have a valid TLB entry with dirty bit set
>> +             * still pointing to this folio and writes can happen without
>> +             * causing a page table walk and folio lock acquisition before
>> +             * the try_to_unmap_flush() below is done. After the collapse,
>> +             * file-backed folio is not set as dirty and can be discarded
>> +             * before any new write marks the folio dirty, causing data
>> +             * corruption.
>> +             */
>> +            if (!is_shmem && folio_test_dirty(folio)) {
>> +                    result = SCAN_PAGE_DIRTY_OR_WRITEBACK;
>> +                    goto out_unlock;
>
> Looks buggy :)
>
> This runs after folio_isolate_lru() and after xas_lock_irq(&xas) ...
>
> If not missing something, "goto out_unlock" would leave the xarray lock
> held and the folio off the LRU :)
>
> Note that the block right above does call xas_unlock_irq(&xas), and it
> also does call folio_putback_lru(folio):
>
> ---8<---
>               if (folio_ref_count(folio) != 2 + folio_nr_pages(folio)) {
>                       result = SCAN_PAGE_COUNT;
>                       xas_unlock_irq(&xas);     <-
>                       folio_putback_lru(folio); <-
>                       goto out_unlock;
>               }
> ---
>
> So we should follow the same cleanup as that block here, right?

You are right. Will fix it. Thank you.

Best Regards,
Yan, Zi

Reply via email to