On 23 Apr 2026, at 4:30, Lance Yang wrote: > On Fri, Apr 17, 2026 at 10:44:19PM -0400, Zi Yan wrote: >> This check ensures the correctness of collapse read-only THPs for FSes >> after READ_ONLY_THP_FOR_FS is enabled by default for all FSes supporting >> PMD THP pagecache. >> >> READ_ONLY_THP_FOR_FS only supports read-only fd and uses mapping->nr_thps >> and inode->i_writecount to prevent any write to read-only to-be-collapsed >> folios. In upcoming commits, READ_ONLY_THP_FOR_FS will be removed and the >> aforementioned mechanism will go away too. To ensure khugepaged functions >> as expected after the changes, skip if any folio is dirty after >> try_to_unmap(), since a dirty folio means this read-only folio >> got some writes via mmap can happen between try_to_unmap() and >> try_to_unmap_flush() via cached TLB entries and khugepaged does not support >> writable pagecache folio collapse yet. >> >> Signed-off-by: Zi Yan <[email protected]> >> --- >> mm/khugepaged.c | 25 +++++++++++++++++++++---- >> 1 file changed, 21 insertions(+), 4 deletions(-) >> >> diff --git a/mm/khugepaged.c b/mm/khugepaged.c >> index 3eb5d982d3d3..1c0fdc81d276 100644 >> --- a/mm/khugepaged.c >> +++ b/mm/khugepaged.c >> @@ -1979,8 +1979,7 @@ static enum scan_result collapse_file(struct mm_struct >> *mm, unsigned long addr, >> } >> } else if (folio_test_dirty(folio)) { >> /* >> - * khugepaged only works on read-only fd, >> - * so this page is dirty because it hasn't >> + * This page is dirty because it hasn't >> * been flushed since first write. There >> * won't be new dirty pages. >> * >> @@ -2038,8 +2037,8 @@ static enum scan_result collapse_file(struct mm_struct >> *mm, unsigned long addr, >> if (!is_shmem && (folio_test_dirty(folio) || >> folio_test_writeback(folio))) { >> /* >> - * khugepaged only works on read-only fd, so this >> - * folio is dirty because it hasn't been flushed >> + * khugepaged only works on clean file-backed folios, >> + * so this folio is dirty because it hasn't been flushed >> * since first write. >> */ >> result = SCAN_PAGE_DIRTY_OR_WRITEBACK; >> @@ -2083,6 +2082,24 @@ static enum scan_result collapse_file(struct >> mm_struct *mm, unsigned long addr, >> goto out_unlock; >> } >> >> + /* >> + * At this point, the folio is locked, unmapped. Make sure the >> + * folio is clean, so that no one else is able to write to it, >> + * since that would require taking the folio lock first. >> + * Otherwise that means the folio was pointed by a dirty PTE and >> + * some CPU might have a valid TLB entry with dirty bit set >> + * still pointing to this folio and writes can happen without >> + * causing a page table walk and folio lock acquisition before >> + * the try_to_unmap_flush() below is done. After the collapse, >> + * file-backed folio is not set as dirty and can be discarded >> + * before any new write marks the folio dirty, causing data >> + * corruption. >> + */ >> + if (!is_shmem && folio_test_dirty(folio)) { >> + result = SCAN_PAGE_DIRTY_OR_WRITEBACK; >> + goto out_unlock; > > Looks buggy :) > > This runs after folio_isolate_lru() and after xas_lock_irq(&xas) ... > > If not missing something, "goto out_unlock" would leave the xarray lock > held and the folio off the LRU :) > > Note that the block right above does call xas_unlock_irq(&xas), and it > also does call folio_putback_lru(folio): > > ---8<--- > if (folio_ref_count(folio) != 2 + folio_nr_pages(folio)) { > result = SCAN_PAGE_COUNT; > xas_unlock_irq(&xas); <- > folio_putback_lru(folio); <- > goto out_unlock; > } > --- > > So we should follow the same cleanup as that block here, right?
You are right. Will fix it. Thank you. Best Regards, Yan, Zi

