On 3/16/26 22:40, Konstantin Khorenko wrote: > tcache_attach_page() inserts a page into the per-node radix tree under > tree_lock, then releases the lock and calls tcache_lru_add(). Between > releasing tree_lock and completing tcache_lru_add(), the page is visible > in the radix tree but not yet on the tcache LRU. > > During this window a concurrent tcache_detach_page() on another CPU can: > 1. Find the page via radix_tree_lookup (RCU) > 2. page_cache_get_speculative(page): refcount 1 -> 2 > 3. page_ref_freeze(page, 2): refcount 2 -> 0 > 4. Remove the page from the radix tree > 5. tcache_lru_del(): page not on LRU yet, skipped > 6. tcache_put_page() -> free_hot_cold_page(): page freed to PCP list > > Now page->lru links into a PCP free list. When the original CPU then > executes tcache_lru_add() -> list_add_tail(&page->lru, &pni->lru), it > overwrites page->lru destroying the PCP list linkage. Subsequent PCP > allocations follow the stale pointer and hit a poisoned or cross-linked > lru, causing "list_del corruption" warnings and eventually a hard lockup > when free_pcppages_bulk() holds zone->lock forever. > > Fix by taking an extra page reference before releasing tree_lock. This > makes page_ref_freeze(page, 2) fail on any concurrent detach (refcount > will be 3, not the expected 2), forcing the detach to retry after the > page is fully set up (in tree AND on LRU). The extra reference is > dropped after tcache_lru_add() completes. > > Note: moving tcache_lru_add() inside the tree_lock critical section would > cause a lock ordering inversion (tree_lock -> pni->lock vs the shrinker's > pni->lock -> tree_lock path), so the extra-reference approach is used. > > https: //virtuozzo.atlassian.net/browse/PSBM-161840
nit: ^ excess space > Reviewed-by: Pavel Tikhomirov <[email protected]> > Signed-off-by: Konstantin Khorenko <[email protected]> > --- > mm/tcache.c | 19 ++++++++++++++++++- > 1 file changed, 18 insertions(+), 1 deletion(-) > > diff --git a/mm/tcache.c b/mm/tcache.c > index e8ba7ee26cbef..f95b5ed6cb0bc 100644 > --- a/mm/tcache.c > +++ b/mm/tcache.c > @@ -810,9 +810,26 @@ tcache_attach_page(struct tcache_node *node, pgoff_t > index, struct page *page) > */ > spin_lock_irqsave(&node->tree_lock, flags); > err = tcache_page_tree_insert(node, index, page); > + if (!err) { > + /* > + * Take an extra reference while the page is visible in > + * the tree but not yet on the LRU. Without this, > + * a concurrent tcache_detach_page() on another CPU can > + * find the page via radix_tree_lookup, succeed with > + * page_ref_freeze(page, 2) and free the page to PCP. > + * When we then call tcache_lru_add() below, we overwrite > + * page->lru which now links into a PCP free list, > + * corrupting that list. The extra reference makes the > + * freeze fail (refcount will be 3, not 2), so the > + * concurrent detach retries after we finish setup. > + */ > + get_page(page); > + } > spin_unlock(&node->tree_lock); > - if (!err) > + if (!err) { > tcache_lru_add(node->pool, page); > + put_page(page); > + } > local_irq_restore(flags); /* Implies rcu_read_lock_sched() */ > return err; > } -- Best regards, Pavel Tikhomirov Senior Software Developer, Virtuozzo. _______________________________________________ Devel mailing list [email protected] https://lists.openvz.org/mailman/listinfo/devel
