On 7/19/19 12:29 PM, Ralph Campbell wrote: > When a ZONE_DEVICE private page is freed, the page->mapping field can be > set. If this page is reused as an anonymous page, the previous value can > prevent the page from being inserted into the CPU's anon rmap table. > For example, when migrating a pte_none() page to device memory: > migrate_vma(ops, vma, start, end, src, dst, private) > migrate_vma_collect() > src[] = MIGRATE_PFN_MIGRATE > migrate_vma_prepare() > /* no page to lock or isolate so OK */ > migrate_vma_unmap() > /* no page to unmap so OK */ > ops->alloc_and_copy() > /* driver allocates ZONE_DEVICE page for dst[] */ > migrate_vma_pages() > migrate_vma_insert_page() > page_add_new_anon_rmap() > __page_set_anon_rmap() > /* This check sees the page's stale mapping field */ > if (PageAnon(page)) > return > /* page->mapping is not updated */ > > The result is that the migration appears to succeed but a subsequent CPU > fault will be unable to migrate the page back to system memory or worse. > > Clear the page->mapping field when freeing the ZONE_DEVICE page so stale > pointer data doesn't affect future page use.
Reviewed-by: John Hubbard <[email protected]> thanks, -- John Hubbard NVIDIA > > Fixes: b7a523109fb5c9d2d6dd ("mm: don't clear ->mapping in hmm_devmem_free") > Cc: [email protected] > Signed-off-by: Ralph Campbell <[email protected]> > Cc: Christoph Hellwig <[email protected]> > Cc: Dan Williams <[email protected]> > Cc: Andrew Morton <[email protected]> > Cc: Jason Gunthorpe <[email protected]> > Cc: Logan Gunthorpe <[email protected]> > Cc: Ira Weiny <[email protected]> > Cc: Matthew Wilcox <[email protected]> > Cc: Mel Gorman <[email protected]> > Cc: Jan Kara <[email protected]> > Cc: "Kirill A. Shutemov" <[email protected]> > Cc: Michal Hocko <[email protected]> > Cc: Andrea Arcangeli <[email protected]> > Cc: Mike Kravetz <[email protected]> > Cc: "Jérôme Glisse" <[email protected]> > --- > kernel/memremap.c | 24 ++++++++++++++++++++++++ > 1 file changed, 24 insertions(+) > > diff --git a/kernel/memremap.c b/kernel/memremap.c > index bea6f887adad..98d04466dcde 100644 > --- a/kernel/memremap.c > +++ b/kernel/memremap.c > @@ -408,6 +408,30 @@ void __put_devmap_managed_page(struct page *page) > > mem_cgroup_uncharge(page); > > + /* > + * When a device_private page is freed, the page->mapping field > + * may still contain a (stale) mapping value. For example, the > + * lower bits of page->mapping may still identify the page as > + * an anonymous page. Ultimately, this entire field is just > + * stale and wrong, and it will cause errors if not cleared. > + * One example is: > + * > + * migrate_vma_pages() > + * migrate_vma_insert_page() > + * page_add_new_anon_rmap() > + * __page_set_anon_rmap() > + * ...checks page->mapping, via PageAnon(page) call, > + * and incorrectly concludes that the page is an > + * anonymous page. Therefore, it incorrectly, > + * silently fails to set up the new anon rmap. > + * > + * For other types of ZONE_DEVICE pages, migration is either > + * handled differently or not done at all, so there is no need > + * to clear page->mapping. > + */ > + if (is_device_private_page(page)) > + page->mapping = NULL; > + > page->pgmap->ops->page_free(page); > } else if (!count) > __put_page(page); >

