On 9/24/25 01:56, Karim Manaouil wrote:
> On Tue, Sep 23, 2025 at 01:44:20PM +1000, Balbir Singh wrote:
>> On 9/23/25 12:23, Zi Yan wrote:
>>> On 16 Sep 2025, at 8:21, Balbir Singh wrote:
>>>
>>>> Extend migrate_vma_collect_pmd() to handle partially mapped large folios
>>>> that require splitting before migration can proceed.
>>>>
>>>> During PTE walk in the collection phase, if a large folio is only
>>>> partially mapped in the migration range, it must be split to ensure the
>>>> folio is correctly migrated.
>>>>
>>>> Signed-off-by: Balbir Singh <[email protected]>
>>>> Cc: David Hildenbrand <[email protected]>
>>>> Cc: Zi Yan <[email protected]>
>>>> Cc: Joshua Hahn <[email protected]>
>>>> Cc: Rakie Kim <[email protected]>
>>>> Cc: Byungchul Park <[email protected]>
>>>> Cc: Gregory Price <[email protected]>
>>>> Cc: Ying Huang <[email protected]>
>>>> Cc: Alistair Popple <[email protected]>
>>>> Cc: Oscar Salvador <[email protected]>
>>>> Cc: Lorenzo Stoakes <[email protected]>
>>>> Cc: Baolin Wang <[email protected]>
>>>> Cc: "Liam R. Howlett" <[email protected]>
>>>> Cc: Nico Pache <[email protected]>
>>>> Cc: Ryan Roberts <[email protected]>
>>>> Cc: Dev Jain <[email protected]>
>>>> Cc: Barry Song <[email protected]>
>>>> Cc: Lyude Paul <[email protected]>
>>>> Cc: Danilo Krummrich <[email protected]>
>>>> Cc: David Airlie <[email protected]>
>>>> Cc: Simona Vetter <[email protected]>
>>>> Cc: Ralph Campbell <[email protected]>
>>>> Cc: Mika Penttilä <[email protected]>
>>>> Cc: Matthew Brost <[email protected]>
>>>> Cc: Francois Dugast <[email protected]>
>>>> ---
>>>>  mm/migrate_device.c | 82 +++++++++++++++++++++++++++++++++++++++++++++
>>>>  1 file changed, 82 insertions(+)
>>>>
>>>> diff --git a/mm/migrate_device.c b/mm/migrate_device.c
>>>> index abd9f6850db6..70c0601f70ea 100644
>>>> --- a/mm/migrate_device.c
>>>> +++ b/mm/migrate_device.c
>>>> @@ -54,6 +54,53 @@ static int migrate_vma_collect_hole(unsigned long start,
>>>>    return 0;
>>>>  }
>>>>
>>>> +/**
>>>> + * migrate_vma_split_folio() - Helper function to split a THP folio
>>>> + * @folio: the folio to split
>>>> + * @fault_page: struct page associated with the fault if any
>>>> + *
>>>> + * Returns 0 on success
>>>> + */
>>>> +static int migrate_vma_split_folio(struct folio *folio,
>>>> +                             struct page *fault_page)
>>>> +{
>>>> +  int ret;
>>>> +  struct folio *fault_folio = fault_page ? page_folio(fault_page) : NULL;
>>>> +  struct folio *new_fault_folio = NULL;
>>>> +
>>>> +  if (folio != fault_folio) {
>>>> +          folio_get(folio);
>>>> +          folio_lock(folio);
>>>> +  }
>>>> +
>>>> +  ret = split_folio(folio);
>>>> +  if (ret) {
>>>> +          if (folio != fault_folio) {
>>>> +                  folio_unlock(folio);
>>>> +                  folio_put(folio);
>>>> +          }
>>>> +          return ret;
>>>> +  }
>>>> +
>>>> +  new_fault_folio = fault_page ? page_folio(fault_page) : NULL;
>>>> +
>>>> +  /*
>>>> +   * Ensure the lock is held on the correct
>>>> +   * folio after the split
>>>> +   */
>>>> +  if (!new_fault_folio) {
>>>> +          folio_unlock(folio);
>>>> +          folio_put(folio);
>>>> +  } else if (folio != new_fault_folio) {
>>>> +          folio_get(new_fault_folio);
>>>> +          folio_lock(new_fault_folio);
>>>> +          folio_unlock(folio);
>>>> +          folio_put(folio);
>>>> +  }
>>>> +
>>>> +  return 0;
>>>> +}
>>>> +
>>>>  static int migrate_vma_collect_pmd(pmd_t *pmdp,
>>>>                               unsigned long start,
>>>>                               unsigned long end,
>>>> @@ -136,6 +183,8 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
>>>>                     * page table entry. Other special swap entries are not
>>>>                     * migratable, and we ignore regular swapped page.
>>>>                     */
>>>> +                  struct folio *folio;
>>>> +
>>>>                    entry = pte_to_swp_entry(pte);
>>>>                    if (!is_device_private_entry(entry))
>>>>                            goto next;
>>>> @@ -147,6 +196,23 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
>>>>                        pgmap->owner != migrate->pgmap_owner)
>>>>                            goto next;
>>>>
>>>> +                  folio = page_folio(page);
>>>> +                  if (folio_test_large(folio)) {
>>>> +                          int ret;
>>>> +
>>>> +                          pte_unmap_unlock(ptep, ptl);
>>>> +                          ret = migrate_vma_split_folio(folio,
>>>> +                                                    migrate->fault_page);
>>>> +
>>>> +                          if (ret) {
>>>> +                                  ptep = pte_offset_map_lock(mm, pmdp, 
>>>> addr, &ptl);
>>>> +                                  goto next;
>>>> +                          }
>>>> +
>>>> +                          addr = start;
>>>> +                          goto again;
>>>> +                  }
>>>
>>> This does not look right to me.
>>>
>>> The folio here is device private, but migrate_vma_split_folio()
>>> calls split_folio(), which cannot handle device private folios yet.
>>> Your change to split_folio() is in Patch 10 and should be moved
>>> before this patch.
>>>
>>
>> Patch 10 is to split the folio in the middle of migration (when we have
>> converted the entries to migration entries). This patch relies on the
>> changes in patch 4. I agree the names are confusing, I'll reword the
>> functions
> 
> Hi Balbir,
> 
> I am still reviewing the patches, but I think I agree with Zi here.
> 
> split_folio() will replace the PMD mappings of the huge folio with PTE
> mappings, but will also split the folio into smaller folios. The former
> is ok with this patch, but the latter is probably not correct if the folio
> is a zone device folio. The driver needs to know about the change, as
> usually the driver will have some sort of mapping between GPU physical
> memory chunks and their corresponding zone device pages.
> 

Yes, at this point there is no support for split folio callback. I can move this
bit to a later patch or move the entire patch to after patch 10. I suspect
this is a theoretical bisection concern for a future driver using large folios?

Thanks,
Balbir

Reply via email to