On 2025-11-12 at 03:43 +1100, Thomas Hellström <[email protected]> wrote... > This series aims at providing an initial implementation of multi-device > SVM, where communitcation with peers (migration and direct execution out > of peer memory) uses some form of fast interconnect. In this series > we're using pcie p2p. > > In a multi-device environment, the struct pages for device-private memory > (the dev_pagemap) may take up a significant amount of system memory. We > therefore want to provide a means of revoking / removing the dev_pagemaps > not in use. In particular when a device is offlined, we want to block > migrating *to* the device memory and migrate data already existing in the > devices memory to system. The dev_pagemap then becomes unused and can be > removed. > > Removing and setting up a large dev_pagemap is also quite time-consuming, > so removal of unused dev_pagemaps only happens on system memory pressure > using a shrinker.
Agree it is quite time-consuming, we have run into this problem as well including with the pcie p2p dma pages. On the mm side I've started looking at if/how we can remove the need for struct pages at all for supporting this. Doesn't help you at all now of course, but hopefully one day we can avoid the need for this. I will be discussing this at LPC if you happen to be there. - Alistair > Patch 1 is a small debug printout fix. > Patches 2-7 deals with dynamic drm_pagemaps as described above. > Patches 8-12 adds infrastructure to handle remote drm_pagemaps with > fast interconnects. > Patch 13 extends the xe madvise() UAPI to handle remote drm_pagemaps. > Patch 14 adds a pcie-p2p dma SVM interconnect to the xe driver. > Patch 15 adds some SVM-related debug printouts for xe. > Patch 16 adds direct interconnect migration. > Patch 17 adds some documentation. > > What's still missing is implementation of migration policies. > That will be implemented in follow-up series. > > v2: > - Address review comments from Matt Brost. > - Fix compilation issues reported by automated testing > - Add patch 1, 17. > - What's now patch 16 was extended to support p2p migration. > > Thomas Hellström (17): > drm/xe/svm: Fix a debug printout > drm/pagemap, drm/xe: Add refcounting to struct drm_pagemap > drm/pagemap: Add a refcounted drm_pagemap backpointer to struct > drm_pagemap_zdd > drm/pagemap, drm/xe: Manage drm_pagemap provider lifetimes > drm/pagemap: Add a drm_pagemap cache and shrinker > drm/xe: Use the drm_pagemap cache and shrinker > drm/pagemap: Remove the drm_pagemap_create() interface > drm/pagemap_util: Add a utility to assign an owner to a set of > interconnected gpus > drm/xe: Use the drm_pagemap_util helper to get a svm pagemap owner > drm/xe: Pass a drm_pagemap pointer around with the memory advise > attributes > drm/xe: Use the vma attibute drm_pagemap to select where to migrate > drm/xe: Simplify madvise_preferred_mem_loc() > drm/xe/uapi: Extend the madvise functionality to support foreign > pagemap placement for svm > drm/xe: Support pcie p2p dma as a fast interconnect > drm/xe/vm: Add a couple of VM debug printouts > drm/pagemap, drm/xe: Support migration over interconnect > drm/xe/svm: Document how xe keeps drm_pagemap references > > drivers/gpu/drm/Makefile | 3 +- > drivers/gpu/drm/drm_gpusvm.c | 4 +- > drivers/gpu/drm/drm_pagemap.c | 354 ++++++++++++--- > drivers/gpu/drm/drm_pagemap_util.c | 568 ++++++++++++++++++++++++ > drivers/gpu/drm/xe/xe_device.c | 20 + > drivers/gpu/drm/xe/xe_device.h | 2 + > drivers/gpu/drm/xe/xe_device_types.h | 5 + > drivers/gpu/drm/xe/xe_svm.c | 631 ++++++++++++++++++++++----- > drivers/gpu/drm/xe/xe_svm.h | 82 +++- > drivers/gpu/drm/xe/xe_tile.c | 34 +- > drivers/gpu/drm/xe/xe_tile.h | 21 + > drivers/gpu/drm/xe/xe_userptr.c | 2 +- > drivers/gpu/drm/xe/xe_vm.c | 65 ++- > drivers/gpu/drm/xe/xe_vm.h | 1 + > drivers/gpu/drm/xe/xe_vm_madvise.c | 106 ++++- > drivers/gpu/drm/xe/xe_vm_types.h | 21 +- > drivers/gpu/drm/xe/xe_vram_types.h | 15 +- > include/drm/drm_pagemap.h | 91 +++- > include/drm/drm_pagemap_util.h | 92 ++++ > include/uapi/drm/xe_drm.h | 18 +- > 20 files changed, 1898 insertions(+), 237 deletions(-) > create mode 100644 drivers/gpu/drm/drm_pagemap_util.c > create mode 100644 include/drm/drm_pagemap_util.h > > -- > 2.51.1 >
