Hi, On 11/12/25 08:29, Honglei Huang wrote: > Hi all, > > This RFC patch series introduces a new mechanism for batch registration of > multiple non-contiguous SVM (Shared Virtual Memory) ranges in a single ioctl > call. The primary goal of this series is to start a discussion about the best > approach to handle scattered user memory allocations in GPU workloads. > > Background and Motivation > ========================== > > Current applications using ROCm/HSA often need to register many scattered > memory buffers (e.g., multiple malloc() allocations) for GPU access. With the > existing AMDKFD_IOC_SVM ioctl, each range must be registered individually, > leading to: > - Blocking issue in some special use cases with many memory ranges > - High system call overhead when dealing with dozens or hundreds of ranges > - Inefficient resource management > - Complexity in userspace applications > > Use Case Example > ================ > > Consider a typical ML/HPC workload that allocates 100+ small buffers across > different parts of the address space. Currently, this requires 100+ separate > ioctl calls. The proposed batch interface reduces this to a single call.
Yeah, that's an intentional limitation. In an IOCTL interface you usually need to guarantee that the operation either completes or fails in a transactional manner. It is possible to implement this, but usually rather tricky if you do multiple operations in a single IOCTL. So you really need a good use case to justify the added complexity. > Paravirtualized environments exacerbate this issue, as KVM's memory backing > is often non-contiguous at the host level. In virtualized environments, guest > physical memory appears contiguous to the VM but is actually scattered across > host memory pages. This fragmentation means that what appears as a single > large allocation in the guest may require multiple discrete SVM registrations > to properly handle the underlying host memory layout, further multiplying the > number of required ioctl calls. SVM with dynamic migration under KVM is most likely a dead end to begin with. The only possibility to implement it is with memory pinning which is basically userptr. Or a rather slow client side IOMMU emulation to catch concurrent DMA transfers to get the necessary information onto the host side. Intel calls this approach colIOMMU: https://www.usenix.org/system/files/atc20-paper236-slides-tian.pdf > Current Implementation - A Workaround Approach > =============================================== > > This patch series implements a WORKAROUND solution that pins user pages in > memory to enable batch registration. While functional, this approach has > several significant limitations: > > **Major Concern: Memory Pinning** > - The implementation uses pin_user_pages_fast() to lock pages in RAM > - This defeats the purpose of SVM's on-demand paging mechanism > - Prevents memory oversubscription and dynamic migration > - May cause memory pressure on systems with limited RAM > - Goes against the fundamental design philosophy of HMM-based SVM That again is perfectly intentional. Any other mode doesn't really make sense with KVM. > **Known Limitations:** > 1. Increased memory footprint due to pinned pages > 2. Potential for memory fragmentation > 3. No support for transparent huge pages in pinned regions > 4. Limited interaction with memory cgroups and resource controls > 5. Complexity in handling VMA operations and lifecycle management > 6. May interfere with NUMA optimization and page migration > > Why Submit This RFC? > ==================== > > Despite the limitations above, I am submitting this series to: > > 1. **Start the Discussion**: I want community feedback on whether batch > registration is a useful feature worth pursuing. > > 2. **Explore Better Alternatives**: Is there a way to achieve batch > registration without pinning? Could I extend HMM to better support > this use case? There is an ongoing unification project between KFD and KGD, we are currently looking into the SVM part on a weekly basis. Saying that we probably need a really good justification to add new features to the KFD interfaces cause this is going to delay the unification. Regards, Christian. > > 3. **Understand Trade-offs**: For some workloads, the performance benefit > of batch registration might outweigh the drawbacks of pinning. I'd > like to understand where the balance lies. > > Questions for the Community > ============================ > > 1. Are there existing mechanisms in HMM or mm that could support batch > operations without pinning? > > 2. Would a different approach (e.g., async registration, delayed validation) > be more acceptable? > > Alternative Approaches Considered > ================================== > > I've considered several alternatives: > > A) **Pure HMM approach**: Register ranges without pinning, rely entirely on > > B) **Userspace batching library**: Hide multiple ioctls behind a library. > > Patch Series Overview > ===================== > > Patch 1: Add KFD_IOCTL_SVM_ATTR_MAPPED attribute type > Patch 2: Define data structures for batch SVM range registration > Patch 3: Add new AMDKFD_IOC_SVM_RANGES ioctl command > Patch 4: Implement page pinning mechanism for scattered ranges > Patch 5: Wire up the ioctl handler and attribute processing > > Testing > ======= > > The series has been tested with: > - Multiple scattered malloc() allocations (2-2000+ ranges) > - Various allocation sizes (4KB to 1G+) > - GPU compute workloads using the registered ranges > - Memory pressure scenarios > - OpecnCL CTS in KVM guest environment > - HIP catch tests in KVM guest environment > - Some AI applications like Stable Diffusion, ComfyUI, 3B LLM models based > on HuggingFace transformers > > I understand this approach is not ideal and are committed to working on a > better solution based on community feedback. This RFC is the starting point > for that discussion. > > Thank you for your time and consideration. > > Best regards, > Honglei Huang > > --- > > Honglei Huang (5): > drm/amdkfd: Add KFD_IOCTL_SVM_ATTR_MAPPED attribute > drm/amdkfd: Add SVM ranges data structures > drm/amdkfd: Add AMDKFD_IOC_SVM_RANGES ioctl command > drm/amdkfd: Add support for pinned user pages in SVM ranges > drm/amdkfd: Wire up SVM ranges ioctl handler > > drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 67 +++++++++++ > drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 232 > +++++++++++++++++++++++++++++-- > drivers/gpu/drm/amd/amdkfd/kfd_svm.h | 3 + > include/uapi/linux/kfd_ioctl.h | 52 +++++++- > 4 files changed, 348 insertions(+), 6 deletions(-)
