NAK to this or any version of this. This series is insane and the idea is insane.
On Thu, Jun 25, 2026 at 01:47:25PM +0200, David Hildenbrand (Arm) wrote: > On 6/25/26 12:59, Yitao Jiang wrote: > > Hi, > > > > This series fixes a THP policy problem I found while debugging > > frequent ROCm GPU failures on an AMD Radeon 780M system during ML > > training. > > > > Some AMDGPU/KFD user mappings are registered through interval > > notifiers and cannot safely tolerate the backing VMA changing from base > > pages to a transparent huge page after registration. Userspace can > > still apply MADV_HUGEPAGE or MADV_COLLAPSE, and khugepaged can also > > collapse the range, after the GPU mapping has been registered. > > Huh, why? As a memory notifier user, you must be prepared from memory to get > unmapped+remapped at random points in time. > > What is the precise problem here? How are you handling THPs at registration > time? > > Letting arbitrary drivers make THP policies sounds like the very wrong > approach. We absolutely will not _ever_ allow drivers to do this while I still breath :) > > -- > Cheers, > > David Thanks, Lorenzo
