NAK to this or any version of this.

This series is insane and the idea is insane.

On Thu, Jun 25, 2026 at 01:47:25PM +0200, David Hildenbrand (Arm) wrote:
> On 6/25/26 12:59, Yitao Jiang wrote:
> > Hi,
> >
> > This series fixes a THP policy problem I found while debugging
> > frequent ROCm GPU failures on an AMD Radeon 780M system during ML
> > training.
> >
> > Some AMDGPU/KFD user mappings are registered through interval
> > notifiers and cannot safely tolerate the backing VMA changing from base
> > pages to a transparent huge page after registration. Userspace can
> > still apply MADV_HUGEPAGE or MADV_COLLAPSE, and khugepaged can also
> > collapse the range, after the GPU mapping has been registered.
>
> Huh, why? As a memory notifier user, you must be prepared from memory to get
> unmapped+remapped at random points in time.
>
> What is the precise problem here? How are you handling THPs at registration 
> time?
>
> Letting arbitrary drivers make THP policies sounds like the very wrong 
> approach.

We absolutely will not _ever_ allow drivers to do this while I still breath :)

>
> --
> Cheers,
>
> David

Thanks, Lorenzo

Reply via email to