Hi Alejandro,
yes I'm very aware of this issue, but unfortunately can't give an easy
solution either.
I'm working for over a year now on getting this fixed, but unfortunately
it turned out that this problem is much bigger than initially thought.
Setting the appropriate GFP flags for the job allocation is actually the
trivial part.
The really really hard thing is that we somehow need to add a lock to
prevent the page tables from being evicted. And as you also figured out
that lock can't be taken easily anywhere else.
I've already wrote a prototype for this, but didn't had time to hammer
it into shape for upstreaming yet.
Regards,
Christian.
Am 27.11.19 um 15:55 schrieb Sierra Guiza, Alejandro (Alex):
Hi Christian,
As you know, we’re working on the HMM enablement. Im working on the
dGPU page table entries invalidation on the userptr mapping case.
Currently, the MMU notifiers handle stops all user mode queues,
schedule a delayed worker to re-validate userptr mappings and restart
the queues.
Part of the HMM functionality, we need to invalidate the page table
entries instead of stopping the queues. At the same time we need to
move the revalidation of the userptr mappings into the page fault handler.
We’re seeing a deadlock warning after we try to invalidate the PTEs
inside the MMU notifier handler. More specific, when we try to update
the BOs to invalidate PTEs using amdgpu_vm_bo_update. This uses
kmalloc on the amdgpu_job_alloc which seems to be causing this problem.
Based on @Kuehling, Felix <mailto:[email protected]> comments,
kmalloc without any special flags can cause memory reclaim. Doing that
inside an MMU notifier is problematic, because an MMU notifier may be
called inside a memory-reclaim operation itself. That would result in
recursion. Also, reclaim shouldn't be done while holding a lock that
can be taken in an MMU notifier for the same reason. If you cause a
reclaim while holding that lock, then an MMU notifier called by the
reclaim can deadlock trying to take the same lock.
Please let us know if you have any advice to enable this the right way
Thanks in advanced,
Alejandro
_______________________________________________
amd-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/amd-gfx