On 5/8/25 16:46, James Zhu wrote: > When XNACK on, hang or low performance is observed with some test cases. > The restoring page process has unexpected stuck during evicting/restoring > if some bo's flag has KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED setting: > 1. when xnack on, retry pagefault will invoke restoring pages process > 2. A. if there is enough VRAM space, simply migrating pages from ram to vram > B. if there is no enough VRAM space left, searching resource LRU list, and > scheduling a new eviction work queue to evict LRU bo from vram to ram > first, then resume restoring pages process, or waiting for eviction > timeout and try to schedule evicting next LRU bo > 3. for case 2B, if bo has KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED setting, > queue eviction will be triggered.So restoring work queue will be scheduled. > 4. step 1, restoring pages process will hold one mm->mmap_lock's read until > restoring pages is completed > step 2B, evictiion work queue process will hold one mm->mmap_lock's read > until evicting bo is completed > step 3, restoring work queue process is trying to acquire one > mm->mmap_lock's > write after the above two mm->mmap_lock's read are released, and in the > meantime which will block all following mm->mmap_lock's read request. > 5. in step 2, if the first eviction bo's size is big enough for step 1 > restoring pages request, everything is fine. if not, which means that the > mm->mmap_lock's read step 1 won't be release right the way. In step 3, > first > eviction bo's restoring work queue will compete for mm->mmap_lock's write, > the second and following LRU bo's evictiion work queue will be blocked by > tring to acquire mm->mmap_lock's read until timeout. All restoring pages > process will be stuck here. > Using down_write_trylock to replace mmap_write_lock will help not block the > second and following evictiion work queue process. > > Signed-off-by: James Zhu <[email protected]> > --- > drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c > b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c > index 72be6e152e88..5f6ed70559b7 100644 > --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c > @@ -1794,7 +1794,11 @@ svm_range_list_lock_and_flush_work(struct > svm_range_list *svms, > { > retry_flush_work: > flush_work(&svms->deferred_list_work); > - mmap_write_lock(mm); > + while (true) { > + if (down_write_trylock(&(mm->mmap_lock))) > + break; > + schedule(); > + }
Oh, stuff like that is usually an absolutely clear NAK from upstream. As far as I know that is not something we can do so easily. Would it be possible to wait for progress instead of calling schedule() here? Regards, Christian. > > if (list_empty(&svms->deferred_range_list)) > return;
