On 2025-05-15 17:31, Chen, Xiaogang wrote:
On 5/15/2025 3:45 PM, Philip Yang wrote:
On 2025-05-15 10:29, Chen, Xiaogang wrote:
Does this patch fix a bug or just make code look more reasonable?
kfd_process_destroy_pdds releases pdd related buffers, not related
to operations on vm. So vm tear down dose not affect this function.
This change doesn't fix anything currently, as fput(pdd->drm_file) to
free vm is right between free vm mapping qpd->cwsr_mem, qpd->ib_mem
and free kernel bo qpd->proc_doorbells, pdd->proc_ctx_bo, to make it
clear for future change.
Then the current place to do fput(pdd->drm_file) make more sense:
unmap vm mapping of qpd->cwsr_mem, qpd->ib_mem is the last place where
kfd process release procedure needs vm alive. After that the kfd
process release does not need vm alive. It then releases remaining
buffers. So release drm_file as soon as we do not need hold it.
The issue was vm_fini shows error message "still active bo inside vm"
(1/1000) chance, took a while to trace down the leaking vm mapping, the
issue is seq64 memory mapping leaking and fixed by the first patch. KFD
pdd cleanup path, free vm is in the middle of free pdd memory, this is
one of the suspicious vm_fini race. We may add new pdd memory mapping to
vm in future, to prevent the potential vm_fini race, this patch move
free vm to after all pdd memory is freed and add comment.
Regards,
Philip
Regards
Xiaogang
Regards,
Philip
Regards
Xiaogang
On 5/14/2025 12:10 PM, Philip Yang wrote:
Release pdd->drm_file may free the vm if this is the last reference,
move it to the last step after memory is unmapped.
Signed-off-by: Philip Yang<[email protected]>
---
drivers/gpu/drm/amd/amdkfd/kfd_process.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index e868cc8da46f..b009c852180d 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -1063,9 +1063,6 @@ static void kfd_process_destroy_pdds(struct
kfd_process *p)
kfd_process_device_destroy_cwsr_dgpu(pdd);
kfd_process_device_destroy_ib_mem(pdd);
- if (pdd->drm_file)
- fput(pdd->drm_file);
-
if (pdd->qpd.cwsr_kaddr && !pdd->qpd.cwsr_base)
free_pages((unsigned long)pdd->qpd.cwsr_kaddr,
get_order(KFD_CWSR_TBA_TMA_SIZE));
@@ -1088,6 +1085,13 @@ static void kfd_process_destroy_pdds(struct
kfd_process *p)
pdd->runtime_inuse = false;
}
+ /*
+ * This may release the vm if application already close
the drm node,
+ * do it as last step after memory unmapped.
+ */
+ if (pdd->drm_file)
+ fput(pdd->drm_file);
+
kfree(pdd);
p->pdds[i] = NULL;
}