[Public]
From: Koenig, Christian <[email protected]>
Sent: Tuesday, April 22, 2025 9:26 PM
To: Liang, Prike <[email protected]>; Yadav, Arvind <[email protected]>
Cc: Deucher, Alexander <[email protected]>;
[email protected]; Christian König
<[email protected]>
Subject: Re: [PATCH 4/4] drm/amdgpu: free the evf when the attached bo release
Am 22.04.25 um 15:09 schrieb Liang, Prike:
Stop, wait a second. That shouldn't happen at the first place. Why is the
eviction
fence considered a dependency for page table updates?
When it is added only as bookkeep then we should never consider that here.
Looks like something in the sync obj is messed up.
It is like this. Here, amdgpu_sync_resv is using
DMA_RESV_USAGE_BOOKKEEP.
int amdgpu_sync_resv() {
..
/* TODO: Use DMA_RESV_USAGE_READ here */
That here is the core of the problem.
I've added this TODO item 4 years ago to switch over to DMA_RESV_USAGE_READ
here when moved all TTM use cases to using
drm_sched_job_add_resv_dependencies().
That was done, but this TODO here forgotten.
So here the user space submission sync only requires syncing the fences with
less than DMA_RESV_USAGE_READ usage in the amdgpu_sync_resv(). If so, then the
eviction fence will not be added to the sync with kernel queue submission. I
can submit a patch for this change.
Thanks,
Prike
dma_resv_for_each_fence(&cursor, resv,
DMA_RESV_USAGE_BOOKKEEP, f) {
dma_fence_chain_for_each(f, f) {
..
}
during PT update amdgpu_vm_bo_update() is using sync to moving
fences(Eviction fence) before mapping anything. Because of this Eviction fence
will
act as dependency.
Yes, since the amdgpu_sync_resv() uses the bookkeep usage, then all the BOs
reservation fences along with the eviction fence will be returned and added to
the sync.
Yeah, but that is incorrect.
And with the attached patch, the eviction fence can be released properly when
the kq and uq are enabled.
We need to fix the underlying bug first before we can work on the next step.
Regards,
Christian.
Thanks,
Prike
~arvind
Regards,
Christian.
, and then the eviction fence will be added as a dependent fence by
propagating with amdgpu_sync_push_to_job(). With removing the eviction fence
from the VM sync at amdgpu_sync_resv(), then the eviction fence can be released
properly.
Thanks,
Prike
Thanks,
Christian.
PS: Please stop calling the eviction fence evf.
return 0;
free_err:
@@ -237,6 +234,30 @@ void amdgpu_eviction_fence_detach(struct
amdgpu_eviction_fence_mgr *evf_mgr,
dma_fence_put(stub);
}
+void amdgpu_userq_remove_all_eviction_fences(struct amdgpu_bo
+*bo)
Please name that amdgpu_eviction_fence_remove_all().
Noted.
Regards,
Christian.
+{
+ struct dma_resv *resv = &bo->tbo.base._resv;
+ struct dma_fence *fence, *stub;
+ struct dma_resv_iter cursor;
+
+ dma_resv_assert_held(resv);
+
+ stub = dma_fence_get_stub();
+ dma_resv_for_each_fence(&cursor, resv,
DMA_RESV_USAGE_BOOKKEEP, fence) {
+ struct amdgpu_eviction_fence *ev_fence;
+
+ ev_fence = fence_to_evf(fence);
+ if (!ev_fence || !dma_fence_is_signaled(&ev_fence->base))
+ continue;
+
+ dma_resv_replace_fences(resv, fence->context, stub,
+ DMA_RESV_USAGE_BOOKKEEP);
+
+ }
+
+ dma_fence_put(stub);
+}
+
int amdgpu_eviction_fence_init(struct amdgpu_eviction_fence_mgr
*evf_mgr) {
/* This needs to be done one time per open */ diff --git
a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
index fcd867b7147d..da99ac322a2e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
@@ -66,4 +66,5 @@ amdgpu_eviction_fence_signal(struct
amdgpu_eviction_fence_mgr *evf_mgr, int
amdgpu_eviction_fence_replace_fence(struct
amdgpu_eviction_fence_mgr
*evf_mgr,
struct drm_exec *exec);
+void amdgpu_userq_remove_all_eviction_fences(struct amdgpu_bo
+*bo);
#endif
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 1e73ce30d4d7..f001018a01eb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -1392,6 +1392,7 @@ void amdgpu_bo_release_notify(struct
ttm_buffer_object *bo)
amdgpu_vram_mgr_set_cleared(bo->resource);
dma_resv_add_fence(&bo->base._resv, fence,
DMA_RESV_USAGE_KERNEL);
dma_fence_put(fence);
+ amdgpu_userq_remove_all_eviction_fences(abo);
out:
dma_resv_unlock(&bo->base._resv);