Am 22.04.25 um 15:09 schrieb Liang, Prike:
>>> Stop, wait a second. That shouldn't happen at the first place. Why is the
>>> eviction
>> fence considered a dependency for page table updates?
>>> When it is added only as bookkeep then we should never consider that here.
>> Looks like something in the sync obj is messed up.
>> It is like this. Here, amdgpu_sync_resv is using
>> DMA_RESV_USAGE_BOOKKEEP.
>>
>> int amdgpu_sync_resv() {
>>
>> ..
>>
>> /* TODO: Use DMA_RESV_USAGE_READ here */
That here is the core of the problem.
I've added this TODO item 4 years ago to switch over to DMA_RESV_USAGE_READ
here when moved all TTM use cases to using
drm_sched_job_add_resv_dependencies().
That was done, but this TODO here forgotten.
>> dma_resv_for_each_fence(&cursor, resv,
>> DMA_RESV_USAGE_BOOKKEEP, f) {
>> dma_fence_chain_for_each(f, f) {
>>
>> ..
>>
>> }
>> during PT update amdgpu_vm_bo_update() is using sync to moving
>> fences(Eviction fence) before mapping anything. Because of this Eviction
>> fence will
>> act as dependency.
> Yes, since the amdgpu_sync_resv() uses the bookkeep usage, then all the BOs
> reservation fences along with the eviction fence will be returned and added
> to the sync.
Yeah, but that is incorrect.
> And with the attached patch, the eviction fence can be released properly when
> the kq and uq are enabled.
We need to fix the underlying bug first before we can work on the next step.
Regards,
Christian.
>
> Thanks,
> Prike
>
>> ~arvind
>>
>>> Regards,
>>> Christian.
>>>
>>>> , and then the eviction fence will be added as a dependent fence by
>> propagating with amdgpu_sync_push_to_job(). With removing the eviction fence
>> from the VM sync at amdgpu_sync_resv(), then the eviction fence can be
>> released
>> properly.
>>>> Thanks,
>>>> Prike
>>>>> Thanks,
>>>>> Christian.
>>>>>
>>>>> PS: Please stop calling the eviction fence evf.
>>>>>
>>>>>>>> return 0;
>>>>>>>>
>>>>>>>> free_err:
>>>>>>>> @@ -237,6 +234,30 @@ void amdgpu_eviction_fence_detach(struct
>>>>>>> amdgpu_eviction_fence_mgr *evf_mgr,
>>>>>>>> dma_fence_put(stub);
>>>>>>>> }
>>>>>>>>
>>>>>>>> +void amdgpu_userq_remove_all_eviction_fences(struct amdgpu_bo
>>>>>>>> +*bo)
>>>>>>> Please name that amdgpu_eviction_fence_remove_all().
>>>>>> Noted.
>>>>>>
>>>>>>> Regards,
>>>>>>> Christian.
>>>>>>>
>>>>>>>> +{
>>>>>>>> + struct dma_resv *resv = &bo->tbo.base._resv;
>>>>>>>> + struct dma_fence *fence, *stub;
>>>>>>>> + struct dma_resv_iter cursor;
>>>>>>>> +
>>>>>>>> + dma_resv_assert_held(resv);
>>>>>>>> +
>>>>>>>> + stub = dma_fence_get_stub();
>>>>>>>> + dma_resv_for_each_fence(&cursor, resv,
>>>>>>> DMA_RESV_USAGE_BOOKKEEP, fence) {
>>>>>>>> + struct amdgpu_eviction_fence *ev_fence;
>>>>>>>> +
>>>>>>>> + ev_fence = fence_to_evf(fence);
>>>>>>>> + if (!ev_fence || !dma_fence_is_signaled(&ev_fence->base))
>>>>>>>> + continue;
>>>>>>>> +
>>>>>>>> + dma_resv_replace_fences(resv, fence->context, stub,
>>>>>>>> + DMA_RESV_USAGE_BOOKKEEP);
>>>>>>>> +
>>>>>>>> + }
>>>>>>>> +
>>>>>>>> + dma_fence_put(stub);
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> int amdgpu_eviction_fence_init(struct amdgpu_eviction_fence_mgr
>>>>>>>> *evf_mgr) {
>>>>>>>> /* This needs to be done one time per open */ diff --git
>>>>>>>> a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
>>>>>>>> index fcd867b7147d..da99ac322a2e 100644
>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
>>>>>>>> @@ -66,4 +66,5 @@ amdgpu_eviction_fence_signal(struct
>>>>>>>> amdgpu_eviction_fence_mgr *evf_mgr, int
>>>>>>>> amdgpu_eviction_fence_replace_fence(struct
>>>>>>>> amdgpu_eviction_fence_mgr
>>>>>>> *evf_mgr,
>>>>>>>> struct drm_exec *exec);
>>>>>>>> +void amdgpu_userq_remove_all_eviction_fences(struct amdgpu_bo
>>>>>>>> +*bo);
>>>>>>>> #endif
>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>>>>>> index 1e73ce30d4d7..f001018a01eb 100644
>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>>>>>> @@ -1392,6 +1392,7 @@ void amdgpu_bo_release_notify(struct
>>>>>>> ttm_buffer_object *bo)
>>>>>>>> amdgpu_vram_mgr_set_cleared(bo->resource);
>>>>>>>> dma_resv_add_fence(&bo->base._resv, fence,
>>>>>>> DMA_RESV_USAGE_KERNEL);
>>>>>>>> dma_fence_put(fence);
>>>>>>>> + amdgpu_userq_remove_all_eviction_fences(abo);
>>>>>>>>
>>>>>>>> out:
>>>>>>>> dma_resv_unlock(&bo->base._resv);