amdgpu: validate userq's last fence prior to destroying

Christian König Tue, 15 Jul 2025 05:15:22 -0700

On 15.07.25 13:50, Liang, Prike wrote:
> [Public]
> 
> Regards,
>       Prike
> 
>> -----Original Message-----
>> From: Koenig, Christian <[email protected]>
>> Sent: Friday, July 11, 2025 8:13 PM
>> To: Liang, Prike <[email protected]>; [email protected]
>> Cc: Deucher, Alexander <[email protected]>
>> Subject: Re: [PATCH v6 07/11] drm/amdgpu: validate userq's last fence prior 
>> to
>> destroying
>>
>> On 11.07.25 11:39, Prike Liang wrote:
>>> The userq requires validating queue status before destroying it, if
>>> user tries to destroy a busy userq by IOCTL then the driver should
>>> report an error for this illegal usage.
>>
>> Clear NAK, destroying a busy userqueue is perfectly valid!
> Yes, the firmware should handle such case something like as preempting the 
> queue.
> If we directly unmap a hang queue and may further cause the MES firmware hang 
> up,
> so, do we need to detect the hang userq here by checking the userq fence 
> status and reset the hang queue before further performs the unmap queue?


No, waiting for the last fence should be perfectly sufficient since the hang 
detection is separate from this.

BTW please remove the 100ms timeout here, we should wait forever or until the 
hang detection has suspended the queue and signaled the fence with an error.

Regards,
Christian.


> 
>> Regards,
>> Christian.
>>
>>>
>>> Signed-off-by: Prike Liang <[email protected]>
>>> ---
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c | 15 ++++++++++++---
>>>  1 file changed, 12 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
>>> index 81fbb00b6d91..bcbe8d3f66ed 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
>>> @@ -281,7 +281,7 @@ amdgpu_userq_map_helper(struct amdgpu_userq_mgr
>> *uq_mgr,
>>>     return r;
>>>  }
>>>
>>> -static void
>>> +static int
>>>  amdgpu_userq_wait_for_last_fence(struct amdgpu_userq_mgr *uq_mgr,
>>>                              struct amdgpu_usermode_queue *queue)  { @@ -
>> 290,10 +290,14 @@
>>> amdgpu_userq_wait_for_last_fence(struct amdgpu_userq_mgr *uq_mgr,
>>>
>>>     if (f && !dma_fence_is_signaled(f)) {
>>>             ret = dma_fence_wait_timeout(f, true, msecs_to_jiffies(100));
>>> -           if (ret <= 0)
>>> +           if (ret <= 0) {
>>>                     drm_file_err(uq_mgr->file, "Timed out waiting for
>> fence=%llu:%llu\n",
>>>                                  f->context, f->seqno);
>>> +                   return -ETIMEDOUT;
>>> +           }
>>>     }
>>> +
>>> +   return 0;
>>>  }
>>>
>>>  static void
>>> @@ -509,7 +513,12 @@ amdgpu_userq_destroy(struct drm_file *filp, int
>> queue_id)
>>>             mutex_unlock(&uq_mgr->userq_mutex);
>>>             return -EINVAL;
>>>     }
>>> -   amdgpu_userq_wait_for_last_fence(uq_mgr, queue);
>>> +
>>> +   if (amdgpu_userq_wait_for_last_fence(uq_mgr, queue)) {
>>> +           drm_warn(adev_to_drm(uq_mgr->adev), "Don't destroy a busy
>> userq\n");
>>> +           /* For the fence signal timeout case, it requires resetting the 
>>> busy
>> queue.*/
>>> +           r = -ETIMEDOUT;
>>> +   }
>>>
>>>     /*
>>>      * At this point the userq obj va should be mapped,
>

Re: [PATCH v6 07/11] drm/amdgpu: validate userq's last fence prior to destroying

Reply via email to