RE: [PATCH] drm/amdgpu: amdgpu_device_recover_vram always failed if only one node in shadow_list

Lou, Wentao Tue, 02 Apr 2019 02:24:37 -0700

Thanks Christian.
You mean msecs_to_jiffies(8000) should be time to complete whole loop, not each 
loop.
Just sent out another patch for review.
Thanks.


BR,
Wentao


-----Original Message-----
From: Koenig, Christian <[email protected]> 
Sent: Tuesday, April 2, 2019 3:39 PM
To: Lou, Wentao <[email protected]>; Deng, Emily <[email protected]>; 
[email protected]
Subject: Re: [PATCH] drm/amdgpu: amdgpu_device_recover_vram always failed if 
only one node in shadow_list

Yeah, exactly that's what should happen here.

The value of tmo SHOULD be changed, otherwise we wait tmo jiffies on each loop.

Christian.

Am 02.04.19 um 09:29 schrieb Lou, Wentao:
> Hi Christian,
>
> If " tmo = dma_fence_wait_timeout(fence, false, tmo); " was executed inside 
> list_for_each_entry, the value of tmo might be changed.
> But " tmo = dma_fence_wait_timeout(fence, false, tmo); " might be called 
> after exiting the loop of list_for_each_entry.
> It might pass a different value of tmo into dma_fence_wait_timeout.
>
> BR,
> Wentao
>
>
> -----Original Message-----
> From: Christian König <[email protected]>
> Sent: Tuesday, April 2, 2019 3:01 PM
> To: Deng, Emily <[email protected]>; Lou, Wentao 
> <[email protected]>; [email protected]
> Subject: Re: [PATCH] drm/amdgpu: amdgpu_device_recover_vram always 
> failed if only one node in shadow_list
>
> Yeah, agree that is much closer to a correct solution.
>
> But even better would be to correctly update the timeout as well, e.g:
>
> tmo = dma_fence_wait_timeout(fence, false, tmo); dma_fence_put(fence); fence 
> = next; if (tmo == 0)
>       r = -ETIMEDOUT;
>       break
> } else if (tmo < 0) {
>       r = tmo;
>       break;
> }
>
> That we restart the timeout for each wait looks like a rather problematic bug 
> to me as well.
>
> Christian.
>
> Am 02.04.19 um 06:03 schrieb Deng, Emily:
>> Maybe it will be better to add follow check, and change “if (r <= 0 || tmo 
>> <= 0) " to "if (r <0 || tmo <= 0)".
>>      r = dma_fence_wait_timeout(f, false, timeout);
>>      if (r == 0) {
>>              r = -ETIMEDOUT;
>>              break;
>>      } else if (r < 0) {
>>              break;
>>      }
>>
>> Best wishes
>> Emily Deng
>>
>>
>>> -----Original Message-----
>>> From: amd-gfx <[email protected]> On Behalf Of 
>>> wentalou
>>> Sent: Monday, April 1, 2019 4:59 PM
>>> To: [email protected]
>>> Cc: Lou, Wentao <[email protected]>
>>> Subject: [PATCH] drm/amdgpu: amdgpu_device_recover_vram always 
>>> failed if only one node in shadow_list
>>>
>>> amdgpu_bo_restore_shadow would assign zero to r if succeeded.
>>> r would remain zero if there is only one node in shadow_list.
>>> current code would always return failure when r <= 0.
>>>
>>> Change-Id: Iae6880e7c78b71fde6a6754c69665c2e312a80a5
>>> Signed-off-by: Wentao Lou <[email protected]>
>>> ---
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 +++++-
>>> 1 file changed, 5 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> index c4c61e9..5cf21a4 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> @@ -3171,6 +3171,7 @@ static int amdgpu_device_recover_vram(struct 
>>> amdgpu_device *adev)
>>>     struct dma_fence *fence = NULL, *next = NULL;
>>>     struct amdgpu_bo *shadow;
>>>     long r = 1, tmo;
>>> +   bool single_shadow = false;
>>>
>>>     if (amdgpu_sriov_runtime(adev))
>>>             tmo = msecs_to_jiffies(8000);
>>> @@ -3194,10 +3195,12 @@ static int amdgpu_device_recover_vram(struct 
>>> amdgpu_device *adev)
>>>                     r = dma_fence_wait_timeout(fence, false, tmo);
>>>                     dma_fence_put(fence);
>>>                     fence = next;
>>> +                   single_shadow = false;
>>>                     if (r <= 0)
>>>                             break;
>>>             } else {
>>>                     fence = next;
>>> +                   single_shadow = true;
>>>             }
>>>     }
>>>     mutex_unlock(&adev->shadow_list_lock);
>>> @@ -3206,7 +3209,8 @@ static int amdgpu_device_recover_vram(struct 
>>> amdgpu_device *adev)
>>>             tmo = dma_fence_wait_timeout(fence, false, tmo);
>>>     dma_fence_put(fence);
>>>
>>> -   if (r <= 0 || tmo <= 0) {
>>> +   /* r would be zero even if amdgpu_bo_restore_shadow succeeded when
>>> single shadow in list */
>>> +   if (r < 0 || (r == 0 && !single_shadow) || tmo <= 0) {
>>>             DRM_ERROR("recover vram bo from shadow failed\n");
>>>             return -EIO;
>>>     }
>>> --
>>> 2.7.4
>>>
>>> _______________________________________________
>>> amd-gfx mailing list
>>> [email protected]
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>> _______________________________________________
>> amd-gfx mailing list
>> [email protected]
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu: amdgpu_device_recover_vram always failed if only one node in shadow_list

Reply via email to