[AMD Official Use Only - General]
For the unlocking, I have tested on both nv21 and nv31, the unlock/lock paring
looks not break.
On asic <gfx11, the (!adev->gfx.is_poweron) always true, this parameter is
introduced from GFX11.
On gfx11, in the reset (suspend then resume) process, after suspend, gfx
poweron right after smu resumed successfully.
The (!adev->gfx.is_poweron) is always false when trylock the reset_domin->sem.
Not return ahead in gfx11 and continue gpu tlb flush in IP specific (gmc v11)
callback.
Then unlock after gpu tlb flush:
void amdgpu_gmc_flush_gpu_tlb(struct amdgpu_device *adev, uint32_t vmid,
uint32_t vmhub, uint32_t flush_type)
{
if (!down_read_trylock(&adev->reset_domain->sem) &&
!adev->gfx.is_poweron) //lock
return;
...
adev->gmc.gmc_funcs->flush_gpu_tlb(adev, vmid, vmhub,
flush_type);
up_read(&adev->reset_domain->sem); //unlock
return;
}
Thanks,
Feifei
-----Original Message-----
From: Xu, Feifei
Sent: Tuesday, October 10, 2023 5:44 PM
To: Christian König <[email protected]>; Wang, Yang(Kevin)
<[email protected]>; [email protected]
Cc: Koenig, Christian <[email protected]>; Zhang, Hawking
<[email protected]>
Subject: RE: [PATCH] drm/amdgpu:Check gfx poweron when skip flush_gpu_tlb
>> Then a TLB flush shouldn't be necessary on reset. A reset implies that the
>> TLB is cleared as well.
Hmm, in current implementation, when we say a reset implied that the TLB is
cleared, assume that the TLB clear is purely hardware action. There's no gpu
tlb flush initiated by software/driver after suspend.
While in some asics of gfx11 (like nv31), gpu tlb need to be flushed by
software/driver after smu resume successfully intentionally.
Without the gpu tlb flush on nv31, S3 or reset will be break with gfx page
fault.
>> First of all the patch is broken because you only handle the locking, but
>> not the unlocking part.
For the unlocking part, realized that you and Kevin are correct. Lock/unlock
not paried.
Thanks,
Feifei
-----Original Message-----
From: Christian König <[email protected]>
Sent: Monday, October 9, 2023 4:52 PM
To: Xu, Feifei <[email protected]>; Wang, Yang(Kevin) <[email protected]>;
[email protected]
Cc: Koenig, Christian <[email protected]>; Zhang, Hawking
<[email protected]>
Subject: Re: [PATCH] drm/amdgpu:Check gfx poweron when skip flush_gpu_tlb
Am 09.10.23 um 03:50 schrieb Xu, Feifei:
> [AMD Official Use Only - General]
>
> Hi,
>
>>> Based on your description, the above code should use "||" instead of
>>> "&&",
> && is to add more restriction here. To avoid skipping necessary TLB flush by
> return.
> For Asics < GFX11, !adev->gfx.is_poweron is always true (this
> paremeter is intrudoced from GFX11), only depends on reset_domain->sem; For
> Asics = GFX11, !adev->gfx.is_poweron might be false (which gfx might already
> poweron in the reset), this will make the if () not ture, return will not be
> executed, thus flush TLB.
First of all the patch is broken because you only handle the locking, but not
the unlocking part.
Then a TLB flush shouldn't be necessary on reset. A reset implies that the TLB
is cleared as well.
We discussed the possibility to avoid that, but this is not supposed to be
happening at the moment.
Regards,
Christian.
>
>>> And after merging code into one line may result in the lock not being
>>> released if the lock can be acquired success.
> If !adev->gfx.is_poweron is true, the reset_domin->sem will not be
> down_read_trylock, thus could avoid this deadlock.
>
> Thanks,
> Feifei
>
> -----Original Message-----
> From: Wang, Yang(Kevin) <[email protected]>
> Sent: Sunday, October 8, 2023 9:36 PM
> To: Xu, Feifei <[email protected]>; [email protected]
> Cc: Xu, Feifei <[email protected]>; Xu, Feifei <[email protected]>;
> Koenig, Christian <[email protected]>; Zhang, Hawking
> <[email protected]>
> Subject: RE: [PATCH] drm/amdgpu:Check gfx poweron when skip
> flush_gpu_tlb
>
>
> -----Original Message-----
> From: amd-gfx <[email protected]> On Behalf Of
> Feifei Xu
> Sent: Sunday, October 8, 2023 6:07 PM
> To: [email protected]
> Cc: Xu, Feifei <[email protected]>; Xu, Feifei <[email protected]>;
> Koenig, Christian <[email protected]>; Zhang, Hawking
> <[email protected]>
> Subject: [PATCH] drm/amdgpu:Check gfx poweron when skip flush_gpu_tlb
>
> To fix the gpu recovery failed on GFX11 with gfxhub pagefault, flush gpu tlb
> after reset on GFX11.
> Gfxhub tlb flush need check if adev->gfx.is_poweron set.
>
> Fixes: d0c860f33553 ("drm/amdgpu: rework lock handling for flush_tlb
> v2")
>
> Signed-off-by: Feifei Xu <[email protected]>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> index 2f9bb86edd71..048d32edee88 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> @@ -611,8 +611,9 @@ void amdgpu_gmc_flush_gpu_tlb(struct amdgpu_device *adev,
> uint32_t vmid,
> /*
> * A GPU reset should flush all TLBs anyway, so no need to do
> * this while one is ongoing.
> + * For GFX11, gfxhub flush check if adev->gfx.is_poweron is
> set.
> */
> - if (!down_read_trylock(&adev->reset_domain->sem))
> + if (!down_read_trylock(&adev->reset_domain->sem) &&
> +!adev->gfx.is_poweron)
> return;
>
> [Kevin]:
> Based on your description, the above code should use "||" instead of
> "&&", And after merging code into one line may result in the lock not being
> released if the lock can be acquired success.
>
> Best Regards,
> Kevin
>
> if (adev->gmc.flush_tlb_needs_extra_type_2)
> --
> 2.34.1
>