[AMD Official Use Only - General]
Yes, adev->gfx.is_poweron check will be applied in gmc_v11 callback, which will
be called after the generic gmc part: amdgpu_gmc_flush_gpu_tlb() function.
But in commit: d0c860f33553 ("drm/amdgpu: rework lock handling for flush_tlb
v2"), the flush is moved at a higher level amdgpu_gmc_flush_gpu_tlb() function.
Thus the gmc_v11 callback will never be called in the resume because
adev->reset_domain->sem not released and returned ahead. Adding a check of
adev->gfx.is_poweron will let GFX11 not breaking ahead, like following:
if (!down_read_trylock(&adev->reset_domain->sem) && //--> true in gfx11
+!adev->gfx.is_poweron) //-->false in gfx11, and the whole if statement
will be false, not return ahead. The following gmc v11 callback will be
executed later.
Thanks,
Feifei
-----Original Message-----
From: Zhang, Hawking <[email protected]>
Sent: Monday, October 9, 2023 4:58 PM
To: Xu, Feifei <[email protected]>; Wang, Yang(Kevin) <[email protected]>;
[email protected]
Cc: Koenig, Christian <[email protected]>
Subject: RE: [PATCH] drm/amdgpu:Check gfx poweron when skip flush_gpu_tlb
[AMD Official Use Only - General]
adev->gfx.is_poweron check should already be applied in IP specific (gmc v11)
callback. If gfx is not power on, it does nothing but just returns. I didn't
see how it helps resolve the issue if we just move the check from one function
to another.
Regards,
Hawking
-----Original Message-----
From: Xu, Feifei <[email protected]>
Sent: Monday, October 9, 2023 09:51
To: Wang, Yang(Kevin) <[email protected]>; [email protected]
Cc: Koenig, Christian <[email protected]>; Zhang, Hawking
<[email protected]>
Subject: RE: [PATCH] drm/amdgpu:Check gfx poweron when skip flush_gpu_tlb
[AMD Official Use Only - General]
Hi,
>> Based on your description, the above code should use "||" instead of
>> "&&",
&& is to add more restriction here. To avoid skipping necessary TLB flush by
return.
For Asics < GFX11, !adev->gfx.is_poweron is always true (this paremeter is
intrudoced from GFX11), only depends on reset_domain->sem; For Asics = GFX11,
!adev->gfx.is_poweron might be false (which gfx might already poweron in the
reset), this will make the if () not ture, return will not be executed, thus
flush TLB.
>> And after merging code into one line may result in the lock not being
>> released if the lock can be acquired success.
If !adev->gfx.is_poweron is true, the reset_domin->sem will not be
down_read_trylock, thus could avoid this deadlock.
Thanks,
Feifei
-----Original Message-----
From: Wang, Yang(Kevin) <[email protected]>
Sent: Sunday, October 8, 2023 9:36 PM
To: Xu, Feifei <[email protected]>; [email protected]
Cc: Xu, Feifei <[email protected]>; Xu, Feifei <[email protected]>; Koenig,
Christian <[email protected]>; Zhang, Hawking <[email protected]>
Subject: RE: [PATCH] drm/amdgpu:Check gfx poweron when skip flush_gpu_tlb
-----Original Message-----
From: amd-gfx <[email protected]> On Behalf Of Feifei Xu
Sent: Sunday, October 8, 2023 6:07 PM
To: [email protected]
Cc: Xu, Feifei <[email protected]>; Xu, Feifei <[email protected]>; Koenig,
Christian <[email protected]>; Zhang, Hawking <[email protected]>
Subject: [PATCH] drm/amdgpu:Check gfx poweron when skip flush_gpu_tlb
To fix the gpu recovery failed on GFX11 with gfxhub pagefault, flush gpu tlb
after reset on GFX11.
Gfxhub tlb flush need check if adev->gfx.is_poweron set.
Fixes: d0c860f33553 ("drm/amdgpu: rework lock handling for flush_tlb v2")
Signed-off-by: Feifei Xu <[email protected]>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
index 2f9bb86edd71..048d32edee88 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
@@ -611,8 +611,9 @@ void amdgpu_gmc_flush_gpu_tlb(struct amdgpu_device *adev,
uint32_t vmid,
/*
* A GPU reset should flush all TLBs anyway, so no need to do
* this while one is ongoing.
+ * For GFX11, gfxhub flush check if adev->gfx.is_poweron is set.
*/
- if (!down_read_trylock(&adev->reset_domain->sem))
+ if (!down_read_trylock(&adev->reset_domain->sem) &&
+!adev->gfx.is_poweron)
return;
[Kevin]:
Based on your description, the above code should use "||" instead of "&&", And
after merging code into one line may result in the lock not being released if
the lock can be acquired success.
Best Regards,
Kevin
if (adev->gmc.flush_tlb_needs_extra_type_2)
--
2.34.1