[Public]

Reviewed-by: Yiqing Yao <[email protected]>
Tested-by: Yiqing Yao <[email protected]>

Thanks,
Yiqing(James).

________________________________
From: amd-gfx <[email protected]> on behalf of Lijo Lazar 
<[email protected]>
Sent: Monday, December 9, 2024 11:52 PM
To: [email protected] <[email protected]>
Cc: Zhang, Hawking <[email protected]>; Deucher, Alexander 
<[email protected]>; Zhou1, Tao <[email protected]>; Skvortsov, Victor 
<[email protected]>; Zhao, Victor <[email protected]>; Tomasevic, 
Vojislav <[email protected]>
Subject: [PATCH] drm/amdgpu: Avoid VF for RAS recovery source check

VF device sets the RAS flag when mailbox data can't be read properly.
There is no conclusive way to tell if the real source is RAS error.
Therefore VF schedules a KFD based reset which doesn't set RAS source.
SKip checking RAS source for any VF scheduled recovery.

Signed-off-by: Lijo Lazar <[email protected]>
Reported-by: Vojislav Tomasevic <[email protected]>

Fixes: 2211660c20a0 ("drm/amdgpu: Prefer RAS recovery for scheduler hang")
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 735a01c58cd7..eb3fd55a3702 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -5864,6 +5864,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
          * detected at the same time, let RAS recovery take care of it.
          */
         if (amdgpu_ras_is_err_state(adev, AMDGPU_RAS_BLOCK__ANY) &&
+           !amdgpu_sriov_vf(adev) &&
             reset_context->src != AMDGPU_RESET_SRC_RAS) {
                 dev_dbg(adev->dev,
                         "Gpu recovery from source: %d yielding to RAS error 
recovery handling",
--
2.25.1

Reply via email to