[AMD Official Use Only - AMD Internal Distribution Only]

Reviewed-by: Hawking Zhang <[email protected]>

Regards,
Hawking
-----Original Message-----
From: Lazar, Lijo <[email protected]>
Sent: Monday, June 30, 2025 12:41
To: [email protected]
Cc: Zhang, Hawking <[email protected]>; Deucher, Alexander 
<[email protected]>; Kim, Jonathan <[email protected]>; Zhang, 
Jesse(Jie) <[email protected]>
Subject: [PATCH] drm/amdkfd: Avoid queue reset if disabled

If ring reset is disabled, skip resetting queues. Instead, fall back to device 
based reset.

Signed-off-by: Lijo Lazar <[email protected]>
---
 drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index 76359c6a3f3a..500f51552038 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -2339,9 +2339,18 @@ static int reset_hung_queues_sdma(struct 
device_queue_manager *dqm)

 static int reset_queues_on_hws_hang(struct device_queue_manager *dqm, bool 
is_sdma)  {
+       struct amdgpu_device *adev = dqm->dev->adev;
+
        while (halt_if_hws_hang)
                schedule();

+       if (adev->debug_disable_gpu_ring_reset) {
+               dev_info_once(adev->dev,
+                             "%s queue hung, but ring reset disabled",
+                             is_sdma ? "sdma" : "compute");
+
+               return -EPERM;
+       }
        if (!amdgpu_gpu_recovery)
                return -ENOTRECOVERABLE;

--
2.49.0

Reply via email to