[AMD Official Use Only - AMD Internal Distribution Only] Reviewed-by: Hawking Zhang <[email protected]>
Regards, Hawking -----Original Message----- From: Lazar, Lijo <[email protected]> Sent: Monday, June 30, 2025 12:41 To: [email protected] Cc: Zhang, Hawking <[email protected]>; Deucher, Alexander <[email protected]>; Kim, Jonathan <[email protected]>; Zhang, Jesse(Jie) <[email protected]> Subject: [PATCH] drm/amdkfd: Avoid queue reset if disabled If ring reset is disabled, skip resetting queues. Instead, fall back to device based reset. Signed-off-by: Lijo Lazar <[email protected]> --- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c index 76359c6a3f3a..500f51552038 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c @@ -2339,9 +2339,18 @@ static int reset_hung_queues_sdma(struct device_queue_manager *dqm) static int reset_queues_on_hws_hang(struct device_queue_manager *dqm, bool is_sdma) { + struct amdgpu_device *adev = dqm->dev->adev; + while (halt_if_hws_hang) schedule(); + if (adev->debug_disable_gpu_ring_reset) { + dev_info_once(adev->dev, + "%s queue hung, but ring reset disabled", + is_sdma ? "sdma" : "compute"); + + return -EPERM; + } if (!amdgpu_gpu_recovery) return -ENOTRECOVERABLE; -- 2.49.0
