amdgpu: Fix SDMA queue reset array out-of-bounds access

Jesse Zhang Tue, 10 Jun 2025 22:56:27 -0700

The current SDMA v4.4.2 queue reset logic incorrectly uses GET_INST
macro for queue operations, leading to array index out-of-bounds
errors when harvesting is enabled. This manifests as UBSAN warnings
when stopping queues during reset operations.


[  306.871518] UBSAN: array-index-out-of-bounds in 
drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c:118:38
[  306.871538] index 4294967295 is out of range for type 'uint32_t *[44]'
[  306.871929]  amdgpu_sdma_reset_engine+0xe4/0x320 [amdgpu]
[  306.872115]  reset_queues_on_hws_hang+0x2dc/0x4d0 [amdgpu]

The fix ensures we use physical instance IDs consistently for queue
operations while maintaining harvest-aware mapping for register access.

Signed-off-by: Jesse Zhang <[email protected]>
---
 drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c 
b/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c
index 9c169112a5e7..3de125062ee3 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c
@@ -1670,7 +1670,7 @@ static bool sdma_v4_4_2_page_ring_is_guilty(struct 
amdgpu_ring *ring)
 static int sdma_v4_4_2_reset_queue(struct amdgpu_ring *ring, unsigned int vmid)
 {
        struct amdgpu_device *adev = ring->adev;
-       u32 id = GET_INST(SDMA0, ring->me);
+       u32 id = ring->me;
        int r;
 
        if (!(adev->sdma.supported_reset & AMDGPU_RESET_TYPE_PER_QUEUE))
@@ -1686,7 +1686,7 @@ static int sdma_v4_4_2_reset_queue(struct amdgpu_ring 
*ring, unsigned int vmid)
 static int sdma_v4_4_2_stop_queue(struct amdgpu_ring *ring)
 {
        struct amdgpu_device *adev = ring->adev;
-       u32 instance_id = GET_INST(SDMA0, ring->me);
+       u32 instance_id = ring->me;
        u32 inst_mask;
        uint64_t rptr;
 
-- 
2.34.1

[PATCH 2/2] drm/amdgpu: Fix SDMA queue reset array out-of-bounds access

Reply via email to