The current SDMA v4.4.2 queue reset logic incorrectly uses GET_INST macro for queue operations, leading to array index out-of-bounds errors when harvesting is enabled. This manifests as UBSAN warnings when stopping queues during reset operations.
[ 306.871518] UBSAN: array-index-out-of-bounds in drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c:118:38 [ 306.871538] index 4294967295 is out of range for type 'uint32_t *[44]' [ 306.871929] amdgpu_sdma_reset_engine+0xe4/0x320 [amdgpu] [ 306.872115] reset_queues_on_hws_hang+0x2dc/0x4d0 [amdgpu] The fix ensures we use physical instance IDs consistently for queue operations while maintaining harvest-aware mapping for register access. Signed-off-by: Jesse Zhang <[email protected]> --- drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c b/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c index 9c169112a5e7..3de125062ee3 100644 --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c @@ -1670,7 +1670,7 @@ static bool sdma_v4_4_2_page_ring_is_guilty(struct amdgpu_ring *ring) static int sdma_v4_4_2_reset_queue(struct amdgpu_ring *ring, unsigned int vmid) { struct amdgpu_device *adev = ring->adev; - u32 id = GET_INST(SDMA0, ring->me); + u32 id = ring->me; int r; if (!(adev->sdma.supported_reset & AMDGPU_RESET_TYPE_PER_QUEUE)) @@ -1686,7 +1686,7 @@ static int sdma_v4_4_2_reset_queue(struct amdgpu_ring *ring, unsigned int vmid) static int sdma_v4_4_2_stop_queue(struct amdgpu_ring *ring) { struct amdgpu_device *adev = ring->adev; - u32 instance_id = GET_INST(SDMA0, ring->me); + u32 instance_id = ring->me; u32 inst_mask; uint64_t rptr; -- 2.34.1
