On 4/2/2025 02:37, Christian König wrote:
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index ffca74a476da..3cdb5f8325aa 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -356,7 +356,6 @@ enum amdgpu_kiq_irq { AMDGPU_CP_KIQ_IRQ_DRIVER0 = 0, AMDGPU_CP_KIQ_IRQ_LAST }; -#define SRIOV_USEC_TIMEOUT 1200000 /* wait 12 * 100ms for SRIOV */ #define MAX_KIQ_REG_WAIT 5000 /* in usecs, 5ms */ #define MAX_KIQ_REG_BAILOUT_INTERVAL 5 /* in msecs, 5ms */ #define MAX_KIQ_REG_TRY 1000Unrelated to this patch here, but defines like those *must* have an AMDGPU_ prefix. Please fix in a follow up patch.
Sure. A deeper problem which has led to these macros is the duplication of polling logic across several different files.
We could instead move this code into amdgpu_fence_wait_polling. All clients would then abort early on in_reset or in_interrupt. There are a couple of users with different timeouts (adev->usec_timeout and a hard-coded 2100ms) which could be unified or retained with a fixed 5ms polling interval.
adev->usec_timeout is too low for this particular system under load.
