On Wed, Mar 4, 2026 at 3:08 PM Mario Limonciello
<[email protected]> wrote:
>
> When GPU initialization fails due to an unsupported HW block
> IP blocks may have a NULL version pointer. During cleanup in
> amdgpu_device_fini_hw, the code calls amdgpu_device_set_pg_state and
> amdgpu_device_set_cg_state which iterate over all IP blocks and access
> adev->ip_blocks[i].version without NULL checks, leading to a kernel
> NULL pointer dereference.
>
> Add NULL checks for adev->ip_blocks[i].version in both
> amdgpu_device_set_cg_state and amdgpu_device_set_pg_state to prevent
> dereferencing NULL pointers during GPU teardown when initialization has
> failed.
>
> Signed-off-by: Mario Limonciello <[email protected]>
I think probably:
Fixes: 39fc2bc4da00 ("drm/amdgpu: Protect GPU register accesses in
powergated state in some paths")
Reviewed-by: Alex Deucher <[email protected]>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 5c24369821e47..258391ddee7c9 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -3258,6 +3258,8 @@ int amdgpu_device_set_cg_state(struct amdgpu_device
> *adev,
> i = state == AMD_CG_STATE_GATE ? j : adev->num_ip_blocks - j
> - 1;
> if (!adev->ip_blocks[i].status.late_initialized)
> continue;
> + if (!adev->ip_blocks[i].version)
> + continue;
> /* skip CG for GFX, SDMA on S0ix */
> if (adev->in_s0ix &&
> (adev->ip_blocks[i].version->type ==
> AMD_IP_BLOCK_TYPE_GFX ||
> @@ -3297,6 +3299,8 @@ int amdgpu_device_set_pg_state(struct amdgpu_device
> *adev,
> i = state == AMD_PG_STATE_GATE ? j : adev->num_ip_blocks - j
> - 1;
> if (!adev->ip_blocks[i].status.late_initialized)
> continue;
> + if (!adev->ip_blocks[i].version)
> + continue;
> /* skip PG for GFX, SDMA on S0ix */
> if (adev->in_s0ix &&
> (adev->ip_blocks[i].version->type ==
> AMD_IP_BLOCK_TYPE_GFX ||
> --
> 2.53.0
>