On 13.10.25 08:21, Jesse.Zhang wrote:
> APU platforms (identified by `adev->gmc.is_app_apu`) do not initialize 
> dedicated
> VRAM management structures (`adev->mman.vram_mgr.manager`) because they rely 
> on
> system memory instead of discrete VRAM. Accessing this uninitialized structure
> via `ttm_resource_manager_usage()` triggers a NULL pointer dereference 
> (typically
> in `_raw_spin_lock()` when trying to acquire the manager's lock), leading to
> kernel OOPS—especially when tools like rocm-smi query VRAM usage or during
> power/VM operations.
> 
> Fix this by adding explicit APU checks in all code paths that access VRAM
> manager structures:
> 
> 1. **amdgpu_cs.c**: Extend the existing bandwidth control check in
>    `amdgpu_cs_get_threshold_for_moves()` to include APU devices. Return 0 for
>    migration thresholds immediately, skipping VRAM-specific logic that would
>    access uninitialized data.
> 
> 2. **amdgpu_kms.c**: Modify the `AMDGPU_INFO_VRAM_USAGE` ioctl and memory info
>    reporting to return 0 for VRAM usage on APUs. This avoids calling
>    `ttm_resource_manager_usage()` with an invalid manager pointer.
> 
> 3. **amdgpu_virt.c**: Skip VRAM usage calculation for APUs when writing vf2pf
>    (virtual function to physical function) data. Use 0 for `fb_usage` since 
> APUs
>    have no discrete framebuffer memory to report.
> 
> These changes ensure APUs never access uninitialized VRAM manager structures,
> resolving the NULL dereference while preserving correct behavior for discrete
> GPUs (which retain full VRAM usage tracking).
> 
> Signed-off-by: Jesse Zhang <[email protected]>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   | 2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c  | 4 ++--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 2 +-
>  3 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> index 5f515fdcc775..d80414b32015 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> @@ -709,7 +709,7 @@ static void amdgpu_cs_get_threshold_for_moves(struct 
> amdgpu_device *adev,
>        */
>       const s64 us_upper_bound = 200000;
>  
> -     if (!adev->mm_stats.log2_max_MBps) {
> +     if ((!adev->mm_stats.log2_max_MBps) || adev->gmc.is_app_apu) {
>               *max_bytes = 0;
>               *max_vis_bytes = 0;
>               return;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> index a9327472c651..e6bf9f6a2713 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> @@ -758,7 +758,7 @@ int amdgpu_info_ioctl(struct drm_device *dev, void *data, 
> struct drm_file *filp)
>               ui64 = atomic64_read(&adev->num_vram_cpu_page_faults);
>               return copy_to_user(out, &ui64, min(size, 8u)) ? -EFAULT : 0;
>       case AMDGPU_INFO_VRAM_USAGE:
> -             ui64 = ttm_resource_manager_usage(&adev->mman.vram_mgr.manager);
> +             ui64 = adev->gmc.is_app_apu ? 0 : 
> ttm_resource_manager_usage(&adev->mman.vram_mgr.manager);

Please use ttm_resource_manager_used(&adev->mman.vram_mgr.manager) instead of 
checking the adev->gmc.is_app_apu flag.

It could be that we will get more use cases for not having the VRAM manager 
initialized.

Apart from that good catch.

Thanks,
Christian.

>               return copy_to_user(out, &ui64, min(size, 8u)) ? -EFAULT : 0;
>       case AMDGPU_INFO_VIS_VRAM_USAGE:
>               ui64 = amdgpu_vram_mgr_vis_usage(&adev->mman.vram_mgr);
> @@ -805,7 +805,7 @@ int amdgpu_info_ioctl(struct drm_device *dev, void *data, 
> struct drm_file *filp)
>                       atomic64_read(&adev->vram_pin_size) -
>                       AMDGPU_VM_RESERVED_VRAM;
>               mem.vram.heap_usage =
> -                     ttm_resource_manager_usage(vram_man);
> +                     adev->gmc.is_app_apu ? 0 : 
> ttm_resource_manager_usage(vram_man);
>               mem.vram.max_allocation = mem.vram.usable_heap_size * 3 / 4;
>  
>               mem.cpu_accessible_vram.total_heap_size =
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> index 3328ab63376b..5ff856bef199 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> @@ -599,7 +599,7 @@ static int amdgpu_virt_write_vf2pf_data(struct 
> amdgpu_device *adev)
>       vf2pf_info->os_info.all = 0;
>  
>       vf2pf_info->fb_usage =
> -             ttm_resource_manager_usage(&adev->mman.vram_mgr.manager) >> 20;
> +             adev->gmc.is_app_apu ? 0 : 
> ttm_resource_manager_usage(&adev->mman.vram_mgr.manager) >> 20;
>       vf2pf_info->fb_vis_usage =
>               amdgpu_vram_mgr_vis_usage(&adev->mman.vram_mgr) >> 20;
>       vf2pf_info->fb_size = adev->gmc.real_vram_size >> 20;

Reply via email to