On 6/11/26 16:37, [email protected] wrote:
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) 
to consider:

Pre-existing issues:
- [High] Incomplete Fix / Kernel panic on module load error path: If `gpu_buddy_init()` fails, 
it cleans up its internal allocations but leaves `mm->n_roots > 0` and 
`mm->free_trees` and `mm->roots` as dangling or NULL pointers. Because 
`adev->mman.initialized` is set to `true` prior to calling `amdgpu_vram_mgr_init()`, a 
failure in `gpu_buddy_init()` unconditionally triggers the driver teardown path. 
`amdgpu_vram_mgr_fini()` then invokes `gpu_buddy_fini()`, which dereferences the invalid 
pointers and causes a kernel panic.

Pre-existing issues are out of scope for this series, except when they are needed for the new functionality.

/Thomas



--

commit 88a159aad4fc080127cdbd3bc127dfaae0ae9c17
Author: Thomas Hellström <[email protected]>

drm/amdgpu: Fix init ordering in amdgpu_vram_mgr_init()

This commit fixes kernel panics on the module load error path by moving
drmm_cgroup_register_region() after the list and buddy allocator are fully
initialized.

Closes: 
https://sashiko.dev/#/patchset/[email protected]?part=4

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
index 2a241a5b12c42..ac3f71d77140a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
[ ... ]
@@ -935,6 +932,10 @@ int amdgpu_vram_mgr_init(struct amdgpu_device *adev)
        if (err)
                return err;
[Severity: High]
This is a pre-existing issue, but if gpu_buddy_init() fails, does it still
leave the system vulnerable to the same module load error path panic?

Looking at drivers/gpu/buddy.c, if gpu_buddy_init() fails after setting
mm->n_roots, the error path cleans up allocations but leaves mm->n_roots
populated and mm->roots as a dangling or NULL pointer:

drivers/gpu/buddy.c:gpu_buddy_init() {
     ...
     mm->n_roots = hweight64(size);
     ...
out_free_tree:
     while (i--)
         kfree(mm->free_trees[i]);
     kfree(mm->free_trees);
     return -ENOMEM;
}

Since adev->mman.initialized is set to true before amdgpu_vram_mgr_init() is
called, returning this error will trigger the teardown path where
amdgpu_vram_mgr_fini() unconditionally calls gpu_buddy_fini():

drivers/gpu/buddy.c:gpu_buddy_fini() {
     ...
     for (i = 0; i < mm->n_roots; ++i) {
         start = gpu_buddy_block_offset(mm->roots[i]);
     ...
}

Because mm->n_roots is non-zero, will this unconditionally dereference the
invalid mm->roots pointer and cause a panic?

+       man->cg = drmm_cgroup_register_region(adev_to_drm(adev), "vram", 
adev->gmc.real_vram_size);
+       if (IS_ERR(man->cg))
+               return PTR_ERR(man->cg);
+
        ttm_set_driver_manager(&adev->mman.bdev, TTM_PL_VRAM, &mgr->manager);
        ttm_resource_manager_set_used(man, true);
        return 0;

Reply via email to