Thank you for your contribution! Sashiko AI review found 1 potential issue(s) 
to consider:

Pre-existing issues:
- [High] Memory leak of the dynamically allocated `scheds` array in 
`rocket_job_open()` on error paths and when `num_cores == 1`.
--

commit 49abb67c60d7da03b194fc958eb96f281bc20ffc
Author: Maíra Canal <[email protected]>

drm/rocket: Drop the dedicated reset workqueue

Each core allocated an ordered workqueue to serialize timeout handlers, which
is unnecessary since each core is reset independently. With the reset worker
gone, the workqueue served solely as the timeout_wq. This patch drops the
dedicated workqueue and lets drm_sched fall back to the default timeout
workqueue.

> diff --git a/drivers/accel/rocket/rocket_job.c 
> b/drivers/accel/rocket/rocket_job.c
> index 871041eb7a1dc..1a454559ecf49 100644
> --- a/drivers/accel/rocket/rocket_job.c
> +++ b/drivers/accel/rocket/rocket_job.c

[ ... ]

>  void rocket_job_fini(struct rocket_core *core)
>  {
>       drm_sched_fini(&core->sched);
> -
> -     destroy_workqueue(core->reset.wq);
>  }
>  
>  int rocket_job_open(struct rocket_file_priv *rocket_priv)

[Severity: High]
This isn't a bug introduced by this patch, but is there a memory leak of the
dynamically allocated scheds array on error paths and when rdev->num_cores
equals 1?

Looking at drivers/accel/rocket/rocket_job.c:rocket_job_open():

        struct drm_gpu_scheduler **scheds = kmalloc_objs(*scheds,
                                                         rdev->num_cores);
        ...
        ret = drm_sched_entity_init(&rocket_priv->sched_entity,
                                    DRM_SCHED_PRIORITY_NORMAL,
                                    scheds,
                                    rdev->num_cores, NULL);
        if (WARN_ON(ret))
                return ret;

If drm_sched_entity_init() fails, the function returns directly and the
scheds array is leaked.

Additionally, if drm_sched_entity_init() succeeds but rdev->num_cores == 1,
drm_sched_entity_init() explicitly drops the reference by setting
entity->sched_list to NULL:

drivers/gpu/drm/scheduler/sched_entity.c:drm_sched_entity_init() {
        ...
        entity->sched_list = num_sched_list > 1 ? sched_list : NULL;
        ...
}

Because rocket_job_close() relies on entity->sched_list to free the memory:

drivers/accel/rocket/rocket_job.c:rocket_job_close() {
        ...
        kfree(entity->sched_list);
        ...
}

Calling kfree() with NULL resolves cleanly but permanently loses the original
scheds array allocation. Can this be used to trigger kernel memory exhaustion
by repeatedly calling open() and close() on the DRM device file?

-- 
Sashiko AI review · 
https://sashiko.dev/#/patchset/[email protected]?part=3

Reply via email to