Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
Pre-existing issues: - [High] The `pm_runtime_get_sync()` error path fails to decrement the runtime PM usage counter, permanently leaking the reference. - [High] Returning an unsignaled dma_fence instead of an error pointer on job submission failure triggers a kernel WARN. -- commit f0df2d483139ad216e9d43923cbf9543a8f4e81c Author: ZhaoJinming <[email protected]> accel/rocket: Fix dma_fence reference leak on error paths in rocket_job_run() In rocket_job_run(), after creating a fence at line 305 and taking an extra reference for job->done_fence at line 311 via dma_fence_get(), two error paths fail to release that extra reference before returning. > diff --git a/drivers/accel/rocket/rocket_job.c > b/drivers/accel/rocket/rocket_job.c > index 2f1861f960ccb..fc10703534e37 100644 > --- a/drivers/accel/rocket/rocket_job.c > +++ b/drivers/accel/rocket/rocket_job.c > @@ -311,12 +311,19 @@ static struct dma_fence *rocket_job_run(struct > drm_sched_job *sched_job) > job->done_fence = dma_fence_get(fence); > > ret = pm_runtime_get_sync(core->dev); > - if (ret < 0) > + if (ret < 0) { > + dma_fence_put(job->done_fence); > + job->done_fence = NULL; [Severity: High] This is a pre-existing issue, but does this error path permanently leak the runtime PM usage counter? pm_runtime_get_sync() increments the device's usage counter before attempting to resume the device. If the resume fails, it returns a negative error code but the usage counter remains incremented. Without explicitly dropping this usage counter (e.g., using pm_runtime_put_noidle()), could repeated failures prevent the NPU from ever suspending again? > return fence; [Severity: High] This is also a pre-existing issue, but will returning the valid, unsignaled fence instead of ERR_PTR(ret) trigger a kernel WARN? Since the hardware wasn't submitted, the fence refcount immediately hits 0 after the DRM scheduler drops its initial reference. dma_fence_release() detects pending callbacks on an unsignaled fence and triggers a loud kernel WARN(1, "Fence ... released with pending signals!\n"), forcibly signaling it with -EDEADLK. Should this path destroy the fence and return ERR_PTR(ret) so the scheduler cleanly aborts the job? > + } > > ret = iommu_attach_group(job->domain->domain, core->iommu_group); > - if (ret < 0) > + if (ret < 0) { > + pm_runtime_put(core->dev); > + dma_fence_put(job->done_fence); > + job->done_fence = NULL; > return fence; [Severity: High] This is a pre-existing issue, but similar to the above, does returning the unsignaled fence here instead of ERR_PTR(ret) cause the same DRM scheduler WARN when the fence is released? > + } -- Sashiko AI review ยท https://sashiko.dev/#/patchset/[email protected]?part=1
