Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
Pre-existing issues: - [High] Calling `drm_sched_job_cleanup()` after `drm_sched_job_arm()` violates the DRM scheduler API contract. - [High] The runtime suspend callback fails to synchronize the shared IRQ before powering down. - [High] Potential NULL pointer dereference in `iommu_attach_group()` if the device lacks an IOMMU group. -- commit a91812825621124e6ae3a091adc6205cbeefed80 Author: ZhaoJinming <[email protected]> accel/rocket: Fix error path handling in rocket_job_run() This commit fixes multiple issues in the error paths of rocket_job_run(). It resolves a dma_fence reference leak, correctly balances the runtime PM usage counter on failure by switching to pm_runtime_resume_and_get(), and returns ERR_PTR(ret) instead of an unsignaled fence to avoid DRM scheduler warnings. > diff --git a/drivers/accel/rocket/rocket_job.c > b/drivers/accel/rocket/rocket_job.c > index 2f1861f960ccb..cf9f61682dd8b 100644 > --- a/drivers/accel/rocket/rocket_job.c > +++ b/drivers/accel/rocket/rocket_job.c > @@ -310,13 +310,22 @@ static struct dma_fence *rocket_job_run(struct > drm_sched_job *sched_job) [ ... ] > ret = iommu_attach_group(job->domain->domain, core->iommu_group); > - if (ret < 0) > - return fence; > + if (ret < 0) { > + pm_runtime_put(core->dev); > + dma_fence_put(job->done_fence); > + job->done_fence = NULL; > + dma_fence_put(fence); > + return ERR_PTR(ret); > + } [Severity: High] This is a pre-existing issue, but can core->iommu_group be NULL here? If the device does not have an IOMMU group (e.g., IOMMU is disabled or not present in the device tree), iommu_group_get() returns NULL. Calling iommu_attach_group() unconditionally with a NULL group parameter will cause a kernel crash when it attempts to dereference the pointer. Should there be a check for a valid iommu_group before attaching? [ ... ] I also noticed a couple of pre-existing issues in this driver: [Severity: High] This isn't a bug introduced by this patch, but is it safe to call drm_sched_job_cleanup() after drm_sched_job_arm() has already been called? In drivers/accel/rocket/rocket_job.c:rocket_job_push(), if rocket_acquire_object_fences() fails: drm_sched_job_arm(&job->base); ... ret = rocket_acquire_object_fences(job->in_bos, job->in_bo_count, &job->base, false); if (ret) goto err_unlock; The error path returns to rocket_ioctl_submit_job(), which then calls: out_cleanup_job: if (ret) drm_sched_job_cleanup(&rjob->base); The DRM scheduler API contract states that drm_sched_job_arm() is a point of no return that initializes fences and sequence numbers. Calling drm_sched_job_cleanup() after this breaks the dma_fence sequence numbers, potentially causing any context waiting on the aborted sequence number to hang indefinitely. [Severity: High] This is a pre-existing issue, but does the runtime suspend callback need to synchronize the shared IRQ before powering down? In drivers/accel/rocket/rocket_drv.c:rocket_device_runtime_suspend(): clk_bulk_disable_unprepare(ARRAY_SIZE(rdev->cores[core].clks), rdev->cores[core].clks); The Rocket driver registers its IRQ handler with IRQF_SHARED. According to the power management subsystem rules, drivers using shared IRQs must call synchronize_irq() in their runtime suspend callback before disabling clocks. Without this, could the hardware be powered off while the IRQ handler is mid-flight and reading registers, potentially leading to system hangs or spurious wakeups? -- Sashiko AI review ยท https://sashiko.dev/#/patchset/[email protected]?part=1
