Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
Pre-existing issues: - [High] IOMMU group reference leak in rocket_job_handle_irq() - [High] Unsafe hardware register access in shared IRQ handler while runtime suspended -- commit fca53bbe333680ce9d4e0368b39e684506994081 Author: ZhaoJinming <[email protected]> Subject: accel/rocket: Fix fence leak, runtime PM leak and WARN in rocket_job_run() error paths This commit fixes multiple resource leaks in the error paths of rocket_job_run(). It replaces pm_runtime_get_sync() with pm_runtime_resume_and_get() to handle the usage counter, correctly releases dma_fence references on failure, and adds pm_runtime_put() during IOMMU attachment failures. > diff --git a/drivers/accel/rocket/rocket_job.c > b/drivers/accel/rocket/rocket_job.c > index 2f1861f960ccb..cf9f61682dd8b 100644 > --- a/drivers/accel/rocket/rocket_job.c > +++ b/drivers/accel/rocket/rocket_job.c [ ... ] > @@ -310,13 +310,22 @@ static struct dma_fence *rocket_job_run(struct > drm_sched_job *sched_job) > dma_fence_put(job->done_fence); > job->done_fence = dma_fence_get(fence); > > - ret = pm_runtime_get_sync(core->dev); > - if (ret < 0) > - return fence; > + ret = pm_runtime_resume_and_get(core->dev); > + if (ret < 0) { > + dma_fence_put(job->done_fence); > + job->done_fence = NULL; > + dma_fence_put(fence); > + return ERR_PTR(ret); > + } > > ret = iommu_attach_group(job->domain->domain, core->iommu_group); > - if (ret < 0) > - return fence; > + if (ret < 0) { > + pm_runtime_put(core->dev); > + dma_fence_put(job->done_fence); > + job->done_fence = NULL; > + dma_fence_put(fence); > + return ERR_PTR(ret); > + } > > scoped_guard(mutex, &core->job_lock) { > core->in_flight_job = job; [ ... ] > static void rocket_job_handle_irq(struct rocket_core *core) > { > pm_runtime_mark_last_busy(core->dev); > > rocket_pc_writel(core, OPERATION_ENABLE, 0x0); > rocket_pc_writel(core, INTERRUPT_CLEAR, 0x1ffff); > > scoped_guard(mutex, &core->job_lock) > if (core->in_flight_job) { > if (core->in_flight_job->next_task_idx < > core->in_flight_job->task_count) { > rocket_job_hw_submit(core, core->in_flight_job); > return; > } > > iommu_detach_group(NULL, iommu_group_get(core->dev)); [Severity: High] This isn't a bug introduced by this patch, but does this leak an IOMMU group reference? It looks like iommu_group_get() increments the kobject reference count for the group. However, iommu_detach_group() doesn't consume this reference, and there is no subsequent iommu_group_put() call. Since this runs on every completed job, could this eventually overflow the reference counter and prevent the group from being freed? [ ... ] > static irqreturn_t rocket_job_irq_handler(int irq, void *data) > { > struct rocket_core *core = data; > u32 raw_status = rocket_pc_readl(core, INTERRUPT_RAW_STATUS); [Severity: High] This is a pre-existing issue, but is it safe to access hardware registers here without first checking if the device is active? Since the driver registers this as a shared interrupt handler, it could be invoked if another device on the same IRQ line triggers an interrupt while the NPU device is runtime suspended. Reading from a powered-down device typically returns 0xffffffff, which would spuriously trigger the WARN_ON macros below and falsely return IRQ_WAKE_THREAD. Should this use pm_runtime_get_if_active() to verify the device is powered before reading INTERRUPT_RAW_STATUS? > > WARN_ON(raw_status & PC_INTERRUPT_RAW_STATUS_DMA_READ_ERROR); > WARN_ON(raw_status & PC_INTERRUPT_RAW_STATUS_DMA_WRITE_ERROR); -- Sashiko AI review ยท https://sashiko.dev/#/patchset/[email protected]?part=1
