Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
New issues: - [High] Runtime PM usage counter underflow due to incorrect error handling of `pm_runtime_get_if_active()` in threaded IRQ handler. Pre-existing issues: - [High] Returning `IRQ_NONE` for valid DMA error interrupts without clearing the interrupt status causes a hardirq storm. -- commit 5d8993f6abe579de7eec7f4c98365e11dd141bf8 Author: ZhaoJinming <[email protected]> accel/rocket: Fix iommu_group leak and unsafe IRQ register access This commit fixes an iommu_group reference leak in the IRQ handling path and introduces runtime PM checks in the IRQ handlers to prevent unsafe hardware register access when the device is suspended. > diff --git a/drivers/accel/rocket/rocket_job.c > b/drivers/accel/rocket/rocket_job.c > index cf9f61682dd8b..6ac79b8517319 100644 > --- a/drivers/accel/rocket/rocket_job.c > +++ b/drivers/accel/rocket/rocket_job.c [ ... ] > @@ -420,7 +420,10 @@ static irqreturn_t rocket_job_irq_handler_thread(int > irq, void *data) > { > struct rocket_core *core = data; > > - rocket_job_handle_irq(core); > + if (pm_runtime_get_if_active(core->dev)) { > + rocket_job_handle_irq(core); > + pm_runtime_put(core->dev); > + } > > return IRQ_HANDLED; > } [Severity: High] Does this evaluation handle negative error codes correctly? pm_runtime_get_if_active() returns 1 if the device is active, 0 if inactive, and a negative error code (such as -EINVAL) if runtime PM is disabled. Because any non-zero value evaluates to true in C, a negative return value will satisfy this condition. If that happens, the code proceeds to call pm_runtime_put(). Since the initial get call failed and did not increment the usage counter, wouldn't this unconditionally decrement the counter and lead to an underflow? Could this explicitly check for success, such as evaluating if the return value is strictly equal to 1? > @@ -428,16 +431,24 @@ static irqreturn_t rocket_job_irq_handler_thread(int > irq, void *data) > static irqreturn_t rocket_job_irq_handler(int irq, void *data) > { > struct rocket_core *core = data; > - u32 raw_status = rocket_pc_readl(core, INTERRUPT_RAW_STATUS); > + u32 raw_status; > + > + if (pm_runtime_get_if_active(core->dev) != 1) > + return IRQ_NONE; > + > + raw_status = rocket_pc_readl(core, INTERRUPT_RAW_STATUS); > > WARN_ON(raw_status & PC_INTERRUPT_RAW_STATUS_DMA_READ_ERROR); > WARN_ON(raw_status & PC_INTERRUPT_RAW_STATUS_DMA_WRITE_ERROR); > > if (!(raw_status & PC_INTERRUPT_RAW_STATUS_DPU_0 || > - raw_status & PC_INTERRUPT_RAW_STATUS_DPU_1)) > + raw_status & PC_INTERRUPT_RAW_STATUS_DPU_1)) { > + pm_runtime_put(core->dev); > return IRQ_NONE; > + } > > rocket_pc_writel(core, INTERRUPT_MASK, 0x0); > + pm_runtime_put(core->dev); > > return IRQ_WAKE_THREAD; > } [Severity: High] This isn't a bug introduced by this patch, but will this cause a hardirq storm if a DMA error occurs without a DPU completion? If an interrupt is raised due to a DMA read or write error, the code logs a warning via WARN_ON. However, if the DPU completion bits are not set, the handler returns IRQ_NONE without clearing, acknowledging, or masking the pending interrupt. Since the interrupt remains asserted at the hardware level, won't the kernel's IRQ core immediately re-invoke the handler in an infinite loop, hanging the CPU until the spurious interrupt detector permanently disables the shared IRQ line? -- Sashiko AI review · https://sashiko.dev/#/patchset/[email protected]?part=2
