rocket: Fix iommu_group leak and unsafe IRQ register access

sashiko-bot Tue, 09 Jun 2026 20:14:46 -0700

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) 
to consider:


New issues:
- [High] Runtime PM usage counter underflow due to incorrect error handling of 
`pm_runtime_get_if_active()` in threaded IRQ handler.

Pre-existing issues:
- [High] Returning `IRQ_NONE` for valid DMA error interrupts without clearing 
the interrupt status causes a hardirq storm.
--

commit 5d8993f6abe579de7eec7f4c98365e11dd141bf8
Author: ZhaoJinming <[email protected]>

accel/rocket: Fix iommu_group leak and unsafe IRQ register access

This commit fixes an iommu_group reference leak in the IRQ handling path
and introduces runtime PM checks in the IRQ handlers to prevent unsafe
hardware register access when the device is suspended.

> diff --git a/drivers/accel/rocket/rocket_job.c 
> b/drivers/accel/rocket/rocket_job.c
> index cf9f61682dd8b..6ac79b8517319 100644
> --- a/drivers/accel/rocket/rocket_job.c
> +++ b/drivers/accel/rocket/rocket_job.c
[ ... ]
> @@ -420,7 +420,10 @@ static irqreturn_t rocket_job_irq_handler_thread(int 
> irq, void *data)
>  {
>       struct rocket_core *core = data;
>  
> -     rocket_job_handle_irq(core);
> +     if (pm_runtime_get_if_active(core->dev)) {
> +             rocket_job_handle_irq(core);
> +             pm_runtime_put(core->dev);
> +     }
>  
>       return IRQ_HANDLED;
>  }

[Severity: High]
Does this evaluation handle negative error codes correctly?

pm_runtime_get_if_active() returns 1 if the device is active, 0 if inactive,
and a negative error code (such as -EINVAL) if runtime PM is disabled. Because
any non-zero value evaluates to true in C, a negative return value will satisfy
this condition.

If that happens, the code proceeds to call pm_runtime_put(). Since the initial
get call failed and did not increment the usage counter, wouldn't this
unconditionally decrement the counter and lead to an underflow?

Could this explicitly check for success, such as evaluating if the return value
is strictly equal to 1?

> @@ -428,16 +431,24 @@ static irqreturn_t rocket_job_irq_handler_thread(int 
> irq, void *data)
>  static irqreturn_t rocket_job_irq_handler(int irq, void *data)
>  {
>       struct rocket_core *core = data;
> -     u32 raw_status = rocket_pc_readl(core, INTERRUPT_RAW_STATUS);
> +     u32 raw_status;
> +
> +     if (pm_runtime_get_if_active(core->dev) != 1)
> +             return IRQ_NONE;
> +
> +     raw_status = rocket_pc_readl(core, INTERRUPT_RAW_STATUS);
>  
>       WARN_ON(raw_status & PC_INTERRUPT_RAW_STATUS_DMA_READ_ERROR);
>       WARN_ON(raw_status & PC_INTERRUPT_RAW_STATUS_DMA_WRITE_ERROR);
>  
>       if (!(raw_status & PC_INTERRUPT_RAW_STATUS_DPU_0 ||
> -           raw_status & PC_INTERRUPT_RAW_STATUS_DPU_1))
> +           raw_status & PC_INTERRUPT_RAW_STATUS_DPU_1)) {
> +             pm_runtime_put(core->dev);
>               return IRQ_NONE;
> +     }
>  
>       rocket_pc_writel(core, INTERRUPT_MASK, 0x0);
> +     pm_runtime_put(core->dev);
>  
>       return IRQ_WAKE_THREAD;
>  }

[Severity: High]
This isn't a bug introduced by this patch, but will this cause a hardirq storm
if a DMA error occurs without a DPU completion?

If an interrupt is raised due to a DMA read or write error, the code logs a
warning via WARN_ON. However, if the DPU completion bits are not set, the
handler returns IRQ_NONE without clearing, acknowledging, or masking the
pending interrupt.

Since the interrupt remains asserted at the hardware level, won't the kernel's
IRQ core immediately re-invoke the handler in an infinite loop, hanging the CPU
until the spurious interrupt detector permanently disables the shared IRQ line?

-- 
Sashiko AI review · 
https://sashiko.dev/#/patchset/[email protected]?part=2

Re: [PATCH v4 2/2] accel/rocket: Fix iommu_group leak and unsafe IRQ register access

Reply via email to