Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
Pre-existing issues: - [High] The MMU IRQ handler remains active while device clocks and power domains are disabled during error teardown, causing unclocked MMIO accesses (S-Error). - [High] panfrost_reset() does not mask or synchronize the MMU interrupt before resetting the GPU hardware, leading to a race condition with the MMU IRQ thread. -- commit cac1cfb873df895888c7871605b659cdaf55b78e Author: Adrián Larumbe <[email protected]> Date: Thu Jun 4 17:35:26 2026 +0000 drm/panfrost: Explicitly enable MMU interrupts at device init Because the device must be in a position to accept jobs between the time drm_dev_register() is called and autosuspend first kicks in, there's a very narrow window inbetween during which jobs targeting the tiler buffer object would time out, since the device's PM status is 'Active', but no MMU interrupts were enabled at device initialisation time. > diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c > b/drivers/gpu/drm/panfrost/panfrost_mmu.c > index 4a3162c3b659f..0e1ad6256c2cb 100644 > --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c > +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c [ ... ] > @@ -355,8 +361,7 @@ void panfrost_mmu_reset(struct panfrost_device *pfdev) > > spin_unlock(&pfdev->as_lock); > > - mmu_write(pfdev, MMU_INT_CLEAR, ~0); > - mmu_write(pfdev, MMU_INT_MASK, ~0); > + panfrost_mmu_enable_interrupts(pfdev); > } [Severity: High] This isn't a bug introduced by this patch, but while reviewing the interrupt handling during reset, does panfrost_reset() need to mask and synchronize the MMU interrupt before resetting the hardware? When a job timeout occurs, panfrost_reset() correctly masks and synchronizes the job manager interrupts: drivers/gpu/drm/panfrost/panfrost_job.c:panfrost_reset() { /* Mask job interrupts and synchronize to make sure we won't be * interrupted during our reset. */ job_write(pfdev, JOB_INT_MASK, 0); synchronize_irq(pfdev->js->irq); ... panfrost_device_reset(pfdev, true); ... } However, it does not do the same for the MMU interrupt (pfdev->mmu_irq). Because MMU faults often cause job timeouts, the panfrost_mmu_irq_handler_thread could be actively running concurrently on another CPU. If the IRQ thread attempts to handle an MMU page fault, it will write to MMU registers while panfrost_device_reset() concurrently calls panfrost_gpu_soft_reset() to reset the hardware. Can uncoordinated MMIO accesses to hardware undergoing a soft reset lead to unpredictable behavior or bus errors? [ ... ] > @@ -970,6 +975,8 @@ int panfrost_mmu_init(struct panfrost_device *pfdev) > return err; > } > > + panfrost_mmu_enable_interrupts(pfdev); > + [Severity: High] This is a pre-existing issue, but by explicitly enabling the MMU interrupts here during init, does this expose a race condition in the device initialization error paths? If a subsequent initialization step like panfrost_jm_init() fails, panfrost_device_init() will jump to the out_mmu label: drivers/gpu/drm/panfrost/panfrost_device.c:panfrost_device_init() { ... err = panfrost_jm_init(pfdev); if (err) goto out_mmu; ... out_mmu: panfrost_mmu_fini(pfdev); ... out_clk: panfrost_clk_fini(pfdev); ... } panfrost_mmu_fini() masks the interrupt in hardware, but does not synchronize or disable the IRQ handler in the kernel: void panfrost_mmu_fini(struct panfrost_device *pfdev) { mmu_write(pfdev, MMU_INT_MASK, 0); } Because panfrost_mmu_init() registers a shared threaded IRQ handler using devm_request_threaded_irq(), the handler remains active during the rest of the teardown. If an interrupt fires or the thread runs after panfrost_clk_fini() has disabled the hardware clocks, panfrost_mmu_irq_handler() will read unclocked MMIO registers (e.g., mmu_read(pfdev, MMU_INT_STAT)). Could this lead to a fatal S-Error (synchronous external abort)? > return 0; > } -- Sashiko AI review · https://sashiko.dev/#/patchset/[email protected]?part=7
