Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
Pre-existing issues: - [High] Unhandled error from malidp_runtime_pm_resume() and pm_runtime_get_sync() leads to kernel panic. - [High] Missing synchronize_irq() in the runtime PM suspend path for shared IRQs causes a race condition. - [High] Asynchronous pm_runtime_put() races with pm_runtime_disable(), leaving the hardware in an incorrect power state. -- commit 15b02ca295e5d2b2ae68c0ebcf65a7fa13cad1b6 Author: Gustavo Kenji Mendonça Kaneko <[email protected]> drm/arm/malidp: use clk_bulk API in runtime PM resume and suspend malidp_runtime_pm_resume() calls clk_prepare_enable() three times without checking the return value. If any clock fails to enable, the driver silently proceeds with unclocked hardware, leading to undefined behavior. > diff --git a/drivers/gpu/drm/arm/malidp_drv.c > b/drivers/gpu/drm/arm/malidp_drv.c > index 9abe800f598ac..23fa942ae4bbe 100644 > --- a/drivers/gpu/drm/arm/malidp_drv.c > +++ b/drivers/gpu/drm/arm/malidp_drv.c > @@ -670,6 +670,11 @@ static int malidp_runtime_pm_suspend(struct device *dev) [ ... ] > malidp_se_irq_fini(hwdev); > malidp_de_irq_fini(hwdev); > hwdev->pm_suspended = true; > - clk_disable_unprepare(hwdev->mclk); > - clk_disable_unprepare(hwdev->aclk); > - clk_disable_unprepare(hwdev->pclk); > + clk_bulk_disable_unprepare(ARRAY_SIZE(clks), clks); [Severity: High] This is a pre-existing issue, but does this missing synchronize_irq() in the runtime PM suspend path for shared IRQs cause a race condition? The driver registers its DE and SE interrupts with IRQF_SHARED. If a shared interrupt triggers concurrently on another CPU, could the IRQ handler (malidp_se_irq() or malidp_de_irq()) pass the pm_suspended check just before the clocks are disabled here, and then attempt to read hardware registers while the clocks are off, resulting in a synchronous external abort? > return 0; > } > @@ -689,10 +692,17 @@ static int malidp_runtime_pm_resume(struct device *dev) [ ... ] > + err = clk_bulk_prepare_enable(ARRAY_SIZE(clks), clks); > + if (err) > + return err; [Severity: High] This isn't a bug introduced by this patch, but since we are now correctly propagating the clock enablement error from malidp_runtime_pm_resume(), do we need to update the callers to handle it? In malidp_bind(), the code ignores the return value: if (pm_runtime_enabled(dev)) pm_runtime_get_sync(dev); else malidp_runtime_pm_resume(dev); ret = hwdev->hw->query_hw(hwdev); If clock enablement fails, it seems the driver unconditionally proceeds to read hardware registers (e.g., in query_hw()). Could this cause a synchronous external abort when it accesses unclocked hardware? Similarly, malidp_pm_resume_early() does not check the return value and proceeds to mark the device as active: static int __maybe_unused malidp_pm_resume_early(struct device *dev) { malidp_runtime_pm_resume(dev); pm_runtime_set_active(dev); return 0; } [Severity: High] This is a pre-existing issue, but does the asynchronous pm_runtime_put() race with pm_runtime_disable() in the driver removal paths? In malidp_bind() (on the query_hw_fail error unroll path): query_hw_fail: pm_runtime_put(dev); if (pm_runtime_enabled(dev)) pm_runtime_disable(dev); else malidp_runtime_pm_suspend(dev); And similarly in malidp_unbind(): pm_runtime_put(dev); if (pm_runtime_enabled(dev)) pm_runtime_disable(dev); else malidp_runtime_pm_suspend(dev); Because pm_runtime_put() queues an asynchronous idle notification, the subsequent pm_runtime_disable() barrier could cancel the pending suspend request before it completes. Could this leave the device's clocks enabled indefinitely after the driver is unloaded? Should this use pm_runtime_put_sync() instead? > > - clk_prepare_enable(hwdev->pclk); > - clk_prepare_enable(hwdev->aclk); > - clk_prepare_enable(hwdev->mclk); > hwdev->pm_suspended = false; > malidp_de_irq_hw_init(hwdev); > malidp_se_irq_hw_init(hwdev); -- Sashiko AI review · https://sashiko.dev/#/patchset/[email protected]?part=1
