On 04/06/2026 18:35, Adrián Larumbe wrote:
> During device probe(), failure to do a PM get() will leave the usage_count
> set to 0, which is the value assigned at device creation time. That means
> when the autosuspend delay expires, runtime suspend callback won't be
> invoked, so the device will remain powered on forever.
> 
> On top of that, failure to call PM put() during device unplug means
> Panfrost device's PM usage_count increases monotonically for every new
> module reload.
> 
> The combined outcome of both of the above was that devfreq OPP transition
> notifications would be printed all the time, even when no jobs are being
> submitted. This quickly fills the kernel ring buffer with junk.
> 
> Even direr than that was the fact MMU interrupts are only enabled when
> the device is reset, so after device probe() the very first job targeting
> the tiler heap BO would always time out, because the driver's PM runtime
> resume callback would not be invoked.
> 
> Signed-off-by: Adrián Larumbe <[email protected]>
> Fixes: 635430797d3f ("drm/panfrost: Rework runtime PM initialization")
> Fixes: 876b15d2c88d ("drm/panfrost: Fix module unload")
> ---
>  drivers/gpu/drm/panfrost/panfrost_drv.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c 
> b/drivers/gpu/drm/panfrost/panfrost_drv.c
> index 2d4b6aa95c66..545fbf2c8d0c 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_drv.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
> @@ -989,6 +989,7 @@ static int panfrost_probe(struct platform_device *pdev)
>       pm_runtime_set_active(pfdev->base.dev);
>       pm_runtime_mark_last_busy(pfdev->base.dev);
>       pm_runtime_enable(pfdev->base.dev);
> +     pm_runtime_get_noresume(pfdev->base.dev);
>       pm_runtime_set_autosuspend_delay(pfdev->base.dev, 50); /* ~3 frames */
>       pm_runtime_use_autosuspend(pfdev->base.dev);
>  
> @@ -1000,10 +1001,12 @@ static int panfrost_probe(struct platform_device 
> *pdev)
>       if (err < 0)
>               goto err_out1;
>  
> +     pm_runtime_put_autosuspend(pfdev->base.dev);
>  
>       return 0;
>  
>  err_out1:
> +     pm_runtime_put_noidle(pfdev->base.dev);
>       pm_runtime_disable(pfdev->base.dev);
>       panfrost_device_fini(pfdev);

Sashiko is concerned that dropping the usage count before
pm_runtime_disable() could cause things to turn off too early, and I
have to agree it sounds like it could be a problem:

Sashiko wrote:
> Does dropping the usage count before pm_runtime_disable() create a race
> condition where the suspend callback can run and disable clocks before
> hardware shutdown?
> Because the usage count is dropped early, a concurrent PM event could trigger
> the suspend callback, disabling clocks. Then, panfrost_device_fini() calls
> panfrost_gpu_fini() which writes to MMIO registers. Could writing to
> unclocked registers on ARM SoCs cause fatal bus errors or panics?

Sashiko also suggests we might have some other (partially pre-existing)
issues here.

https://sashiko.dev/#/patchset/20260604-claude-fixes-v2-0-57c6bd4c1655%40collabora.com

Thanks,
Steve

>       pm_runtime_set_suspended(pfdev->base.dev);
> @@ -1018,8 +1021,9 @@ static void panfrost_remove(struct platform_device 
> *pdev)
>       drm_dev_unregister(&pfdev->base);
>  
>       pm_runtime_get_sync(pfdev->base.dev);
> -     pm_runtime_disable(pfdev->base.dev);
>       panfrost_device_fini(pfdev);
> +     pm_runtime_put_noidle(pfdev->base.dev);
> +     pm_runtime_disable(pfdev->base.dev);
>       pm_runtime_set_suspended(pfdev->base.dev);
>  }
>  
> 

Reply via email to