Il 24/11/23 10:17, AngeloGioacchino Del Regno ha scritto:
Il 23/11/23 16:40, Boris Brezillon ha scritto:On Thu, 23 Nov 2023 16:14:12 +0100 AngeloGioacchino Del Regno <[email protected]> wrote:Il 23/11/23 14:51, Boris Brezillon ha scritto:On Thu, 23 Nov 2023 14:24:57 +0100 AngeloGioacchino Del Regno <[email protected]> wrote:So, while I agree that it'd be slightly more readable as a diff if those were two different commits I do have reasons against splitting.....If we just need a quick fix to avoid PWRTRANS interrupts from kicking in when we power-off the cores, I think we'd be better off dropping GPU_IRQ_POWER_CHANGED[_ALL] from the value we write to GPU_INT_MASK at [re]initialization time, and then have a separate series that fixes the problem more generically.But that didn't work: https://lore.kernel.org/all/[email protected]/I meant, your 'ignore-core_mask' fix + the 'drop GPU_IRQ_POWER_CHANGED[_ALL] in GPU_INT_MASK' one. So, https://lore.kernel.org/all/[email protected]/ + https://lore.kernel.org/all/[email protected]/...while this "full" solution worked: https://lore.kernel.org/all/[email protected]/ https://lore.kernel.org/all/[email protected]/ ...so this *is* a "quick fix" already... :-)It's a half-baked solution for the missing irq-synchronization-on-suspend issue IMHO. I understand why you want it all in one patch that can serve as a fix for 123b431f8a5c ("drm/panfrost: Really power off GPU cores in panfrost_gpu_power_off()"), which is why I'm suggesting to go for an even simpler diff (see below), and then fully address the irq-synhronization-on-suspend issue in a follow-up patchset. --->8---diff --git a/drivers/gpu/drm/panfrost/panfrost_gpu.c b/drivers/gpu/drm/panfrost/panfrost_gpu.cindex 09f5e1563ebd..6e2d7650cc2b 100644 --- a/drivers/gpu/drm/panfrost/panfrost_gpu.c +++ b/drivers/gpu/drm/panfrost/panfrost_gpu.c @@ -78,7 +78,10 @@ int panfrost_gpu_soft_reset(struct panfrost_device *pfdev) } gpu_write(pfdev, GPU_INT_CLEAR, GPU_IRQ_MASK_ALL); - gpu_write(pfdev, GPU_INT_MASK, GPU_IRQ_MASK_ALL);We probably want a comment here: /* Only enable the interrupts we care about. */+ gpu_write(pfdev, GPU_INT_MASK, + GPU_IRQ_MASK_ERROR | + GPU_IRQ_PERFCNT_SAMPLE_COMPLETED | + GPU_IRQ_CLEAN_CACHES_COMPLETED);...but if we do that, the next patch(es) will contain a partial revert of this commit, putting back this to gpu_write(pfdev, GPU_INT_MASK, GPU_IRQ_MASK_ALL)...Why should we revert it? We're not processing the PWRTRANS interrupts in the interrupt handler, those should never have been enabled in the first place. The only reason we'd want to revert that change is if we decide to do have interrupt-based waits in the poweron/off implementation, which, as far as I'm aware, is not something we intend to do any time soon.You're right, yes. Okay, I'll push the new code soon. Cheers!
Update: I was running some (rather fast) tests here because I ... felt like
playing
with it, basically :-)
So, I had an issue with MediaTek platforms being unable to cut power to the GPU
or
disable clocks aggressively... and after trying "this and that" I couldn't get
it
working (in runtime suspend).
Long story short - after implementing `panfrost_{job,mmu,gpu}_suspend_irq()`
(only
gpu irq, as you said, is a half solution), I can not only turn off clocks, but
even
turn off GPU power supplies entirely, bringing the power consumption of the GPU
itself during *runtime* suspend to ... zero.
The result of this test makes me truly happy, even though complete powercut
during
runtime suspend may not be feasible for other reasons (takes ~200000ns on AVG,
MIN ~160000ns, but the MAX is ~475000ns - and beware that I haven't run that for
long, I'd suspect to get up to 1-1.5ms as max time, so that's a big no).
This means that I will take a day or two and I'll push both the "simple" fix for
the Really-power-off and also some more commits to add the full irq sync.
Cheers!
Angelo
I'm not sure that it's worth changing this like that, then changing it back right after :-\ Anyway, if anyone else agrees with doing it and then partially revert, I have no issues going with this one instead; what I care about ultimately is resolving the regression ASAP :-) Cheers, Angelo/* * All in-flight jobs should have released their cycle @@ -425,11 +428,10 @@ void panfrost_gpu_power_on(struct panfrost_device *pfdev) void panfrost_gpu_power_off(struct panfrost_device *pfdev) { - u64 core_mask = panfrost_get_core_mask(pfdev); int ret; u32 val;- gpu_write(pfdev, SHADER_PWROFF_LO, pfdev->features.shader_present & core_mask);+ gpu_write(pfdev, SHADER_PWROFF_LO, pfdev->features.shader_present); ret = readl_relaxed_poll_timeout(pfdev->iomem + SHADER_PWRTRANS_LO, val, !val, 1, 1000); if (ret) @@ -441,7 +443,7 @@ void panfrost_gpu_power_off(struct panfrost_device *pfdev) if (ret) dev_err(pfdev->dev, "tiler power transition timeout"); - gpu_write(pfdev, L2_PWROFF_LO, pfdev->features.l2_present & core_mask); + gpu_write(pfdev, L2_PWROFF_LO, pfdev->features.l2_present); ret = readl_poll_timeout(pfdev->iomem + L2_PWRTRANS_LO, val, !val, 0, 1000); if (ret)
