Hi, Am Donnerstag, 4. September 2025, 08:22:24 CEST schrieb Boris Brezillon: > On Wed, 3 Sep 2025 23:44:59 +0200 > Marek Vasut <[email protected]> wrote: > > > On 3/25/25 3:52 PM, Boris Brezillon wrote: > > > > Hello Boris, > > > > sorry for the late reply. > > > > >>>>>>> Hm, that might be the cause of the fast reset issue (which is a fast > > >>>>>>> resume more than a fast reset BTW): if you re-assert the reset line > > >>>>>>> on > > >>>>>>> runtime suspend, I guess this causes a full GPU reset, and the MCU > > >>>>>>> ends > > >>>>>>> up in a state where it needs a slow reset (all data sections reset > > >>>>>>> to > > >>>>>>> their initial state). Can you try to move the > > >>>>>>> reset_control_[de]assert > > >>>>>>> to the unplug/init functions? > > >>>>>> Is it correct to assume , that if I remove all reset_control_assert() > > >>>>>> calls (and keep only the _deassert() calls), the slow resume problem > > >>>>>> should go away too ? > > >>>>> > > >>>>> Yeah, dropping the _assert()s should do the trick. > > >>>> Hmmm, no, that does not help. I was hoping maybe NXP can chime in and > > >>>> suggest something too ? > > >>> > > >>> Can you try keep all the clks/regulators/power-domains/... on after > > >>> init, and see if the fast resume works with that. If it does, > > >>> re-introduce one resource at a time to find out which one causes the > > >>> MCU to lose its state. > > >> > > >> I already tried that too . I spent quite a while until I reached that L2 > > >> workaround in fact. > > > > > > So, with your RPM suspend/resume being NOPs, it still doesn't work? > > > Unless the FW is doing something behind our back, I don't really see > > > why this would fail on your platform, but not on the rk3588. Are you > > > sure the power domains are kept on at all times. I'm asking, because if > > > you linked all the PDs, the on/off sequence is automatically handled by > > > the RPM core at suspend/resume time. > > > > I revisited this now. > > > > Can you please test the following patch (also attached) on one of your > > devices, and tell me what the status is at the end . The diff sets the > > GLB_HALT bit and then clears it again, which I suspect should first halt > > the GPU and (this is what I am unsure about) then again un-halt/resume > > the GPU ? > > It doesn't work like that. What you're describing is like executing > "shutdown" on your terminal and then typing "boot" on the keyboard > after your computer has been shut down. > > > > > " > > diff --git a/drivers/gpu/drm/panthor/panthor_fw.c > > b/drivers/gpu/drm/panthor/panthor_fw.c > > index 9bf06e55eaeea..57c0d4fd29aa2 100644 > > --- a/drivers/gpu/drm/panthor/panthor_fw.c > > +++ b/drivers/gpu/drm/panthor/panthor_fw.c > > @@ -1087,8 +1087,16 @@ void panthor_fw_pre_reset(struct panthor_device > > *ptdev, bool on_hang) > > struct panthor_fw_global_iface *glb_iface = > > panthor_fw_get_glb_iface(ptdev); > > u32 status; > > > > +pr_err("%s[%d] pre-halt status=%x\n", __func__, __LINE__, > > gpu_read(ptdev, MCU_STATUS)); > > + > > panthor_fw_update_reqs(glb_iface, req, GLB_HALT, GLB_HALT); > > gpu_write(ptdev, CSF_DOORBELL(CSF_GLB_DOORBELL_ID), 1); > > +mdelay(100); > > +pr_err("%s[%d] likely-halted status=%x\n", __func__, __LINE__, > > gpu_read(ptdev, MCU_STATUS)); > > + panthor_fw_update_reqs(glb_iface, req, 0, GLB_HALT); > > +mdelay(100); > > +pr_err("%s[%d] likely-running ? status=%x\n", __func__, __LINE__, > > gpu_read(ptdev, MCU_STATUS)); > > + > > if (!gpu_read_poll_timeout(ptdev, MCU_STATUS, status, > > status == MCU_STATUS_HALT, 10, > > 100000)) { > > " > > > > In my case, the relevant output looks like this: > > > > " > > [ 3.326805] panthor_fw_pre_reset[1090] pre-halt status=1 > > [ 3.432151] panthor_fw_pre_reset[1095] likely-halted status=2 > > [ 3.542179] panthor_fw_pre_reset[1098] likely-running ? status=2 > > " > > > > That means, the GPU remains halted at the end, even if the "GLB_HALT" > > bit is cleared before the last print. The clearing of GLB_HALT is also > > what panthor_fw_post_reset() does. > > After the halt has been processed by the FW, the memory region where > you check the halt status again is inert, since the micro-controller > (MCU) supposed to update those bits is off at this point. The FW > interface is really just a shared memory region between the CPU and > MCU, nothing more. > > > > > I suspect the extra soft reset I did before "un-halted" the GPU and > > allowed it to proceed. > > Hm, not quite. I mean, you still need to explicitly boot the MCU after > a reset, which is what the write to MCU_CONTROL [1] does. What the > soft-reset does though, is reset all GPU blocks, including the MCU. > This means the MCU starts from a fresh state when you reach [1]. > > If I had to guess, I'd say something is messed up when the GPU is > halted, and you need a soft-reset to recover from that. Unfortunately, > I don't know enough about what your FW is doing to help. Maybe > Arm/Freescale could... > > > > > I wonder if there is some way to un-halt the GPU using some gpu_write() > > direct register access, is there ? > > That's MCU_CONTROL, yes. And it's done here [1] already. > > > Maybe the GPU remains halted because > > setting the GLB_HALT stops command stream processing, and the GPU never > > samples the clearing of GLB_HALT and therefore remains halted forever ? > > Exactly that, and that's expected.
FYI: in a new release of system manager software (starting from lf-6.12.3-1.0.0), the GPU reset is reasserted in SM software already [1] and access to GPU block control has been removed from Cortex-A [2]. Starting from B0 step this version is required AFAIK. Best regards Alexander [1] https://github.com/nxp-imx/imx-sm/commit/2dcc0409ede82eef54857be50daa588b23b3ba7b [2] https://github.com/nxp-imx/imx-sm/commit/a3e5da9ea51144f513ac3909fa151fa7df394100 -- TQ-Systems GmbH | Mühlstraße 2, Gut Delling | 82229 Seefeld, Germany Amtsgericht München, HRB 105018 Geschäftsführer: Detlef Schneider, Rüdiger Stahl, Stefan Schneider http://www.tq-group.com/
