On 24/10/2025 12:51, Karunika Choo wrote:
> On 24/10/2025 10:43, Steven Price wrote:
>> On 23/10/2025 23:16, Karunika Choo wrote:
>>> On 20/10/2025 11:50, Steven Price wrote:
>>>> On 14/10/2025 10:43, Karunika Choo wrote:
>>>>> This patch adds common helpers to issue power commands, poll
>>>>> transitions, and validate domain state, then wires them into the L2
>>>>> on/off paths.
>>>>>
>>>>> The L2 power-on sequence now delegates control of the SHADER and TILER
>>>>> domains to the MCU when allowed, while the L2 itself is never delegated.
>>>>> On power-off, dependent domains beneath the L2 are checked, and if
>>>>> necessary, retracted and powered down to maintain proper domain
>>>>> ordering.
>>>>>
>>>>> Signed-off-by: Karunika Choo <[email protected]>
>>>>> ---
>> [...]
>>>>> +         u64 domain_ready = gpu_read64(ptdev, 
>>>>> get_domain_ready_reg(child_domain));
>>>>> +
>>>>> +         if (domain_ready && (pwr_status & 
>>>>> PWR_STATUS_DOMAIN_DELEGATED(child_domain))) {
>>>>> +                 drm_warn(&ptdev->base,
>>>>> +                          "L2 power off: Delegated %s domain not powered 
>>>>> down by MCU",
>>>>> +                          get_domain_name(child_domain));
>>>>> +                 ret = retract_domain(ptdev, child_domain);
>>>>> +                 if (ret) {
>>>>> +                         drm_err(&ptdev->base, "Failed to retract %s 
>>>>> domain",
>>>>> +                                 get_domain_name(child_domain));
>>>>> +                         panthor_pwr_info_show(ptdev);
>>>>> +                         return ret;
>>>>> +                 }
>>>>> +         }
>>>>> +
>>>>> +         ret = panthor_pwr_domain_power_off(ptdev, child_domain, 
>>>>> domain_ready,
>>>>> +                                            PWR_TRANSITION_TIMEOUT_US);
>>>>> +         if (ret)
>>>>> +                 return ret;
>>>>> + }
>>>>> +
>>>>> + return panthor_pwr_domain_power_off(ptdev, PWR_COMMAND_DOMAIN_L2,
>>>>> +                                     ptdev->gpu_info.l2_present,
>>>>> +                                     PWR_TRANSITION_TIMEOUT_US);
>>>>
>>>> Does this implicitly 'retract' the shader/tiler power domains? If so I
>>>> think it's worth a comment. Otherwise it looks like we don't actually
>>>> know the status of whether the shader/tiler power domains are retracted
>>>> or not.
>>>>
>>>
>>> panthor_pwr_l2_power_off() will only retract the shader/tiler domains if
>>> they have not been powered down by the MCU. In cases where the MCU did
>>> power down these child domains, delegate_domain() will exit early as
>>> they would already be delegated. I understand the ambiguity here,
>>> hopefully it is somewhat acceptable.
>>
>> So my question was really how does the driver know whether the domains
>> are delegated or not when this function returns?
>>
>> I couldn't quite get my head around whether turning the L2 power domain
>> off would implicitly 'retract' the shader/tiler power domains -
>> obviously it forces them off which means the MCU doesn't have control.
>> So retracting would make sense, but I couldn't see anything in the spec.
>>
>> It would be good to have a comment explaining what the expected state is
>> after this function (panthor_pwr_l2_power_off) returns. Is it unknown
>> whether the shader/tiler are retracted, or is there something in the
>> hardware which does this automatically so we know but don't have to
>> manually retract? Presumably if we end up fully powering down the GPU
>> that must effectively retract all domains (the GPU hardware is reset so
>> it goes back to reset conditions).
>>
>> Sorry, it's a bit of a basic question but the spec is somewhat unhelpful
>> on this point! (Or at least I haven't found a relevant statement).
>>
> 
> Powering off the L2 does not automatically retract its child domains.
> The above case is for handling the edge case where the MCU is hung and
> is not able to power off the delegated domains, therefore the host needs
> to take over and power them down before turning off the L2. Additionally,
> like you have alluded to, powering off the GPU will inevitably reset all
> of these states (retracting the child domains), necessitating a
> re-delegation on L2 power on.
> 
> Therefore, the typical operation loop will be as follows:
>  1. L2 power on
>  2. Delegate Tiler/Shader
>  <suspend>
>  3. Halt MCU (should power down Tiler/Shader)
>  4. L2 power off (no retract of Tiler/Shader)
>  <resume>
>  5. L2 power on (next resume)
>  6. Delegate Tiler/Shader (skipped as already delegated)
> 
> If the MCU is hung:
>  1. L2 power on
>  2. Delegate Tiler/Shader
>  <suspend>
>  3. Halt MCU fails
>  4. L2 power off (Retract and power off Shader/Tiler)
>  <resume>
>  5. L2 power on
>  6. Delegate Tiler/Shader
> 
> If the GPU is turned off between suspend and resume:
>  1. L2 power on
>  2. Delegate Tiler/Shader
>  <suspend>
>  3. Halt MCU (should power down Tiler/Shader)
>  4. L2 power off (no retract of Tiler/Shader)
>  <GPU turned off>
>  <resume>
>  6. L2 power on
>  7. Delegate Tiler/Shader

Thanks for the explanation!

> 
> With the current implementation, we cannot expect it to be always
> retracted on return of the function, but it does provide the
> additional benefit that on resume we don't need to go through the
> whole delegate cycle after powering up the L2, allowing us to
> save some time there.
> 
> On the other hand, if we want to explicitly enforce that we retract on 
> suspend, then we have to accept the additional cost to delegate the
> domains on resume.

No, there's no need to change it. But I think it's worth a comment that
in the usual case (the MCU isn't hung) the shader/tiler are left
delegated and the attempt to delegate them again will detect this and
skip it.

I think what mostly confused me is that delegate_domain() has the following:

> +     if (pwr_status_reg & delegated_mask) {
> +             drm_dbg(&ptdev->base, "%s domain already delegated",
> +                     get_domain_name(domain));
> +             return 0;
> +     }

Although it's "drm_dbg" that message makes it seem like this is an
unexpected situation. Whereas actually we would normally expect that to
happen during a resume (as long as the GPU remains powered). With that
in my head I then started to think that there might be something in the
hardware causing an automatic "retract".

Thanks,
Steve

Reply via email to