On Thu, Mar 5, 2026 at 12:48 AM Yang Wang <[email protected]> wrote:
>
> Older versions of the MES firmware may cause abnormal GPU power consumption.
> When performing inference tasks on the GPU (e.g., with Ollama using ROCm),
> the GPU may show abnormal power consumption in idle state and incorrect GPU 
> load information.
> This issue has been fixed in firmware version 0x8b and newer.
>
> Signed-off-by: Yang Wang <[email protected]>

Acked-by: Alex Deucher <[email protected]>

> ---
>  drivers/gpu/drm/amd/amdgpu/mes_v12_0.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c 
> b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
> index 5bfa5d1d0b36..023c7345ea54 100644
> --- a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
> @@ -731,6 +731,9 @@ static int mes_v12_0_set_hw_resources(struct amdgpu_mes 
> *mes, int pipe)
>         int i;
>         struct amdgpu_device *adev = mes->adev;
>         union MESAPI_SET_HW_RESOURCES mes_set_hw_res_pkt;
> +       uint32_t mes_rev = (pipe == AMDGPU_MES_SCHED_PIPE) ?
> +               (mes->sched_version & AMDGPU_MES_VERSION_MASK) :
> +               (mes->kiq_version & AMDGPU_MES_VERSION_MASK);
>
>         memset(&mes_set_hw_res_pkt, 0, sizeof(mes_set_hw_res_pkt));
>
> @@ -785,7 +788,7 @@ static int mes_v12_0_set_hw_resources(struct amdgpu_mes 
> *mes, int pipe)
>          * handling support, other queue will not use the oversubscribe timer.
>          * handling  mode - 0: disabled; 1: basic version; 2: basic+ version
>          */
> -       mes_set_hw_res_pkt.oversubscription_timer = 50;
> +       mes_set_hw_res_pkt.oversubscription_timer = mes_rev < 0x8b ? 0 : 50;
>         mes_set_hw_res_pkt.unmapped_doorbell_handling = 1;
>
>         if (amdgpu_mes_log_enable) {
> --
> 2.47.3
>

Reply via email to