On Thu, Mar 5, 2026 at 12:48 AM Yang Wang <[email protected]> wrote: > > Older versions of the MES firmware may cause abnormal GPU power consumption. > When performing inference tasks on the GPU (e.g., with Ollama using ROCm), > the GPU may show abnormal power consumption in idle state and incorrect GPU > load information. > This issue has been fixed in firmware version 0x8b and newer. > > Signed-off-by: Yang Wang <[email protected]>
Acked-by: Alex Deucher <[email protected]> > --- > drivers/gpu/drm/amd/amdgpu/mes_v12_0.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c > b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c > index 5bfa5d1d0b36..023c7345ea54 100644 > --- a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c > +++ b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c > @@ -731,6 +731,9 @@ static int mes_v12_0_set_hw_resources(struct amdgpu_mes > *mes, int pipe) > int i; > struct amdgpu_device *adev = mes->adev; > union MESAPI_SET_HW_RESOURCES mes_set_hw_res_pkt; > + uint32_t mes_rev = (pipe == AMDGPU_MES_SCHED_PIPE) ? > + (mes->sched_version & AMDGPU_MES_VERSION_MASK) : > + (mes->kiq_version & AMDGPU_MES_VERSION_MASK); > > memset(&mes_set_hw_res_pkt, 0, sizeof(mes_set_hw_res_pkt)); > > @@ -785,7 +788,7 @@ static int mes_v12_0_set_hw_resources(struct amdgpu_mes > *mes, int pipe) > * handling support, other queue will not use the oversubscribe timer. > * handling mode - 0: disabled; 1: basic version; 2: basic+ version > */ > - mes_set_hw_res_pkt.oversubscription_timer = 50; > + mes_set_hw_res_pkt.oversubscription_timer = mes_rev < 0x8b ? 0 : 50; > mes_set_hw_res_pkt.unmapped_doorbell_handling = 1; > > if (amdgpu_mes_log_enable) { > -- > 2.47.3 >
