Launchpad has imported 94 comments from the remote bug at
https://bugzilla.kernel.org/show_bug.cgi?id=218305.

If you reply to an imported comment from within Launchpad, your comment
will be sent to the remote bug automatically. Read more about
Launchpad's inter-bugtracker facilities at
https://help.launchpad.net/InterBugTracking.

------------------------------------------------------------------------
On 2023-12-24T07:21:48+00:00 aros wrote:

I'm almost sure it's a bug in the firmware but since I cannot make HP
fix it, I'll try to report it here.

The CPU gets stuck at this extremely low frequency after N number of
suspend/resume cycles where N can be 1, 2, 3, 4 but at most 5.

The laptop is plugged in at all times.

This is happening with both acpi-cpufreq and amd-pstate-epp.

# cpupower frequency-info
analyzing CPU 10:
  driver: amd-pstate-epp
  CPUs which run at the same hardware frequency: 10
  CPUs which need to have their frequency coordinated by software: 10
  maximum transition latency:  Cannot determine or is not supported.
  hardware limits: 400 MHz - 6.08 GHz
  available cpufreq governors: performance powersave
  current policy: frequency should be within 400 MHz and 6.08 GHz.
                  The governor "powersave" may decide which speed to use
                  within this range.
  current CPU frequency: Unable to call hardware
  current CPU frequency: 544 MHz (asserted by call to kernel)
  boost state support:
    Supported: yes
    Active: yes
    AMD PSTATE Highest Performance: 232. Maximum Frequency: 6.08 GHz.
    AMD PSTATE Nominal Performance: 145. Nominal Frequency: 3.80 GHz.
    AMD PSTATE Lowest Non-linear Performance: 42. Lowest Non-linear Frequency: 
1.10 GHz.
    AMD PSTATE Lowest Performance: 16. Lowest Frequency: 400 MHz.

Some CPU parameters look completely wrong after it happens:

# ryzenadj -i
|        Name         |   Value   |     Parameter      |
|---------------------|-----------|--------------------|
| STAPM LIMIT         |    30.000 | stapm-limit        |
| STAPM VALUE         |     4.181 |                    |
| PPT LIMIT FAST      |    30.000 | fast-limit         |
| PPT VALUE FAST      |     5.347 |                    |
| PPT LIMIT SLOW      |    20.000 | slow-limit         |
| PPT VALUE SLOW      |     3.747 |                    |
| StapmTimeConst      |       nan | stapm-time         |
| SlowPPTTimeConst    |       nan | slow-time          |
| PPT LIMIT APU       |       nan | apu-slow-limit     |
| PPT VALUE APU       |       nan |                    |
| TDC LIMIT VDD       |       nan | vrm-current        |
| TDC VALUE VDD       |       nan |                    |
| TDC LIMIT SOC       |       nan | vrmsoc-current     |
| TDC VALUE SOC       |       nan |                    |
| EDC LIMIT VDD       |       nan | vrmmax-current     |
| EDC VALUE VDD       |       nan |                    |
| EDC LIMIT SOC       |       nan | vrmsocmax-current  |
| EDC VALUE SOC       |       nan |                    |
| THM LIMIT CORE      |       nan | tctl-temp          |
| THM VALUE CORE      |       nan |                    |
| STT LIMIT APU       |       nan | apu-skin-temp      |
| STT VALUE APU       |       nan |                    |
| STT LIMIT dGPU      |       nan | dgpu-skin-temp     |
| STT VALUE dGPU      |       nan |                    |
| CCLK Boost SETPOINT |       nan | power-saving /     |
| CCLK BUSY VALUE     |       nan | max-performance    |

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/0

------------------------------------------------------------------------
On 2023-12-24T07:30:38+00:00 aros wrote:

I do use this command to constrain CPU thermals:

ryzenadj --tctl-temp=75 --stapm-limit=30000 --fast-limit=30000 --slow-
limit=20000

https://github.com/FlyGoat/RyzenAdj

Perhaps on resume the firmware sees altered limits and wreaks havoc to
everything.

These last three parameters suddenly become read only after the bug
occurs.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/1

------------------------------------------------------------------------
On 2023-12-24T15:03:36+00:00 W_Armin wrote:

Does the issue also happen if you dont use ryzenadj?

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/2

------------------------------------------------------------------------
On 2023-12-24T15:12:47+00:00 aros wrote:

(In reply to Armin Wolf from comment #2)
> Does the issue also happen if you dont use ryzenadj?

Yes, today a single suspend resume cycle has been enough to trigger this
bug.

This is the result of this bug (no settings have been altered prior):

ryzenadj -i
CPU Family: Phoenix
SMU_SERVICE REQ_ID:0x3
SMU_SERVICE REQ: arg0: 0x0, arg1:0x0, arg2:0x0, arg3:0x0, arg4: 0x0, arg5: 0x0
SMU_SERVICE REP: REP: 0x1, arg0: 0xe, arg1:0x0, arg2:0x0, arg3:0x0, arg4: 0x0, 
arg5: 0x0
SMU BIOS Interface Version: 14
Version: v0.14.0 
init_table
SMU_SERVICE REQ_ID:0x6
SMU_SERVICE REQ: arg0: 0x0, arg1:0x0, arg2:0x0, arg3:0x0, arg4: 0x0, arg5: 0x0
SMU_SERVICE REP: REP: 0x1, arg0: 0x4c0008, arg1:0x0, arg2:0x0, arg3:0x0, arg4: 
0x0, arg5: 0x0
SMU_SERVICE REQ_ID:0x66
SMU_SERVICE REQ: arg0: 0x0, arg1:0x0, arg2:0x0, arg3:0x0, arg4: 0x0, arg5: 0x0
SMU_SERVICE REP: REP: 0x1, arg0: 0x9e300000, arg1:0x7, arg2:0x0, arg3:0x0, 
arg4: 0x0, arg5: 0x0
SMU_SERVICE REQ_ID:0x65
SMU_SERVICE REQ: arg0: 0x0, arg1:0x0, arg2:0x0, arg3:0x0, arg4: 0x0, arg5: 0x0
SMU_SERVICE REP: REP: 0xfd, arg0: 0x0, arg1:0x0, arg2:0x0, arg3:0x0, arg4: 0x0, 
arg5: 0x0
SMU_SERVICE REQ_ID:0x65
SMU_SERVICE REQ: arg0: 0x0, arg1:0x0, arg2:0x0, arg3:0x0, arg4: 0x0, arg5: 0x0
SMU_SERVICE REP: REP: 0x1, arg0: 0x0, arg1:0x0, arg2:0x0, arg3:0x0, arg4: 0x0, 
arg5: 0x0
PM Table Version: 4c0008
SMU_SERVICE REQ_ID:0x65
SMU_SERVICE REQ: arg0: 0x0, arg1:0x0, arg2:0x0, arg3:0x0, arg4: 0x0, arg5: 0x0
SMU_SERVICE REP: REP: 0x1, arg0: 0x0, arg1:0x0, arg2:0x0, arg3:0x0, arg4: 0x0, 
arg5: 0x0
|        Name         |   Value   |     Parameter      |
|---------------------|-----------|--------------------|
| STAPM LIMIT         |    51.000 | stapm-limit        |
| STAPM VALUE         |     4.150 |                    |
| PPT LIMIT FAST      |    51.000 | fast-limit         |
| PPT VALUE FAST      |     6.040 |                    |
| PPT LIMIT SLOW      |    41.000 | slow-limit         |
| PPT VALUE SLOW      |     4.056 |                    |
| StapmTimeConst      |       nan | stapm-time         |
| SlowPPTTimeConst    |       nan | slow-time          |
| PPT LIMIT APU       |       nan | apu-slow-limit     |
| PPT VALUE APU       |       nan |                    |
| TDC LIMIT VDD       |       nan | vrm-current        |
| TDC VALUE VDD       |       nan |                    |
| TDC LIMIT SOC       |       nan | vrmsoc-current     |
| TDC VALUE SOC       |       nan |                    |
| EDC LIMIT VDD       |       nan | vrmmax-current     |
| EDC VALUE VDD       |       nan |                    |
| EDC LIMIT SOC       |       nan | vrmsocmax-current  |
| EDC VALUE SOC       |       nan |                    |
| THM LIMIT CORE      |       nan | tctl-temp          |
| THM VALUE CORE      |       nan |                    |
| STT LIMIT APU       |       nan | apu-skin-temp      |
| STT VALUE APU       |       nan |                    |
| STT LIMIT dGPU      |       nan | dgpu-skin-temp     |
| STT VALUE dGPU      |       nan |                    |
| CCLK Boost SETPOINT |       nan | power-saving /     |
| CCLK BUSY VALUE     |       nan | max-performance    |


STAMP, PPT FAST and PPT SLOW all have broken values.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/3

------------------------------------------------------------------------
On 2023-12-24T15:37:25+00:00 W_Armin wrote:

Could be that the firmware fails to properly restore those values after
suspend, does the issue also happen under Windows?

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/4

------------------------------------------------------------------------
On 2023-12-24T15:38:35+00:00 aros wrote:

(In reply to Armin Wolf from comment #4)
> Could be that the firmware fails to properly restore those values after
> suspend, does the issue also happen under Windows?

I rarely boot into Windows but I may check.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/5

------------------------------------------------------------------------
On 2023-12-25T10:46:21+00:00 aros wrote:

I've not been able to reproduce this issue under Windows but then I
didn't try hard enough (which means multiple attempts spanning several
days).

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/6

------------------------------------------------------------------------
On 2023-12-25T10:53:41+00:00 W_Armin wrote:

Have you verified that you are using the latest BIOS for you machine?

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/7

------------------------------------------------------------------------
On 2023-12-25T14:42:34+00:00 aros wrote:

The issue is reproducible with the latest BIOS release (V82: 01.03.09
Rev.A, released on Dec 15, 2023) and two versions prior. HP doesn't
allow to download and flash earlier versions.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/8

------------------------------------------------------------------------
On 2023-12-26T04:21:52+00:00 shyam-sundar.s-k wrote:

Since you are pointing to STAPM, PPT limits, Can you blacklist amd_pmf
driver and see if that helps after the suspend/resume cycle?

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/9

------------------------------------------------------------------------
On 2023-12-26T15:18:22+00:00 aros wrote:

This is reproducible without the amd-pmf module:

[root@hp policy0]# lsmod | grep pmf
[root@hp policy0]# cpupower frequency-info
analyzing CPU 13:
  driver: amd-pstate-epp
  CPUs which run at the same hardware frequency: 13
  CPUs which need to have their frequency coordinated by software: 13
  maximum transition latency:  Cannot determine or is not supported.
  hardware limits: 400 MHz - 5.76 GHz
  available cpufreq governors: performance powersave
  current policy: frequency should be within 400 MHz and 5.76 GHz.
                  The governor "powersave" may decide which speed to use
                  within this range.
  current CPU frequency: Unable to call hardware
  current CPU frequency: 542 MHz (asserted by call to kernel)
  boost state support:
    Supported: yes
    Active: yes
    AMD PSTATE Highest Performance: 220. Maximum Frequency: 5.76 GHz.
    AMD PSTATE Nominal Performance: 145. Nominal Frequency: 3.80 GHz.
    AMD PSTATE Lowest Non-linear Performance: 42. Lowest Non-linear Frequency: 
1.10 GHz.
    AMD PSTATE Lowest Performance: 16. Lowest Frequency: 400 MHz.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/10

------------------------------------------------------------------------
On 2023-12-26T15:25:27+00:00 aros wrote:

Created attachment 305659
The contents of /sys/devices/system/cpu/cpufreq/*

Switching between power modes using
/sys/devices/system/cpu/cpufreq/*/energy_performance_preference does
nothing.

The exact per CPU frequency stats:
# cat /sys/devices/system/cpu/cpufreq/policy*/scaling_cur_freq
400000
544395
400000
544099
400000
400000
400000
400000
400000
544189
400000
544181
400000
400000
544007
542947

No idea where 544MHz comes from.

BTW here's another bug, either firmware or something in the kernel
reports wrong max frequency:

# cat /sys/devices/system/cpu/cpufreq/policy*/scaling_max_freq
5137000
6080000
6080000
5764000
5764000
5924000
5924000
5137000
6080000
6080000
5608000
5608000
5293000
5293000
5449000
5449000

I'm not aware of any Zen 4 CPUs which can run at 6080000KHz frequency by
default, let alone mobile parts.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/11

------------------------------------------------------------------------
On 2023-12-26T15:29:47+00:00 aros wrote:

> BTW here's another bug, either firmware or something in the kernel reports
> wrong max frequency:

Max frequency is reported correctly only for two out of sixteen logical
cores. It's wrong for all other cores. Would be great if AMD fixed this.

Speaking of my firmware, it's:

https://support.hp.com/us-en/drivers/hp-
elitebook-845-14-inch-g10-notebook-pc/2101628462

> Description:
> 
> This package is used to update the supported firmware on HP Business Notebook
> systems with a V82 family BIOS. This package is provided for supported
> computer systems that are running a supported operating system.
> 
> Fix and enhancements:
> 
> - Fixes an issue where the Performance page in AMD Software: Adrenalin
> Edition does not display correctly. - Adds the Gaming Optimized mode to video
> memory size.
> 
> - Includes the following firmware:
> AMD Graphics Output Protocol (GOP) Firmware, version 3.7.10
> AMD PSP Firmware, version 0.2D.6.6C
> AMD SMU Firmware, version 0.76.65.0
> Embedded Controller (EC) Firmware, version 60.28.00
> Intel/Realtek UEFI PXE ROM, version 2.041
> TI Power Delivery (PD) Firmware, version 4.1.0

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/12

------------------------------------------------------------------------
On 2024-01-03T00:03:22+00:00 dan.martins wrote:

I am seeing similar behaviour, to the extent that my CPU cores get
capped at some low frequency. Sometimes it is a few cores stuck at
~1600MHz, and sometimes it is all cores stuck at 544MHz. It typically
happens for me when rebooting. I tried suspend/resume several times but
could not reproduce that way.

CPU is a AMD Ryzen 5 7640U on a Framework 13 laptop. 6.6.8 kernel on
Fedora 39.

We may not be having the same issue, but I wanted to mention, I can get
all cores back to normal by switching the scaling_governor from
powersave to performance and back in case it helps in your case. I am
using "sudo cpupower frequency-set -g <GOV>" to switch it.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/13

------------------------------------------------------------------------
On 2024-01-17T04:10:54+00:00 mario.limonciello wrote:

I read through this thread and I currently think that Artem and Dan have
encountered two separate bugs.

@Artem:

Under the presumption that ryzenadj is actually retrieving the correct
values for STAPM, PPT FAST, and PPT SLOW I want to ask if this is tied
to a specific power adapter, or sequence of events.  Like suspend on
power, resume on battery or suspend on battery resume on power.

If there is a linkage between any of those, then I think this is "most
likely" an HP EC bug.

@Dan,

Can you reproduce this if you manually always set the scaling governor
on all CPUs to "performance" before you reboot?

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/14

------------------------------------------------------------------------
On 2024-01-17T19:33:34+00:00 aros wrote:

(In reply to Mario Limonciello (AMD) from comment #14)
> Under the presumption that ryzenadj is actually retrieving the correct
> values for STAPM, PPT FAST, and PPT SLOW I want to ask if this is tied to a
> specific power adapter, or sequence of events.  Like suspend on power,
> resume on battery or suspend on battery resume on power.

My laptop is plugged in 100% of the time.

> 
> If there is a linkage between any of those, then I think this is "most
> likely" an HP EC bug.

I've given up on reporting bugs to HP. It's a complicated process which
takes forever. I bought this laptop and its maximum CPU frequency was
limited to 4.5GHz which took HP over four months to resolve and that was
at least reproducible under Linux and Windows.

This bug seems to affect only Linux or maybe I've not used Windows
enough to face it in the Microsoft OS.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/15

------------------------------------------------------------------------
On 2024-01-19T07:31:13+00:00 muzhi.yu1 wrote:

I can reproduce Artem's issue on EliteBook 845 G10 (kernel 6.7.0 on
NixOS). Also Dan's workaround works for me 80% of the time, with only a
few times when I had to reboot to lift the cpufreq lock.

My max_freqs are also strange, regardless of whether cpufreq is capped
to 544MHz or not.

```
❯ cat /sys/devices/system/cpu/cpufreq/policy*/scaling_max_freq
5137000
5137000
6080000
6080000
5449000
5449000
5293000
5293000
5924000
5924000
6080000
6080000
5764000
5764000
5608000
5608000
```

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/16

------------------------------------------------------------------------
On 2024-01-19T12:45:48+00:00 dan.martins wrote:

(In reply to Muzhi Yu from comment #16)
> I can reproduce Artem's issue on EliteBook 845 G10 (kernel 6.7.0 on NixOS).
> Also Dan's workaround works for me 80% of the time, with only a few times
> when I had to reboot to lift the cpufreq lock.
> 

I have since found that I don't need to switch to the performance
governor at all. It is enough, in my case, to "reset" the the scaling
governor to powersave. Just "sudo cpupower frequency-set -g powersave".

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/17

------------------------------------------------------------------------
On 2024-01-19T13:52:48+00:00 dan.martins wrote:

(In reply to Mario Limonciello (AMD) from comment #14)
> I read through this thread and I currently think that Artem and Dan have
> encountered two separate bugs.
> 
> @Artem:
> 
> Under the presumption that ryzenadj is actually retrieving the correct
> values for STAPM, PPT FAST, and PPT SLOW I want to ask if this is tied to a
> specific power adapter, or sequence of events.  Like suspend on power,
> resume on battery or suspend on battery resume on power.
> 
> If there is a linkage between any of those, then I think this is "most
> likely" an HP EC bug.
> 
> @Dan,
> 
> Can you reproduce this if you manually always set the scaling governor on
> all CPUs to "performance" before you reboot?

Mario,
I just tested setting the governor to performance before reboot and yes, it is 
reproducible in that case too.
1. load the CPU and observe all cores can reach ~4Ghz
2. set governor: sudo cpupower frequency-set -g performance
3. reboot
4. load the CPU and check frequencies: on first reboot, all cores hit 4GHz 
range. On second reboot, cores 6-11 can only reach ~1.7GHz.

This is in-line with previous tests. It is inconsistent, and various
power settings don't seem to affect it (epp, platform_profile,
scaling_governor). It does seem much more likely to occur when on
battery, but will stills happen sometimes when plugged in.

A couple of more recent observations:
- I don't need to toggle from performance to powersave to fix it. I can just 
"sudo cpupower frequency-set -g powersave" even when it is already reporting 
that it is using the powersave governor.
- on reboot, the scaling_governor is always showing powersave, even when I set 
it to performance before reboot.
- Using kernel 6.6.11 as of this morning for the above test

Thanks,
Dan

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/18

------------------------------------------------------------------------
On 2024-01-19T15:25:00+00:00 mario.limonciello wrote:

Can you please dump teh values from all of these MSR's from userspace
while in a reproduced state?

#define MSR_AMD_CPPC_CAP1               0xc00102b0
#define MSR_AMD_CPPC_ENABLE             0xc00102b1
#define MSR_AMD_CPPC_CAP2               0xc00102b2
#define MSR_AMD_CPPC_REQ                0xc00102b3
#define MSR_AMD_CPPC_STATUS             0xc00102b4

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/19

------------------------------------------------------------------------
On 2024-01-20T00:21:42+00:00 mario.limonciello wrote:

Can you guys please test this and see if it improves the situation at
all?

https://lore.kernel.org/linux-
pm/20240119113319.54158-1-mario.limoncie...@amd.com/T/#u

Thanks!

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/20

------------------------------------------------------------------------
On 2024-01-21T18:05:34+00:00 dan.martins wrote:

(In reply to Mario Limonciello (AMD) from comment #19)
> Can you please dump teh values from all of these MSR's from userspace while
> in a reproduced state?
> 
> #define MSR_AMD_CPPC_CAP1             0xc00102b0
> #define MSR_AMD_CPPC_ENABLE           0xc00102b1
> #define MSR_AMD_CPPC_CAP2             0xc00102b2
> #define MSR_AMD_CPPC_REQ              0xc00102b3
> #define MSR_AMD_CPPC_STATUS           0xc00102b4

Hi Mario,

Thank you for looking into this. I'll try your kernel patch when I have
a bit more time. For now, here are the MSRs:

Good state (from a boot when no cores were limited):
=========================
=========================
MSR_AMD_CPPC_CAP1  
d08a2c10  
d08a2c10d08a2c10  
dc8a2c10  
dc8a2c10dc8a2c10  
ca8a2c10  
ca8a2c10ca8a2c10  
dc8a2c10  
dc8a2c10dc8a2c10  
c48a2c10  
c48a2c10c48a2c10  
d68a2c10  
d68a2c10d68a2c10  
=========================  
MSR_AMD_CPPC_ENABLE  
1  
1  
1  
1  
1  
1  
1  
1  
1  
1  
1  
1  
=========================  
MSR_AMD_CPPC_CAP2  
0  
0  
0  
0  
0  
0  
0  
0  
0  
0  
0  
0  
=========================  
MSR_AMD_CPPC_REQ  
10d0  
10d0  
10dc  
10dc  
10ca  
10ca  
10dc  
10dc  
10c4  
10c4  
10d6  
f0f  
=========================  
MSR_AMD_CPPC_STATUS  
0  
0  
0  
0  
0  
0  
0  
0  
0  
0  
0  
0  
=========================

In reproduced state, where all cores are stuck at ~544MHz, MSR_AMD_CPPC_REQ 
values appear to have wrapped around?
========================================
========================================
MSR_AMD_CPPC_CAP1
d08a2c10
d08a2c10d08a2c10
dc8a2c10
dc8a2c10dc8a2c10
ca8a2c10
ca8a2c10ca8a2c10
dc8a2c10
dc8a2c10dc8a2c10
c48a2c10
c48a2c10c48a2c10
d68a2c10
d68a2c10d68a2c10
=========================
MSR_AMD_CPPC_ENABLE
1
1
1
1
1
1
1
1
1
1
1
1
=========================
MSR_AMD_CPPC_CAP2
0
0
0
0
0
0
0
0
0
0
0
0
=========================
MSR_AMD_CPPC_REQ
ff000f0f
ff000f0f
ff000f0f
ff000f0f
ff000f0f
ff000f0f
ff000f0f
ff000f0f
ff000f0f
ff000f0f
ff000f0f
ff000f0f
=========================
MSR_AMD_CPPC_STATUS
0
0
0
0
0
0
0
0
0
0
0
0
=========================
=========================


And, when I (re)set the scaling governor, the MSR_AMD_CPPC_REQ change slightly. 
Here is side-by-side. reproduced state on left, and after re-setting the 
governor on right.
=========================                                       
=========================
MSR_AMD_CPPC_REQ                                                MSR_AMD_CPPC_REQ
ff000f0f                                                      | ff0010d0
ff000f0f                                                      | ff0010d0
ff000f0f                                                      | ff0010dc
ff000f0f                                                      | ff0010dc
ff000f0f                                                      | ff0010ca
ff000f0f                                                      | ff0010ca
ff000f0f                                                      | ff0010dc
ff000f0f                                                      | ff0010dc
ff000f0f                                                      | ff0010c4
ff000f0f                                                      | ff0010c4
ff000f0f                                                      | ff0010d6
ff000f0f                                                      | ff0010d6

Thanks,
Dan

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/21

------------------------------------------------------------------------
On 2024-01-22T01:48:15+00:00 dan.martins wrote:

(In reply to Mario Limonciello (AMD) from comment #20)
> Can you guys please test this and see if it improves the situation at all?
> 
> https://lore.kernel.org/linux-pm/20240119113319.54158-1-mario.
> limoncie...@amd.com/T/#u
> 
> Thanks!

Hi again Mario,

I tested this patch against Fedora's 6.6.13 kernel and so far, after 6
reboots have not been able to reproduce. When I switch back to the stock
kernel, I can typically reproduce the issue in 1-2 reboots so the patch
seems to have helped so far. I'll keep using the patched kernel for now
and let you know if this issue occurs again.

Thanks,
Dan

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/22

------------------------------------------------------------------------
On 2024-01-22T01:53:23+00:00 dan.martins wrote:

(In reply to Dan Martins from comment #21)
> (In reply to Mario Limonciello (AMD) from comment #19)
> > Can you please dump teh values from all of these MSR's from userspace while
> > in a reproduced state?

> In reproduced state, where all cores are stuck at ~544MHz, MSR_AMD_CPPC_REQ
> values appear to have wrapped around?

Ignore comment about values wrapping around, it appears the upper byte
is set when I adjust PPD from performance (0x00) to balanced (0x80) and
powersave (0xFF). I must have adjusted this between reboots.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/23

------------------------------------------------------------------------
On 2024-01-22T03:22:15+00:00 mario.limonciello wrote:

That's great news. Everyone who feels comfortable sharing your email
address feel free to reply to the post with a "Tested-by" tag.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/24

------------------------------------------------------------------------
On 2024-01-22T17:50:56+00:00 muzhi.yu1 wrote:

Hi guys,

Just adding my data point here. I've applied the patch and haven't seen
this bug for the evening after ~5 cycles.

BTW, are the MSR values still relevant, because I'm seeing no difference
between normal and bad states?

```
c4912a10c4912a10
1
0
ff0010c4
0
```

Thanks!

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/25

------------------------------------------------------------------------
On 2024-01-22T17:54:17+00:00 mario.limonciello wrote:

No need to share MSR values anymore.  I believe this this is the correct 
solution.
If there are still problems with it they may be a secondary problem.

The MSR values are a little difficult to properly capture because each
CPU has it's own register value.  So a proper test would need to capture
all of them for all CPUs (not all may have this problem occurring).

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/26

------------------------------------------------------------------------
On 2024-03-05T19:03:11+00:00 mario.limonciello wrote:

So I think there are actually two issues in this bug.  
* The first one was the one that Artem reported which looks like a problem with 
the EC communicating some limits to the APU.  This is Artem's issue.
* The second one is that there was a bug in amd-pstate that could cause CPPC 
requests to have the wrong values.  This is (nearly) everyone else's issue in 
this bug.

The second issue is fixed by
https://github.com/torvalds/linux/commit/22fb4f041999f5f16ecbda15a2859b4ef4cbf47e

For the first issue, Artem can you update to 6.8-rc7, make sure you've
added the TEE firmware for the amd-pmf driver from linux-firmware and
see if you can still reproduce it?

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/27

------------------------------------------------------------------------
On 2024-03-06T09:59:39+00:00 aros wrote:

> For the first issue, Artem can you update to 6.8-rc7, make sure you've added
> the TEE firmware for the amd-pmf driver from linux-firmware and see if you
> can still reproduce it?

I've just added the firmware file,
"773bd96f-b83f-4d52-b12dc529b13d8543.bin" (what a weird name) and I will
test 6.8 as soon as it gets released. It's coming pretty soon.

Thanks.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/28

------------------------------------------------------------------------
On 2024-03-06T16:20:25+00:00 mario.limonciello wrote:

> I've just added the firmware file, "773bd96f-b83f-4d52-b12dc529b13d8543.bin"
> (what a weird name) and I will test 6.8 as soon as it gets released. It's
> coming pretty soon.

OK.  Separately from that I'd like to understand what you were getting
at with your ryzenadj comment.

I don't know if ryzenadj accesses all those coefficients correctly; but
we *do* export them properly under amd-pmf debugfs.

There is a debugfs file called "current_power_limits".  Can you read it
before suspend as well as after a suspend that reproduced the failure?

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/29

------------------------------------------------------------------------
On 2024-03-06T16:31:33+00:00 mario.limonciello wrote:

Sorry two more things.

First - there are two sets of coefficients (one for AC and for DC).  You
can see in current_power_limits_show() that it will return the table
matching your power mode.

Please capture like this:
1) Start on DC, capture the file.
2) Switch to AC, capture the file.
3) Suspend the machine
4) Unplug adapter
5) Resume
6) Capture the file (while you're on DC)
7) Switch to AC, capture the file.

This will let us confirm whether or not there is a problem with the
table.

Second - after the issue has occurred, does changing the acpi platform
profile from sysfs or powerprofilesctl recover it?

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/30

------------------------------------------------------------------------
On 2024-03-06T17:38:03+00:00 aros wrote:

1)

current_power_limits 
spl:51000 fppt:51000 sppt:41000 sppt_apu_only:41001 stt_min:25000 stt[APU]:0 
stt[HS2]: 0

2)

cat current_power_limits 
spl:51000 fppt:51000 sppt:41000 sppt_apu_only:41000 stt_min:25000 stt[APU]:0 
stt[HS2]: 0


3-4-5) done

6) cat current_power_limits 
spl:51000 fppt:51000 sppt:41000 sppt_apu_only:41000 stt_min:25000 stt[APU]:0 
stt[HS2]: 0


7) cat current_power_limits 
spl:51000 fppt:51000 sppt:41000 sppt_apu_only:41000 stt_min:25000 stt[APU]:0 
stt[HS2]: 0

While we have been discussing this, I've just found out that when this
bug occurs, all I need to do is to unplug and that fixes everything.

It's actually such a simple workaround, I will leave it up to you
whether anything needs to be done to address it.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/31

------------------------------------------------------------------------
On 2024-03-06T17:46:24+00:00 mario.limonciello wrote:

> cat current_power_limits

It looks like those don't change.

> While we have been discussing this, I've just found out that when this bug
> occurs, all I need to do is to unplug and that fixes everything.

Presumably you mean unplug OR replug (IE opposite of what you did in
suspend) right?

> It's actually such a simple workaround, I will leave it up to you whether
> anything needs to be done to address it.

It's good you have that workaround.  I'd like to know if
powerprofilesctl/acpi platform profile can also recover it.

If so; we might want to add an explicit code in the suspend/resume
callbacks to rewrite the state if power adapter changed over suspend.  I
think this would be a safe solution for everyone.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/32

------------------------------------------------------------------------
On 2024-03-06T17:50:48+00:00 aros wrote:

> Presumably you mean unplug OR replug (IE opposite of what you did in suspend)
> right?

After unplugging it's already fixed. I do of course replug not to waste
battery power.

> If so; we might want to add an explicit code in the suspend/resume callbacks
> to rewrite the state if power adapter changed over suspend.  I think this
> would be a safe solution for everyone.

If only it doesn't break other systems. That sounds a tad scary to me.
So far I seem to have been the only affected person (not that many
people seem to be using HP business laptops with Linux).

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/33

------------------------------------------------------------------------
On 2024-03-06T17:55:33+00:00 mario.limonciello wrote:

> If only it doesn't break other systems. That sounds a tad scary to me. So far
> I seem to have been the only affected person (not that many people seem to be
> using HP business laptops with Linux).

The code would basically look like this:
* Capture state of power adapter at suspend callback into a private variable
* If state of power adapter has changed during resume then rewrite all CPU 
coefficients.

It should be safe for everyone.  But I need to know that it actually
helps your problem.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/34

------------------------------------------------------------------------
On 2024-03-06T18:03:25+00:00 W_Armin wrote:

Shouldn't the driver generally restore all CPU coefficients after
suspend/resume? Or is there a specification saying that the CPU
coefficients will be restored by the platform firmware?

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/35

------------------------------------------------------------------------
On 2024-03-06T18:06:15+00:00 mario.limonciello wrote:

AMD-PMF can be used differently by different OEMs and models depending
upon their needs and desires.

Some will control entirely by their EC.  Some will rely on PMF to do
more functionality.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/36

------------------------------------------------------------------------
On 2024-03-06T18:08:04+00:00 W_Armin wrote:

Could it be that the Windows equivalent of the amd-pmf driver does
restore all/some coefficients after suspend/resume?

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/37

------------------------------------------------------------------------
On 2024-03-06T18:09:51+00:00 mario.limonciello wrote:

The Windows equivalent of the amd-pmf driver on this HP system uses the
features in kernel 6.8 that I've been asking Artem to test.

Once I know whether the issue happens on kernel 6.8 and whether changing
the profile manually helps it we can decide on whether to do anything.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/38

------------------------------------------------------------------------
On 2024-03-06T18:10:35+00:00 W_Armin wrote:

Ok

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/39

------------------------------------------------------------------------
On 2024-04-01T08:00:47+00:00 ries.infotec+kernel wrote:

Hello everyone. I just read through this bugreport as I have the same
problem on my HP Elitebook 845 G10 with a 7840U CPU. It randomely gets
stuck at 544MHz.

I'm running Endeavour OS (Arch based) with the latest Kernel
(6.8.2-arch2-1) and Firmware (core/linux-firmware 20240312.3b128b60-1).

I thought things would be fixed now, but I just had the hanging CPU freq
again.

As the bug is not closed and last comment is nearly 4 weeks old I just
wanted to know if the fix is not official yet...

Thanks for an update and to all for investigating here :)

(Meanwhile I'll try the "sudo cpupower frequency-set -g powersave"
workaround to see if it helps to circumvent an annoying reboot)

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/40

------------------------------------------------------------------------
On 2024-04-01T13:19:58+00:00 mario.limonciello wrote:

Created attachment 306075
possible patch (v1)

> It randomely gets stuck at 544MHz.

Are you sure it's random?  From the above discussions I believe it is
triggered specifically from an event sent by the EC when changing the
power adapter while suspended.

> with the latest Kernel (6.8.2-arch2-1)

Thanks.  I've been waiting for feedback with kernel 6.8.  And you have
CONFIG_AMD_PMF set?

> I just wanted to know if the fix is not official yet...

There is no fix or workaround right now, like I said above this "looks"
like a bug caused by HP's EC or BIOS.

Assuming you tested with amd-pmf in place and it really is the same root
cause described above (only by power adapter) I was thinking about it
and this sounds like it could be a race condition. I do have an idea for
a workaround.  Can you see if this patch helps?

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/41

------------------------------------------------------------------------
On 2024-04-01T13:37:49+00:00 aros wrote:

>  From the above discussions I believe it is triggered specifically from an
>  event sent by the EC when changing the power adapter while suspended.

Yep, and like I said in my case unplugging/plugging the power cord is
enough to fix it which was a relief for me.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/42

------------------------------------------------------------------------
On 2024-04-01T14:04:48+00:00 ries.infotec+kernel wrote:

I have

$ zcat /proc/config.gz | grep CONFIG_AMD_PMF
CONFIG_AMD_PMF=m
# CONFIG_AMD_PMF_DEBUG is not set

Unfortunately I could not reproduce the effect during the test I did
right now. Subjective impression is that the bug occurs less often since
6.8.x kernel.

Testing procedure was:

Plugged - suspend - unplug - resume - OK
Unplugged - suspend - plug back in - resume - OK
Starting plugged - suspend - resume - repeated 6 times while plugged - OK

Resume was done via "systemctl suspend" command on my hotkey "Strg-
Super-End"

Then I tried it 3 times by using the lid - same here - it works for the
moment. So it seems to be more random that for Artem. I'll check if
un-/plug procedure helps as a quick fix to not have to reboot when CPU
gets stucked again.

I somehow need to find a reliable procedure to run into this bug before
it makes sense to test the patch.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/43

------------------------------------------------------------------------
On 2024-04-01T14:14:28+00:00 ries.infotec+kernel wrote:

(typo: -Resume-) Suspend was done via "systemctl suspend" command on my
hotkey "Strg-Super-End"

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/44

------------------------------------------------------------------------
On 2024-04-01T14:23:48+00:00 mario.limonciello wrote:

If 6.8 is more reliable you can also try to apply the patch to 6.7 or an
earlier kernel that could more easily trigger it.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/45

------------------------------------------------------------------------
On 2024-04-05T02:20:34+00:00 mario.limonciello wrote:

Any testing results for that patch idea?

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/46

------------------------------------------------------------------------
On 2024-04-05T08:09:01+00:00 ries.infotec+kernel wrote:

Hi Mario, sorry for not responding, I still haven't been able to
reproduce the bug. Just had it once after Kernel 6.8.x.

I will test once I have a reproduceable scenario.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/47

------------------------------------------------------------------------
On 2024-04-05T11:15:29+00:00 aros wrote:

(In reply to Peter Ries from comment #47)
> Hi Mario, sorry for not responding, I still haven't been able to reproduce
> the bug. Just had it once after Kernel 6.8.x. 
> 
> I will test once I have a reproduceable scenario.

Please try what triggers it for me 100%:

1. While the laptop is plugged in/connected to them mains, put it to sleep.
2. Unplug it for a little while - 20 seconds is enough I guess.
3. Plug it back it.
4. Wake it up.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/48

------------------------------------------------------------------------
On 2024-04-06T07:48:22+00:00 ries.infotec+kernel wrote:

Hi Arten, this unfortunately works for me - meaning the CPU frequency
does NOT get stuck if I do it like this.

- I put laptop to sleep
- unplugged
- waited 1 minute (without resuming on battery)
- plugged back in
- resume
-> CPU scales up and down as expected 

I just wonder what happened AFTER I had the effect with kernel 6.8.x
(only once)


I currently have

6.8.2-arch2-1

core/linux-firmware-whence 20240312.3b128b60-1
core/linux-firmware 20240312.3b128b60-1 

really weird

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/49

------------------------------------------------------------------------
On 2024-04-15T17:42:21+00:00 voidpointertonull+kernelorgbugzilla wrote:

Coming from Bug #217931 , I found the mentions of being stuck at low
frequency odd as I couldn't observe that despite managing multiple
hosts, but then here I am.

The twist is that I have a 7950X3D desktop setup, not a laptop one, and I 
apparently I just ran into the same low frequency issue others experienced.
Unfortunately the usefulness of my information will be limited as I'm on a not 
really customized Kubuntu 23.10 setup with kernel 6.5.0 , but on the other hand 
I haven't touched anything relevant, not even setting a frequency limit.

I'm observing the CPU being stuck in the 400 MHz - 549 MHz range which is quite 
fitting for this bug report, and the host was never suspended / hibernated.
The only relevant oddity I've found so far is that 
/sys/devices/system/cpu/cpu4/cpufreq/scaling_max_freq was sticking out like a 
sore thumb with 400000 set while other cores had the value of 5759000, but 
changing that didn't make a difference.

Not really sure when did this manifest itself, but highly likely after (or 
during?) a case of Bug #204253 as I brushed away the slowness for a while as 
the usual heavy I/O (over NFS) problem which even used to freeze the desktop 
for more than a minute on a weaker setup, but the current higher performance 
CPU seemed to take it better, although the experience was still disruptive.
Is this really a laptop bug then instead of a more generic problem with a large 
stutter causing some logic to get upset possibly due to timing problems? Heavy 
CPU usage alone surely doesn't do the trick as I've seen hosts doing fine with 
that, but heavy I/O seems more brutal with possibly similar "world stopping 
power" as suspending.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/50

------------------------------------------------------------------------
On 2024-04-15T19:01:52+00:00 mario.limonciello wrote:

Please let's stick to upstream kernels. It will just confuse the issue
with the distro kernel, ESPECIALLY a kernel that is EOL upstream. We had
other fixes that have landed in amd-pstate that are definitely missing
from a 6.5 kernel that could very well be a similar or same issue.

So please reproduce with 6.9-rc4 or 6.8.7. If you can still reproduce it
then please open a new issue and collect all possible information. If
it's indeed the same issue we can mark as a duplicate at that time.

This issue is looking like a thermal throttling issue where the APU
didn't properly ack a request from the EC while in suspend. I posted a
patch that gives the APU more time to ack it during suspend but it needs
to be tested still in a case that it can be reproduced reliably.

If it doesn't help, I would like to see if extending the time duration
in between cycles helps.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/51

------------------------------------------------------------------------
On 2024-04-15T19:35:54+00:00 voidpointertonull+kernelorgbugzilla wrote:

I believe the other issue was supposed to be strictly about limiting max 
frequency causing issues, and I'm definitely not doing that, but possibly I 
missed other fixed, I surely didn't keep up with everything.
Understood the warning though, but that's exactly why I pointed out that my 
kernel version might not be helpful.

The main point was that while all discussions seems to be about APUs, I
encountered an issue that appears to be really similar if not the same
with a desktop CPU. Just wanted this information to be added as I've
found 3 bug reports where this problem is mentioned but with only laptop
CPUs discussed.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/52

------------------------------------------------------------------------
On 2024-04-15T20:43:16+00:00 mario.limonciello wrote:

7950X3D is a desktop SoC, but IIRC it has integrated graphics. It's a
desktop APU.

But that aside, thermal throttling can be triggered even in CPU products
from the EC. The interface the EC uses to do this applies to both types
of parts.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/53

------------------------------------------------------------------------
On 2024-05-06T14:20:35+00:00 vanoverloopdaan wrote:

I can reproduce the issue on my HP Elitebook 845 G10 with an AMD Ryzen 7
Pro 7840U running Linux 6.8.6 with the following steps:

1. Ensure that the power adapter is not plugged in
2. Suspend the machine
3. Wait for 10 seconds
4. Plug in the power adapter
5. Wait for 10 seconds
6. Wake the machine
7. The CPU frequency is now stuck at 544 MHz

When I unplug the power adapter now, the frequency will immediately
start scaling up again. Replugging the power adapter again while the
device is awake is also okay.

I was unable to reproduce the issue by starting with a power adapter
plugged in and unplugging it before waking up. This did not seem to
cause any issues.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/54

------------------------------------------------------------------------
On 2024-05-06T14:23:30+00:00 aros wrote:

(In reply to Daan Vanoverloop from comment #54)
> I can reproduce the issue on my HP Elitebook 845 G10 with an AMD Ryzen 7 Pro
> 7840U running Linux 6.8.6 with the following steps:
> 
> 1. Ensure that the power adapter is not plugged in
> 2. Suspend the machine
> 3. Wait for 10 seconds
> 4. Plug in the power adapter
> 5. Wait for 10 seconds
> 6. Wake the machine
> 7. The CPU frequency is now stuck at 544 MHz
> 
> When I unplug the power adapter now, the frequency will immediately start
> scaling up again. Replugging the power adapter again while the device is
> awake is also okay.
> 
> I was unable to reproduce the issue by starting with a power adapter plugged
> in and unplugging it before waking up. This did not seem to cause any issues.

Exactly how I experience it and what this bug is about. Mario said he
would post a patch to reset the EC on resume and that should fix the
issue but I've not seen the patch yet.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/55

------------------------------------------------------------------------
On 2024-05-06T14:52:09+00:00 mario.limonciello wrote:

> Mario said he would post a patch to reset the EC on resume and that should
> fix the issue but I've not seen the patch yet.

Eh?  I don't recall saying I'd post a patch to reset EC on resume.

I did post a patch to this bug that could try to adjust the timing that
is waiting for testing though in case it's a race condition.  It will
force 10-20ms more time spent in the Linux kernel when the power adapter
is unplugged over suspend.

Also if it doesn't help, please modify it to make it 100-200ms.  This
should rule out a race condition.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/56

------------------------------------------------------------------------
On 2024-05-06T14:55:08+00:00 aros wrote:

> Eh?  I don't recall saying I'd post a patch to reset EC on resume.

My memory is faltering obviously. Sorry.

> The Windows equivalent of the amd-pmf driver on this HP system uses the
> features in kernel 6.8 that I've been asking Artem to test.

No, kernel 6.8 didn't fix the issue for me.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/57

------------------------------------------------------------------------
On 2024-05-06T15:16:52+00:00 vanoverloopdaan wrote:

> I did post a patch to this bug that could try to adjust the timing that is
> waiting for testing though in case it's a race condition.  It will force
> 10-20ms more time spent in the Linux kernel when the power adapter is
> unplugged over suspend.  
>
> Also if it doesn't help, please modify it to make it 100-200ms.  This should
> rule out a race condition.


I will apply this patch later today or tomorrow and report back on whether I 
can still reproduce this issue.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/58

------------------------------------------------------------------------
On 2024-05-06T17:26:23+00:00 mario.limonciello wrote:

Created attachment 306264
debugging patch

I'm attaching a patch that isn't upstreamed at the moment, but you can
apply to your kernel to try to capture a debug register for me.  Apply
it to your kernel and then read the register value like this:

echo "0x59804" | sudo tee /sys/kernel/debug/amd_nb/smn_address
sudo cat /sys/kernel/debug/amd_nb/smn_value

Here is what a reasonable value looks like on my local system:
$ echo "0x59804" | sudo tee /sys/kernel/debug/amd_nb/smn_address
$ sudo cat /sys/kernel/debug/amd_nb/smn_value
0x017f1201

Share to me the values that you get from smn_value in these 3 situations:
1) At bootup (before you suspend)
2) After you've suspended and reproduced the issue
3) After you've done the W/A to undo the issue.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/59

------------------------------------------------------------------------
On 2024-05-07T08:24:40+00:00 vanoverloopdaan wrote:

I applied your patch, but I'm not able to reproduce the issue at home.
When the issue doesn't occur, I find the same smn_value as you. It might
be related to the specific power adapter I use at work, or other devices
that were plugged in. I will try to reproduce the issue tomorrow at
work, and try to narrow it down to a specific device that's plugged in.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/60

------------------------------------------------------------------------
On 2024-05-08T07:39:54+00:00 vanoverloopdaan wrote:

I was able to reproduce the issue consistently at work with the USB
power adapter that was included with the laptop, with or without a
display and USB devices plugged in. I was not able to reproduce the
issue with a different USB power adapter at home.

These are the smn values I found:

1) at bootup: 0x017f1201
2) after reproducing the issue: 0x017f1201
3) after doing the workaround (unplugging the power adapter): 0x017f1221

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/61

------------------------------------------------------------------------
On 2024-05-08T11:14:58+00:00 mario.limonciello wrote:

Are you sure you didn't mix up 2 & 3?

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/62

------------------------------------------------------------------------
On 2024-05-08T11:21:54+00:00 vanoverloopdaan wrote:

Yes, I tried it a few times and I'm pretty sure this is correct. I
noticed that the value only changes to 0x017f1221 after unplugging the
power adapter. But I'll try again just to make sure.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/63

------------------------------------------------------------------------
On 2024-05-08T11:42:22+00:00 vanoverloopdaan wrote:

I just encountered the issue again, but this time I unplugged my power
adapter while the device was suspended, which can also trigger this bug.
When I look at the smn_value, I find 0x017f1201 again (the "normal"
value). When I plug in the adapter, I can work around the issue and find
smn_value 0x017f1221. The value will stay on 0x017f1221 until I do a
suspend/wake cycle, which resets it back to 0x017f1201, regardless of
whether I the low clock speed issue occurred or not. Any kind of
plugging or unplugging of the power adapter while the device is awake
causes it to change to 0x017f1221. Plugging and unplugging while
suspended does not seem to have any effect on this value, as it would
always reset to 0x017f1201 when waking from suspend.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/64

------------------------------------------------------------------------
On 2024-05-08T22:38:07+00:00 mario.limonciello wrote:

Especially paired with the fact that different adapters don't trigger it
I stand by this being an EC issue as the EC controls the throttling
behavior.

I suggest you guys raise with HP and point them at this issue.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/65

------------------------------------------------------------------------
On 2024-05-09T07:49:11+00:00 darkbasic wrote:

> Especially paired with the fact that different adapters don't trigger it I
> stand by this being an EC issue as the EC controls the throttling behavior.

What does EC stand for?

Might this (https://h30434.www3.hp.com/t5/Business-Notebooks/HP-
Elitebook-865-G10-w-AMD-Ryzen-9-PRO-7940HS-cannot-sustain/m-p/9061799)
be related?

What's weird is that it only happens when I'm using the external
monitors plugged into the dock, but I don't have any problem if I'm just
using the dock's ethernet adapter or USB hub.

> I suggest you guys raise with HP and point them at this issue.

Easier said that done: they don't care about Linux via the official support 
channels.
I'm sure there is someone who cares because they distribute updates via LVFS 
and they even sold Linux laptops like the HP Dev One but I have no idea how to 
reach whoever could be interested to fix this.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/66

------------------------------------------------------------------------
On 2024-05-09T08:46:58+00:00 aros wrote:

(In reply to Mario Limonciello (AMD) from comment #65)
> Especially paired with the fact that different adapters don't trigger it I
> stand by this being an EC issue as the EC controls the throttling behavior.
> 
> I suggest you guys raise with HP and point them at this issue.

But why does it affect only Linux?

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/67

------------------------------------------------------------------------
On 2024-05-09T13:31:49+00:00 mario.limonciello wrote:

> What does EC stand for?

EC is "Embedded Controller".  Here's the ACPI specification for how it
is supposed to be interacted with:

https://uefi.org/specs/ACPI/6.5/12_Embedded_Controller_Interface_Specification.html

It's a black box to anyone but the system manufacturer.

> Might this
> (https://h30434.www3.hp.com/t5/Business-Notebooks/HP-Elitebook-865-G10-w-AMD-Ryzen-9-PRO-7940HS-cannot-sustain/m-p/9061799)
> be related?

> What's weird is that it only happens when I'm using the external monitors
> plugged into the dock, but I don't have any problem if I'm just using the
> dock's ethernet adapter or USB hub.

Yes, it "could" be related. This is getting OT, but if you have enough
ports on your laptop without a dock you could try to plug dongle(s) for
monitor(s) and a regular power adapter and see if you can reproduce the
same behavior.

> Easier said that done: they don't care about Linux via the official support
> channels.

:/

> But why does it affect only Linux?

As it pertains to how the sleep wake up works, Linux and Windows work
slightly differently.  Windows has a concept of "dark screen wakeup"
after any wakeup event and will move in and out of hardware sleep while
in this state.  Linux once you get a wakeup event if it's not enough to
wake the system (such as the ACPI SCI but no other interrupt) then it
goes back to sleep immediately.

This difference of behavior has uncovered bugs where the X86 cores race
for some of the same resources with the power management firmware on
earlier hardware.

So my working theory has been some timing margins for throttling are not
being met when suspend/resume has occurred under Linux.  That's why I
was suggesting patches to try to keep the kernel alive longer when a
power adapter event wakes the APU.  But the behavior and timing of when
to throttle are totally controlled by the EC.  So if there is a timing
problem and forcing the X86 cores to be awake longer doesn't help I'm
not sure what else we can do without HP coming to the table to debug
from their EC perspective.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/68

------------------------------------------------------------------------
On 2024-05-11T09:23:02+00:00 darkbasic wrote:

> Yes, it "could" be related. This is getting OT, but if you have enough ports
> on your laptop without a dock you could try to plug dongle(s) for monitor(s)
> and a regular power adapter and see if you can reproduce the same behavior.

https://www.amazon.it/sicotool-Adattatore-DisplayPort-Thunderbolt-
Compatibile/dp/B08B647L2X

Would something like this work on Phoenix (HP Elitebook 865 G10)?
I'm pretty sure it requires DP Alt mode.

Also, would it support Displayport MST?
I would like to keep my setup the same to make the test more valid and I'm 
currently using two Dell UltraSharp U2515H attached via a single mini DP cable.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/69

------------------------------------------------------------------------
On 2024-05-11T10:25:31+00:00 mario.limonciello wrote:

Yes that should work, but the resolution availability will depend upon
how much bandwidth your connection series needs.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/70

------------------------------------------------------------------------
On 2024-08-12T18:55:46+00:00 prasun.gera wrote:

> > I suggest you guys raise with HP and point them at this issue.
> 
> Easier said that done: they don't care about Linux via the official support
> channels.
> I'm sure there is someone who cares because they distribute updates via LVFS
> and they even sold Linux laptops like the HP Dev One but I have no idea how
> to reach whoever could be interested to fix this.


This also affects Rembrandt (845 G9), in case someone from HP makes it to this 
bug report.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/71

------------------------------------------------------------------------
On 2024-08-12T22:19:26+00:00 aros wrote:

(In reply to Prasun from comment #71)
> This also affects Rembrandt (845 G9), in case someone from HP makes it to
> this bug report.

Sadly it looks like HP generally doesn't care about Linux and Linux
support or compatibility for the G line of laptops has never been
mentioned either.

I'm marking it as INVALID because Maria has basically said it's a bug in
the EC (code).

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/72

------------------------------------------------------------------------
On 2024-08-12T22:20:07+00:00 aros wrote:

Mario, I meant Mario. Sorry :-)

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/73

------------------------------------------------------------------------
On 2024-09-19T07:57:28+00:00 vanoverloopdaan wrote:

The issue seems to be resolved for me when running kernel 6.10.5 and
firmware (01.05.11 Rev.A), which can be installed from LVFS using
fwupdmgr (or one of the wrapper GUIs like GNOME Software).

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/74

------------------------------------------------------------------------
On 2024-09-19T12:05:01+00:00 vanoverloopdaan wrote:

Apologies for the false alert, it is not fixed. I was just unable to
reproduce it when I wanted to.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/75

------------------------------------------------------------------------
On 2024-12-22T02:59:53+00:00 jameshogge wrote:

Just want to add that I have exactly the same issue on a Lenovo Thinkpad
P16v with a Ryzen 7840HS. Without fail, if I close the lid, wait until
the laptop is sleeping, unplug the charger, and open the lid again, the
CPU will be locked between 400-544MHz.

Reinserting the charger does not fix the issue but subsequently
disconnecting it does.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/90

------------------------------------------------------------------------
On 2024-12-22T03:00:54+00:00 jameshogge wrote:

Should add that I'm running Ubuntu 24.04. Kernel version 6.8.0

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/91

------------------------------------------------------------------------
On 2024-12-22T07:19:32+00:00 aros wrote:

So, it's not just HP.

Mario, how come two unrelated vendors have the same EC bug? ;-)

Maybe AMD released something buggy to its partners in the first place?
;-)

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/92

------------------------------------------------------------------------
On 2024-12-22T15:52:03+00:00 mario.limonciello wrote:

> Mario, how come two unrelated vendors have the same EC bug? ;-)

All I can do is hypothesize. Maybe they licensed the same EC? Maybe they
use the same ODM?

> Maybe AMD released something buggy to its partners in the first place?
;-)

AMD doesn't release EC code, this is proprietary to OEMs. I have tried;
I can't reproduce on reference hardware.

Fwiw I have also tried to reproduce this on a Framework laptop which has
an open source EC and can't reproduce there either.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/93

------------------------------------------------------------------------
On 2024-12-22T19:55:18+00:00 aros wrote:

(In reply to Mario Limonciello (AMD) from comment #79)
> > Mario, how come two unrelated vendors have the same EC bug? ;-)
> 
> All I can do is hypothesize. Maybe they licensed the same EC? Maybe they use
> the same ODM?
> 
> > Maybe AMD released something buggy to its partners in the first place? ;-)
> 
> AMD doesn't release EC code, this is proprietary to OEMs. I have tried; I
> can't reproduce on reference hardware.
> 
> Fwiw I have also tried to reproduce this on a Framework laptop which has an
> open source EC and can't reproduce there either.

Got it. I thought the EC code is sourced by AMD.

But that makes it even more suspicious. Why would such a critical
component be outsourced?

That almost sounds like a recipe for a disaster. Possibly even a
security issue?

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/94

------------------------------------------------------------------------
On 2024-12-22T20:15:14+00:00 jameshogge wrote:

Frustrating... I've tried raising this with Lenovo too. I've had
relatively useful customer support experiences with the in the past. If
it turns out to be something they fix on their end, I'll say.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/95

------------------------------------------------------------------------
On 2025-01-06T03:47:29+00:00 mario.limonciello wrote:

> But that makes it even more suspicious. Why would such a critical component
> be outsourced?

That's how the PC industry is.  This isn't an "AMD" specific thing.

> That almost sounds like a recipe for a disaster. Possibly even a security
> issue?

How is it any different than BIOS, or PD controller, or any other
component?  OEM designs have OEM specific bugs.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/96

------------------------------------------------------------------------
On 2025-01-06T03:52:24+00:00 aros wrote:

(In reply to Mario Limonciello (AMD) from comment #82)
> > But that makes it even more suspicious. Why would such a critical component
> > be outsourced?
> 
> That's how the PC industry is.  This isn't an "AMD" specific thing.
> 
> > That almost sounds like a recipe for a disaster. Possibly even a security
> > issue?
> 
> How is it any different than BIOS, or PD controller, or any other component?
> OEM designs have OEM specific bugs.

I hope how the EC works and its firmware could be validated by AMD in
the future. Or maybe you've already implemented that - I've not heard
about this issue from Zen 5 owners.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/97

------------------------------------------------------------------------
On 2025-04-10T18:01:50+00:00 mail wrote:

I've been experiencing the same issue on my HP ZBook Power 11g a (with
AMD Ryzen 9 PRO 8945HS) on both Linux 6.8.12 and also 6.11.0 (with
Ubuntu patches).

In particular, the variant of this issue that reproduces reliably for me is:
1. Laptop on AC
2. Suspend
3. While suspended, unplug AC, wait a few seconds
4. While still suspended, reconnect AC
5. Wake up laptop
6. CPU frequency no longer exceeds 544 MHz until I unplug and/or replug AC again

I've tried your patch Mario that enables the sleep for all CPUs, but I was 
still able to reproduce the issue.
However, I then updated it to sleep for 100-200ms (instead of the 10-20) as you 
had suggested and I'm no longer seeing the issue.

So it appears that this is a valid workaround until HP fixes their
firmware (if ever). Thanks so much for investigating this Mario and
figuring out a fix!

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/100

------------------------------------------------------------------------
On 2025-04-10T18:05:35+00:00 mario.limonciello wrote:

For other $REASONS I'm planning to increase that time generically.  So
I'm really happy to hear it helps with this issue too.

can you please try to change it to msleep(2500)?  Is everything still OK
with that?

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/101

------------------------------------------------------------------------
On 2025-04-10T18:15:52+00:00 mail wrote:

Wonderful. Based on initial testing, replacing the `usleep_range(10000,
20000);` by `msleep(2500);` also avoids the race condition.

I haven't tested if this causes any regressions elsewhere. Is there anything I 
should be looking out for? Maybe increased battery drain while in sleep? (Not 
sure how frequently these "dark screen" wakeups are happening)
I'll keep the msleep(2500) in for the time being and report if I notice any 
adverse effects.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/102

------------------------------------------------------------------------
On 2025-04-10T18:30:01+00:00 mario.limonciello wrote:

The side effect will be that the APU doesn't go back to HW sleep for an
extra 2.5s on any time the ACPI SCI is fired.  It's fired for things
like power adapter and lid.

So look out for anything happening around those events that's out of the
ordinary.

Side note; with that 2.5s in place can you get me a report generated
with

https://web.git.kernel.org/pub/scm/linux/kernel/git/superm1/amd-debug-
tools.git/tree/amd_s2idle.py

and changing the power adapter in a way that would have caused the
failure before the time you program the script?  I'll just double check
if I see anything funny.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/103

------------------------------------------------------------------------
On 2025-04-10T18:59:33+00:00 mail wrote:

Created attachment 307952
s2idle_report-on_ac_with_reproduction_steps.txt

I'm not 100% sure if this is what you had in mind (let me know if not),
but this is what I did:

1. Ran the script with a suspend time of 30s without performing any of
the reproduction steps (I did this while on BAT - the measured power
consumption looks high at 9W, but I think the 30s is not enough to
measure a reliable value)

2. Ran the script with a suspend time of 30s, and performed the
reproduction steps that used to trigger the race condition *while the
script had suspended my laptop*.

Both runs were done with the kernel that has your patch with
msleep(2500). Let me know if you wanted me to test any other
constellation.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/104

------------------------------------------------------------------------
On 2025-04-10T18:59:43+00:00 mail wrote:

Created attachment 307953
s2idle_report-on_bat_without_reproduction_steps.txt

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/105

------------------------------------------------------------------------
On 2025-04-10T19:06:48+00:00 mario.limonciello wrote:

Thanks, that's exactly what I had in mind.  I wanted to see what the
very first Notify event was - and it was for battery (BAT0).

It's interesting to me that there are notify events for C000
(Processor), NPCF (Looks like NVIDIA device) and PMF_ (AMD PMF device).

The PMF device notifications will have some patches in 6.16, but I don't
think they'll affect this.

Thanks again!  Look out for formal patches soon.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/106

------------------------------------------------------------------------
On 2025-04-14T18:27:02+00:00 mario.limonciello wrote:

Here is the submission for 2.5s:

https://lore.kernel.org/platform-driver-x86/70dfa642-4c97-4aaf-
aa79-70127974f...@amd.com/T/#m174cfe2f4ec5893e39cff6994a93ebd499ec29e7

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/107

------------------------------------------------------------------------
On 2025-04-14T18:45:46+00:00 mail wrote:

Created attachment 307963
attachment-10741-0.html

Confirming for the record that I haven't noticed any adverse side
effects of the 2.5s sleep since last Thursday. Battery consumption
during sleep doesn't seem to have meaningfully changed either.

Looking forward to seeing this patched in an upcoming kernel release.
Thanks Mario for the fix! ...and Artem for raising the thread here to
kick this all off! 

On Mon, 2025-04-14 at 18:27 +0000, bugzilla-dae...@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=218305
> 
> --- Comment #91 from Mario Limonciello (AMD)
> (mario.limoncie...@amd.com) ---
> Here is the submission for 2.5s:
> 
>
> https://lore.kernel.org/platform-driver-x86/70dfa642-4c97-4aaf-aa79-70127974f...@amd.com/T/#m174cfe2f4ec5893e39cff6994a93ebd499ec29e7
>

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/108

------------------------------------------------------------------------
On 2025-04-24T08:12:43+00:00 jameshogge wrote:

I've applied this patch to my kernel (v6.8.0) and it seems to have fixed
the issue on my Thinkpad also! Only been using it for a day but I can no
longer replicate the 400MHz and I haven't seen any negative side effects
yet.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2088733/comments/111


** Changed in: linux
       Status: Unknown => Invalid

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2088733

Title:
  low CPU frequency after wake up AMD Ryzen

Status in Linux:
  Invalid
Status in linux package in Ubuntu:
  Confirmed

Bug description:
  After wake up I can see at least once a week issue with CPU frequency
  not going up. When running:

  $ watch lscpu -e=CPU,MHZ

  standard output looks like:

  CPU       MHZ
    0  400.0000
    1 1383.2480
    2  400.0000
    3  400.0000
    4  400.0000
    5 2699.7561
    6 1288.0500
    7  400.0000
    8  400.0000
    9  400.0000
   10  400.0000
   11  400.0000
   12 3244.0720
   13  400.0000
   14  400.0000
   15 1295.9050

  while when the issue occurs, I can't see 600 Mhz or higher values in
  the same graph. Which also means that response time of computer is
  much lower and everything feels lazy.

  Restart fixes the issue or plug power cable off and in again.

  My CPU is AMD Ryzen™ 7 PRO 7840HS w/ Radeon™ 780M Graphics × 16
  My kernel version: 6.8.0-48-generic #48-Ubuntu SMP PREEMPT_DYNAMIC Fri Sep 27 
14:04:52 UTC 2024
  OS version: Ubuntu 24.04.1 LTS
  My laptop: HP ZBook Firefly 14 inch G10 A Mobile Workstation PC

  I know about one more person with same machine type with the issue and
  there is also this question on askubuntu which says there is third
  person with this issue.

  https://askubuntu.com/questions/1531956/cpu-too-slow-after-waking-up-
  in-ubuntu-24-04-1

  This bug looks pretty similar however should be fixed already, so creating 
new one
  https://bugs.launchpad.net/ubuntu/+source/linux-hwe-5.19/+bug/2007718

To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/2088733/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to