Launchpad has imported 45 comments from the remote bug at
https://bugzilla.kernel.org/show_bug.cgi?id=210741.

If you reply to an imported comment from within Launchpad, your comment
will be sent to the remote bug automatically. Read more about
Launchpad's inter-bugtracker facilities at
https://help.launchpad.net/InterBugTracking.

------------------------------------------------------------------------
On 2020-12-17T01:13:11+00:00 dsmythies wrote:

Created attachment 294171
Graph of load sweep up and down at 347 Hertz.

Consider a steady state periodic single threaded workflow, with a work/sleep 
frequency of 347 Hertz and a load somewhere in the ~75% range at the steady 
state operating point.
For the intel-cpufreq CPU frequency scaling driver and powersave governor and 
hwp disabled, it goes indefinitely without any issues.
For the acpi-cpufreq CPU frequency scaling driver and ondemand governor, it 
goes indefinitely without any issues.
For the intel-cpufreq CPU frequency scaling driver and powersave governor and 
hwp enabled, it suffers from overruns.

Why?

For unknown reasons, HWP seems to incorrectly decide that the processor
is idle and spins the PLL down to a very low frequency. Upon exit from
the sleep portion of the periodic workflow it takes a very long time (on
the order of 20 milliseconds (supporting data for that statement will
added in a later posting)), resulting in the periodic job no being able
to complete its work before the next interval, whereas it normally has
plenty of time to do its work. Actually, typical worst case overruns are
around 12 milliseconds, or several work/sleep periods (i.e. it takes a
very long time to catch up.)

The probability of this occurring is about 3%, but varies significantly.
Obviously, the recovery time is also a function of EPP, but mostly this
work has been done with the default EPP of 128. I believe this to be a
sampling and anti-aliasing issue, but can not prove it because HWP is
black box. My best GUESS is:

If the periodic load is busy on a jiffy boundary, such that the tick is on.
Then if it is sleeping at the next jiffy boundary, with a pending wake such 
that idle state 2 was used.
  Then if the rest of the system was idle such that HWP decides to spin down 
the PLL.
    Then it is highly probable that upon that idle state 2 exit, the PLL is too 
slow to ramp up and the task will overrun as a result.
Else everything will be fine.

For a 1000 Hz kernel the above suggests that a work/sleep frequency of 500 Hz 
should behave in a binary way, either lots of overruns or none. It does.
For a 1000 Hz kernel the above suggests that a work/sleep frequency of 333.333 
Hz should behave in a binary way, either lots of overruns or none. It does.
Note: in all cases the sleep time has to be within the window of opportunity.

Now, actually I can not prove if the idle state 2 part is a cause or
consequence, but it never happens with it disabled, but at the cost of
significant power.

Another way this issue would manifest itself is as seeming to be an
extraordinary idle exit latency, but would be rather difficult to
isolate as the cause.

processors tested:
Intel(R) Core(TM) i5-9600K CPU @ 3.70GHz (mine)
Intel(R) Core(TM) i5-6200U CPU @ 2.30GHz (not mine)

HWP has been around for years, why am I just reporting this now?

I never owned an HWP capable processor before. My older i7-2600K based
test computer was getting a little old, so I built a new test computer.
I noticed this issue the same day I first enabled HWP. That was months
ago (notice the dates on the graphs that will eventually be added to
this), and I tried, repeatedly, to get help from Intel via the linux-pm
e-mail list.

Now, given the above system response issue, a new test was developed to
focus specifically on this issue, dubbed the "Inverse Impulse Response"
test. It examines in great detail the CPU frequency rise time after a
brief, less than 1 millisecond, gap in an otherwise continuous workflow.
I'll attach graphs and details in subsequent postings to this bug
report.

While I believe this is an issue entirely within HWP, I have not been
able to prove that there was nothing sent from the kernel somehow
telling HWP to spin down.

Notes:

CPU affinity does not need to be forced, but sometimes is for data
acquisition.

1000 hertz kernels were tested back to kernel 5.2, all failed.

Kernel 5.10-rc7 (I have yet to compile 5.10) also fails.

A 250 hertz kernel was tested, and it did not have this issue in this
area. Perhaps elsewhere, I didn't look.

Both teo and menu idle governors were tested, and while both suffer from
the unexpected CPU frequency drop, teo seems much worse. However failure
points for both governors are repeatable.

The test computers were always checked for any throttling log sticky
bits, and regardless were never anywhere even close to throttling.

Note, however that every HWP capable computer I was to acquire data from
has at least one of those sticky bits set after boot, so they need to be
reset before any test that might want to examine them afterwards.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917813/comments/0

------------------------------------------------------------------------
On 2020-12-17T01:15:34+00:00 dsmythies wrote:

Intel: Kristen hasn't been the maintainer for years. please update the
auto-assigned thing.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917813/comments/1

------------------------------------------------------------------------
On 2020-12-17T01:23:05+00:00 dsmythies wrote:

Created attachment 294173
Graph of an area of concern breaking down.

an experiment was done looking around the area initially found at 347
hertz work/sleep frequency of the periodic workflow and load.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917813/comments/2

------------------------------------------------------------------------
On 2020-12-17T01:29:27+00:00 dsmythies wrote:

Created attachment 294175
Graph of overruns from the same experiment as the previous post

There should not be overruns. (sometimes there are 1 or 2 from the first
time start up)

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917813/comments/3

------------------------------------------------------------------------
On 2020-12-17T01:47:08+00:00 dsmythies wrote:

Created attachment 294177
inverse impulse test - short short sleep exit response

Good and bad inverse impulse response exits all on one graph.

The graph mentions 5 milliseconds a lot. At that time I did not know
that the frequency steps times are a function of EPP. I have since
mapped the entire EPP space getting:

0 <= EPP <= 1 : unable to measure.
2 <= EPP <= 39 : 2 milliseconds between frequency steps
40 <= EPP <= 55 : 3 milliseconds between frequency steps
56 <= EPP <= 79 : 4 milliseconds between frequency steps
80 <= EPP <= 133 : 5 milliseconds between frequency steps
134 <= EPP <= 143 : 6 milliseconds between frequency steps
144 <= EPP <= 154 : 7 milliseconds between frequency steps
155 <= EPP <= 175 : 8 milliseconds between frequency steps
176 <= EPP <= 255 : 9 milliseconds between frequency steps

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917813/comments/4

------------------------------------------------------------------------
On 2020-12-17T01:55:16+00:00 dsmythies wrote:

Created attachment 294179
inverse impulse response - multiple (like 1000) bad exits

by capturing a great many bad exits, one can begin to observe the width
of the timing race window (which I already knew from other work, but
don't think I wrote it herein yet). the next few attachments will drill
down into some details of this same data.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917813/comments/5

------------------------------------------------------------------------
On 2020-12-17T02:01:56+00:00 dsmythies wrote:

Created attachment 294181
inverse impulse response - multiple (like 1000) bad exits - detail A


just a zoomed in graph of an area of interest, so I could verify that the 
window size was the same as (close enough) as what I asked for. The important 
point being that the window is always exactly around the frequency step point.

Now we already know that the frequency step points are aHWP thing, so
this data supports the argument that HWP is doing this stuff on its own.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917813/comments/6

------------------------------------------------------------------------
On 2020-12-17T03:59:47+00:00 dsmythies wrote:

Created attachment 294185
inverse impulse response - multiple (like 1000) bad exits - detail

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917813/comments/7

------------------------------------------------------------------------
On 2020-12-17T04:02:56+00:00 dsmythies wrote:

Created attachment 294187
inverse impulse response - multiple (like 1000) bad exits - detail C

 the previous and this one are details B and C zoomed in looks at
another two spots. Again calculating the window width.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917813/comments/8

------------------------------------------------------------------------
On 2020-12-17T06:25:20+00:00 dsmythies wrote:

Created attachment 294189
inverse impulse response - i5-6200u multi all bad

this is the other computer. there are also detail graphs, if needed.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917813/comments/9

------------------------------------------------------------------------
On 2020-12-17T17:18:05+00:00 dsmythies wrote:

Created attachment 294201
Just an example of inverse impulse verses some different EPPs

see also:

https://marc.info/?l=linux-pm&m=159354421400342&w=2

and on that old thread, I just added a link to this.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917813/comments/10

------------------------------------------------------------------------
On 2020-12-19T16:56:45+00:00 dsmythies wrote:

> A 250 hertz kernel was tested, and it did not have this
> issue in this area. Perhaps elsewhere, I didn't look.

Correction: same thing for 250 Hertz kernel.

Some summary data for the periodic workflow manifestation of the issue.
347 hertz work/sleep frequency, fixed packet of work to do per cycle, 5
minutes, kernel 5.10, both 1000 Hz and 250 Hz, teo and menu idle
governors, idle state 2 enabled and disabled.

1000 Hz, teo, idle state 2 enabled:
overruns 28399
maximum catch up 13334 uSec
Ave. work percent: 76.767
Power: ~14.5 watts

1000 Hz, menu, idle state 2 enabled:
overruns 835
maximum catch up 10934 uSec
Ave. work percent: 68.106
Power: ~16.3 watts

1000 Hz, teo, idle state 2 disabled:
overruns 0
maximum catch up 0 uSec
Ave. work percent: 67.453
Power: ~16.8 watts (+2.3 watts)

1000 Hz, menu, idle state 2 disabled:
overruns 0
maximum catch up 0 uSec
Ave. work percent: 67.849
Power: ~16.4 watts (and yes the 0.1 diff is relevant)

250 Hz, teo, idle state 2 enabled:
overruns 193
maximum catch up 10768 uSec
Ave. work percent: 68.618
Power: ~16.1 watts

250 Hz, menu, idle state 2 enabled:
overruns 22
maximum catch up 10818 uSec
Ave. work percent: 68.607
Power: ~16.1 watts

250 Hz, teo, idle state 2 disabled:
overruns 0
maximum catch up 0 uSec
Ave. work percent: 68.550
Power: ~16.1 watts

250 Hz, menu, idle state 2 disabled:
overruns 0
maximum catch up 0 uSec
Ave. work percent: 68.586
Power: ~16.1 watts

So, the reason I missed the 250 hertz kernel in my earlier work, was
because the probability was so much less. The probability is less
because the operating point is so different between the teo and menu
governors and the 1000 and 250 Hz kernels. i.e. there is much more spin
down margin for the menu case.

The operating point difference between difference between the 250 Hz and
1000 Hz kernels for teo is worth a deeper look.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917813/comments/11

------------------------------------------------------------------------
On 2020-12-20T17:02:18+00:00 dsmythies wrote:

additionally, and for all other things being equal, the use of idle
state 2 is dramatically different between the 1000 (0.66%) and 250
(0.03%) Hertz kernels, resulting in differing probabilities of hitting
the timing window while in idle state 2.

HWP does not work correctly in these scenarios.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917813/comments/12

------------------------------------------------------------------------
On 2020-12-21T22:58:31+00:00 dsmythies wrote:

Created attachment 294275
Graph of load sweep at 200 Hertz for various idle states

> Now, actually I can not prove if the idle state 2 part
> is a cause or consequence, but it never happens with it
> disabled, but at the cost of significant power.

idle state 2, combined with the timing window, which is much much larger
than previously known, is the cause.

The CPU load is increased to max, then decreased. As a side note, there
is a staggering amount of hysteresis and very long time constants
involved here.

If one just sits and watches turbostat with the system supposedly in
steady state operation, HWP can be observed very gradually (10s of
seconds) deciding that it can reduce the CPU frequency, thus saving
power. Then it has one of these false frequency drops, HWP struggles to
catch up, raising the CPU frequency as it does so, and the cycle
repeats.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917813/comments/13

------------------------------------------------------------------------
On 2020-12-29T19:35:07+00:00 dsmythies wrote:

Created attachment 294399
step function system response - overview

1514 step function tests were done.
the system response was monitored each time.
For 93% of the tests, the system response was as expected.
(do not confuse "as expected" with "ideal" or "best".)
For 7% of the tests the system response was not as expected, being much much 
too slow and taking way too long thereafter to completely come up to speed.

Note: The y-axis of these graphs is now "gap-time" instead of CPU
frequency. This was not done to confuse the reader, but the reverse
frequency calculation was not done on purpose. It is preferable to
observe the data in units of time, without introducing frequency errors
due to ISR and other latency gaps. Approximate CPU frequency conversions
have been added.

While I will post about 5 graphs for this experiment, I have hundreds
and have done many different EPPs and on and on ...

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917813/comments/14

------------------------------------------------------------------------
On 2020-12-29T19:36:10+00:00 dsmythies wrote:

Created attachment 294401
step function system response - detail A

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917813/comments/15

------------------------------------------------------------------------
On 2020-12-29T19:37:26+00:00 dsmythies wrote:

Created attachment 294403
step function system response - detail B

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917813/comments/16

------------------------------------------------------------------------
On 2020-12-29T19:39:18+00:00 dsmythies wrote:

Created attachment 294405
step function system response - detail B-1

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917813/comments/17

------------------------------------------------------------------------
On 2020-12-29T19:40:49+00:00 dsmythies wrote:

Created attachment 294407
step function system response - detail B-2

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917813/comments/18

------------------------------------------------------------------------
On 2021-01-02T17:23:02+00:00 dsmythies wrote:

Created attachment 294469
step function system response - idle state 2 disabled

1552 test runs with idle state 2 disabled, no failures.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917813/comments/19

------------------------------------------------------------------------
On 2021-01-16T22:29:58+00:00 dsmythies wrote:

Created attachment 294685
a set of tools for an automated test

At this point, I have provided 3 different methods that reveal the same
HWP issue. Herein, tools are provided to perform an automated quick test
to answer the question "does my processor have this HWP issue?"

The motivation for this automation is to make it easier to test other
HWP capable Intel processors. Until now the other methods for
manifesting the issue have required "tweeking", and have probabilities
of occurrence even lower than 0.01%, requiring unbearably long testing
times (many hours) in order to acquire enough data to be statistically
valid. Typically, this test provides PASS/FAIL results in about 5
minutes.

The test changes idle state enabled/disabled status, requiring root
rights to do so. The scale for the fixed workpacket periodic workflow is
both arbitrary and different between processors. The test runs in two
steps: The first finds the operating point for the test (i.e. it does
the "tweeking" automatically); The second does the actual tests one
without idle state 2 and one with only idle state 2 (recall that the
issue is linked with the use of idle state 2). Forcing idle state 2
greatly increases the probability of the issue occurring. While this
test has been created specifically for the intel_pstate CPU frequency
scaling driver with HWP enabled and the powersave governor, it doesn't
check. Therefore one way to test the test is to try it with HWP
disabled.

Note: the subject test computer must be able to run one CPU at 100%
without needing to throttle (power or thermal or any other reason),
including with only idle state 2 enabled.

Results so far: 3 of 3 processors FAIL; i5-9600k; i5-6200U; i7-10610U.

use this command:

./job-control-periodic 347 6 6 900 10

Legend:
347 hertz work/sleep frequency
6 seconds per iteration run.
6 seconds per test run.
try for approximately 900 uSec average sleep time.
10 test loops at that 6 seconds per test.

the test will take about 5 minutes.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917813/comments/20

------------------------------------------------------------------------
On 2021-01-16T22:39:25+00:00 dsmythies wrote:

Created attachment 294687
an example run of the quick test tools

the example contains results for:
HWP disabled: PASS (as expected)
HWP enabled: FAIL (as expected)

but tests were also done with a 250 Hertz kernel, turbo disabled, EEO
and RHO bits changed... all give FAIL for HWP enabled forcing idle state
2, and PASS for other conditions.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917813/comments/21

------------------------------------------------------------------------
On 2021-02-09T00:29:13+00:00 dsmythies wrote:

some other results for the quick test:

i5-9600k (Doug): FAIL. (Ubuntu 20.04; kernel any)
i5-6200U (Alin): FAIL. (Debian.)
i7-7700HQ (Gunnar): FAIL (Ubuntu 20.10)
i7-10610U (Russell) : FAIL. (CentOS (RedHat 8), 4.18.0-240.10.1.el8_3.x86_64 #1 
SMP).
another Skylake(Rick) still waiting to hear back.

so 4 out of 4 so far (and I gave them no guidance at all, on purpose, as
to any particular kernel to try).

I have been picking away at this thread (pun intended) for months, and I
think it is finally starting to unravel. Somewhere above i said:

> For unknown reasons, HWP seems to incorrectly decide
> that the processor is idle and spins the PLL down to
> a very low frequency.

I now believe it to be something inside the processor, but maybe not
part of HWP. I think that non-hwp processors or ones with it disabled,
also misdiagnose that the entire processor is idle. My evidence is both
not very thorough and not currently in a presentable form, but this
issue only ever occurs some short time or immediately after every core
has been idle, with at least one in idle state 2. The huge difference
between HWP and OS driven pstates is that the OS knows the system wasn't
actually idle and HWP doesn't. Even though package C1E is disabled it
behaves, perhaps, similar to be it being enabled.

There is some small timing window where this really screws up. Mostly is
works fine, and either the CPU frequency doesn't even ramp down at all,
or it recovers quickly, within about 120 uSec.

And as far as I know, it exits the idle state O.K. but it takes an
incredibly long time for HWP to ramp up the CPU frequency again.
Meanwhile, any non-HWP approach doesn't drop the pstate request to
minimum nor re-start any sluggish ramp up.

Now, this issue is rare and would be extremely difficult to diagnose
appearing as occasional glitches, i.e. a frame rate drop in a game,
dropped data, unbelievably long latency is any kind of performance is
required. I consider this issue to be of the utmost importance.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917813/comments/22

------------------------------------------------------------------------
On 2021-02-09T07:39:46+00:00 dsmythies wrote:

Created attachment 295137
An example idle trace capture of the issue

these are very difficult to find.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917813/comments/23

------------------------------------------------------------------------
On 2021-02-09T07:54:28+00:00 dsmythies wrote:

Created attachment 295139
Just for reference, a good example of some idle trace data

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917813/comments/24

------------------------------------------------------------------------
On 2021-02-09T15:37:42+00:00 dsmythies wrote:

Created attachment 295155
graph of inverse impulse response measured verses theoretical failure 
probabilities.

As recently as late yesterday, I was still attempting to refine the gap
time definition from comment #1. Through this entire process, I just
assumed the processor would at least require 2 samples before deciding
the entire system was idle. Why? Because it was beyond my comprehension
that it would be based on one instant in time. Well, that was wrong, and
it is actually based on one sample only at the HWP loop time (see
attachment #294201), if idle state 2 is involved.

Oh, only idle state 2 was enabled for this. The reason I could not
originally refine the gap definition, was that I did not yet know
enough. I have to force idle state 2 increase the failure probabilities
enough to find these limits without tests that would have otherwise run
for days.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917813/comments/25

------------------------------------------------------------------------
On 2021-02-09T15:42:01+00:00 dsmythies wrote:

Created attachment 295159
forgot to label my axis on the previous post

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917813/comments/26

------------------------------------------------------------------------
On 2021-02-10T23:14:28+00:00 dsmythies wrote:

Created attachment 295211
take one point, 4500 uSec from the previous graph and do add a couple of other 
configurations

Observe the recovery time, which does not include the actual idle state
exit latency, just the extra time needed to get to get to adequate CPU
frequency, is on average 87 times slower for HWP verses noHWP and 44
times slower the passive/ondemand/noHWP.

Yes, there a few interesting spikes on the passive/ondemand/noHWP graph,
but those things we can debug relatively easily (which I will not do).

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917813/comments/27

------------------------------------------------------------------------
On 2021-02-28T22:20:05+00:00 dsmythies wrote:

Created attachment 295533
changing the MWAIT definition of C1E fixes the problem

I only changed the one definition relevant to my test computer. The
documentation on these bits is rather scant. Other potential fixes
include getting rid of Idle state 2 (C1E) altogether. Or booting with it
disabled: "intel_idle.states_off=4".

I observe that Rui fixed the "assigned" field. Thanks, not that it helps
as Srinivas has been aware of this for over 1/2 a year.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917813/comments/28

------------------------------------------------------------------------
On 2021-03-01T06:26:31+00:00 srinivas.pandruvada wrote:

I tried to reproduce with your scripts on CFL-S systems and didn't
observe the same almost 1/2 half year back. Systems can be configured
different way which impacts HWP algorithm. So it is possible that my lab
system is configured differently.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917813/comments/29

------------------------------------------------------------------------
On 2021-03-02T15:37:47+00:00 dsmythies wrote:

(In reply to Srinivas Pandruvada from comment #29)
> I tried to reproduce with your scripts on CFL-S systems and didn't observe
> the same almost 1/2 half year back. Systems can be configured different way
> which impacts HWP algorithm. So it is possible that my lab system is
> configured differently.

By "CFL-S" I assume you mean "Coffee Lake".

I wish you had reported back to me your findings, as we could have
figured out the difference.

Anyway, try the automated quick test I posted in comment 20. Keep in
mind that it needs to be HWP enabled, active, powersave, default
epp=128. It is on purpose that the tool does not check for this
configuration.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917813/comments/30

------------------------------------------------------------------------
On 2021-03-04T15:43:26+00:00 dsmythies wrote:

(In reply to Doug Smythies from comment #28)
> Created attachment 295533 [details]
> changing the MWAIT definition of C1E fixes the problem

Conversely, I have tried to determine if other idle states can be broken
by introducing the least significant bit of the MWAIT.

I did idle state 3, C3, and could not detect any change in system
response.

I did idle state 5, C7S, which already had the bit set, along with bit
1, so I set bit one to 0:

  .name = "C7s",
- .desc = "MWAIT 0x33",
- .flags = MWAIT2flg(0x33) | CPUIDLE_FLAG_TLB_FLUSHED,
+ .desc = "MWAIT 0x31",
+ .flags = MWAIT2flg(0x31) | CPUIDLE_FLAG_TLB_FLUSHED,
  .exit_latency = 124,
  .target_residency = 800,
  .enter = &intel_idle,

I could not detect any change in system response.

I am also unable to detect any difference in system response between
idle state 1, C1, and idle state 2, C1E, with this change. I do not know
if the change merely makes idle state 2 = idle state 1.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917813/comments/31

------------------------------------------------------------------------
On 2021-03-13T00:20:57+00:00 dsmythies wrote:

Created attachment 295827
wult statistics for c1,c1e for stock and mwait modifed kernels

Attempting to measure exit latency using Artem Bityutskiy's wult tool, tdt 
method.
Kernel 5.12-rc2 stock and with the MWAIT change from 0X01 to 0X03.
Statistics.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917813/comments/33

------------------------------------------------------------------------
On 2021-03-13T00:26:23+00:00 dsmythies wrote:

Created attachment 295829
graph of wult test results

graph of wult tdt method results.
If a I210 based NIC can be sourced, it will be tried, if pre-wake needs to be 
eliminated. I do not know if is needed or not.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917813/comments/34

------------------------------------------------------------------------
On 2021-03-14T18:57:27+00:00 srinivas.pandruvada wrote:

(In reply to Doug Smythies from comment #30)
> (In reply to Srinivas Pandruvada from comment #29)
> > I tried to reproduce with your scripts on CFL-S systems and didn't observe
> > the same almost 1/2 half year back. Systems can be configured different way
> > which impacts HWP algorithm. So it is possible that my lab system is
> > configured differently.
> 
> By "CFL-S" I assume you mean "Coffee Lake".
Yes, desktop part.

> 
> I wish you had reported back to me your findings, as we could have figured
> out the difference.
> 
I thought I have responded you, I have to search my emails. I had to specially 
get a system arranged but it had 200MHz higher turbo. You did share your 
scripts at that time.

These algorithm are tuned on a system, so small variations can have
bigger impact.

Let's see if ChenYu, has some system same as yours.

> Anyway, try the automated quick test I posted in comment 20. Keep in mind
> that it needs to be HWP enabled, active, powersave, default epp=128. It is
> on purpose that the tool does not check for this configuration.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917813/comments/35

------------------------------------------------------------------------
On 2021-03-14T20:23:59+00:00 dsmythies wrote:

Created attachment 295853
wult statistics for c1,c1e for stock and mwait modifed kernels - version 2

Artum advised that I lock the CPU frequencies at some high value, in
order to show some difference. Frequencies locked at 4.6 GHz for this
attempt.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917813/comments/36

------------------------------------------------------------------------
On 2021-03-14T21:04:22+00:00 dsmythies wrote:

Created attachment 295855
wult graph c1e for stock and mwait modifed kernels - version 2

As Artum advised, with locked CPU frequencies.

Other data (kernel 5.12-rc2):

Phoronix dbench 1.0.2 0 client count 1:

stock: 264.8 MB/S
stock, idle state 2 disabled: 311.3 MB/S (+18%)
stock, HWP boost: 417.9 MB/S (+58%)
stock, idle state 2 disabled & HWP boost: 434.3 MB/S (+64%)
stock, performance governor: 420 MB/S (+59%)
stock, performance governor & is2 disabled: 435MB/S (+64%)

inverse impulse response, 847 uSec gap:
stock: 2302 tests 38 fails, 98.35% pass rate.
+ MWAIT change: 1072 tests, 0 fails, 100% pass rate.

@Srinivas: The whole point of the quick test stuff is that it self
adjusts to the system under test.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917813/comments/37

------------------------------------------------------------------------
On 2021-03-20T21:07:12+00:00 dsmythies wrote:

For this:
Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz
The quicktest gives indeterminate results.
However, it also is not using any idle state involving the least significant 
bit of MWAIT being set.

$ grep . /sys/devices/system/cpu/cpu0/cpuidle/state*/name
/sys/devices/system/cpu/cpu0/cpuidle/state0/name:POLL
/sys/devices/system/cpu/cpu0/cpuidle/state1/name:C1_ACPI
/sys/devices/system/cpu/cpu0/cpuidle/state2/name:C2_ACPI
/sys/devices/system/cpu/cpu0/cpuidle/state3/name:C3_ACPI

$ grep . /sys/devices/system/cpu/cpu0/cpuidle/state*/desc
/sys/devices/system/cpu/cpu0/cpuidle/state0/desc:CPUIDLE CORE POLL IDLE
/sys/devices/system/cpu/cpu0/cpuidle/state1/desc:ACPI FFH MWAIT 0x0
/sys/devices/system/cpu/cpu0/cpuidle/state2/desc:ACPI FFH MWAIT 0x30
/sys/devices/system/cpu/cpu0/cpuidle/state3/desc:ACPI FFH MWAIT 0x60

If there is a way to make idle work like all previous ways, i.e.:

$ grep . /sys/devices/system/cpu/cpu0/cpuidle/state*/name
/sys/devices/system/cpu/cpu0/cpuidle/state0/name:POLL
/sys/devices/system/cpu/cpu0/cpuidle/state1/name:C1
/sys/devices/system/cpu/cpu0/cpuidle/state2/name:C1E
/sys/devices/system/cpu/cpu0/cpuidle/state3/name:C3
/sys/devices/system/cpu/cpu0/cpuidle/state4/name:C6

$ grep . /sys/devices/system/cpu/cpu0/cpuidle/state*/desc
/sys/devices/system/cpu/cpu0/cpuidle/state0/desc:CPUIDLE CORE POLL IDLE
/sys/devices/system/cpu/cpu0/cpuidle/state1/desc:MWAIT 0x00
/sys/devices/system/cpu/cpu0/cpuidle/state2/desc:MWAIT 0x01
/sys/devices/system/cpu/cpu0/cpuidle/state3/desc:MWAIT 0x10
/sys/devices/system/cpu/cpu0/cpuidle/state4/desc:MWAIT 0x20

I have not been able to figure out how.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917813/comments/38

------------------------------------------------------------------------
On 2021-03-24T04:30:20+00:00 dsmythies wrote:

I found this thread:
https://patchwork.kernel.org/project/linux-pm/patch/20200826120421.44356-1-guil...@barpilot.io/

And somehow figured out that a i5-10600K is COMETLAKE, and so did the
same as that link:

doug@s19:~/temp-k-git/linux$ git diff
diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index 3273360f30f7..770660d777c4 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -1155,6 +1155,7 @@ static const struct x86_cpu_id intel_idle_ids[] 
__initconst = {
        X86_MATCH_INTEL_FAM6_MODEL(KABYLAKE_L,          &idle_cpu_skl),
        X86_MATCH_INTEL_FAM6_MODEL(KABYLAKE,            &idle_cpu_skl),
        X86_MATCH_INTEL_FAM6_MODEL(SKYLAKE_X,           &idle_cpu_skx),
+       X86_MATCH_INTEL_FAM6_MODEL(COMETLAKE,           &idle_cpu_skl),
        X86_MATCH_INTEL_FAM6_MODEL(ICELAKE_X,           &idle_cpu_icx),
        X86_MATCH_INTEL_FAM6_MODEL(XEON_PHI_KNL,        &idle_cpu_knl),
        X86_MATCH_INTEL_FAM6_MODEL(XEON_PHI_KNM,        &idle_cpu_knl),

And got back the original types of idle states.
I do not want beat up my nvme drive with dbench, so installed an old intel SSD 
I had lying around:

Phoronix dbench 1.0.2 0 client count 1: (MB/S)
Intel_pstate HWP enabled, active powersave:

Kernel 5.12-rc2 stock:
All idle states enabled: 416.5
Only Idle State 0: 400.1
Only Idle State 1: 294.2
Only idle State 2: 401.6
Only idle State 3: 403.0

Kernel 5.12-rc2 patched as above:
All idle states enabled: 396.8
Only Idle State 0: 400.4
Only Idle State 1: 294.4
Only idle State 2: 245.9
Only idle State 3: 405.3
Only idle State 4: 402.8
quick test: FAIL.

Intel_pstate HWP disabled, active powersave:
Kernel 5.12-rc2 patched as above:
All idle states enabled: 340.0
Only Idle State 0: 399.5
Only Idle State 1: 358.5
Only idle State 2: 353.1
Only idle State 3: 346.9
Only idle State 4: 344.2
quick test: PASS.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917813/comments/39

------------------------------------------------------------------------
On 2021-03-29T22:55:28+00:00 dsmythies wrote:

It is true that the quicktest should at least check the idle state 2 is
indeed C1E.

I ran the inverse impulse response test. Kernel 5.12-rc2. Processor
i5-10600K. inverse gap 842 nSec:

1034 tests 0 fails.

with the patch as per comment 38 above, i.e. with C1E:

1000 tests 16 fails. 98.40% pass 1.60% fail.

I ran just the generic periodic test at 347 hertz and light load, stock
kernel, i.e. no C1E:

HWP disabled: active/powersave:
doug@s19:~/freq-scalers$ /home/doug/c/consume 32.0 347 300 1
consume: 32.0 347 300  PID: 1280
 - fixed workpacket method:  Elapsed: 300000158  Now: 1617030857155911
Total sleep: 169222343
Overruns: 0  Max ovr: 0
Loops: 104094  Ave. work percent: 43.592582

HWP enabled: active/powersave:
doug@s19:~$ /home/doug/c/consume 32.0 347 300 1
consume: 32.0 347 300  PID: 1293
 - fixed workpacket method:  Elapsed: 300000654  Now: 1617031529268276
Total sleep: 171458395
Overruns: 725  Max ovr: 1449
Loops: 104094  Ave. work percent: 42.847326

The above was NOT due to CPU migration:

doug@s19:~$ taskset -c 10 /home/doug/c/consume 32.0 347 3600 1
consume: 32.0 347 3600  PID: 1341
 - fixed workpacket method:  Elapsed: 3600002498  Now: 1617036391455519
Total sleep: 2086618739
Overruns: 3189  Max ovr: 1864
Loops: 1249133  Ave. work percent: 42.038409

Conclusion: there is still something very minor going on even without
C1E being involved.

Notes:

I think HWPBOOST was, at least partially, programming around the C1E
issue.

In addition to the ultimate rejection of the patch of the thread referenced in  
comment 38, I think other processors should be rolled back to the same state. I 
have never been able to measure any energy consumption or performance 
difference for all of those deep idle states on my i5-9600K processor.

Call me dense, but I only figured out yesterday that HWP is called
"Speed Shift" in other literature and BIOS.

It does not make sense that we spent so much effort a few years ago to
make sure that we did not dwell in shallow idle states for long periods,
only to have HWP set the requested pstate to minimum upon its (C1E) use,
albeit under some other conditions. By definition the system is NOT
actually idle, it it were we would have asked for a deep idle state.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917813/comments/40

------------------------------------------------------------------------
On 2023-11-30T03:39:17+00:00 dsmythies wrote:

Created attachment 305516
an updated set of tools for an automated quick test

Now checks if idle state 2 is C1E, and aborts if not.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917813/comments/41

------------------------------------------------------------------------
On 2023-11-30T03:43:47+00:00 dsmythies wrote:

Created attachment 305517
Quick test runs on Kernel 6.7-rc3

Summary:
HWP disabled: PASS
HWP enabled: FAIL

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917813/comments/42

------------------------------------------------------------------------
On 2023-12-05T18:30:50+00:00 dsmythies wrote:

Created attachment 305540
CPU frequency recovery time verses inactivity gap time.

Using only Idle state 2; using all except idle state 2; all idle states
with and without HWP.

The maximum inactivity gap of ~400 mSec is different than a few years
ago, when it didn't have an upper limit.

The C1E dependant stuff is at the lower end, less than ~60 mSec.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917813/comments/43

------------------------------------------------------------------------
On 2023-12-05T18:32:40+00:00 dsmythies wrote:

Created attachment 305541
An more detailed example at 250 mSec inactivity gap. HWP and no-Hwp

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917813/comments/44

------------------------------------------------------------------------
On 2023-12-05T18:36:40+00:00 dsmythies wrote:

Created attachment 305542
All drivers and governors, HWP and no-HWP, execution times.

No disabled idle states.
250 mSec inactivity followed by the exact same work packet for every test.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917813/comments/45

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1917813

Title:
  HWP and C1E are incompatible - Intel processors

Status in Linux:
  Confirmed
Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Modern Intel Processors (since Skylake) with HWP (HardWare Pstate)
  control enabled and Idle State 2, C1E, enabled can incorrectly drop
  the CPU frequency with an extremely slow recovery time.

  The fault is not within HWP itself, but within the internal idle
  detection logic. One difference between OS driven pstate control and
  HWP driven pstate control is that the OS knows the system was not
  actually idle, but HWP does not. Another difference is the incredibly
  sluggish recovery with HWP.

  The problem only occurs when Idle State 2, C1E, is involved. Not all
  processors have the C1E idle state. The issue is independent of C1E
  auto-promotion, which is turned off in general, as far as I know.

  With all idle states enabled the issue is rare. The issue would
  manifest itself in periodic workflows, and would be extremely
  difficult to isolate (It took me over 1/2 a year).

  The purpose of this bug report is to link to the upstream bug report,
  where readers can find tons of detail. I'll also set it to confirmed,
  as it has already been verified on 4 different processor models, and I
  do not want the bot asking me for files that are not required.

  Workarounds include:
  . don't use HWP.
  . disable idle state 2, C1E
  . change the C1E idle state to use MWAIT 0x03 instead of MWAIT 0x01 (still in 
test. documentation on the MWAIT least significant nibble is scant).

To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/1917813/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to