There have been some reports on strange / unexpected things with Ryzen 5xxx
processors. I think I have seen 5950X, 5900X and 5800X mentioned, not sure
about others.
Since I have 5800X myself I looked into a couple of issues that have
straightforward demonstrators. I would like to share my findings and
observations on those issues.
Issue 1. High wake-up latency for CPU idle states.
This seems to be related to the so called CC6 idle state.
The official information on it is very sparse.
The state is not explicitly exposed to the OS, at least, though ACPI interfaces
that FreeBSD currently supports.
In my tests I see that if all logical processors enter an idle state then an
external interrupt can be delayed by 500+ us. Specifically, I observed this
with an MSI-X interrupt from a discrete network chip. Interrupts from internal
components seem to be affected as well, but to a lesser degree.
The deep state in question can be entered regardless of whether C2 (via I/O) is
enabled, C1 (via hlt) is sufficient. In fact, with machdep.idle=hlt it works
the same.
The state is not entered if at least one logical CPU is not idle.
The state is not entered if machdep.idle=mwait is used. Apparently, the
processors do not attempt to automatically enter as deep idle modes with mwait
as they do with hlt.
Finally, the state is not entered if zenstates.py utility is used to disable C6
/ CC6 state via an undocumented (publicly) MSR.
For me personally that state does not cause any annoyances but anyone who
experiences problems related to "stuttering", "jitter", latency might want to
look into this.
Issue 2. Uneven performance of CPU intensive tasks, especially with SCHED_ULE,
when SMT is enabled.
I found out that at least on my hardware all even numbered logical CPUs can
perform much better than odd numbered logical CPUs. It seems that hardware
threads within a core are not equal. Maybe this is related to ability to use
boosted frequencies, but maybe something else, I am not sure.
From a brief look at the ULE code it looks that the selection of a hw thread
within a core is intentionally random when all other things are equal.
I suspect that the hardware + firmware may actually describe that performance
disparity via ACPI CPPC (_CPC object, etc), but right now we do not support
querying that or making use of it.
It would interesting to see if other owners of similar processors can confirm or
provide counter-examples to my observations.
Simple tests for issue 1:
- ping a host attached to the same switch (so, with very low expected latency)
- ping 127.0.0.1
For issue 2: take some CPU intensive single-threaded task and bind it (with
cpuset -l) to different logical CPUs. Multiple such tasks can be run
concurrently on different logical CPUs.
References:
- https://forums.freebsd.org/threads/variable-ping-latency-on-ryzen-setup.82791/
- https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=256594
- https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=254040
- https://github.com/r4m0n/ZenStates-Linux
- https://github.com/meowthink/ZenStates-FreeBSD -- has a bug
- https://github.com/avg-I/ZenStates-FreeBSD -- has a fix
- https://www.kernel.org/doc/html/latest/admin-guide/acpi/cppc_sysfs.html
- https://static.linaro.org/connect/lvc21/presentations/lvc21-219.pdf
-
https://uefi.org/specs/ACPI/6.4/14_Platform_Communications_Channel/Platform_Comm_Channel.html
--
Andriy Gapon