Hi Arnd,

On Wed, Mar 29, 2023 at 01:12:45PM +0200, Arnd Bergmann wrote:
> The machine I use has KVM support for 64-bit guests, so
> I think you have this the wrong way around: If you run
> the arm64 guest with TCG, I would expect to see the
> same effect on an arm64 host and an x86 host.

Thanks for correcting me.

> It will be a very long time before Debian/arm64 can
> consider bumping the baseline, as the oldest Cortex-A53
> and Cortex-A57 cores are only ten years old at this point,
> and the Cortex-A53 is still the most popular core in
> currently shipping SoCs, by a wide margin.

This also is a useful bit. Johannes mentioned that autopkgtest uses
cortex-a53.

> The maintenance argument goes both ways I think: Having
> the A57 or A53 as the baseline makes it easier when
> a package accidentally relies on a feature of a later
> core without doing a runtime feature check, so that
> would favor using A57 over cpu=max. The advantage
> of using cpu=max is normally that this enables additional
> features to be used in the guest that may provide
> better performance or security, and allow testing those
> features.

At this time, I am convinced that -cpu max is a suboptimal choice.

> I don't know why the system performs poorly with cpu=max,
> this may be a known issue with one of the features this
> enables, or it may be a bug in the kernel or in qemu
> that we should fix.
> 
> I have no objections to changing the default to
> cpu=cortex-a57 for non-KVM runs, but I think more
> importantly we should
> 
> a) try to reproduce the behavior on an x86-64 host, and

Yes, it is fully reproducible there. I ran some tests with suggestions
from Arnd:

-cpu max
Startup finished in 23.590s (kernel) + 18.210s (userspace) = 41.800s
-cpu cortex-a53
Startup finished in 6.080s (kernel) + 9.808s (userspace) = 15.889s
-cpu cortex-a57
Startup finished in 6.090s (kernel) + 8.460s (userspace) = 14.551s
-cpu cortex-a76
Startup finished in 6.415s (kernel) + 8.300s (userspace) = 14.715s
-cpu a64fx
Startup finished in 6.373s (kernel) + 9.048s (userspace) = 15.422s
-cpu neoverse-n1
Startup finished in 6.078s (kernel) + 8.367s (userspace) = 14.446s
-cpu max,sve=off,sme=off,pmu=off,lpa2=off,pauth=off
Startup finished in 4.357s (kernel) + 5.405s (userspace) = 9.763s
-cpu max,lpa2=off
Startup finished in 20.854s (kernel) + 18.848s (userspace) = 39.703s
-cpu max,pauth=off
Startup finished in 4.756s (kernel) + 5.678s (userspace) = 10.435s
-cpu max,sme=off
Startup finished in 22.018s (kernel) + 18.335s (userspace) = 40.353s
-cpu max,pmu=off
Startup finished in 21.032s (kernel) + 17.974s (userspace) = 39.007s
-cpu max,pauth-impdef=on
Startup finished in 6.077s (kernel) + 7.241s (userspace) = 13.319s

So pauth seems to be the culprit. This is kinda known, see:
https://qemu-project.gitlab.io/qemu/system/arm/cpu-features.html#tcg-vcpu-features

> b) figure out the underlying issue.

I think we did.

So choosing pauth-impdef over pauth should mostly fix performance. So
given that for kvm we choose cpu=host, I think going higher than
cortex-something would still be sensible. At this point, my preference
is max,pauth-impdef=on. Does anyone disagree? Would someone confirm that
this also speeds up on arm64?

Helmut

Reply via email to