Hi Arnd, On Wed, Mar 29, 2023 at 01:12:45PM +0200, Arnd Bergmann wrote: > The machine I use has KVM support for 64-bit guests, so > I think you have this the wrong way around: If you run > the arm64 guest with TCG, I would expect to see the > same effect on an arm64 host and an x86 host.
Thanks for correcting me. > It will be a very long time before Debian/arm64 can > consider bumping the baseline, as the oldest Cortex-A53 > and Cortex-A57 cores are only ten years old at this point, > and the Cortex-A53 is still the most popular core in > currently shipping SoCs, by a wide margin. This also is a useful bit. Johannes mentioned that autopkgtest uses cortex-a53. > The maintenance argument goes both ways I think: Having > the A57 or A53 as the baseline makes it easier when > a package accidentally relies on a feature of a later > core without doing a runtime feature check, so that > would favor using A57 over cpu=max. The advantage > of using cpu=max is normally that this enables additional > features to be used in the guest that may provide > better performance or security, and allow testing those > features. At this time, I am convinced that -cpu max is a suboptimal choice. > I don't know why the system performs poorly with cpu=max, > this may be a known issue with one of the features this > enables, or it may be a bug in the kernel or in qemu > that we should fix. > > I have no objections to changing the default to > cpu=cortex-a57 for non-KVM runs, but I think more > importantly we should > > a) try to reproduce the behavior on an x86-64 host, and Yes, it is fully reproducible there. I ran some tests with suggestions from Arnd: -cpu max Startup finished in 23.590s (kernel) + 18.210s (userspace) = 41.800s -cpu cortex-a53 Startup finished in 6.080s (kernel) + 9.808s (userspace) = 15.889s -cpu cortex-a57 Startup finished in 6.090s (kernel) + 8.460s (userspace) = 14.551s -cpu cortex-a76 Startup finished in 6.415s (kernel) + 8.300s (userspace) = 14.715s -cpu a64fx Startup finished in 6.373s (kernel) + 9.048s (userspace) = 15.422s -cpu neoverse-n1 Startup finished in 6.078s (kernel) + 8.367s (userspace) = 14.446s -cpu max,sve=off,sme=off,pmu=off,lpa2=off,pauth=off Startup finished in 4.357s (kernel) + 5.405s (userspace) = 9.763s -cpu max,lpa2=off Startup finished in 20.854s (kernel) + 18.848s (userspace) = 39.703s -cpu max,pauth=off Startup finished in 4.756s (kernel) + 5.678s (userspace) = 10.435s -cpu max,sme=off Startup finished in 22.018s (kernel) + 18.335s (userspace) = 40.353s -cpu max,pmu=off Startup finished in 21.032s (kernel) + 17.974s (userspace) = 39.007s -cpu max,pauth-impdef=on Startup finished in 6.077s (kernel) + 7.241s (userspace) = 13.319s So pauth seems to be the culprit. This is kinda known, see: https://qemu-project.gitlab.io/qemu/system/arm/cpu-features.html#tcg-vcpu-features > b) figure out the underlying issue. I think we did. So choosing pauth-impdef over pauth should mostly fix performance. So given that for kvm we choose cpu=host, I think going higher than cortex-something would still be sensible. At this point, my preference is max,pauth-impdef=on. Does anyone disagree? Would someone confirm that this also speeds up on arm64? Helmut