On 6/13/24 10:10 AM, Andi Kleen wrote:
Manolis Tsamis <manolis.tsa...@vrull.eu> writes:
Assembly like this can appear with bitfields or type punning / unions.
On stress-ng when running the cpu-union microbenchmark the following speedups
have been observed.
Neoverse-N1: +29.4%
Intel Coffeelake: +13.1%
AMD 5950X: +17.5%
It seems this should have some kind of target hook so that the target
can configure what forwards should be avoided. At least in x86 land
there is a trend to the hardware handling more and more cases with each
generation.
Definitely the case that we should expect the hardware guys to keep
improving things. I was speaking to one of ours about this specific
case and even with their planned improvements in the uarch they think
the compiler side transformation will perform better when it can be applied.
But yes, I think we're going to need some way to control this not just
on a per arch, but on a per uarch basis. I originally thought we just
do it all the time, but my position has evolved since then.
jeff