Hi Kyrill,
Because the expansion now emits straightline code rather than
conditionals and branches
it should be easier to optimise in general, so I'd expect this to be an
improvement overall.
That said, I have benchmarked it on SPEC2017 on aarch64.
If you have any benchmarks of interest to you you (or somebody else) can
run on a target that you
care about I would be very grateful for any results.
Well, most people currently use x86_64 for scientific computing, so I
would be concerned most about this architecture. As for the test case,
min / max performance clearly has an effect on 521.wrf, so this would
be ideal.
If you could run 521.wrf on x86_64, and find that it does not
regress measureably (or even shows an improvement), the patch is OK.
I'd be interested in the timings you get.
Regards
Thomas