https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122777
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|WAITING |UNCONFIRMED
Ever confirmed|1 |0
--- Comment #7 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Hmm, looks like I need to look further into both of these.
It is interesting that calculix improved only on zen5 and not on zen4.
One difference is X86_TUNE_AVX512_SPLIT_REGS is set to true for zen4 but not
for zen5.
If that is the case and the vectorizer choses AVX512 and then splits it; you
now have double the register usage and that could cause worse code.
So maybe causes vectorization using AVX512 to be used after r16-5258 (for zen4)
and then split into 2 and things go down hill.
And then r16-5975 comes and gets the register allocator better for zen5 but not
for zen4 due to the splitting .
exchange2_r on zen5 looks to be something different but still most likely
register allocator issue, it has a lot of spilling. But it is also interesting
how it didn't slow down on zen4 either. Also -march=x86-64-v3 is AVX2 and not
AVX512 so that might make a difference there.
Oh and the `-O2 -flto` (without -march=x86-64-v3) seems very noisy after my
r16-5258 change too:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?highlight_run=78642&plot.0=1248.407.0
. Which definitely points to maybe something odd going on with the micro-arch
here too.
Looking into the last time exchange2_r improved on zen5 (with -O2
-march=x86_64v3 -flto), the changes were all profile/counts related.
r16-4165-g8498ef3d075801 and r16-4164-gbae9c5e7c6efc0 .
So maybe the exchange2_r will fix itself after
https://gcc.gnu.org/pipermail/gcc-patches/2025-December/703278.html or
https://gcc.gnu.org/pipermail/gcc-patches/2025-December/703279.html gets
applied. Or this another case where we counts are still need to be fixed up.