https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115842
--- Comment #9 from Tamar Christina <tnfchris at gcc dot gnu.org> --- (In reply to Hongtao Liu from comment #8) > (In reply to Tamar Christina from comment #7) > > (In reply to Hongtao Liu from comment #6) > > > I noticed some double-counting of cost in group-candidate (regarding loop > > > invariant expressions), this modification reduces the number of > > > instructions > > > executed by ~8% for exchange_r binary compiled with -march=x86-64-v3 -O2. > > > > > > > Note that this patch causes regressions on AArch64. While exchange improves > > slightly I see regressions in: leela, -5%, mcf, xz, x264, deepsjeng -2%, > > geomean -1% > > What options do you use, we have an AmpereOne machine, like to try to see if > it's reproduciable on it. This was on Neoverse-V2, but probably reproducible on AmpereOne, the flags was -mcpu=native -Ofast -fomit-framepointer -flto=auto