https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87528
Martin Jambor <jamborm at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |kugan at gcc dot gnu.org
--- Comment #1 from Martin Jambor <jamborm at gcc dot gnu.org> ---
It seems that the machine does not like the newly generated calls into
libgcc for popcount.
The profile of r262486 (_slow variant) and the one immediately
preceding it (the _fast variant) is:
$ perf report -n --percent-limit=2 | cat
# Overhead Samples Command Shared Object Symbol
# ........ ............ ............... ............. .................
#
6.15% 187930 deepsjeng_r_slow deepsjeng_r feval
5.88% 179434 deepsjeng_r_fast deepsjeng_r feval
5.56% 169734 deepsjeng_r_fast deepsjeng_r search
5.42% 165581 deepsjeng_r_slow deepsjeng_r search
5.19% 158575 deepsjeng_r_slow deepsjeng_r ProbeTT
5.16% 157546 deepsjeng_r_fast deepsjeng_r ProbeTT
4.74% 144696 deepsjeng_r_slow deepsjeng_r qsearch
4.72% 144193 deepsjeng_r_fast deepsjeng_r qsearch
2.76% 84389 deepsjeng_r_slow libgcc_s.so __popcountdi2
2.75% 83936 deepsjeng_r_fast deepsjeng_r see
2.73% 83307 deepsjeng_r_slow deepsjeng_r see
2.67% 81614 deepsjeng_r_slow deepsjeng_r order_moves
2.62% 80077 deepsjeng_r_fast deepsjeng_r order_moves
2.49% 76087 deepsjeng_r_slow deepsjeng_r FindFirstRemove
2.47% 75346 deepsjeng_r_fast deepsjeng_r FindFirstRemove
2.03% 61888 deepsjeng_r_fast deepsjeng_r make
2.03% 61861 deepsjeng_r_slow deepsjeng_r make
The profile for r262864 (marked again as _slow below) and its
immediate predecessor (marked _fast) is:
# Overhead Samples Command Shared Object Symbol
# ........ ............ ............... ............. .................
#
5.87% 192681 deepsjeng_r_slow deepsjeng_r feval
5.74% 188254 deepsjeng_r_fast deepsjeng_r feval
5.48% 179850 deepsjeng_r_slow libgcc_s.so __popcountdi2
5.17% 169671 deepsjeng_r_slow deepsjeng_r search
5.04% 165438 deepsjeng_r_fast deepsjeng_r search
4.83% 158368 deepsjeng_r_fast deepsjeng_r ProbeTT
4.82% 158096 deepsjeng_r_slow deepsjeng_r ProbeTT
4.44% 145659 deepsjeng_r_fast deepsjeng_r qsearch
4.39% 144117 deepsjeng_r_slow deepsjeng_r qsearch
2.56% 84085 deepsjeng_r_fast libgcc_s.so __popcountdi2
2.55% 83853 deepsjeng_r_slow deepsjeng_r see
2.55% 83653 deepsjeng_r_fast deepsjeng_r see
2.54% 83383 deepsjeng_r_fast deepsjeng_r order_moves
2.44% 80246 deepsjeng_r_slow deepsjeng_r order_moves
2.31% 75966 deepsjeng_r_fast deepsjeng_r FindFirstRemove
2.30% 75575 deepsjeng_r_slow deepsjeng_r FindFirstRemove
Again, let me emphasize this is all about generic march/mtune, native
march/mtune is almost 3% faster than GCC 8.