https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104125
--- Comment #4 from Martin Jambor <jamborm at gcc dot gnu.org> --- Despite spending much more time on this than I wanted I was not able to find out anything really interesting. The functions that slowed down significantly is feval (FWIW, perf annotation points down to a conditional jump, depending on a comparison of 0x78(%rsp) to zero, as a new costly instruction). I have gone back to the commit that introduced the regression and added a debug counter to switch between the old and new behavior. The single change responsible for the entire slowdown happened in evrp pass when working on function positional_eval: @@ -1946,7 +1948,7 @@ _11 = _9 & _10; _95 = PopCount (_11); _96 = _95 * 15; - _104 = -_96; + _104 = _95 * -15; _13 = pawntt_84(D)->b_super_strong_square; _14 = s_85(D)->BitBoard[4]; _15 = _13 & _14; Neither _95 nor _96 has any further uses, and either way, simple search in dumps suggests that even in the "fast" case, the expression is folded to multiplication by -15 later anyway. But from here the investigation is difficult, this change introduces changes in SSA numbering in later passes and diffs are huge. Moreover, this also causes change in inlining order (as reported by -fopt-info-optimized): --- opt-fast 2022-02-01 17:17:50.928639947 +0100 +++ opt-slow 2022-02-01 17:18:07.284728740 +0100 @@ -4,4 +4,4 @@ neval.cpp:1086:26: optimized: Inlined trapped_eval.constprop/209 into void feval(state_t*, int, t_eval_comps*)/163 which now has time 172.599138 and size 156, net change of -25. -neval.cpp:1067:22: optimized: Inlined void kingpressure_eval(state_t*, attackinfo_t*, t_eval_comps*)/162 into void feval(state_t*, int, t_eval_comps*)/163 which now has time 216.190938 and size 314, net change of -31. -neval.cpp:1081:20: optimized: Inlined void positional_eval(state_t*, pawntt_t*, t_eval_comps*)/157 into void feval(state_t*, int, t_eval_comps*)/163 which now has time 314.215938 and size 433, net change of -21. +neval.cpp:1081:20: optimized: Inlined void positional_eval(state_t*, pawntt_t*, t_eval_comps*)/157 into void feval(state_t*, int, t_eval_comps*)/163 which now has time 269.624138 and size 274, net change of -21. +neval.cpp:1067:22: optimized: Inlined void kingpressure_eval(state_t*, attackinfo_t*, t_eval_comps*)/162 into void feval(state_t*, int, t_eval_comps*)/163 which now has time 313.215938 and size 432, net change of -31. neval.cpp:394:22: optimized: basic block part vectorized using 32 byte vectors On the assembly level, register allocation, spilling and scheduling are clearly somewhat different, again creating so much differences that I cannot tell what is going on from a simple diff.