https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81554
--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> --- Created attachment 41833 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41833&action=edit patch While we do this transform late with the attached patch it doesn't help (noisy) performance. Before: Score based on Pentium III 600MHz using Fortran 77: 19.465005 Score based on Pentium III 600MHz using Fortran 77: 19.558720 Score based on Pentium III 600MHz using Fortran 77: 19.546069 Score based on Pentium III 600MHz using Fortran 77: 19.572887 Score based on Pentium III 600MHz using Fortran 77: 19.528043 Score based on Pentium III 600MHz using Fortran 77: 19.477979 Score based on Pentium III 600MHz using Fortran 77: 19.534370 Score based on Pentium III 600MHz using Fortran 77: 19.562271 Score based on Pentium III 600MHz using Fortran 77: 19.495751 Score based on Pentium III 600MHz using Fortran 77: 19.542132 After: Score based on Pentium III 600MHz using Fortran 77: 19.436746 Score based on Pentium III 600MHz using Fortran 77: 19.510495 Score based on Pentium III 600MHz using Fortran 77: 19.479649 Score based on Pentium III 600MHz using Fortran 77: 19.470079 Score based on Pentium III 600MHz using Fortran 77: 19.470537 Score based on Pentium III 600MHz using Fortran 77: 19.539023 Score based on Pentium III 600MHz using Fortran 77: 19.421880 Score based on Pentium III 600MHz using Fortran 77: 19.504202 Score based on Pentium III 600MHz using Fortran 77: 19.545846 Score based on Pentium III 600MHz using Fortran 77: 19.571152 Either the transform is required pre-loop opts or flag_wrapv pessimizes stuff. I suppose some additional pass re-shuffling would be in order, like moving the block late_gimple_start, reassoc, strength_reduction to after vrp, phi_only_cprop so VRP has the chance to compute good !flag_wrapv ranges late. That results in Score based on Pentium III 600MHz using Fortran 77: 19.076637 Score based on Pentium III 600MHz using Fortran 77: 19.141776 Score based on Pentium III 600MHz using Fortran 77: 19.078936 Score based on Pentium III 600MHz using Fortran 77: 19.146834 Score based on Pentium III 600MHz using Fortran 77: 19.098964 Score based on Pentium III 600MHz using Fortran 77: 19.098782 Score based on Pentium III 600MHz using Fortran 77: 19.127632 Score based on Pentium III 600MHz using Fortran 77: 19.095203 Score based on Pentium III 600MHz using Fortran 77: 19.111919 Score based on Pentium III 600MHz using Fortran 77: 18.993788 thus looks even worse ;) (all the above is with just -O3 on a Broadwell system) I guess reassoc is necessary for DOM to do a good CSE job. OTOH tracer and path splitting should enable more reassoc/SLSR so should be before (but they shouldn't care about flag_wrapv). Thus if we do NEXT_PASS (pass_sprintf_length, true); NEXT_PASS (pass_split_paths); NEXT_PASS (pass_tracer); NEXT_PASS (pass_thread_jumps); NEXT_PASS (pass_vrp, false /* warn_array_bounds_p */); /* The only const/copy propagation opportunities left after DOM and VRP should be due to degenerate PHI nodes. So rather than run the full propagators, run a specialized pass which only examines PHIs to discover const/copy propagation opportunities. */ NEXT_PASS (pass_phi_only_cprop); /* Dumbing down to -fwrapv for reassoc to work and forwprop folding not hindered by undefined overflow disabling transforms. Matches semantics of RTL. */ NEXT_PASS (pass_late_gimple_start); NEXT_PASS (pass_reassoc, false /* insert_powi_p */); NEXT_PASS (pass_strength_reduction); NEXT_PASS (pass_dominator, false /* may_peel_loop_headers_p */); /* The only const/copy propagation opportunities left after DOM and VRP should be due to degenerate PHI nodes. So rather than run the full propagators, run a specialized pass which only examines PHIs to discover const/copy propagation opportunities. */ NEXT_PASS (pass_phi_only_cprop); NEXT_PASS (pass_strlen); NEXT_PASS (pass_thread_jumps); NEXT_PASS (pass_dse); we end up with Score based on Pentium III 600MHz using Fortran 77: 19.467136 Score based on Pentium III 600MHz using Fortran 77: 19.489240 Score based on Pentium III 600MHz using Fortran 77: 19.413257 Score based on Pentium III 600MHz using Fortran 77: 19.285549 Score based on Pentium III 600MHz using Fortran 77: 19.352476 Score based on Pentium III 600MHz using Fortran 77: 19.487067 Score based on Pentium III 600MHz using Fortran 77: 19.513724 Score based on Pentium III 600MHz using Fortran 77: 19.515330 Score based on Pentium III 600MHz using Fortran 77: 19.523810 Score based on Pentium III 600MHz using Fortran 77: 19.518709 Anyway, some more detailed analysis is required here [note I didn't try to reproduce the slowdown]. Pass shuffling is always "interesting"...