On Wed, 10 Jul 2024, Richard Biener wrote: > The loop unrolling code assumes that one third of all volatile accesses > can be possibly optimized away which is of course not true. This leads > to excessive unrolling in some cases. The following tracks the number > of stmts with side-effects as those are not eliminatable later and > only assumes one third of the other stmts can be further optimized. > > Bootstrapped and tested on x86_64-unknown-linux-gnu. > > There's quite some testsuite fallout, mostly because of different rounding > and a size of 8 now no longer is optimistically optimized to 5 but only 6. > I can fix that by writing > > *est_eliminated = (unr_insns - not_elim) / 3; > > as > > *est_eliminated = unr_insns - not_elim - (unr_insns - not_elim) * 2 / 3; > > to preserve the old rounding behavior. But for example > > FAIL: g++.dg/warn/Warray-bounds-20.C -std=gnu++14 LP64 note (test for > warnings, line 56) > > shows > > size: 3 C::C (_25, &MEM <const void *[8]> [(void *)&_ZTT2D1 + 48B]); > > which we now consider not being optimizable (correctly I think) and thus > the optimistic size reduction isn't enough to get the loop unrolled. > Previously the computed size of 20 was reduced to 13, exactly the size > of the not unrolled body. > > So the remaining fallout will be > > +FAIL: g++.dg/warn/Warray-bounds-20.C -std=gnu++14 LP64 note (test for > warnings > , line 56) > +FAIL: g++.dg/warn/Warray-bounds-20.C -std=gnu++14 note (test for > warnings, lin > e 66) > ... > +FAIL: c-c++-common/ubsan/unreachable-3.c -std=gnu++14 scan-tree-dump > optimized "__builtin___ubsan_handle_builtin_unreachable" > ... > +FAIL: c-c++-common/ubsan/unreachable-3.c -O0 scan-tree-dump optimized > "__builtin___ubsan_handle_builtin_unreachable" > > for the latter the issue is __builtin___sanitizer_cov_trace_pc () > > Does this seem feasible overall? I can fixup the testcases above > with #pragma unroll ...
Honza - any comments? > Thanks, > Richard. > > PR tree-optimization/115825 > * tree-ssa-loop-ivcanon.cc (loop_size::not_eliminatable_after_peeling): > New. > (loop_size::last_iteration_not_eliminatable_after_peeling): Likewise. > (tree_estimate_loop_size): Count stmts with side-effects as > not optimistically eliminatable. > (estimated_unrolled_size): Compute the number of stmts that can > be optimistically eliminated by followup transforms. > (try_unroll_loop_completely): Adjust. > > * gcc.dg/tree-ssa/cunroll-17.c: New testcase. > --- > gcc/testsuite/gcc.dg/tree-ssa/cunroll-17.c | 11 +++++++ > gcc/tree-ssa-loop-ivcanon.cc | 35 +++++++++++++++++----- > 2 files changed, 38 insertions(+), 8 deletions(-) > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/cunroll-17.c > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/cunroll-17.c > b/gcc/testsuite/gcc.dg/tree-ssa/cunroll-17.c > new file mode 100644 > index 00000000000..282db99c883 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/cunroll-17.c > @@ -0,0 +1,11 @@ > +/* { dg-do compile } */ > +/* { dg-options "-Os -fdump-tree-optimized" } */ > + > +char volatile v; > +void for16 (void) > +{ > + for (char i = 16; i > 0; i -= 2) > + v = i; > +} > + > +/* { dg-final { scan-tree-dump-times " ={v} " 1 "optimized" } } */ > diff --git a/gcc/tree-ssa-loop-ivcanon.cc b/gcc/tree-ssa-loop-ivcanon.cc > index 5ef24a91917..dd941c31648 100644 > --- a/gcc/tree-ssa-loop-ivcanon.cc > +++ b/gcc/tree-ssa-loop-ivcanon.cc > @@ -139,11 +139,16 @@ struct loop_size > variable where induction variable starts at known constant.) */ > int eliminated_by_peeling; > > + /* Number of instructions that cannot be further optimized in the > + peeled loop, for example volatile accesses. */ > + int not_eliminatable_after_peeling; > + > /* Same statistics for last iteration of loop: it is smaller because > instructions after exit are not executed. */ > int last_iteration; > int last_iteration_eliminated_by_peeling; > - > + int last_iteration_not_eliminatable_after_peeling; > + > /* If some IV computation will become constant. */ > bool constant_iv; > > @@ -267,8 +272,10 @@ tree_estimate_loop_size (class loop *loop, edge exit, > edge edge_to_cancel, > > size->overall = 0; > size->eliminated_by_peeling = 0; > + size->not_eliminatable_after_peeling = 0; > size->last_iteration = 0; > size->last_iteration_eliminated_by_peeling = 0; > + size->last_iteration_not_eliminatable_after_peeling = 0; > size->num_pure_calls_on_hot_path = 0; > size->num_non_pure_calls_on_hot_path = 0; > size->non_call_stmts_on_hot_path = 0; > @@ -292,6 +299,7 @@ tree_estimate_loop_size (class loop *loop, edge exit, > edge edge_to_cancel, > { > gimple *stmt = gsi_stmt (gsi); > int num = estimate_num_insns (stmt, &eni_size_weights); > + bool not_eliminatable_after_peeling = false; > bool likely_eliminated = false; > bool likely_eliminated_last = false; > bool likely_eliminated_peeled = false; > @@ -304,7 +312,9 @@ tree_estimate_loop_size (class loop *loop, edge exit, > edge edge_to_cancel, > > /* Look for reasons why we might optimize this stmt away. */ > > - if (!gimple_has_side_effects (stmt)) > + if (gimple_has_side_effects (stmt)) > + not_eliminatable_after_peeling = true; > + else > { > /* Exit conditional. */ > if (exit && body[i] == exit->src > @@ -377,11 +387,15 @@ tree_estimate_loop_size (class loop *loop, edge exit, > edge edge_to_cancel, > size->overall += num; > if (likely_eliminated || likely_eliminated_peeled) > size->eliminated_by_peeling += num; > + if (not_eliminatable_after_peeling) > + size->not_eliminatable_after_peeling += num; > if (!after_exit) > { > size->last_iteration += num; > if (likely_eliminated || likely_eliminated_last) > size->last_iteration_eliminated_by_peeling += num; > + if (not_eliminatable_after_peeling) > + size->last_iteration_not_eliminatable_after_peeling += num; > } > if ((size->overall * 3 / 2 - size->eliminated_by_peeling > - size->last_iteration_eliminated_by_peeling) > upper_bound) > @@ -437,18 +451,22 @@ tree_estimate_loop_size (class loop *loop, edge exit, > edge edge_to_cancel, > It is (NUNROLL + 1) * size of loop body with taking into account > the fact that in last copy everything after exit conditional > is dead and that some instructions will be eliminated after > - peeling. */ > + peeling. Set *EST_ELIMINATED to the number of stmts that could be > + optimistically eliminated by followup transforms. */ > static unsigned HOST_WIDE_INT > estimated_unrolled_size (struct loop_size *size, > + unsigned HOST_WIDE_INT *est_eliminated, > unsigned HOST_WIDE_INT nunroll) > { > HOST_WIDE_INT unr_insns = ((nunroll) > * (HOST_WIDE_INT) (size->overall > - size->eliminated_by_peeling)); > - if (!nunroll) > - unr_insns = 0; > + HOST_WIDE_INT not_elim > + = ((nunroll) * (HOST_WIDE_INT) size->not_eliminatable_after_peeling); > unr_insns += size->last_iteration - > size->last_iteration_eliminated_by_peeling; > + not_elim += size->last_iteration_not_eliminatable_after_peeling; > > + *est_eliminated = (unr_insns - not_elim) / 3; > return unr_insns; > } > > @@ -829,8 +847,9 @@ try_unroll_loop_completely (class loop *loop, > } > > unsigned HOST_WIDE_INT ninsns = size.overall; > + unsigned HOST_WIDE_INT est_eliminated; > unsigned HOST_WIDE_INT unr_insns > - = estimated_unrolled_size (&size, n_unroll); > + = estimated_unrolled_size (&size, &est_eliminated, n_unroll); > if (dump_file && (dump_flags & TDF_DETAILS)) > { > fprintf (dump_file, " Loop size: %d\n", (int) ninsns); > @@ -842,7 +861,7 @@ try_unroll_loop_completely (class loop *loop, > cautious on guessing if the unrolling is going to be > profitable. > Move from estimated_unrolled_size to unroll small loops. */ > - if (unr_insns * 2 / 3 > + if (unr_insns - est_eliminated > /* If there is IV variable that will become constant, we > save one instruction in the loop prologue we do not > account otherwise. */ > @@ -919,7 +938,7 @@ try_unroll_loop_completely (class loop *loop, > 2) Big loop after completely unroll may not be vectorized > by BB vectorizer. */ > else if ((cunrolli && !loop->inner > - ? unr_insns : unr_insns * 2 / 3) > + ? unr_insns : unr_insns - est_eliminated) > > (unsigned) param_max_completely_peeled_insns) > { > if (dump_file && (dump_flags & TDF_DETAILS)) > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)