Re: [PATCH] tree-optimization/115825 - improve unroll estimates for volatile accesses

Richard Biener Wed, 31 Jul 2024 06:47:27 -0700

On Wed, 10 Jul 2024, Richard Biener wrote:

> The loop unrolling code assumes that one third of all volatile accesses
> can be possibly optimized away which is of course not true.  This leads
> to excessive unrolling in some cases.  The following tracks the number
> of stmts with side-effects as those are not eliminatable later and
> only assumes one third of the other stmts can be further optimized.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
> 
> There's quite some testsuite fallout, mostly because of different rounding
> and a size of 8 now no longer is optimistically optimized to 5 but only 6.
> I can fix that by writing
> 
>   *est_eliminated = (unr_insns - not_elim) / 3;
> 
> as
> 
>   *est_eliminated = unr_insns - not_elim - (unr_insns - not_elim) * 2 / 3;
> 
> to preserve the old rounding behavior.  But for example
> 
> FAIL: g++.dg/warn/Warray-bounds-20.C  -std=gnu++14 LP64 note (test for 
> warnings, line 56)
> 
> shows
> 
>   size:   3 C::C (_25, &MEM <const void *[8]> [(void *)&_ZTT2D1 + 48B]);
> 
> which we now consider not being optimizable (correctly I think) and thus
> the optimistic size reduction isn't enough to get the loop unrolled.
> Previously the computed size of 20 was reduced to 13, exactly the size
> of the not unrolled body.
> 
> So the remaining fallout will be
> 
> +FAIL: g++.dg/warn/Warray-bounds-20.C  -std=gnu++14 LP64 note (test for 
> warnings
> , line 56)
> +FAIL: g++.dg/warn/Warray-bounds-20.C  -std=gnu++14 note (test for 
> warnings, lin
> e 66)
> ...
> +FAIL: c-c++-common/ubsan/unreachable-3.c  -std=gnu++14  scan-tree-dump 
> optimized "__builtin___ubsan_handle_builtin_unreachable"
> ...
> +FAIL: c-c++-common/ubsan/unreachable-3.c   -O0   scan-tree-dump optimized 
> "__builtin___ubsan_handle_builtin_unreachable"
> 
> for the latter the issue is __builtin___sanitizer_cov_trace_pc ()
> 
> Does this seem feasible overall?  I can fixup the testcases above
> with #pragma unroll ...


Honza - any comments?

> Thanks,
> Richard.
> 
>       PR tree-optimization/115825
>       * tree-ssa-loop-ivcanon.cc (loop_size::not_eliminatable_after_peeling):
>       New.
>       (loop_size::last_iteration_not_eliminatable_after_peeling): Likewise.
>       (tree_estimate_loop_size): Count stmts with side-effects as
>       not optimistically eliminatable.
>       (estimated_unrolled_size): Compute the number of stmts that can
>       be optimistically eliminated by followup transforms.
>       (try_unroll_loop_completely): Adjust.
> 
>       * gcc.dg/tree-ssa/cunroll-17.c: New testcase.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/cunroll-17.c | 11 +++++++
>  gcc/tree-ssa-loop-ivcanon.cc               | 35 +++++++++++++++++-----
>  2 files changed, 38 insertions(+), 8 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/cunroll-17.c
> 
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/cunroll-17.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/cunroll-17.c
> new file mode 100644
> index 00000000000..282db99c883
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/cunroll-17.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Os -fdump-tree-optimized" } */
> +
> +char volatile v;
> +void for16 (void)
> +{
> +  for (char i = 16; i > 0; i -= 2)
> +    v = i;
> +}
> +
> +/* { dg-final { scan-tree-dump-times " ={v} " 1 "optimized" } } */
> diff --git a/gcc/tree-ssa-loop-ivcanon.cc b/gcc/tree-ssa-loop-ivcanon.cc
> index 5ef24a91917..dd941c31648 100644
> --- a/gcc/tree-ssa-loop-ivcanon.cc
> +++ b/gcc/tree-ssa-loop-ivcanon.cc
> @@ -139,11 +139,16 @@ struct loop_size
>       variable where induction variable starts at known constant.)  */
>    int eliminated_by_peeling;
>  
> +  /* Number of instructions that cannot be further optimized in the
> +     peeled loop, for example volatile accesses.  */
> +  int not_eliminatable_after_peeling;
> +
>    /* Same statistics for last iteration of loop: it is smaller because
>       instructions after exit are not executed.  */
>    int last_iteration;
>    int last_iteration_eliminated_by_peeling;
> -  
> +  int last_iteration_not_eliminatable_after_peeling;
> +
>    /* If some IV computation will become constant.  */
>    bool constant_iv;
>  
> @@ -267,8 +272,10 @@ tree_estimate_loop_size (class loop *loop, edge exit, 
> edge edge_to_cancel,
>  
>    size->overall = 0;
>    size->eliminated_by_peeling = 0;
> +  size->not_eliminatable_after_peeling = 0;
>    size->last_iteration = 0;
>    size->last_iteration_eliminated_by_peeling = 0;
> +  size->last_iteration_not_eliminatable_after_peeling = 0;
>    size->num_pure_calls_on_hot_path = 0;
>    size->num_non_pure_calls_on_hot_path = 0;
>    size->non_call_stmts_on_hot_path = 0;
> @@ -292,6 +299,7 @@ tree_estimate_loop_size (class loop *loop, edge exit, 
> edge edge_to_cancel,
>       {
>         gimple *stmt = gsi_stmt (gsi);
>         int num = estimate_num_insns (stmt, &eni_size_weights);
> +       bool not_eliminatable_after_peeling = false;
>         bool likely_eliminated = false;
>         bool likely_eliminated_last = false;
>         bool likely_eliminated_peeled = false;
> @@ -304,7 +312,9 @@ tree_estimate_loop_size (class loop *loop, edge exit, 
> edge edge_to_cancel,
>  
>         /* Look for reasons why we might optimize this stmt away. */
>  
> -       if (!gimple_has_side_effects (stmt))
> +       if (gimple_has_side_effects (stmt))
> +         not_eliminatable_after_peeling = true;
> +       else
>           {
>             /* Exit conditional.  */
>             if (exit && body[i] == exit->src
> @@ -377,11 +387,15 @@ tree_estimate_loop_size (class loop *loop, edge exit, 
> edge edge_to_cancel,
>         size->overall += num;
>         if (likely_eliminated || likely_eliminated_peeled)
>           size->eliminated_by_peeling += num;
> +       if (not_eliminatable_after_peeling)
> +         size->not_eliminatable_after_peeling += num;
>         if (!after_exit)
>           {
>             size->last_iteration += num;
>             if (likely_eliminated || likely_eliminated_last)
>               size->last_iteration_eliminated_by_peeling += num;
> +           if (not_eliminatable_after_peeling)
> +             size->last_iteration_not_eliminatable_after_peeling += num;
>           }
>         if ((size->overall * 3 / 2 - size->eliminated_by_peeling
>             - size->last_iteration_eliminated_by_peeling) > upper_bound)
> @@ -437,18 +451,22 @@ tree_estimate_loop_size (class loop *loop, edge exit, 
> edge edge_to_cancel,
>     It is (NUNROLL + 1) * size of loop body with taking into account
>     the fact that in last copy everything after exit conditional
>     is dead and that some instructions will be eliminated after
> -   peeling.  */
> +   peeling.  Set *EST_ELIMINATED to the number of stmts that could be
> +   optimistically eliminated by followup transforms.  */
>  static unsigned HOST_WIDE_INT
>  estimated_unrolled_size (struct loop_size *size,
> +                      unsigned HOST_WIDE_INT *est_eliminated,
>                        unsigned HOST_WIDE_INT nunroll)
>  {
>    HOST_WIDE_INT unr_insns = ((nunroll)
>                            * (HOST_WIDE_INT) (size->overall
>                                               - size->eliminated_by_peeling));
> -  if (!nunroll)
> -    unr_insns = 0;
> +  HOST_WIDE_INT not_elim
> +    = ((nunroll) * (HOST_WIDE_INT) size->not_eliminatable_after_peeling);
>    unr_insns += size->last_iteration - 
> size->last_iteration_eliminated_by_peeling;
> +  not_elim += size->last_iteration_not_eliminatable_after_peeling;
>  
> +  *est_eliminated = (unr_insns - not_elim) / 3;
>    return unr_insns;
>  }
>  
> @@ -829,8 +847,9 @@ try_unroll_loop_completely (class loop *loop,
>           }
>  
>         unsigned HOST_WIDE_INT ninsns = size.overall;
> +       unsigned HOST_WIDE_INT est_eliminated;
>         unsigned HOST_WIDE_INT unr_insns
> -         = estimated_unrolled_size (&size, n_unroll);
> +         = estimated_unrolled_size (&size, &est_eliminated, n_unroll);
>         if (dump_file && (dump_flags & TDF_DETAILS))
>           {
>             fprintf (dump_file, "  Loop size: %d\n", (int) ninsns);
> @@ -842,7 +861,7 @@ try_unroll_loop_completely (class loop *loop,
>            cautious on guessing if the unrolling is going to be
>            profitable.
>            Move from estimated_unrolled_size to unroll small loops.  */
> -       if (unr_insns * 2 / 3
> +       if (unr_insns - est_eliminated
>             /* If there is IV variable that will become constant, we
>                save one instruction in the loop prologue we do not
>                account otherwise.  */
> @@ -919,7 +938,7 @@ try_unroll_loop_completely (class loop *loop,
>            2) Big loop after completely unroll may not be vectorized
>            by BB vectorizer.  */
>         else if ((cunrolli && !loop->inner
> -                 ? unr_insns : unr_insns * 2 / 3)
> +                 ? unr_insns : unr_insns - est_eliminated)
>                  > (unsigned) param_max_completely_peeled_insns)
>           {
>             if (dump_file && (dump_flags & TDF_DETAILS))
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [PATCH] tree-optimization/115825 - improve unroll estimates for volatile accesses

Reply via email to