Richard Biener <rguent...@suse.de> writes:
> So I've come back to PR66313 and found a solution to the tailrecursion
> missed optimization when fixing the factoring folding to use an unsigned
> type when we're not sure of overflow.
>
> The folding part is identical to my last try from 2015, the tailrecursion
> part makes us handle intermittent stmts that were introduced by foldings
> that "clobber" our quest walking the single-use chain of stmts between
> the call and the return (and failing at all stmts that are not part
> of said chain).  A simple solution is to move the stmts that are not
> part of the chain and that we can move before the call.  That handles
> the leaf conversions that now appear for tree-ssa/tailrecursion-6.c
>
> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
>
> Richard.
>
> 2017-05-31  Richard Biener  <rguent...@suse.de>
>
>       PR middle-end/66313
>       * fold-const.c (fold_plusminus_mult_expr): If the factored
>       factor may be zero use a wrapping type for the inner operation.
>       * tree-tailcall.c (independent_of_stmt_p): Pass in to_move bitmap
>       and handle moved defs.
>       (process_assignment): Properly guard the unary op case.  Return a
>       tri-state indicating that moving the stmt before the call may allow
>       to continue.  Pass through to_move.
>       (find_tail_calls): Handle moving unrelated defs before
>       the call.
>
>       * c-c++-common/ubsan/pr66313.c: New testcase.
>       * gcc.dg/tree-ssa/loop-15.c: Adjust.
>
> Index: gcc/fold-const.c
> ===================================================================
> *** gcc/fold-const.c.orig     2015-10-29 12:32:33.302782318 +0100
> --- gcc/fold-const.c  2015-10-29 14:08:39.936497739 +0100
> *************** fold_plusminus_mult_expr (location_t loc
> *** 6916,6925 ****
>       }
>     same = NULL_TREE;
>   
> !   if (operand_equal_p (arg01, arg11, 0))
> !     same = arg01, alt0 = arg00, alt1 = arg10;
> !   else if (operand_equal_p (arg00, arg10, 0))
>       same = arg00, alt0 = arg01, alt1 = arg11;
>     else if (operand_equal_p (arg00, arg11, 0))
>       same = arg00, alt0 = arg01, alt1 = arg10;
>     else if (operand_equal_p (arg01, arg10, 0))
> --- 6916,6926 ----
>       }
>     same = NULL_TREE;
>   
> !   /* Prefer factoring a common non-constant.  */
> !   if (operand_equal_p (arg00, arg10, 0))
>       same = arg00, alt0 = arg01, alt1 = arg11;
> +   else if (operand_equal_p (arg01, arg11, 0))
> +     same = arg01, alt0 = arg00, alt1 = arg10;
>     else if (operand_equal_p (arg00, arg11, 0))
>       same = arg00, alt0 = arg01, alt1 = arg10;
>     else if (operand_equal_p (arg01, arg10, 0))
> *************** fold_plusminus_mult_expr (location_t loc
> *** 6974,6987 ****
>       }
>       }
>   
> !   if (same)
>       return fold_build2_loc (loc, MULT_EXPR, type,
>                       fold_build2_loc (loc, code, type,
>                                    fold_convert_loc (loc, type, alt0),
>                                    fold_convert_loc (loc, type, alt1)),
>                       fold_convert_loc (loc, type, same));
>   
> !   return NULL_TREE;
>   }
>   
>   /* Subroutine of native_encode_expr.  Encode the INTEGER_CST
> --- 6975,7010 ----
>       }
>       }
>   
> !   if (!same)
> !     return NULL_TREE;
> ! 
> !   if (! INTEGRAL_TYPE_P (type)
> !       || TYPE_OVERFLOW_WRAPS (type)
> !       /* We are neither factoring zero nor minus one.  */
> !       || TREE_CODE (same) == INTEGER_CST)
>       return fold_build2_loc (loc, MULT_EXPR, type,
>                       fold_build2_loc (loc, code, type,
>                                    fold_convert_loc (loc, type, alt0),
>                                    fold_convert_loc (loc, type, alt1)),
>                       fold_convert_loc (loc, type, same));
>   
> !   /* Same may be zero and thus the operation 'code' may overflow.  Likewise
> !      same may be minus one and thus the multiplication may overflow.  
> Perform
> !      the operations in an unsigned type.  */
> !   tree utype = unsigned_type_for (type);
> !   tree tem = fold_build2_loc (loc, code, utype,
> !                           fold_convert_loc (loc, utype, alt0),
> !                           fold_convert_loc (loc, utype, alt1));
> !   /* If the sum evaluated to a constant that is not -INF the multiplication
> !      cannot overflow.  */
> !   if (TREE_CODE (tem) == INTEGER_CST
> !       && ! wi::eq_p (tem, wi::min_value (TYPE_PRECISION (utype), SIGNED)))
> !     return fold_build2_loc (loc, MULT_EXPR, type,
> !                         fold_convert (type, tem), same);
> ! 
> !   return fold_convert_loc (loc, type,
> !                        fold_build2_loc (loc, MULT_EXPR, utype, tem,
> !                                         fold_convert_loc (loc, utype, 
> same)));
>   }
>   
>   /* Subroutine of native_encode_expr.  Encode the INTEGER_CST

Sorry if you already know, but this part means that we can no longer
vectorise:

int
f (int *x, int b1, int b2, int b3)
{
  int foo = 0;
  for (int i1 = 0; i1 < b1; ++i1)
    for (int i2 = 0; i2 < b2; ++i2)
      for (int i3 = 0; i3 < b3; ++i3)
        foo += x[i1 * b2 * b3 + i2 * b3 + (i3 - 1)];
  return foo;
}

We now convert all the arithmetic in the [...] to unsigned int
and reassociate it so that the "- 1" is applied last.  We then assume
that the overflow in this subtraction is well-defined and that the
&x[...] might not be linear for the inner loop.

Thanks,
Richard

Reply via email to