On Tue, 30 Nov 2021, Jakub Jelinek wrote:
> Hi!
>
> Seems simplify_associate_operation is quadratic, which isn't a big deal
> for use during combine and other similar RTL passes, because those never
> try to combine expressions from more than a few instructions and because
> those instructions need to be recognized the machine description also bounds
> how many expressions can appear in there.
Well ...
> var-tracking has depth limits only for some cases and unlimited depth
> for the vt_expand_loc though:
> /* This is the value used during expansion of locations. We want it
> to be unbounded, so that variables expanded deep in a recursion
> nest are fully evaluated, so that their values are cached
> correctly. We avoid recursion cycles through other means, and we
> don't unshare RTL, so excess complexity is not a problem. */
> #define EXPR_DEPTH (INT_MAX)
> /* We use this to keep too-complex expressions from being emitted as
> location notes, and then to debug information. Users can trade
> compile time for ridiculously complex expressions, although they're
> seldom useful, and they may often have to be discarded as not
> representable anyway. */
> #define EXPR_USE_DEPTH (param_max_vartrack_expr_depth)
>
> IMO for very large expressions it isn't worth trying to reassociate though,
> in fact e.g. for the new testcase below keeping it as is has bigger chance
> of generating smaller debug info which the dwarf2out.c part of the change
> tries to achieve - if a binary operation has the same operands, we can
> use DW_OP_dup and not bother computing the possibly large operand again.
>
> This patch punts if the associate operands contain together more than
> 64 same operations, which can happen only during var-tracking.
> During bootstrap/regtest on x86_64-linux and i686-linux, this triggers
> only on the new testcase and on gcc.dg/torture/pr88597.c.
> I think given the 16 element static buffer in subrtx_iterator::array_type
> it shouldn't slow down the common case of small expressions, but have
> been wondering whether we shouldn't have some in_vartrack global flag
> or guard it with
> (current_pass && strcmp (current_pass->name, "vartrack") == 0).
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> Another possibility to deal with the power expressions in debug info
> would be to introduce some new RTL operation for the pow{,i} (x, n)
> case, allow that solely in debug insns and expand those into DWARF
> using a loop. But that seems like quite a lot of work for something rarely
> used (especially when powi for larger n is only useful for 0 and 1 inputs
> because anything else overflows).
I wonder given we now have 'simplify_context' whether we can
track a re-association budget we can eat from. At least your
code to determine whether the expression is too large is
quadratic as well (but bound to 64, so just a very large constant
overhead for an outermost expression of size 63). We already
have a mem_depth there, so just have reassoc_times and punt
if that reaches --param max-simplify-reassoc-times, incrementing
it each time simplify_associative_operation is entered?
Thanks,
Richard.
> 2021-11-30 Jakub Jelinek <[email protected]>
>
> PR rtl-optimization/102356
> * simplify-rtx.c: Include rtl-iter.h.
> (simplify_associative_operation): Don't reassociate very large
> expressions with 64 or more CODE subrtxes in the operands.
> * dwarf2out.c (mem_loc_descriptor): Optimize binary operation
> with both operands the same using DW_OP_dup.
>
> * gcc.dg/pr102356.c: New test.
>
> --- gcc/simplify-rtx.c.jj 2021-11-05 00:43:22.576624649 +0100
> +++ gcc/simplify-rtx.c 2021-11-29 19:46:29.674750656 +0100
> @@ -37,6 +37,7 @@ along with GCC; see the file COPYING3.
> #include "selftest-rtl.h"
> #include "rtx-vector-builder.h"
> #include "rtlanal.h"
> +#include "rtl-iter.h"
>
> /* Simplification and canonicalization of RTL. */
>
> @@ -2263,9 +2264,40 @@ simplify_context::simplify_associative_o
> {
> rtx tem;
>
> + if (GET_CODE (op0) == code || GET_CODE (op1) == code)
> + {
> + /* During vartrack, the expressions can grow arbitrarily large.
> + Reassociation isn't really useful for larger expressions
> + and can be very compile time expensive. */
> + unsigned count = 0;
> + subrtx_iterator::array_type array;
> + FOR_EACH_SUBRTX (iter, array, op0, NONCONST)
> + {
> + const_rtx x = *iter;
> + if (GET_CODE (x) == code)
> + {
> + if (count++ >= 64)
> + return NULL_RTX;
> + }
> + else
> + iter.skip_subrtxes ();
> + }
> + FOR_EACH_SUBRTX (iter, array, op1, NONCONST)
> + {
> + const_rtx x = *iter;
> + if (GET_CODE (x) == code)
> + {
> + if (count++ >= 64)
> + return NULL_RTX;
> + }
> + else
> + iter.skip_subrtxes ();
> + }
> + }
> +
> /* Linearize the operator to the left. */
> if (GET_CODE (op1) == code)
> {
> /* "(a op b) op (c op d)" becomes "((a op b) op c) op d)". */
> if (GET_CODE (op0) == code)
> {
> --- gcc/dwarf2out.c.jj 2021-11-29 14:24:14.053634713 +0100
> +++ gcc/dwarf2out.c 2021-11-29 19:44:54.192113401 +0100
> @@ -16363,6 +16363,15 @@ mem_loc_descriptor (rtx rtl, machine_mod
> do_binop:
> op0 = mem_loc_descriptor (XEXP (rtl, 0), mode, mem_mode,
> VAR_INIT_STATUS_INITIALIZED);
> + if (XEXP (rtl, 0) == XEXP (rtl, 1))
> + {
> + if (op0 == 0)
> + break;
> + mem_loc_result = op0;
> + add_loc_descr (&mem_loc_result, new_loc_descr (DW_OP_dup, 0, 0));
> + add_loc_descr (&mem_loc_result, new_loc_descr (op, 0, 0));
> + break;
> + }
> op1 = mem_loc_descriptor (XEXP (rtl, 1), mode, mem_mode,
> VAR_INIT_STATUS_INITIALIZED);
>
> --- gcc/testsuite/gcc.dg/pr102356.c.jj 2021-11-29 19:57:47.061075856
> +0100
> +++ gcc/testsuite/gcc.dg/pr102356.c 2021-11-29 19:57:26.852364489 +0100
> @@ -0,0 +1,33 @@
> +/* PR rtl-optimization/102356 */
> +/* { dg-do compile { target int32plus } } */
> +/* { dg-options "-O3 -g" } */
> +
> +signed char a = 0;
> +unsigned char b = 9;
> +unsigned long long c = 0xF1FBFC17225F7A57ULL;
> +int d = 0x3A6667C6;
> +
> +unsigned char
> +foo (unsigned int x)
> +{
> + unsigned int *e = &x;
> + if ((c /= ((0 * (*e *= b)) <= 0)))
> + ;
> + for (d = 9; d > 2; d -= 2)
> + {
> + c = -2;
> + do
> + if ((*e *= *e))
> + {
> + a = 4;
> + do
> + {
> + a -= 3;
> + if ((*e *= *e))
> + b = 9;
> + }
> + while (a > 2);
> + }
> + while (c++);
> + }
> +}
>
> Jakub
>
>
--
Richard Biener <[email protected]>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)