On Wed, Nov 21, 2018 at 1:12 AM Jeff Law <l...@redhat.com> wrote: > > On 11/20/18 6:42 AM, Michael Matz wrote: > > Hi, > > > > this bug report is about cris generating worse code since tree-ssa. The > > effect is also visible on x86-64. The symptom is that the work horse of > > adler32.c (from zlib) needs spills in the inner loop, while gcc 3 did not, > > and those spills go away with -fno-tree-reassoc. > > > > The underlying reason for the spills is register pressure, which could > > either be rectified by the pressure aware scheduling (which cris doesn't > > have), or by simply not generating high pressure code to begin with. In > > this case it's TER which ultimately causes the register pressure to > > increase, and there are many plans in people minds how to fix this (make > > TER regpressure aware, do some regpressure scheduling on gimple, or even > > more ambitious things), but this patch doesn't tackle this. Instead it > > makes reassoc not generate the situation which causes TER to run wild. > > > > TER increasing register pressure is a long standing problem and so it has > > some heuristics to avoid that. One wobbly heuristic is that it doesn't > > TER expressions together that have the same base variable as their LHSs. > > But reassoc generates only anonymous SSA names for its temporary > > subexpressions, so that TER heuristic doesn't apply. In this testcase > > it's even the case that reassoc doesn't really change much code (one > > addition moves from the end to the beginning of the inner loop), so that > > whole rewriting is even pointless. > > > > In any case, let's use copy_ssa_name instead of make_ssa_name, when we > > have an obvious LHS; that leads to TER making the same decisions with and > > without -fno-tree-reassoc, leading to the same register pressure and no > > spills.
I don't think this is OK. Take one example, in rewrite_expr_tree for the final recursion case we replace a_1 = _2 + _3; with sth else, like _4 = _5 + 1; so we compute a new value that may not have been computed before and certainly not into the user variable a. If you change this to instead create a_4 = _5 + 1; then this leads to wrong debug info showing a value for 'a' that never existed. You can observe this with unsigned int __attribute__((noinline,noipa)) foo (unsigned int a, unsigned int b, unsigned int c, unsigned int d) { a = a + 1; a = a + b; a = a + c; a = a + d; a = a + 3; return a; } int main() { volatile unsigned x = foo (1, 3, 5, 7); return 0; } when you build this with -O -g -fno-var-tracking-assignments (VTA seems to hide the issue, probably due to not enough instructions in the end) you observe (gdb) s foo (a=1, b=3, c=5, d=7) at t.c:6 6 a = a + c; (gdb) disassemble Dump of assembler code for function foo: => 0x00000000004004b2 <+0>: lea 0x4(%rdx,%rcx,1),%eax 0x00000000004004b6 <+4>: add %esi,%eax 0x00000000004004b8 <+6>: add %edi,%eax 0x00000000004004ba <+8>: retq End of assembler dump. (gdb) p a $1 = 1 (gdb) si 7 a = a + d; (gdb) p a $2 = 16 (gdb) si 9 return a; (gdb) p a $3 = 19 (gdb) si 0x00000000004004ba 9 return a; (gdb) p a $4 = <optimized out> (gdb) p $eax $5 = 20 (gdb) quit printing values for a that never should occur. The sequence of "allowed" values is 1, 2, 5, 10, 17, 20. Both 16 and 19 should never be printed. It's quite obvious if you look at the reassoc result which is a_11 = d_7(D) + 4; a_12 = a_11 + c_5(D); a_13 = a_12 + b_3(D); a_9 = a_13 + a_1(D); return a_9; and the fact that var-tracking looks at REG_DECL for debug info location generation. With VTA we get # DEBUG BEGIN_STMT # DEBUG D#3 => a_1(D) + 1 # DEBUG a => D#3 # DEBUG BEGIN_STMT a_11 = d_7(D) + 4; # DEBUG D#2 => D#3 + b_3(D) # DEBUG a => D#2 # DEBUG BEGIN_STMT a_12 = a_11 + c_5(D); # DEBUG D#1 => D#2 + c_5(D) # DEBUG a => D#1 # DEBUG BEGIN_STMT a_13 = a_12 + b_3(D); # DEBUG a => D#1 + d_7(D) # DEBUG BEGIN_STMT a_9 = a_13 + a_1(D); # DEBUG a => a_9 # DEBUG BEGIN_STMT return a_9; but DEBUG stmts in itself do not make the code valid from a debug perspective. Note that the reassociated stmts can become DEBUG stmts itself if they are later DCEd. So your patch has to be much more careful to never change the LHS of stmts that are adjusted (which I think reassoc already does). Richard. > > On x86-64 the effect is: > > before patch: 48 bytes stackframe, 24 stack > > accesses (most of them in the loops), 576 bytes codesize; > > after patch: no stack frame, no stack accesses, 438 bytes codesize > > > > On cris: > > before patch: 64 bytes stack frame, 27 stack access in loops, size of .s > > 145 lines > > after patch: 20 bytes stack frame (as it uses callee saved regs, which > > is also complained about in the bug report), but no stack accesses > > in loops, size of .s: 125 lines > > > > I'm wondering about testcase: should I add an x86-64 specific that tests > > for no stack accesses, or would that be too constraining in the future? > I think that'd be a fine test. Yea it's constraining, but isn't that > the point, to make sure we never generate something dumb for this code. > > If we get to a point where we have to have a stack reference we can > re-evaluate the test. > > > > > Regstrapped on x86-64-linux, no regressions. Okay for trunk? > > > > > > Ciao, > > Michael. > > > > PR tree-optimization/37916 > > * tree-ssa-reassoc.c (make_new_ssa_for_def): Use copy_ssa_name. > > (rewrite_expr_tree, linearize_expr, negate_value, > > repropagate_negates, attempt_builtin_copysign, > > reassociate_bb): Likewise. > I wonder if this would help 21182. I don't think anyone ever looked to > see if reassoc was involved in the issues in that BZ. > > OK for the trunk. > > jeff