On 11/20/18 6:42 AM, Michael Matz wrote: > Hi, > > this bug report is about cris generating worse code since tree-ssa. The > effect is also visible on x86-64. The symptom is that the work horse of > adler32.c (from zlib) needs spills in the inner loop, while gcc 3 did not, > and those spills go away with -fno-tree-reassoc. > > The underlying reason for the spills is register pressure, which could > either be rectified by the pressure aware scheduling (which cris doesn't > have), or by simply not generating high pressure code to begin with. In > this case it's TER which ultimately causes the register pressure to > increase, and there are many plans in people minds how to fix this (make > TER regpressure aware, do some regpressure scheduling on gimple, or even > more ambitious things), but this patch doesn't tackle this. Instead it > makes reassoc not generate the situation which causes TER to run wild. > > TER increasing register pressure is a long standing problem and so it has > some heuristics to avoid that. One wobbly heuristic is that it doesn't > TER expressions together that have the same base variable as their LHSs. > But reassoc generates only anonymous SSA names for its temporary > subexpressions, so that TER heuristic doesn't apply. In this testcase > it's even the case that reassoc doesn't really change much code (one > addition moves from the end to the beginning of the inner loop), so that > whole rewriting is even pointless. > > In any case, let's use copy_ssa_name instead of make_ssa_name, when we > have an obvious LHS; that leads to TER making the same decisions with and > without -fno-tree-reassoc, leading to the same register pressure and no > spills. > > On x86-64 the effect is: > before patch: 48 bytes stackframe, 24 stack > accesses (most of them in the loops), 576 bytes codesize; > after patch: no stack frame, no stack accesses, 438 bytes codesize > > On cris: > before patch: 64 bytes stack frame, 27 stack access in loops, size of .s > 145 lines > after patch: 20 bytes stack frame (as it uses callee saved regs, which > is also complained about in the bug report), but no stack accesses > in loops, size of .s: 125 lines > > I'm wondering about testcase: should I add an x86-64 specific that tests > for no stack accesses, or would that be too constraining in the future? I think that'd be a fine test. Yea it's constraining, but isn't that the point, to make sure we never generate something dumb for this code.
If we get to a point where we have to have a stack reference we can re-evaluate the test. > > Regstrapped on x86-64-linux, no regressions. Okay for trunk? > > > Ciao, > Michael. > > PR tree-optimization/37916 > * tree-ssa-reassoc.c (make_new_ssa_for_def): Use copy_ssa_name. > (rewrite_expr_tree, linearize_expr, negate_value, > repropagate_negates, attempt_builtin_copysign, > reassociate_bb): Likewise. I wonder if this would help 21182. I don't think anyone ever looked to see if reassoc was involved in the issues in that BZ. OK for the trunk. jeff