On 11/20/18 6:42 AM, Michael Matz wrote:
> Hi,
> 
> this bug report is about cris generating worse code since tree-ssa.  The 
> effect is also visible on x86-64.  The symptom is that the work horse of 
> adler32.c (from zlib) needs spills in the inner loop, while gcc 3 did not, 
> and those spills go away with -fno-tree-reassoc.
> 
> The underlying reason for the spills is register pressure, which could 
> either be rectified by the pressure aware scheduling (which cris doesn't 
> have), or by simply not generating high pressure code to begin with.  In 
> this case it's TER which ultimately causes the register pressure to 
> increase, and there are many plans in people minds how to fix this (make 
> TER regpressure aware, do some regpressure scheduling on gimple, or even 
> more ambitious things), but this patch doesn't tackle this.  Instead it 
> makes reassoc not generate the situation which causes TER to run wild.
> 
> TER increasing register pressure is a long standing problem and so it has 
> some heuristics to avoid that.  One wobbly heuristic is that it doesn't 
> TER expressions together that have the same base variable as their LHSs.
> But reassoc generates only anonymous SSA names for its temporary 
> subexpressions, so that TER heuristic doesn't apply.  In this testcase 
> it's even the case that reassoc doesn't really change much code (one 
> addition moves from the end to the beginning of the inner loop), so that 
> whole rewriting is even pointless.
> 
> In any case, let's use copy_ssa_name instead of make_ssa_name, when we 
> have an obvious LHS; that leads to TER making the same decisions with and 
> without -fno-tree-reassoc, leading to the same register pressure and no
> spills.
> 
> On x86-64 the effect is:
>   before patch: 48 bytes stackframe, 24 stack 
>     accesses (most of them in the loops), 576 bytes codesize;
>   after patch: no stack frame, no stack accesses, 438 bytes codesize
> 
> On cris:
>   before patch: 64 bytes stack frame, 27 stack access in loops, size of .s 
>     145 lines
>   after patch: 20 bytes stack frame (as it uses callee saved regs, which 
>     is also complained about in the bug report), but no stack accesses
>     in loops, size of .s: 125 lines
> 
> I'm wondering about testcase: should I add an x86-64 specific that tests 
> for no stack accesses, or would that be too constraining in the future?
I think that'd be a fine test.  Yea it's constraining, but isn't that
the point, to make sure we never generate something dumb for this code.

If we get to a point where we have to have a stack reference we can
re-evaluate the test.

> 
> Regstrapped on x86-64-linux, no regressions.  Okay for trunk?
> 
> 
> Ciao,
> Michael.
> 
>       PR tree-optimization/37916
>       * tree-ssa-reassoc.c (make_new_ssa_for_def): Use copy_ssa_name.
>       (rewrite_expr_tree, linearize_expr, negate_value,
>       repropagate_negates, attempt_builtin_copysign,
>       reassociate_bb): Likewise.
I wonder if this would help 21182.  I don't think anyone ever looked to
see if reassoc was involved in the issues in that BZ.

OK for the trunk.

jeff

Reply via email to