Re: [PATCH][0/n] Merge from match-and-simplify

Richard Biener Fri, 17 Oct 2014 01:00:07 -0700

On Thu, 16 Oct 2014, Sebastian Pop wrote:

> Richard Biener wrote:
> > 
> > I have posted 5 patches as part of a larger series to merge
> > (parts) from the match-and-simplify branch.  While I think
> > there was overall consensus that the idea behind the project
> > is sound there are technical questions left for how the
> > thing should look in the end.  I've raised them in 3/n
> > which is the only patch of the series that contains any
> > patterns sofar.
> > 
> > To re-iterate here (as I expect most people will only look
> > at [0/n] patches ;)), the question is whether we are fine
> > with making fold-const (thus fold_{unary,binary,ternary})
> > not handle some cases it handles currently.
> 
> I have tested on aarch64 all the code in the match-and-simplify against trunk 
> as
> of the last merge at r216315:
> 
> 2014-10-16  Richard Biener  <[email protected]>
> 
>         Merge from trunk r216235 through r216315.
> 
> Overall, I see a lot of perf regressions (about 2/3 of the tests) than
> improvements (1/3 of the tests).  I will try to reduce tests.


Note that the branch goes much further in exercising the machinery
than I want to merge at this point (that applies mostly to all
passes using the SSA propagator such as CCP and VRP and passes
exercising value-numbering - FRE and PRE).

It may also simply show the effect of now folding all statements
from tree-ssa-forwprop.c.  I have yet to investigate the testsuite
fallout of [1/n] to [5/n] - testresults have been very noisy lately
due to the C11 change and now ICF.

> For instance, saxpy regresses at -O3 on aarch64:
> 
> void saxpy(double* x, double* y, double* z) {
>     int i=0;
>     for (i = 0 ; i < ARRAY_SIZE; i++) {
>         z[i] = x[i] + scalar*y[i];
>     }
> }
> 
> $ diff -u base.s mas.s
> --- base.s      2014-10-16 15:30:15.351430000 -0500
> +++ mas.s       2014-10-16 15:30:16.183035000 -0500
> @@ -2,12 +2,14 @@
>         add     x1, x2, 800
>         ldr     q0, [x0, x2]
>         add     x3, x2, 1600
> +       cmp     x0, 784
>         ldr     q1, [x0, x1]
> +       add     x1, x0, 16
>         fmla    v0.2d, v1.2d, v2.2d
>         str     q0, [x0, x3]
> -       add     x0, x0, 16
> -       cmp     x0, 800
> +       mov     x0, x1
>         bne     .L140
>  .LBE179:
> -       subs    w4, w4, #1
> +       cmp     w4, 1
> +       sub     w4, w4, #1
>         bne     .L139

I don't understand AARCH64 assembly very well but the above looks like
RTL issues and/or IVOPTs issues?

Thanks for doing performance measurements.

Richard.

Re: [PATCH][0/n] Merge from match-and-simplify

Reply via email to