Re: [PATCH][cunroll] Add unroll-known-loop-iterations-only param and use it in aarch64

Richard Biener Mon, 12 Nov 2018 06:11:01 -0800

On Fri, Nov 9, 2018 at 6:57 PM Kyrill Tkachov
<kyrylo.tkac...@foss.arm.com> wrote:
>
> On 09/11/18 12:18, Richard Biener wrote:
> > On Fri, Nov 9, 2018 at 11:47 AM Kyrill Tkachov
> > <kyrylo.tkac...@foss.arm.com> wrote:
> >>
> >> Hi all,
> >>
> >> In this testcase the codegen for VLA SVE is worse than it could be due to 
> >> unrolling:
> >>
> >> fully_peel_me:
> >>          mov     x1, 5
> >>          ptrue   p1.d, all
> >>          whilelo p0.d, xzr, x1
> >>          ld1d    z0.d, p0/z, [x0]
> >>          fadd    z0.d, z0.d, z0.d
> >>          st1d    z0.d, p0, [x0]
> >>          cntd    x2
> >>          addvl   x3, x0, #1
> >>          whilelo p0.d, x2, x1
> >>          beq     .L1
> >>          ld1d    z0.d, p0/z, [x0, #1, mul vl]
> >>          fadd    z0.d, z0.d, z0.d
> >>          st1d    z0.d, p0, [x3]
> >>          cntw    x2
> >>          incb    x0, all, mul #2
> >>          whilelo p0.d, x2, x1
> >>          beq     .L1
> >>          ld1d    z0.d, p0/z, [x0]
> >>          fadd    z0.d, z0.d, z0.d
> >>          st1d    z0.d, p0, [x0]
> >> .L1:
> >>          ret
> >>
> >> In this case, due to the vector-length-agnostic nature of SVE the compiler 
> >> doesn't know the loop iteration count.
> >> For such loops we don't want to unroll if we don't end up eliminating 
> >> branches as this just bloats code size
> >> and hurts icache performance.
> >>
> >> This patch introduces a new unroll-known-loop-iterations-only param that 
> >> disables cunroll when the loop iteration
> >> count is unknown (SCEV_NOT_KNOWN). This case occurs much more often for 
> >> SVE VLA code, but it does help some
> >> Advanced SIMD cases as well where loops with an unknown iteration count 
> >> are not unrolled when it doesn't eliminate
> >> the branches.
> >>
> >> So for the above testcase we generate now:
> >> fully_peel_me:
> >>          mov     x2, 5
> >>          mov     x3, x2
> >>          mov     x1, 0
> >>          whilelo p0.d, xzr, x2
> >>          ptrue   p1.d, all
> >> .L2:
> >>          ld1d    z0.d, p0/z, [x0, x1, lsl 3]
> >>          fadd    z0.d, z0.d, z0.d
> >>          st1d    z0.d, p0, [x0, x1, lsl 3]
> >>          incd    x1
> >>          whilelo p0.d, x1, x3
> >>          bne     .L2
> >>          ret
> >>
> >> Not perfect still, but it's preferable to the original code.
> >> The new param is enabled by default on aarch64 but disabled for other 
> >> targets, leaving their behaviour unchanged
> >> (until other target people experiment with it and set it, if appropriate).
> >>
> >> Bootstrapped and tested on aarch64-none-linux-gnu.
> >> Benchmarked on SPEC2017 on a Cortex-A57 and there are no differences in 
> >> performance.
> >>
> >> Ok for trunk?
> >
> > Hum.  Why introduce a new --param and not simply key on
> > flag_peel_loops instead?  That is
> > enabled by default at -O3 and with FDO but you of course can control
> > that in your targets
> > post-option-processing hook.
>
> You mean like this?
> It's certainly a simpler patch, but I was just a bit hesitant of making this 
> change for all targets :)
> But I suppose it's a reasonable change.


No, that change is backward.  What I said is that peeling is already
conditional on
flag_peel_loops and that is enabled by -O3.  So you want to disable
flag_peel_loops for
SVE instead in the target.

> >
> > It might also make sense to have more fine-grained control for this
> > and allow a target
> > to say whether it wants to peel a specific loop or not when the
> > middle-end thinks that
> > would be profitable.
>
> Can be worth looking at as a follow-up. Do you envisage the target analysing
> the gimple statements of the loop to figure out its cost?

Kind-of.  Sth like

  bool targetm.peel_loop (struct loop *);

I have no idea whether you can easily detect a SVE vectorized loop though.
Maybe there's always a special IV or so (the mask?)

Richard.

> Thanks,
> Kyrill
>
>
> 2018-11-09  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>
>
>         * tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Do not unroll
>         loop when number of iterations is not known and flag_peel_loops is in
>         effect.
>
> 2018-11-09  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>
>
>         * gcc.target/aarch64/sve/unroll-1.c: New test.
>

Re: [PATCH][cunroll] Add unroll-known-loop-iterations-only param and use it in aarch64

Reply via email to