On Fri, Nov 9, 2018 at 6:57 PM Kyrill Tkachov <kyrylo.tkac...@foss.arm.com> wrote: > > On 09/11/18 12:18, Richard Biener wrote: > > On Fri, Nov 9, 2018 at 11:47 AM Kyrill Tkachov > > <kyrylo.tkac...@foss.arm.com> wrote: > >> > >> Hi all, > >> > >> In this testcase the codegen for VLA SVE is worse than it could be due to > >> unrolling: > >> > >> fully_peel_me: > >> mov x1, 5 > >> ptrue p1.d, all > >> whilelo p0.d, xzr, x1 > >> ld1d z0.d, p0/z, [x0] > >> fadd z0.d, z0.d, z0.d > >> st1d z0.d, p0, [x0] > >> cntd x2 > >> addvl x3, x0, #1 > >> whilelo p0.d, x2, x1 > >> beq .L1 > >> ld1d z0.d, p0/z, [x0, #1, mul vl] > >> fadd z0.d, z0.d, z0.d > >> st1d z0.d, p0, [x3] > >> cntw x2 > >> incb x0, all, mul #2 > >> whilelo p0.d, x2, x1 > >> beq .L1 > >> ld1d z0.d, p0/z, [x0] > >> fadd z0.d, z0.d, z0.d > >> st1d z0.d, p0, [x0] > >> .L1: > >> ret > >> > >> In this case, due to the vector-length-agnostic nature of SVE the compiler > >> doesn't know the loop iteration count. > >> For such loops we don't want to unroll if we don't end up eliminating > >> branches as this just bloats code size > >> and hurts icache performance. > >> > >> This patch introduces a new unroll-known-loop-iterations-only param that > >> disables cunroll when the loop iteration > >> count is unknown (SCEV_NOT_KNOWN). This case occurs much more often for > >> SVE VLA code, but it does help some > >> Advanced SIMD cases as well where loops with an unknown iteration count > >> are not unrolled when it doesn't eliminate > >> the branches. > >> > >> So for the above testcase we generate now: > >> fully_peel_me: > >> mov x2, 5 > >> mov x3, x2 > >> mov x1, 0 > >> whilelo p0.d, xzr, x2 > >> ptrue p1.d, all > >> .L2: > >> ld1d z0.d, p0/z, [x0, x1, lsl 3] > >> fadd z0.d, z0.d, z0.d > >> st1d z0.d, p0, [x0, x1, lsl 3] > >> incd x1 > >> whilelo p0.d, x1, x3 > >> bne .L2 > >> ret > >> > >> Not perfect still, but it's preferable to the original code. > >> The new param is enabled by default on aarch64 but disabled for other > >> targets, leaving their behaviour unchanged > >> (until other target people experiment with it and set it, if appropriate). > >> > >> Bootstrapped and tested on aarch64-none-linux-gnu. > >> Benchmarked on SPEC2017 on a Cortex-A57 and there are no differences in > >> performance. > >> > >> Ok for trunk? > > > > Hum. Why introduce a new --param and not simply key on > > flag_peel_loops instead? That is > > enabled by default at -O3 and with FDO but you of course can control > > that in your targets > > post-option-processing hook. > > You mean like this? > It's certainly a simpler patch, but I was just a bit hesitant of making this > change for all targets :) > But I suppose it's a reasonable change.
No, that change is backward. What I said is that peeling is already conditional on flag_peel_loops and that is enabled by -O3. So you want to disable flag_peel_loops for SVE instead in the target. > > > > It might also make sense to have more fine-grained control for this > > and allow a target > > to say whether it wants to peel a specific loop or not when the > > middle-end thinks that > > would be profitable. > > Can be worth looking at as a follow-up. Do you envisage the target analysing > the gimple statements of the loop to figure out its cost? Kind-of. Sth like bool targetm.peel_loop (struct loop *); I have no idea whether you can easily detect a SVE vectorized loop though. Maybe there's always a special IV or so (the mask?) Richard. > Thanks, > Kyrill > > > 2018-11-09 Kyrylo Tkachov <kyrylo.tkac...@arm.com> > > * tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Do not unroll > loop when number of iterations is not known and flag_peel_loops is in > effect. > > 2018-11-09 Kyrylo Tkachov <kyrylo.tkac...@arm.com> > > * gcc.target/aarch64/sve/unroll-1.c: New test. >