On Mon, Nov 12, 2018 at 7:20 PM Kyrill Tkachov <kyrylo.tkac...@foss.arm.com> wrote: > > > On 12/11/18 14:10, Richard Biener wrote: > > On Fri, Nov 9, 2018 at 6:57 PM Kyrill Tkachov > > <kyrylo.tkac...@foss.arm.com> wrote: > >> On 09/11/18 12:18, Richard Biener wrote: > >>> On Fri, Nov 9, 2018 at 11:47 AM Kyrill Tkachov > >>> <kyrylo.tkac...@foss.arm.com> wrote: > >>>> Hi all, > >>>> > >>>> In this testcase the codegen for VLA SVE is worse than it could be due > >>>> to unrolling: > >>>> > >>>> fully_peel_me: > >>>> mov x1, 5 > >>>> ptrue p1.d, all > >>>> whilelo p0.d, xzr, x1 > >>>> ld1d z0.d, p0/z, [x0] > >>>> fadd z0.d, z0.d, z0.d > >>>> st1d z0.d, p0, [x0] > >>>> cntd x2 > >>>> addvl x3, x0, #1 > >>>> whilelo p0.d, x2, x1 > >>>> beq .L1 > >>>> ld1d z0.d, p0/z, [x0, #1, mul vl] > >>>> fadd z0.d, z0.d, z0.d > >>>> st1d z0.d, p0, [x3] > >>>> cntw x2 > >>>> incb x0, all, mul #2 > >>>> whilelo p0.d, x2, x1 > >>>> beq .L1 > >>>> ld1d z0.d, p0/z, [x0] > >>>> fadd z0.d, z0.d, z0.d > >>>> st1d z0.d, p0, [x0] > >>>> .L1: > >>>> ret > >>>> > >>>> In this case, due to the vector-length-agnostic nature of SVE the > >>>> compiler doesn't know the loop iteration count. > >>>> For such loops we don't want to unroll if we don't end up eliminating > >>>> branches as this just bloats code size > >>>> and hurts icache performance. > >>>> > >>>> This patch introduces a new unroll-known-loop-iterations-only param that > >>>> disables cunroll when the loop iteration > >>>> count is unknown (SCEV_NOT_KNOWN). This case occurs much more often for > >>>> SVE VLA code, but it does help some > >>>> Advanced SIMD cases as well where loops with an unknown iteration count > >>>> are not unrolled when it doesn't eliminate > >>>> the branches. > >>>> > >>>> So for the above testcase we generate now: > >>>> fully_peel_me: > >>>> mov x2, 5 > >>>> mov x3, x2 > >>>> mov x1, 0 > >>>> whilelo p0.d, xzr, x2 > >>>> ptrue p1.d, all > >>>> .L2: > >>>> ld1d z0.d, p0/z, [x0, x1, lsl 3] > >>>> fadd z0.d, z0.d, z0.d > >>>> st1d z0.d, p0, [x0, x1, lsl 3] > >>>> incd x1 > >>>> whilelo p0.d, x1, x3 > >>>> bne .L2 > >>>> ret > >>>> > >>>> Not perfect still, but it's preferable to the original code. > >>>> The new param is enabled by default on aarch64 but disabled for other > >>>> targets, leaving their behaviour unchanged > >>>> (until other target people experiment with it and set it, if > >>>> appropriate). > >>>> > >>>> Bootstrapped and tested on aarch64-none-linux-gnu. > >>>> Benchmarked on SPEC2017 on a Cortex-A57 and there are no differences in > >>>> performance. > >>>> > >>>> Ok for trunk? > >>> Hum. Why introduce a new --param and not simply key on > >>> flag_peel_loops instead? That is > >>> enabled by default at -O3 and with FDO but you of course can control > >>> that in your targets > >>> post-option-processing hook. > >> You mean like this? > >> It's certainly a simpler patch, but I was just a bit hesitant of making > >> this change for all targets :) > >> But I suppose it's a reasonable change. > > No, that change is backward. What I said is that peeling is already > > conditional on > > flag_peel_loops and that is enabled by -O3. So you want to disable > > flag_peel_loops for > > SVE instead in the target. > > Sorry, I got confused by the similarly named functions. > I'm talking about try_unroll_loop_completely when run as part of > canonicalize_induction_variables i.e. the "ivcanon" pass > (sorry about blaming cunroll here). This doesn't get called through the > try_unroll_loops_completely path.
Well, peeling gets disabled. From your patch I see you want to disable "unrolling" when the number of loop iteration is not constant. That is called peeling where we need to emit the loop exit test N times. Did you check your testcases with -fno-peel-loops? > try_unroll_loop_completely doesn't get disabled with -fno-peel-loops or > -fno-unroll-loops. > Maybe disabling peeling inside try_unroll_loop_completely itself when > !flag_peel_loops is viable? > > Thanks, > Kyrill > > >>> It might also make sense to have more fine-grained control for this > >>> and allow a target > >>> to say whether it wants to peel a specific loop or not when the > >>> middle-end thinks that > >>> would be profitable. > >> Can be worth looking at as a follow-up. Do you envisage the target > >> analysing > >> the gimple statements of the loop to figure out its cost? > > Kind-of. Sth like > > > > bool targetm.peel_loop (struct loop *); > > > > I have no idea whether you can easily detect a SVE vectorized loop though. > > Maybe there's always a special IV or so (the mask?) > > > > Richard. > > > >> Thanks, > >> Kyrill > >> > >> > >> 2018-11-09 Kyrylo Tkachov <kyrylo.tkac...@arm.com> > >> > >> * tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Do not > >> unroll > >> loop when number of iterations is not known and flag_peel_loops > >> is in > >> effect. > >> > >> 2018-11-09 Kyrylo Tkachov <kyrylo.tkac...@arm.com> > >> > >> * gcc.target/aarch64/sve/unroll-1.c: New test. > >> >