Richard Biener <rguent...@suse.de> writes: > On Wed, 9 Sep 2020, Andrea Corallo wrote: >> Hi all, >> >> this patch is meant not to generate predication in loop when the >> loop is operating only on full vectors. >> >> Ex: >> >> #+BEGIN_SRC C >> /* Vector length is 256. */ >> void >> f (int *restrict x, int *restrict y, unsigned int n) { >> for (unsigned int i = 0; i < n * 8; ++i) >> x[i] += y[i]; >> } >> #+END_SRC >> >> Compiling on aarch64 with -O3 -msve-vector-bits=256 current trunk >> gives: >> >> #+BEGIN_SRC asm >> f: >> .LFB0: >> .cfi_startproc >> lsl w2, w2, 3 >> cbz w2, .L1 >> mov x3, 0 >> whilelo p0.s, xzr, x2 >> .p2align 3,,7 >> .L3: >> ld1w z0.s, p0/z, [x0, x3, lsl 2] >> ld1w z1.s, p0/z, [x1, x3, lsl 2] >> add z0.s, z0.s, z1.s >> st1w z0.s, p0, [x0, x3, lsl 2] >> add x3, x3, 8 >> whilelo p0.s, x3, x2 >> b.any .L3 >> .L1: >> ret >> .cfi_endproc >> #+END_SRC >> >> With the patch applied: >> >> #+BEGIN_SRC asm >> f: >> .LFB0: >> .cfi_startproc >> lsl w3, w2, 3 >> cbz w3, .L1 >> mov x2, 0 >> ptrue p0.b, vl32 >> .p2align 3,,7 >> .L3: >> ld1w z0.s, p0/z, [x0, x2, lsl 2] >> ld1w z1.s, p0/z, [x1, x2, lsl 2] >> add z0.s, z0.s, z1.s >> st1w z0.s, p0, [x0, x2, lsl 2] >> add x2, x2, 8 >> cmp x2, x3 >> bne .L3 >> .L1: >> ret >> .cfi_endproc >> #+END_SRC >> >> To achieve this we check earlier if the loop needs peeling and if is >> not the case we do not set LOOP_VINFO_USING_PARTIAL_VECTORS_P to true. >> >> I moved some logic from 'determine_peel_for_niter' to >> 'vect_need_peeling_or_part_vects_p' so it can be used for this purpose. >> >> Bootstrapped and regtested on aarch64-linux-gnu. > > Looks OK to me, the comment > > @@ -2267,7 +2278,10 @@ start_over: > { > if (param_vect_partial_vector_usage == 0) > LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) = false; > - else if (vect_verify_full_masking (loop_vinfo) > + else if ((vect_verify_full_masking (loop_vinfo) > + && vect_need_peeling_or_part_vects_p (loop_vinfo)) > + /* Don't use partial vectors if we don't need to peel the > + loop. */ > || vect_verify_loop_lens (loop_vinfo)) > > seems to be oddly misplaced (I'd put it before the call).
Yeah, IMO it'd better to put it in the first “if”. Also, very minor, but I think it'd be better not to shorten the name: vect_need_peeling_or_partial_vectors_p Thanks, Richard