Richard Biener <rguent...@suse.de> writes:
> On Wed, 9 Sep 2020, Andrea Corallo wrote:
>> Hi all,
>> 
>> this patch is meant not to generate predication in loop when the
>> loop is operating only on full vectors.
>> 
>> Ex:
>> 
>> #+BEGIN_SRC C
>> /* Vector length is 256.  */
>> void
>> f (int *restrict x, int *restrict y, unsigned int n) {
>>   for (unsigned int i = 0; i < n * 8; ++i)
>>     x[i] += y[i];
>> }
>> #+END_SRC
>> 
>> Compiling on aarch64 with -O3 -msve-vector-bits=256 current trunk
>> gives:
>> 
>> #+BEGIN_SRC asm
>> f:
>> .LFB0:
>>         .cfi_startproc
>>         lsl     w2, w2, 3
>>         cbz     w2, .L1
>>         mov     x3, 0
>>         whilelo p0.s, xzr, x2
>>         .p2align 3,,7
>> .L3:
>>         ld1w    z0.s, p0/z, [x0, x3, lsl 2]
>>         ld1w    z1.s, p0/z, [x1, x3, lsl 2]
>>         add     z0.s, z0.s, z1.s
>>         st1w    z0.s, p0, [x0, x3, lsl 2]
>>         add     x3, x3, 8
>>         whilelo p0.s, x3, x2
>>         b.any   .L3
>> .L1:
>>         ret
>>         .cfi_endproc
>> #+END_SRC
>> 
>> With the patch applied:
>> 
>> #+BEGIN_SRC asm
>> f:
>> .LFB0:
>>         .cfi_startproc
>>         lsl     w3, w2, 3
>>         cbz     w3, .L1
>>         mov     x2, 0
>>         ptrue   p0.b, vl32
>>         .p2align 3,,7
>> .L3:
>>         ld1w    z0.s, p0/z, [x0, x2, lsl 2]
>>         ld1w    z1.s, p0/z, [x1, x2, lsl 2]
>>         add     z0.s, z0.s, z1.s
>>         st1w    z0.s, p0, [x0, x2, lsl 2]
>>         add     x2, x2, 8
>>         cmp     x2, x3
>>         bne     .L3
>> .L1:
>>         ret
>>         .cfi_endproc
>> #+END_SRC
>> 
>> To achieve this we check earlier if the loop needs peeling and if is
>> not the case we do not set LOOP_VINFO_USING_PARTIAL_VECTORS_P to true.
>> 
>> I moved some logic from 'determine_peel_for_niter' to
>> 'vect_need_peeling_or_part_vects_p' so it can be used for this purpose.
>> 
>> Bootstrapped and regtested on aarch64-linux-gnu.
>
> Looks OK to me, the comment
>
> @@ -2267,7 +2278,10 @@ start_over:
>      {
>        if (param_vect_partial_vector_usage == 0)
>         LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) = false;
> -      else if (vect_verify_full_masking (loop_vinfo)
> +      else if ((vect_verify_full_masking (loop_vinfo)
> +               && vect_need_peeling_or_part_vects_p (loop_vinfo))
> +              /* Don't use partial vectors if we don't need to peel the
> +                 loop.  */
>                || vect_verify_loop_lens (loop_vinfo))
>
> seems to be oddly misplaced (I'd put it before the call).

Yeah, IMO it'd better to put it in the first “if”.

Also, very minor, but I think it'd be better not to shorten the name:

  vect_need_peeling_or_partial_vectors_p

Thanks,
Richard

Reply via email to