https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116573

--- Comment #4 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
(In reply to Richard Biener from comment #3)
> So when investigating "future" fallout I've seen similar differences for
> gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-1.c for example with the
> GIMPLE difference being that before we used .SELECT_VL but afterwards
> there's a MIN_EXPR to compute the length.
> 
> I've tried to read up the RVV specification but there doesn't seem to be
> a good overall operand documentation for vsetvli :(  I tried to understand
> 
> .L6:
>         mv      a4,a3
>         bleu    a3,a5,.L5  // this is likely the MIN?
>         csrr    a4,vlenb   // save VLEN to a4(?)
> .L5:
>         vsetvli zero,a4,e8,m1,ta,ma // set VLEN to a4 and store new VLEN to
> 'zero'(?)
>         vle8.v  v1,0(a1)
>         vle8.v  v2,0(a2)
>         vsetvli a6,zero,e8,m1,ta,ma // set VLEN to zero?!
>         vsaddu.vv       v1,v1,v2
>         vsetvli zero,a4,e8,m1,ta,ma // set VLEN to a4 again
>         vse8.v  v1,0(a0)
>         add     a1,a1,a5
>         add     a2,a2,a5
>         add     a0,a0,a5
>         mv      a4,a3
>         sub     a3,a3,a5
>         bgtu    a4,a5,.L6
> 
> I think the GIMPLE looks straight-forward but the code the backend generates
> looks bad, possibly the vsetvli pass is lacking here.
> 
> Now, the vectorizer doesn't use .SELECT_VL because
> 
>       if (direct_internal_fn_supported_p (IFN_SELECT_VL, iv_type,
>                                           OPTIMIZE_FOR_SPEED)
>           && LOOP_VINFO_LENS (loop_vinfo).length () == 1
>           && LOOP_VINFO_LENS (loop_vinfo)[0].factor == 1 && !slp
>           && (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
>               || !LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant ()))
>         LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo) = true;
> 
> see the !slp - the comment doesn't explain why, but for example
> vectorizable_induction simply asserts !slp_node when
> LOOP_VINFO_USING_SELECT_VL_P.  I would have expected it to be handled
> more like LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P and be disabled when
> we cannot handle code generation for a feature.
> 
> Simply removing the && !slp fixes the particular testcase above for me.
> 
> I'll leave this bug and the fallout to Ju-Zhe Zhong who added
> LOOP_VINFO_USING_SELECT_VL_P support.
> 
> Anyway, confirmed.

Hi, Richard. Thanks for the guide.

For RVV, we apply SELECT_VL to calculate the number of elements to be processed
when it's single rgroup (control) vectorization
since multiple rgroup vectorization will become more complicate and codengen
seems to be worse while using SELECT_VL to calculate the number of elements so
we prefer to using MIN in this scenario.

Single rgroup:

#define N 16
int src[N];
int dest[N];

void
foo (int n)
{
  for (int i = 0; i < n; i++)
    dest[i] = src[i] + 2;
}
Expect to use SELECT_VL:

_27 = .SELECT_VL (ivtmp_25, POLY_INT_CST [4, 4]);
  ivtmp_3 = _27 * 4;
  vect__1.8_17 = .MASK_LEN_LOAD (vectp_src.6_15, 32B, { -1, ... }, _27, 0);
  vect__2.9_19 = vect__1.8_17 + { 2, ... };
  .MASK_LEN_STORE (vectp_dest.10_21, 32B, { -1, ... }, _27, 0, vect__2.9_19);

Assembly:
.L3:
        vsetvli a5,a0,e32,m1,ta,ma
        vle32.v v1,0(a3)
        slli    a2,a5,2
        sub     a0,a0,a5
        add     a3,a3,a2
        vadd.vi v1,v1,2
        vse32.v v1,0(a4)
        add     a4,a4,a2
        bne     a0,zero,.L3


Wheras in this case (multiple rgroup):

We expect to use MIN to calculate elements:

  _42 = MIN_EXPR <ivtmp.18_5, POLY_INT_CST [16, 16]>;
  loop_len_32 = MIN_EXPR <_42, POLY_INT_CST [8, 8]>;
  loop_len_31 = _42 - loop_len_32;
  _47 = (void *) ivtmp.17_2;
  _48 = &MEM <vector([8,8]) short int> [(short int *)_47];
  .MASK_LEN_STORE (_48, 16B, { -1, ... }, loop_len_32, 0, { 1, 2, ... });
  _49 = (void *) ivtmp.21_10;
  _50 = &MEM <vector([8,8]) short int> [(short int *)_49];
  .MASK_LEN_STORE (_50, 16B, { -1, ... }, loop_len_31, 0, { 1, 2, ... });
  _21 = reciptmp_26 * loop_len_32;
  _51 = (void *) ivtmp.15_36;
  _52 = &MEM <vector([4,4]) int> [(int *)_51];
  .MASK_LEN_STORE (_52, 32B, { -1, ... }, _21, 0, { 3, ... });
  _38 = reciptmp_26 * loop_len_31;
  _53 = (void *) ivtmp.22_17;
  _54 = &MEM <vector([4,4]) int> [(int *)_53];
  .MASK_LEN_STORE (_54, 32B, { -1, ... }, _38, 0, { 3, ... });

So, If I am understanding correctly, it seems that Richard has change
vectorizer that all auto-vectorization are represented as SLP instance ?

So the !slp is not the correct condition in this case.

It seems to change it into :

LOOP_VINFO_SLP_INSTANCES.size() == 1?

Reply via email to