https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116573
--- Comment #4 from JuzheZhong <juzhe.zhong at rivai dot ai> --- (In reply to Richard Biener from comment #3) > So when investigating "future" fallout I've seen similar differences for > gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-1.c for example with the > GIMPLE difference being that before we used .SELECT_VL but afterwards > there's a MIN_EXPR to compute the length. > > I've tried to read up the RVV specification but there doesn't seem to be > a good overall operand documentation for vsetvli :( I tried to understand > > .L6: > mv a4,a3 > bleu a3,a5,.L5 // this is likely the MIN? > csrr a4,vlenb // save VLEN to a4(?) > .L5: > vsetvli zero,a4,e8,m1,ta,ma // set VLEN to a4 and store new VLEN to > 'zero'(?) > vle8.v v1,0(a1) > vle8.v v2,0(a2) > vsetvli a6,zero,e8,m1,ta,ma // set VLEN to zero?! > vsaddu.vv v1,v1,v2 > vsetvli zero,a4,e8,m1,ta,ma // set VLEN to a4 again > vse8.v v1,0(a0) > add a1,a1,a5 > add a2,a2,a5 > add a0,a0,a5 > mv a4,a3 > sub a3,a3,a5 > bgtu a4,a5,.L6 > > I think the GIMPLE looks straight-forward but the code the backend generates > looks bad, possibly the vsetvli pass is lacking here. > > Now, the vectorizer doesn't use .SELECT_VL because > > if (direct_internal_fn_supported_p (IFN_SELECT_VL, iv_type, > OPTIMIZE_FOR_SPEED) > && LOOP_VINFO_LENS (loop_vinfo).length () == 1 > && LOOP_VINFO_LENS (loop_vinfo)[0].factor == 1 && !slp > && (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) > || !LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant ())) > LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo) = true; > > see the !slp - the comment doesn't explain why, but for example > vectorizable_induction simply asserts !slp_node when > LOOP_VINFO_USING_SELECT_VL_P. I would have expected it to be handled > more like LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P and be disabled when > we cannot handle code generation for a feature. > > Simply removing the && !slp fixes the particular testcase above for me. > > I'll leave this bug and the fallout to Ju-Zhe Zhong who added > LOOP_VINFO_USING_SELECT_VL_P support. > > Anyway, confirmed. Hi, Richard. Thanks for the guide. For RVV, we apply SELECT_VL to calculate the number of elements to be processed when it's single rgroup (control) vectorization since multiple rgroup vectorization will become more complicate and codengen seems to be worse while using SELECT_VL to calculate the number of elements so we prefer to using MIN in this scenario. Single rgroup: #define N 16 int src[N]; int dest[N]; void foo (int n) { for (int i = 0; i < n; i++) dest[i] = src[i] + 2; } Expect to use SELECT_VL: _27 = .SELECT_VL (ivtmp_25, POLY_INT_CST [4, 4]); ivtmp_3 = _27 * 4; vect__1.8_17 = .MASK_LEN_LOAD (vectp_src.6_15, 32B, { -1, ... }, _27, 0); vect__2.9_19 = vect__1.8_17 + { 2, ... }; .MASK_LEN_STORE (vectp_dest.10_21, 32B, { -1, ... }, _27, 0, vect__2.9_19); Assembly: .L3: vsetvli a5,a0,e32,m1,ta,ma vle32.v v1,0(a3) slli a2,a5,2 sub a0,a0,a5 add a3,a3,a2 vadd.vi v1,v1,2 vse32.v v1,0(a4) add a4,a4,a2 bne a0,zero,.L3 Wheras in this case (multiple rgroup): We expect to use MIN to calculate elements: _42 = MIN_EXPR <ivtmp.18_5, POLY_INT_CST [16, 16]>; loop_len_32 = MIN_EXPR <_42, POLY_INT_CST [8, 8]>; loop_len_31 = _42 - loop_len_32; _47 = (void *) ivtmp.17_2; _48 = &MEM <vector([8,8]) short int> [(short int *)_47]; .MASK_LEN_STORE (_48, 16B, { -1, ... }, loop_len_32, 0, { 1, 2, ... }); _49 = (void *) ivtmp.21_10; _50 = &MEM <vector([8,8]) short int> [(short int *)_49]; .MASK_LEN_STORE (_50, 16B, { -1, ... }, loop_len_31, 0, { 1, 2, ... }); _21 = reciptmp_26 * loop_len_32; _51 = (void *) ivtmp.15_36; _52 = &MEM <vector([4,4]) int> [(int *)_51]; .MASK_LEN_STORE (_52, 32B, { -1, ... }, _21, 0, { 3, ... }); _38 = reciptmp_26 * loop_len_31; _53 = (void *) ivtmp.22_17; _54 = &MEM <vector([4,4]) int> [(int *)_53]; .MASK_LEN_STORE (_54, 32B, { -1, ... }, _38, 0, { 3, ... }); So, If I am understanding correctly, it seems that Richard has change vectorizer that all auto-vectorization are represented as SLP instance ? So the !slp is not the correct condition in this case. It seems to change it into : LOOP_VINFO_SLP_INSTANCES.size() == 1?