[Bug tree-optimization/116573] [15 Regression] Recent SLP work appears to generate significantly worse code on RISC-V

rguenth at gcc dot gnu.org via Gcc-bugs Fri, 06 Sep 2024 01:48:27 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116573


Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2024-09-06
             Status|UNCONFIRMED                 |NEW

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
So when investigating "future" fallout I've seen similar differences for
gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-1.c for example with the
GIMPLE difference being that before we used .SELECT_VL but afterwards
there's a MIN_EXPR to compute the length.

I've tried to read up the RVV specification but there doesn't seem to be
a good overall operand documentation for vsetvli :(  I tried to understand

.L6:
        mv      a4,a3
        bleu    a3,a5,.L5  // this is likely the MIN?
        csrr    a4,vlenb   // save VLEN to a4(?)
.L5:
        vsetvli zero,a4,e8,m1,ta,ma // set VLEN to a4 and store new VLEN to
'zero'(?)
        vle8.v  v1,0(a1)
        vle8.v  v2,0(a2)
        vsetvli a6,zero,e8,m1,ta,ma // set VLEN to zero?!
        vsaddu.vv       v1,v1,v2
        vsetvli zero,a4,e8,m1,ta,ma // set VLEN to a4 again
        vse8.v  v1,0(a0)
        add     a1,a1,a5
        add     a2,a2,a5
        add     a0,a0,a5
        mv      a4,a3
        sub     a3,a3,a5
        bgtu    a4,a5,.L6

I think the GIMPLE looks straight-forward but the code the backend generates
looks bad, possibly the vsetvli pass is lacking here.

Now, the vectorizer doesn't use .SELECT_VL because

      if (direct_internal_fn_supported_p (IFN_SELECT_VL, iv_type,
                                          OPTIMIZE_FOR_SPEED)
          && LOOP_VINFO_LENS (loop_vinfo).length () == 1
          && LOOP_VINFO_LENS (loop_vinfo)[0].factor == 1 && !slp
          && (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
              || !LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant ()))
        LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo) = true;

see the !slp - the comment doesn't explain why, but for example
vectorizable_induction simply asserts !slp_node when
LOOP_VINFO_USING_SELECT_VL_P.  I would have expected it to be handled
more like LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P and be disabled when
we cannot handle code generation for a feature.

Simply removing the && !slp fixes the particular testcase above for me.

I'll leave this bug and the fallout to Ju-Zhe Zhong who added
LOOP_VINFO_USING_SELECT_VL_P support.

Anyway, confirmed.

[Bug tree-optimization/116573] [15 Regression] Recent SLP work appears to generate significantly worse code on RISC-V

Reply via email to