[Bug tree-optimization/115819] RISC-V: Failed to hoist vrsub.vx to the header of the loop

pinskia at gcc dot gnu.org via Gcc-bugs Sun, 07 Jul 2024 16:23:24 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115819


Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2024-07-07
             Blocks|                            |53947
           Severity|normal                      |enhancement
     Ever confirmed|0                           |1
             Target|Riscv                       |Riscv aarch64
             Status|UNCONFIRMED                 |NEW
          Component|target                      |tree-optimization

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
 vect__4.9_41 = VEC_PERM_EXPR <vect__4.8_40, vect__4.8_40, { POLY_INT_CST [3,
4], POLY_INT_CST [2, 4], POLY_INT_CST [1, 4], ... }>;
  vect__5.10_43 = vect__4.9_41 + { 1, ... };
  vect__5.13_50 = VEC_PERM_EXPR <vect__5.10_43, vect__5.10_43, { POLY_INT_CST
[3, 4], POLY_INT_CST [2, 4], POLY_INT_CST [1, 4], ... }>;


Well the vrgather here should not be needed in the first place.
We have:
        vrgather.vv     v1,v3,v5
        vadd.vi v1,v1,1
        vrgather.vv     v3,v1,v5

For aarch64 we get:

        rev     z31.s, z31.s      // 32       [c=4 l=4]  aarch64_sve_revvnx4si
        add     z31.s, z31.s, #1  // 33       [c=12 l=4]  addvnx4si3/0
        rev     z31.s, z31.s      // 34       [c=4 l=4]  aarch64_sve_revvnx4si

But the 2 revs are not neded, just like the 2 vrgather.vv are not needed.

For aarch64 (without SVE) we get just as bad (maybe worse) code generation:
        tbl     v29.16b, {v30.16b - v31.16b}, v27.16b     // 36     [c=4 l=4] 
aarch64_qtbl2v16qi
        add     v29.4s, v29.4s, v26.4s    // 38   [c=8 l=4]  addv4si3
        mov     v28.16b, v29.16b  // 116      [c=4 l=4] 
*aarch64_simd_movv16qi/3
        tbl     v25.16b, {v28.16b - v29.16b}, v27.16b     // 43     [c=4 l=4] 
aarch64_qtbl2v16qi


The TBL are not needed.

I thought I saw this before too ...


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations

[Bug tree-optimization/115819] RISC-V: Failed to hoist vrsub.vx to the header of the loop

Reply via email to