https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115819
Andrew Pinski <pinskia at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Last reconfirmed| |2024-07-07 Blocks| |53947 Severity|normal |enhancement Ever confirmed|0 |1 Target|Riscv |Riscv aarch64 Status|UNCONFIRMED |NEW Component|target |tree-optimization --- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> --- vect__4.9_41 = VEC_PERM_EXPR <vect__4.8_40, vect__4.8_40, { POLY_INT_CST [3, 4], POLY_INT_CST [2, 4], POLY_INT_CST [1, 4], ... }>; vect__5.10_43 = vect__4.9_41 + { 1, ... }; vect__5.13_50 = VEC_PERM_EXPR <vect__5.10_43, vect__5.10_43, { POLY_INT_CST [3, 4], POLY_INT_CST [2, 4], POLY_INT_CST [1, 4], ... }>; Well the vrgather here should not be needed in the first place. We have: vrgather.vv v1,v3,v5 vadd.vi v1,v1,1 vrgather.vv v3,v1,v5 For aarch64 we get: rev z31.s, z31.s // 32 [c=4 l=4] aarch64_sve_revvnx4si add z31.s, z31.s, #1 // 33 [c=12 l=4] addvnx4si3/0 rev z31.s, z31.s // 34 [c=4 l=4] aarch64_sve_revvnx4si But the 2 revs are not neded, just like the 2 vrgather.vv are not needed. For aarch64 (without SVE) we get just as bad (maybe worse) code generation: tbl v29.16b, {v30.16b - v31.16b}, v27.16b // 36 [c=4 l=4] aarch64_qtbl2v16qi add v29.4s, v29.4s, v26.4s // 38 [c=8 l=4] addv4si3 mov v28.16b, v29.16b // 116 [c=4 l=4] *aarch64_simd_movv16qi/3 tbl v25.16b, {v28.16b - v29.16b}, v27.16b // 43 [c=4 l=4] aarch64_qtbl2v16qi The TBL are not needed. I thought I saw this before too ... Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 [Bug 53947] [meta-bug] vectorizer missed-optimizations