https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112384
Bug ID: 112384 Summary: a non-constant vec dup should be improved Product: gcc Version: 14.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: pinskia at gcc dot gnu.org Target Milestone: --- Target: aarch64 Take: ``` #define vector __attribute__((vector_size(16))) vector int f1(vector int t, int i) { i&=3; vector int tt = {i, i, i, i}; vector int r = __builtin_shuffle(t, tt); return r; } vector int f2(vector int t, int i) { i&=3; i = t[i]; vector int tt = {i, i, i, i}; return tt; } ``` Both of these give not so good code generation. f1 has: ``` dup v31.4s, w0 ... shl v31.4s, v31.4s, 2 tbl v31.16b, {v31.16b}, v28.16b add v31.16b, v31.16b, v29.16b ``` But we could do better by combing the dup and the shl into. For RTL level: Trying 11 -> 12: 11: r98:V4SI=vec_duplicate(r92:SI) REG_DEAD r92:SI 12: r101:V4SI=r98:V4SI<<const_vector REG_DEAD r98:V4SI Failed to match this instruction: (set (reg:V4SI 101) (ashift:V4SI (vec_duplicate:V4SI (reg/v:SI 92 [ iD.4390 ])) (const_vector:V4SI [ (const_int 2 [0x2]) repeated x4 ]))) Changing that into: (set (reg:V4SI 101) (vec_duplicate:V4SI (ashift:SI (reg/v:SI 92 [ iD.4390 ]) (const_int 2 [0x2]))) Will improve things. The first tlb seems can be removable too.