https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112384

            Bug ID: 112384
           Summary: a non-constant vec dup should be improved
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: pinskia at gcc dot gnu.org
  Target Milestone: ---
            Target: aarch64

Take:
```
#define vector __attribute__((vector_size(16)))

vector int f1(vector int t, int i)
{
  i&=3;
  vector int tt = {i, i, i, i};
  vector int r = __builtin_shuffle(t, tt);
  return r;
}

vector int f2(vector int t, int i)
{
  i&=3;
  i = t[i];
  vector int tt = {i, i, i, i};
  return tt;
}
```

Both of these give not so good code generation.

f1 has:
```
        dup     v31.4s, w0
...
        shl     v31.4s, v31.4s, 2
        tbl     v31.16b, {v31.16b}, v28.16b
        add     v31.16b, v31.16b, v29.16b
```
But we could do better by combing the dup and the shl into.

For RTL level:
Trying 11 -> 12:
   11: r98:V4SI=vec_duplicate(r92:SI)
      REG_DEAD r92:SI
   12: r101:V4SI=r98:V4SI<<const_vector
      REG_DEAD r98:V4SI
Failed to match this instruction:
(set (reg:V4SI 101)
    (ashift:V4SI (vec_duplicate:V4SI (reg/v:SI 92 [ iD.4390 ]))
        (const_vector:V4SI [
                (const_int 2 [0x2]) repeated x4
            ])))

Changing that into:
(set (reg:V4SI 101)
 (vec_duplicate:V4SI (ashift:SI (reg/v:SI 92 [ iD.4390 ]) (const_int 2 [0x2])))

Will improve things.

The first tlb seems can be removable too.

Reply via email to