https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82199

            Bug ID: 82199
           Summary: __builtin_shuffle sometimes should produce ins rather
                    than TBL
           Product: gcc
           Version: 8.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: pinskia at gcc dot gnu.org
  Target Milestone: ---
            Target: aarch64

Take:
#define vector __attribute__((vector_size(16) ))

vector float f(vector float a, vector float b)
{
  return __builtin_shuffle  (a, b, (vector int){0, 1, 4,5});
}
---- CUT ---

Currently this produces TBL but really we should be able to produce (for
little-endian):
f:
  ins v0.2d[1], v1.2d[0]
  ret

--- CUT ---

X86_64 is able to produce:
f:
        movlhps %xmm1, %xmm0
        ret

Which is what I had expected.

There is most likely many more __builtin_shuffle which can be optimized for
aarch64 without using TBL which we are not currently doing.

Reply via email to