https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82199
Bug ID: 82199 Summary: __builtin_shuffle sometimes should produce ins rather than TBL Product: gcc Version: 8.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: pinskia at gcc dot gnu.org Target Milestone: --- Target: aarch64 Take: #define vector __attribute__((vector_size(16) )) vector float f(vector float a, vector float b) { return __builtin_shuffle (a, b, (vector int){0, 1, 4,5}); } ---- CUT --- Currently this produces TBL but really we should be able to produce (for little-endian): f: ins v0.2d[1], v1.2d[0] ret --- CUT --- X86_64 is able to produce: f: movlhps %xmm1, %xmm0 ret Which is what I had expected. There is most likely many more __builtin_shuffle which can be optimized for aarch64 without using TBL which we are not currently doing.