https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68655
Bug ID: 68655 Summary: SSE2 cannot vec_perm of low and high part Product: gcc Version: 6.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: rguenth at gcc dot gnu.org CC: uros at gcc dot gnu.org Target Milestone: --- Target: x86_64-*-*, i?86-*-* typedef unsigned short v8hi __attribute__((vector_size(16))); v8hi foo (v8hi a, v8hi b) { return __builtin_shuffle (a, b, (v8hi) { 0, 1, 2, 3, 8, 9, 10, 11 }); } should be able to use movlhps %xmm0, %xmm1 ret but ends up being lowered by vector lowering because the target says it cannot can_vec_perm_p (V8HI, false, { 0, 1, 2, 3, 8, 9, 10, 11 }) There are also two-instruction permutes possible with movhl/lhps like { 0, 1, 2, 3, 12, 13, 14, 15 } can use movhlps %xmm1, %xmm1 movlhps %xmm0, %xmm1 ah, that uses shufpd. Not sure why the above doesn't use shufpd if that is available in SSE2.