http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59578
Bug ID: 59578 Summary: Overuse of v prefix for SSE instructions Product: gcc Version: 4.8.2 Status: UNCONFIRMED Severity: minor Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: msharov at users dot sourceforge.net typedef float v16sf __attribute__((vector_size(16))); v16sf f (v16sf x) { return (__builtin_ia32_shufps (x, x, 0xff)); } Compiled on a Haswell 4770 with -march=native -O emits: vshufps $255, %xmm0, %xmm0, %xmm0 Even though all registers are the same and shufps $255, %xmm0, %xmm0 would have worked just as well without the extra byte for the v prefix. This happens with other __builtin instructions as well. For example: typedef long long v16so __attribute__((vector_size(16))); v16so k (v16so x) { return (__builtin_ia32_aeskeygenassist128 (x, 1)); } Emits vaeskeygenassist even though no memory accesses are present.