https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91546
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> --- Instruction count is not everything. If instructions can be executed/issued together (I don't know x86 processors that well), then GCC produces is better. E.g. clock 0 vmovd xmm2, edx vmovd xmm3, edi clock 1 vpinsrd xmm1, xmm2, ecx, 1 vpinsrd xmm0, xmm3, esi, 1 clock 2 vpunpcklqdq xmm0, xmm0, xmm1 clock 3 While clang/LLVM: clock 0 vmovd xmm0, edi clock 1 vpinsrd xmm0, xmm0, esi, 1 clock 2 vpinsrd xmm0, xmm0, edx, 2 clock 3 vpinsrd xmm0, xmm0, ecx, 3 clock 4