https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91546

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Instruction count is not everything.

If instructions can be executed/issued together (I don't know x86 processors
that well), then GCC produces is better.

E.g.

clock 0
        vmovd   xmm2, edx
        vmovd   xmm3, edi
clock 1
        vpinsrd xmm1, xmm2, ecx, 1
        vpinsrd xmm0, xmm3, esi, 1

clock 2
        vpunpcklqdq     xmm0, xmm0, xmm1
clock 3

While clang/LLVM:

clock 0
        vmovd   xmm0, edi
clock 1
        vpinsrd xmm0, xmm0, esi, 1
clock 2
        vpinsrd xmm0, xmm0, edx, 2
clock 3
        vpinsrd xmm0, xmm0, ecx, 3
clock 4

Reply via email to