https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87976

            Bug ID: 87976
           Summary: [i386] Sub-optimal code generation for
                    _mm256_set1_epi64()
           Product: gcc
           Version: 9.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: thiago at kde dot org
  Target Milestone: ---

In the following code, Clang and ICC emit a very optimal function that consists
of three instructions (including the tail call). MSVC emits a pretty good
equivalent with a bit more function overhead, but no memory access

GCC emits a completely unnecessary memory access.

Code:
====
#include <immintrin.h>
#include <stdint.h>

#ifndef _MSC_VER
#define __vectorcall
#endif
void __vectorcall f(__m256i value256);

void g(uint64_t value)
{
    f( _mm256_set1_epi64x(value));
}
====

Clang and ICC (optimal) output:
g:
        vmovd     %rdi, %xmm0
        vpbroadcastq %xmm0, %ymm0
        jmp       f

GCC:
g:
        pushq   %r13
        leaq    16(%rsp), %r13
        andq    $-32, %rsp
        pushq   -8(%r13)
        pushq   %rbp
        movq    %rsp, %rbp
        pushq   %r13
        movq    %rdi, -24(%rbp)
        vpbroadcastq    -24(%rbp), %ymm0
        popq    %r13
        popq    %rbp
        leaq    -16(%r13), %rsp
        popq    %r13
        jmp     f

Godbolt link for all compilers: https://gcc.godbolt.org/z/-gNvec

Reply via email to