https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87976
Bug ID: 87976 Summary: [i386] Sub-optimal code generation for _mm256_set1_epi64() Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: thiago at kde dot org Target Milestone: --- In the following code, Clang and ICC emit a very optimal function that consists of three instructions (including the tail call). MSVC emits a pretty good equivalent with a bit more function overhead, but no memory access GCC emits a completely unnecessary memory access. Code: ==== #include <immintrin.h> #include <stdint.h> #ifndef _MSC_VER #define __vectorcall #endif void __vectorcall f(__m256i value256); void g(uint64_t value) { f( _mm256_set1_epi64x(value)); } ==== Clang and ICC (optimal) output: g: vmovd %rdi, %xmm0 vpbroadcastq %xmm0, %ymm0 jmp f GCC: g: pushq %r13 leaq 16(%rsp), %r13 andq $-32, %rsp pushq -8(%r13) pushq %rbp movq %rsp, %rbp pushq %r13 movq %rdi, -24(%rbp) vpbroadcastq -24(%rbp), %ymm0 popq %r13 popq %rbp leaq -16(%r13), %rsp popq %r13 jmp f Godbolt link for all compilers: https://gcc.godbolt.org/z/-gNvec