https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70059
Bug ID: 70059 Summary: Invalid codegen on AVX-512 when using _mm512_inserti64x4(x, y, 0) Product: gcc Version: 5.3.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: povilas at radix dot lt Target Milestone: --- Use of _mm512_inserti64x4 in certain scenario outlined below fails on GCC 5.3.1. $ g++-5 --version g++-5 (Ubuntu 5.3.1-9ubuntu3) 5.3.1 20160222 $ cat main.cc #include "immintrin.h" __m512i run(__m256i a, __m256i b) { __m512i r = _mm512_undefined_si512(); r = _mm512_inserti64x4(r, a, 0); r = _mm512_inserti64x4(r, b, 1); return r; } $ g++-5 -O1 -c main.cc --save-temps -o main.o -mavx512f The following assembly is generated: _Z3runDv4_xS_: vinserti64x4 $0x1, %ymm0, %zmm1, %zmm0 ret As you can see, the first argument is inserted into the upper half of zmm0 register and the second into the lower. The intention is the other way round. The problem can be worked around: $ cat main.cc #include "immintrin.h" __m512i run(__m256i a, __m256i b) { __m512i r; r = _mm512_castsi256_si512(a); r = _mm512_inserti64x4(r, b, 1); return r; } $ g++-5 -O1 -c main.cc --save-temps -o main.o -mavx512f The resulting assembly is correct: _Z3runDv4_xS_: vinserti64x4 $0x1, %ymm1, %zmm0, %zmm0 ret Regards, Povilas