https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70059

            Bug ID: 70059
           Summary: Invalid codegen on AVX-512 when using
                    _mm512_inserti64x4(x, y, 0)
           Product: gcc
           Version: 5.3.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: povilas at radix dot lt
  Target Milestone: ---

Use of _mm512_inserti64x4 in certain scenario outlined below fails on GCC
5.3.1.

$ g++-5 --version
g++-5 (Ubuntu 5.3.1-9ubuntu3) 5.3.1 20160222

$ cat main.cc
#include "immintrin.h"

__m512i run(__m256i a, __m256i b)
{
    __m512i r = _mm512_undefined_si512();
    r = _mm512_inserti64x4(r, a, 0);
    r = _mm512_inserti64x4(r, b, 1);
    return r;
}

$ g++-5 -O1 -c main.cc --save-temps -o main.o -mavx512f

The following assembly is generated:
_Z3runDv4_xS_:
        vinserti64x4    $0x1, %ymm0, %zmm1, %zmm0
        ret

As you can see, the first argument is inserted into the upper half of zmm0
register and the second into the lower. The intention is the other way round.

The problem can be worked around:

$ cat main.cc
#include "immintrin.h"

__m512i run(__m256i a, __m256i b)
{
    __m512i r;
    r = _mm512_castsi256_si512(a);
    r = _mm512_inserti64x4(r, b, 1);
    return r;
}

$ g++-5 -O1 -c main.cc --save-temps -o main.o -mavx512f

The resulting assembly is correct:
_Z3runDv4_xS_:
        vinserti64x4    $0x1, %ymm1, %zmm0, %zmm0
        ret


Regards,
Povilas

Reply via email to