https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110202

            Bug ID: 110202
           Summary: _mm512_ternarylogic_epi64 generates unnecessary
                    operations
           Product: gcc
           Version: 13.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: fabio at cannizzo dot net
  Target Milestone: ---

Consider the following two alternative implementations of a bitwise complement
of an avx512 register.

#include <immintrin.h>

__m512i negate1(const __m512i *a)
{
    __m512i res;
    res = c(res, res, *a, 0x55);
    return res;
}

__m512i negate2(const __m512i *a)
{
    __m512i res;
    res = _mm512_xor_si512(*a, _mm512_set1_epi32(-1));
    return res;
}

which compiled with "-O3 -mavx512f" generates the asm listings (see godbolt:
https://godbolt.org/z/jvrxEjW65)

negate1(long long __vector(8) const*):
        vpxor   xmm0, xmm0, xmm0
        vpternlogq      zmm0, zmm0, ZMMWORD PTR [rdi], 85
        ret
negate2(long long __vector(8) const*):
        vpternlogd      zmm0, zmm0, ZMMWORD PTR [rdi], 0x55
        ret

negate1 introduces an unnecessary xor operation. Probably this is because it
does not recognize that, when vpternlogd is used with code 0x55, it only uses
the third zmm argument.

Reply via email to