https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91824

            Bug ID: 91824
           Summary: unnecessary sign-extension after _mm_movemask_epi8 or
                    __builtin_popcount
           Product: gcc
           Version: 9.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
          Assignee: unassigned at gcc dot gnu.org
          Reporter: vanyacpp at gmail dot com
  Target Milestone: ---

gcc -O2 -mpopcnt leaves unnecessary cdqe:

#include <cstdint>
#include <emmintrin.h>

void f(uint64_t& val, __m128i mask)
{
    val += __builtin_popcount(_mm_movemask_epi8(mask));
}

void g(uint64_t& val, __m128i mask)
{
    val += __builtin_popcountll(_mm_movemask_epi8(mask));
}

f:
  pmovmskb eax, xmm0
  popcnt eax, eax
  cdqe
  add QWORD PTR [rdi], rax
  ret
g:
  pmovmskb eax, xmm0
  cdqe
  popcnt rax, rax
  add QWORD PTR [rdi], rax
  ret

Both cdqe are unnecessary, because the results of both pmovmskb and
__builtin_popcount can not be negative.

Only lower 16 bits of pmovmskb can be non-zero. And the image of popcnt is
either [0..32] or [0..64] depending on the argument.

Reply via email to