https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120647

            Bug ID: 120647
           Summary: [X86] Sub optimal code generated for counting the
                    number matches between two array elements
           Product: gcc
           Version: 16.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: vekumar at gcc dot gnu.org
  Target Milestone: ---

For the below test case 
unsigned vector_comparison(const char * rhs, const char * lhs)
{
        unsigned n = 0;
        for (unsigned i = 0; i < 48; ++i)
            if (lhs[i] == rhs[i])
                ++n;
        return n;
}

GCC generated suboptimal code. 
ref: https://godbolt.org/z/s1sxxq77s

When "pocnt" is available for ex. -O3 -march=znver5 we should be generating
simpler code.


vmovdqu (%rdi), %ymm0      ; Load first 32 bytes
vpcmpeqb (%rsi), %ymm0, %k0 ; Direct compare 32 bytes
vmovdqu 32(%rdi), %xmm0    ; Load next 16 bytes
vpcmpeqb 32(%rsi), %xmm0, %k1 ; Compare next 16 bytes
kmovd %k0, %eax            ; Convert mask to register
popcntl %eax, %ecx         ; Count matches (32 bytes)
kmovw %k1, %eax            ; Convert second mask
popcntl %eax, %eax         ; Count matches (16 bytes)
addl %ecx, %eax            ; Sum results (total 48 bytes)

Reply via email to