https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113079

            Bug ID: 113079
           Summary: [x86] Fails to generate dot_prod instructions for
                    64-bit vector.
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: liuhongt at gcc dot gnu.org
  Target Milestone: ---

int
foo (int n, unsigned char* p, char* pi)
{
    int sum = 0;
    for (int i = 0; i != 8; i++)
    {
        sum += p[i] * pi[i];
    }
    return sum;
}

We can use 128-bit dot_prod instruction + clean upper 64 bits. Currently, gcc
generates a long instruction sequence.

        vmovq   xmm0, QWORD PTR [rsi]
        vmovq   xmm2, QWORD PTR [rdx]
        vpmovzxbw       xmm1, xmm0
        vpsrlq  xmm0, xmm0, 32
        vpmovsxbw       xmm3, xmm2
        vpmullw xmm1, xmm1, xmm3
        vpsrlq  xmm2, xmm2, 32
        vpmovzxbw       xmm0, xmm0
        vpmovsxbw       xmm2, xmm2
        vpmullw xmm0, xmm0, xmm2
        vpmovsxwd       xmm2, xmm1
        vpsrlq  xmm1, xmm1, 32
        vpmovsxwd       xmm1, xmm1
        vpaddd  xmm2, xmm2, xmm1
        vpmovsxwd       xmm1, xmm0
        vpsrlq  xmm0, xmm0, 32
        vpmovsxwd       xmm0, xmm0
        vpaddd  xmm1, xmm1, xmm2
        vpxor   xmm2, xmm2, xmm2
        vpshufb xmm2, xmm2, XMMWORD PTR .LC1[rip]
        vpaddd  xmm0, xmm0, xmm1
        vpshufb xmm1, xmm0, XMMWORD PTR .LC0[rip]
        vpor    xmm1, xmm1, xmm2
        vpaddd  xmm0, xmm0, xmm1
        vmovd   eax, xmm0

Reply via email to