https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113079
Bug ID: 113079 Summary: [x86] Fails to generate dot_prod instructions for 64-bit vector. Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: liuhongt at gcc dot gnu.org Target Milestone: --- int foo (int n, unsigned char* p, char* pi) { int sum = 0; for (int i = 0; i != 8; i++) { sum += p[i] * pi[i]; } return sum; } We can use 128-bit dot_prod instruction + clean upper 64 bits. Currently, gcc generates a long instruction sequence. vmovq xmm0, QWORD PTR [rsi] vmovq xmm2, QWORD PTR [rdx] vpmovzxbw xmm1, xmm0 vpsrlq xmm0, xmm0, 32 vpmovsxbw xmm3, xmm2 vpmullw xmm1, xmm1, xmm3 vpsrlq xmm2, xmm2, 32 vpmovzxbw xmm0, xmm0 vpmovsxbw xmm2, xmm2 vpmullw xmm0, xmm0, xmm2 vpmovsxwd xmm2, xmm1 vpsrlq xmm1, xmm1, 32 vpmovsxwd xmm1, xmm1 vpaddd xmm2, xmm2, xmm1 vpmovsxwd xmm1, xmm0 vpsrlq xmm0, xmm0, 32 vpmovsxwd xmm0, xmm0 vpaddd xmm1, xmm1, xmm2 vpxor xmm2, xmm2, xmm2 vpshufb xmm2, xmm2, XMMWORD PTR .LC1[rip] vpaddd xmm0, xmm0, xmm1 vpshufb xmm1, xmm0, XMMWORD PTR .LC0[rip] vpor xmm1, xmm1, xmm2 vpaddd xmm0, xmm0, xmm1 vmovd eax, xmm0