https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106453
Bug ID: 106453 Summary: Redundant zero extension after crc32q Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: amonakov at gcc dot gnu.org Target Milestone: --- On 64-bit x86, straightforward use of SSE 4.2 crc instruction looks like #include <immintrin.h> #include <stdint.h> uint32_t f(uint32_t c, uint64_t *p, size_t n) { for (size_t i = 0; i < n; i++) c = _mm_crc32_u64(c, p[i]); return c; } On the ISA level, the crc32q instruction takes 64-bit operands, and resulting assembly is (gcc -O2 -msse4.2): f: mov eax, edi test rdx, rdx je .L1 lea rdx, [rsi+rdx*8] .L3: mov eax, eax add rsi, 8 crc32 rax, QWORD PTR [rsi-8] cmp rdx, rsi jne .L3 .L1: ret Note zero-extension of 'eax' (which is usually not move-eliminated since destination is the same as source). The crc32q instruction zero-extends rax from the 32-bit result (it also ignores high 32 bits when reading the destination operand), so I think it should be possible to model zero extension in the .md pattern, allowing to eliminate the explicit extension. A source-level workaround is using a 64-bit variable in the loop, so the extension happens just once before the loop.