https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118251
Bug ID: 118251 Summary: i386: Use carry bits of shifts Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org Target Milestone: --- Inspired by https://github.com/komrad36/CRC Even though gcc has CRC pattern matching now which should be implemented on x86 too, it would be still good if it handled the manual coded versions well too. What the compiler currently generates is not necessarily slower on a modern super scalar CPU that is not frontend bound, but it is larger #include <stdint.h> #include <stdbool.h> uint8_t right_shift(uint8_t v) { bool rightmost_bit_set = v & 1; v >>= 1; if (rightmost_bit_set) v ^= 0x11EDC6F41; return v; } uint8_t left_shift(uint8_t v) { bool leftmost_bit_set = v & 0x80; v <<= 1; if (leftmost_bit_set) v ^= 0x11EDC6F41; return v; } generates movl %edi, %eax shrb %dil andl $1, %eax negl %eax andl $65, %eax xorl %edi, %eax ... leal (%rdi,%rdi), %eax movl %eax, %edx xorl $65, %edx testb %dil, %dil cmovs %edx, %eax But on x86 the carry bit contains the shift out bit, so it could be optimized to directly use that without the extra test mov %edi,%eax xor $constant,%edi shrb $1,%eax cmovc %edi,%eax and similarly for the left shift case. With APX the code could be even more efficient. I presume two peepholes could do it.