https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118251

            Bug ID: 118251
           Summary: i386: Use carry bits of shifts
           Product: gcc
           Version: 15.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: andi-gcc at firstfloor dot org
  Target Milestone: ---

Inspired by https://github.com/komrad36/CRC

Even though gcc has CRC pattern matching now which should be implemented on x86
too, it would be still good if it handled the manual coded versions well too.
What the compiler currently generates is not necessarily slower on a modern
super scalar CPU that is not frontend bound, but it is larger

#include <stdint.h>
#include <stdbool.h>

uint8_t right_shift(uint8_t v)
{
        bool rightmost_bit_set = v & 1;
        v >>= 1;
        if (rightmost_bit_set)
                v ^= 0x11EDC6F41;
        return v;
}

uint8_t left_shift(uint8_t v)
{
        bool leftmost_bit_set = v & 0x80;
        v <<= 1;
        if (leftmost_bit_set)
                v ^= 0x11EDC6F41;
        return v;
}

generates

      movl    %edi, %eax
        shrb    %dil
        andl    $1, %eax
        negl    %eax
        andl    $65, %eax
        xorl    %edi, %eax

...
        leal    (%rdi,%rdi), %eax
        movl    %eax, %edx
        xorl    $65, %edx
        testb   %dil, %dil
        cmovs   %edx, %eax

But on x86 the carry bit contains the shift out bit, so it could be optimized
to
directly use that without the extra test

   mov  %edi,%eax
   xor  $constant,%edi
   shrb $1,%eax
   cmovc %edi,%eax

and similarly for the left shift case. With APX the code could be even more
efficient.

I presume two peepholes could do it.

Reply via email to