https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112398

--- Comment #3 from Siarhei Volkau <lis8215 at gmail dot com> ---
Well, let's rewrite it in that way:

void neg8 (uint8_t *restrict dst, const uint8_t *restrict src)
{
    uint8_t work = ~*src; // or *src ^ 0xff;
    dst[0] = (work >> 4) | (work << 4);
}

Wherever upper bits have to be in zero state it is cheaper to use xor,
otherwise we're relying on techniques for eliminating redundant zero_extend and
at least on MIPS (prior to R2) and RISC-V GCC emits the zero_extend
instruction.

MIPS, neg8:
neg8:
        lbu     $2,0($5)
        nop
        nor     $2,$0,$2
        andi    $3,$2,0x00ff
        srl     $3,$3,4
        sll     $2,$2,4
        or      $2,$2,$3
        jr      $31
        sb      $2,0($4)

RISC-V, neg8:
        lbu     a5,0(a1)
        not     a5,a5
        andi    a4,a5,0xff
        srli    a4,a4,4
        slli    a5,a5,4
        or      a4,a4,a5
        sb      a4,0(a0)
        ret

Some other RISCs also emit zero_extend but I'm not sure about having cheaper
xor alternative on them (S390, SH, Xtensa).

Reply via email to