https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112398
--- Comment #3 from Siarhei Volkau <lis8215 at gmail dot com> --- Well, let's rewrite it in that way: void neg8 (uint8_t *restrict dst, const uint8_t *restrict src) { uint8_t work = ~*src; // or *src ^ 0xff; dst[0] = (work >> 4) | (work << 4); } Wherever upper bits have to be in zero state it is cheaper to use xor, otherwise we're relying on techniques for eliminating redundant zero_extend and at least on MIPS (prior to R2) and RISC-V GCC emits the zero_extend instruction. MIPS, neg8: neg8: lbu $2,0($5) nop nor $2,$0,$2 andi $3,$2,0x00ff srl $3,$3,4 sll $2,$2,4 or $2,$2,$3 jr $31 sb $2,0($4) RISC-V, neg8: lbu a5,0(a1) not a5,a5 andi a4,a5,0xff srli a4,a4,4 slli a5,a5,4 or a4,a4,a5 sb a4,0(a0) ret Some other RISCs also emit zero_extend but I'm not sure about having cheaper xor alternative on them (S390, SH, Xtensa).