https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417
--- Comment #15 from Jim Wilson <wilson at gcc dot gnu.org> --- I tried testing your first patch. I just built unpatched/patch rv32-elf/rv64-linux toolchains and used nm/objdump to look at libraries like libc, libstdc++, libm. I managed to find a testcase from glibc that gives worse code with the patch. struct { unsigned int a : 1; unsigned int b : 1; unsigned int c : 1; unsigned int d : 1; unsigned int pad1 : 28; } s; void sub (void) { s.a = 1; s.c = 1; } Without the patch we get sub: lui a5,%hi(s) addi a5,a5,%lo(s) lbu a4,1(a5) ori a4,a4,5 sb a4,1(a5) ret and with the patch we get sub: lui a4,%hi(s) lbu a5,%lo(s)(a4) andi a5,a5,-6 ori a5,a5,5 sb a5,%lo(s)(a4) ret Note the extra and instruction. This seems to be a combine problem. With the patched compiler, I see in the -fdump-rtl-combine-all dump file Trying 9 -> 11: 9: r79:DI=r78:DI&0xfffffffffffffffa REG_DEAD r78:DI 11: r81:DI=r79:DI|0x5 REG_DEAD r79:DI Failed to match this instruction: (set (reg:DI 81) (ior:DI (and:DI (reg:DI 78 [ MEM <unsigned char> [(struct *)&sD.1491] ]) (const_int 250 [0xfa])) (const_int 5 [0x5]))) Combine knows that reg 78 only has 8 nonzero bits, so it simplified the AND -6 to AND 250. If combine had left that constant alone, or if maybe combine propagated that info about nonzero bits through to r81, then it should have been able to optimize out the AND operation. This does work when the load does not zero extend by default. The ARM port shows the exact same problem. I see sub: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. movw r2, #:lower16:.LANCHOR0 movt r2, #:upper16:.LANCHOR0 ldrb r3, [r2] @ zero_extendqisi2 bic r3, r3, #5 orr r3, r3, #5 strb r3, [r2] bx lr and the bic (bit clear) is obviously unnecessary. This probably should be submitted as a separate bug if we don't want to fix it here.