https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #15 from Jim Wilson <wilson at gcc dot gnu.org> ---
I tried testing your first patch.  I just built unpatched/patch
rv32-elf/rv64-linux toolchains and used nm/objdump to look at libraries like
libc, libstdc++, libm.

I managed to find a testcase from glibc that gives worse code with the patch.
struct
{
  unsigned int a : 1;
  unsigned int b : 1;
  unsigned int c : 1;
  unsigned int d : 1;
  unsigned int pad1 : 28;
} s;

void
sub (void)
{
  s.a = 1;
  s.c = 1;
}

Without the patch we get
sub:
        lui     a5,%hi(s)
        addi    a5,a5,%lo(s)
        lbu     a4,1(a5)
        ori     a4,a4,5
        sb      a4,1(a5)
        ret
and with the patch we get
sub:
        lui     a4,%hi(s)
        lbu     a5,%lo(s)(a4)
        andi    a5,a5,-6
        ori     a5,a5,5
        sb      a5,%lo(s)(a4)
        ret
Note the extra and instruction.

This seems to be a combine problem.  With the patched compiler, I see in the
-fdump-rtl-combine-all dump file
Trying 9 -> 11:
    9: r79:DI=r78:DI&0xfffffffffffffffa
      REG_DEAD r78:DI
   11: r81:DI=r79:DI|0x5
      REG_DEAD r79:DI
Failed to match this instruction:
(set (reg:DI 81)
    (ior:DI (and:DI (reg:DI 78 [ MEM <unsigned char> [(struct  *)&sD.1491] ])
            (const_int 250 [0xfa]))
        (const_int 5 [0x5])))
Combine knows that reg 78 only has 8 nonzero bits, so it simplified the AND -6
to AND 250.  If combine had left that constant alone, or if maybe combine
propagated that info about nonzero bits through to r81, then it should have
been able to optimize out the AND operation.  This does work when the load does
not zero extend by default.

The ARM port shows the exact same problem.  I see
sub:
        @ args = 0, pretend = 0, frame = 0
        @ frame_needed = 0, uses_anonymous_args = 0
        @ link register save eliminated.
        movw    r2, #:lower16:.LANCHOR0
        movt    r2, #:upper16:.LANCHOR0
        ldrb    r3, [r2]        @ zero_extendqisi2
        bic     r3, r3, #5
        orr     r3, r3, #5
        strb    r3, [r2]
        bx      lr
and the bic (bit clear) is obviously unnecessary.

This probably should be submitted as a separate bug if we don't want to fix it
here.

Reply via email to