https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107455

            Bug ID: 107455
           Summary: Suboptimal codegen for some branch-on-zero cases
           Product: gcc
           Version: 13.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: sinan.lin at linux dot alibaba.com
  Target Milestone: ---

Created attachment 53788
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53788&action=edit
code sequence from https://github.com/embench/embench-iot

gcc -S -Os -march=rv32gc -mabi=ilp32 test.c


```
sglib_dllist_len:
        beq     a0,zero,.L6
        mv      a4,a0
        li      a5,0
.L3:
        lw      a4,8(a4)
        addi    a5,a5,1
        bne     a4,zero,.L3
        lw      a4,4(a0)
        li      a0,0
.L4:
        bne     a4,zero,.L5
        add     a0,a0,a5
        ret
.L5:
        lw      a4,4(a4)
        addi    a0,a0,1
        j       .L4
.L6:
        li      a0,0
        ret
```

li a0,0 is unnecessary, and this extra instruction might lead to a worse cfg
and bad code size. I spotted several size suboptimal cases related to this
pattern.


result on clang:
```
sglib_dllist_len:
        beqz    a0, .LBB0_4
        mv      a1, a0
        li      a0, -1
        mv      a2, a1
.LBB0_2:
        lw      a2, 8(a2)
        addi    a0, a0, 1
        bnez    a2, .LBB0_2
.LBB0_3:
        lw      a1, 4(a1)
        addi    a0, a0, 1
        bnez    a1, .LBB0_3
.LBB0_4:
        ret
```

Similar problem on arm64: https://godbolt.org/z/Yo6jsKMGz

Reply via email to