https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107455
Bug ID: 107455 Summary: Suboptimal codegen for some branch-on-zero cases Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: sinan.lin at linux dot alibaba.com Target Milestone: --- Created attachment 53788 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53788&action=edit code sequence from https://github.com/embench/embench-iot gcc -S -Os -march=rv32gc -mabi=ilp32 test.c ``` sglib_dllist_len: beq a0,zero,.L6 mv a4,a0 li a5,0 .L3: lw a4,8(a4) addi a5,a5,1 bne a4,zero,.L3 lw a4,4(a0) li a0,0 .L4: bne a4,zero,.L5 add a0,a0,a5 ret .L5: lw a4,4(a4) addi a0,a0,1 j .L4 .L6: li a0,0 ret ``` li a0,0 is unnecessary, and this extra instruction might lead to a worse cfg and bad code size. I spotted several size suboptimal cases related to this pattern. result on clang: ``` sglib_dllist_len: beqz a0, .LBB0_4 mv a1, a0 li a0, -1 mv a2, a1 .LBB0_2: lw a2, 8(a2) addi a0, a0, 1 bnez a2, .LBB0_2 .LBB0_3: lw a1, 4(a1) addi a0, a0, 1 bnez a1, .LBB0_3 .LBB0_4: ret ``` Similar problem on arm64: https://godbolt.org/z/Yo6jsKMGz