https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116704
Andrew Pinski <pinskia at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Component|tree-optimization |rtl-optimization Last reconfirmed| |2024-09-13 Severity|normal |enhancement Target| |x86_64-linux-gnu | |arm-linux-gnueabi Ever confirmed|0 |1 --- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> --- aarch64 -O2 gives: ``` calc_simple: mov w2, w0 mov w0, 0 cmp w2, w1 bgt .L1 add w1, w1, 1 .L4: tst x2, 1 add w3, w0, w2 add w2, w2, 1 csel w0, w3, w0, eq cmp w1, w2 bne .L4 .L1: ret ``` There is only one setting w0 to 0 there. But arm-linux-gnu-eabi has 2: ``` calc_simple: mov r3, r0 cmp r0, r1 bgt .L5 adds r1, r1, #1 movs r0, #0 .L4: lsls r2, r3, #31 it pl addpl r0, r0, r3 adds r3, r3, #1 cmp r1, r3 bne .L4 bx lr .L5: movs r0, #0 bx lr ``` The reason why it is caught on aarch64 but not arm/x86 is because sched1 (which is disable on x86) decided to swap the order of ``` (insn 12 11 5 3 (set (reg:SI 1 x1 [orig:102 _12 ] [102]) (plus:SI (reg:SI 1 x1 [orig:114 topD.4415 ] [114]) (const_int 1 [0x1]))) 119 {*addsi3_aarch64} (nil)) (insn 5 12 22 3 (set (reg/v:SI 0 x0 [orig:105 <retval> ] [105]) (const_int 0 [0])) "/app/example.cpp":2:9 69 {*movsi_aarch64} (expr_list:REG_EQUAL (const_int 0 [0]) (nil))) ``` Putting the set of x0 before. And then jump2 sees the setting of x0 on both sides of the (first) branch is the same and can commonialize it. So in theory jump2 should be enhanced to look further for commonializing instructions. There might be another bug about that ... NOTE on this is just a small size optimization and on modern processors the setting of register to 0 is free.