https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116704

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
          Component|tree-optimization           |rtl-optimization
   Last reconfirmed|                            |2024-09-13
           Severity|normal                      |enhancement
             Target|                            |x86_64-linux-gnu
                   |                            |arm-linux-gnueabi
     Ever confirmed|0                           |1

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
aarch64 -O2 gives:
```
calc_simple:
        mov     w2, w0
        mov     w0, 0
        cmp     w2, w1
        bgt     .L1
        add     w1, w1, 1
.L4:
        tst     x2, 1
        add     w3, w0, w2
        add     w2, w2, 1
        csel    w0, w3, w0, eq
        cmp     w1, w2
        bne     .L4
.L1:
        ret
```

There is only one setting w0 to 0 there.

But arm-linux-gnu-eabi has 2:
```
calc_simple:
        mov     r3, r0
        cmp     r0, r1
        bgt     .L5
        adds    r1, r1, #1
        movs    r0, #0
.L4:
        lsls    r2, r3, #31
        it      pl
        addpl   r0, r0, r3
        adds    r3, r3, #1
        cmp     r1, r3
        bne     .L4
        bx      lr
.L5:
        movs    r0, #0
        bx      lr
```


The reason why it is caught on aarch64 but not arm/x86 is because sched1 (which
is disable on x86) decided to swap the order of
```
(insn 12 11 5 3 (set (reg:SI 1 x1 [orig:102 _12 ] [102])
        (plus:SI (reg:SI 1 x1 [orig:114 topD.4415 ] [114])
            (const_int 1 [0x1]))) 119 {*addsi3_aarch64}
     (nil))
(insn 5 12 22 3 (set (reg/v:SI 0 x0 [orig:105 <retval> ] [105])
        (const_int 0 [0])) "/app/example.cpp":2:9 69 {*movsi_aarch64}
     (expr_list:REG_EQUAL (const_int 0 [0])
        (nil)))
```

Putting the set of x0 before.

And then jump2 sees the setting of x0 on both sides of the (first) branch is
the same and can commonialize it.
So in theory jump2 should be enhanced to look further for commonializing
instructions.

There might be another bug about that ...

NOTE on this is just a small size optimization and on modern processors the
setting of register to 0 is free.

Reply via email to