https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71022
Bug ID: 71022 Summary: GCC prefers register moves over move immediate Product: gcc Version: 7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: wdijkstr at arm dot com Target Milestone: --- When assigning the same immediate value to different registers, GCC will always CSE the immediate and emit a register move for subsequent uses. This creates unnecessary register dependencies and increases latency. When the cost of an immediate move is the same as a register move (which should be true for most targets), it should prefer the former. A register move is better only when the immediate requires multiple instructions or is larger with -Os. It's not obvious where this is best done. The various cprop phases before IRA do the right thing, but cse2 (which runs later) then undoes it. And cprop_hardreg doesn't appear to be able to deal with immediates. int f1(int x) { int y = 1, z = 1; while (x--) { y += z; z += x; } return y + z; } void g(float, float); void f2(void) { g(1.0, 1.0); g(3.3, 3.3); } On AArch64 I get: f1: sub w1, w0, #1 cbz w0, .L12 mov w0, 1 mov w2, w0 *** mov w2, 1 .p2align 2 .L11: add w2, w2, w0 add w0, w0, w1 sub w1, w1, #1 cmn w1, #1 bne .L11 add w0, w2, w0 ret .L12: mov w0, 2 ret f2: fmov s1, 1.0e+0 str x30, [sp, -16]! fmov s0, s1 *** fmov s0, 1.0 bl g adrp x0, .LC1 ldr x30, [sp], 16 ldr s1, [x0, #:lo12:.LC1] fmov s0, s1 *** ldr s0, [x0, #:lo12:.LC1] b g