I would like to know if there is anyway we can use registers from
particular register class just as spill registers (in places where
register allocator would normally spill to stack and nothing more), when
it can be useful.
In AArch64, in some cases, compiling with -mgeneral-regs-only produces
better performance compared not using it. The difference here is that
when -mgeneral-regs-only is not used, floating point register are also
used in register allocation. Then IRA/LRA has to move them to core
registers before performing operations as shown below.
.....
fmov s1, w8 <--
mov w21, 49622
movk w21, 0xca62, lsl 16
add w21, w16, w21
add w21, w21, w2
eor w10, w0, w10
add w10, w21, w10
ror w8, w7, 27
add w7, w10, w8
ror w7, w7, 27
fmov w0, s1 <--
add w7, w0, w7
add w13, w13, w7
fmov w0, s4 <--
add w0, w0, w20
fmov s4, w0 <--
ror w18, w18, 2
fmov w0, s2 <--
add w0, w0, w18
fmov s2, w0 <--
add w12, w12, w27
add w14, w14, w15
mov w15, w24
fmov x0, d3 <--
subs x0, x0, #1
fmov d3, x0 <--
bne .L2
fmov x0, d0 <--
.....
In this case, costs for allocnos calculated by IRA based on the cost
model supplied by the back-end is like:
a0(r667,l0) costs: GENERAL_REGS:0,0 FP_LO_REGS:3960,3960
FP_REGS:3960,3960 ALL_REGS:3960,3960 MEM:3960,3960
Thus, changing the cost of floating point register class is not going to
help. If I increase further, register allocated will just spill these
live ranges to memory and will ignore floating point register in this case.
Is there any other back-end in gcc that does anything to improve cases
like this, that I can refer to?