I would like to know if there is anyway we can use registers from particular register class just as spill registers (in places where register allocator would normally spill to stack and nothing more), when it can be useful.
In AArch64, in some cases, compiling with -mgeneral-regs-only produces better performance compared not using it. The difference here is that when -mgeneral-regs-only is not used, floating point register are also used in register allocation. Then IRA/LRA has to move them to core registers before performing operations as shown below. ..... fmov s1, w8 <-- mov w21, 49622 movk w21, 0xca62, lsl 16 add w21, w16, w21 add w21, w21, w2 eor w10, w0, w10 add w10, w21, w10 ror w8, w7, 27 add w7, w10, w8 ror w7, w7, 27 fmov w0, s1 <-- add w7, w0, w7 add w13, w13, w7 fmov w0, s4 <-- add w0, w0, w20 fmov s4, w0 <-- ror w18, w18, 2 fmov w0, s2 <-- add w0, w0, w18 fmov s2, w0 <-- add w12, w12, w27 add w14, w14, w15 mov w15, w24 fmov x0, d3 <-- subs x0, x0, #1 fmov d3, x0 <-- bne .L2 fmov x0, d0 <-- ..... In this case, costs for allocnos calculated by IRA based on the cost model supplied by the back-end is like: a0(r667,l0) costs: GENERAL_REGS:0,0 FP_LO_REGS:3960,3960 FP_REGS:3960,3960 ALL_REGS:3960,3960 MEM:3960,3960 Thus, changing the cost of floating point register class is not going to help. If I increase further, register allocated will just spill these live ranges to memory and will ignore floating point register in this case. Is there any other back-end in gcc that does anything to improve cases like this, that I can refer to? Thanks in advance, Kugan