On 2014-05-16, 6:23 AM, Kugan wrote:
I would like to know if there is anyway we can use registers from
particular register class just as spill registers (in places where
register allocator would normally spill to stack and nothing more), when
it can be useful.

In AArch64, in some cases, compiling with -mgeneral-regs-only produces
better performance compared not using it. The difference here is that
when -mgeneral-regs-only is not used, floating point register are also
used in register allocation. Then IRA/LRA has to move them to core
registers before performing operations as shown below.

.....
        fmov    s1, w8                 <--
        mov     w21, 49622
        movk    w21, 0xca62, lsl 16
        add     w21, w16, w21
        add     w21, w21, w2
        eor     w10, w0, w10
        add     w10, w21, w10
        ror     w8, w7, 27
        add     w7, w10, w8
        ror     w7, w7, 27
        fmov    w0, s1                 <--
        add     w7, w0, w7
        add     w13, w13, w7
        fmov    w0, s4                 <--
        add     w0, w0, w20
        fmov    s4, w0                 <--
        ror     w18, w18, 2
        fmov    w0, s2                 <--
        add     w0, w0, w18
        fmov    s2, w0                 <--
        add     w12, w12, w27
        add     w14, w14, w15
        mov     w15, w24
        fmov    x0, d3                 <--
        subs    x0, x0, #1
        fmov    d3, x0                 <--
        bne     .L2
        fmov    x0, d0                 <--

  .....

In this case, costs for allocnos calculated by IRA based on the cost
model supplied by the back-end is like:
a0(r667,l0) costs: GENERAL_REGS:0,0 FP_LO_REGS:3960,3960
FP_REGS:3960,3960 ALL_REGS:3960,3960 MEM:3960,3960

Thus, changing the cost of floating point register class is not going to
help. If I increase further, register allocated will just spill these
live ranges to memory and will ignore floating point register in this case.

Is there any other back-end in gcc that does anything to improve cases
like this, that I can refer to?


There is a target hook spill_class. You can see how can it be defined in i386.c. Instead of memory, the pseudos are stored in vector regs. It is profitable for modern Intel processors which have a fast path between general regs and SSE regs. It results in generation of smaller code too as movd is shorter than ld/st insns.

So you can increase costs of fp regs and define the hook, then fp regs will be used for pseudos not getting general regs and fmov will be generated instead of ld/st.

I am working on improving spilling general regs into vector ones. So I hope there will be more cases when GCC does it.



Reply via email to