Re: Using particular register class (like floating point registers) as spill register class

Andrew Haley Mon, 19 May 2014 06:11:44 -0700

On 05/19/2014 01:19 PM, Ramana Radhakrishnan wrote:
> On Mon, May 19, 2014 at 1:02 PM, Andrew Haley <[email protected]> wrote:
>> On 05/16/2014 05:20 PM, Ian Bolton wrote:
>>>> On 05/16/2014 12:05 PM, Kugan wrote:
>>>>>
>>>>>
>>>>> On 16/05/14 20:40, [email protected] wrote:
>>>>>>
>>>>>>
>>>>>>> On May 16, 2014, at 3:23 AM, Kugan
>>>> <[email protected]> wrote:
>>>>>>>
>>>>>>> I would like to know if there is anyway we can use registers from
>>>>>>> particular register class just as spill registers (in places where
>>>>>>> register allocator would normally spill to stack and nothing more),
>>>> when
>>>>>>> it can be useful.
>>>>>>>
>>>>>>> In AArch64, in some cases, compiling with -mgeneral-regs-only
>>>> produces
>>>>>>> better performance compared not using it. The difference here is
>>>> that
>>>>>>> when -mgeneral-regs-only is not used, floating point register are
>>>> also
>>>>>>> used in register allocation. Then IRA/LRA has to move them to core
>>>>>>> registers before performing operations as shown below.
>>>>>>
>>>>>> Can you show the code with fp register disabled?  Does it use the
>>>> stack to spill?  Normally this is due to register to register class
>>>> costs compared to register to memory move cost.  Also I think it
>>>> depends on the processor rather the target.  For thunder, using the fp
>>>> registers might actually be better than using the stack depending if
>>>> the stack was in L1.
>>>>> Not all the LDR/STR combination match to fmov. In the testcase I
>>>> have,
>>>>>
>>>>> aarch64-none-linux-gnu-gcc sha_dgst.c -O2  -S  -mgeneral-regs-only
>>>>> grep -c "ldr" sha_dgst.s
>>>>> 50
>>>>> grep -c "str" sha_dgst.s
>>>>> 42
>>>>> grep -c "fmov" sha_dgst.s
>>>>> 0
>>>>>
>>>>> aarch64-none-linux-gnu-gcc sha_dgst.c -O2  -S
>>>>> grep -c "ldr" sha_dgst.s
>>>>> 42
>>>>> grep -c "str" sha_dgst.s
>>>>> 31
>>>>> grep -c "fmov" sha_dgst.s
>>>>> 105
>>>>>
>>>>> I  am not saying that we shouldn't use floating point register here.
>>>> But
>>>>> from the above, it seems like register allocator is using it as more
>>>>> like core register (even though the cost mode has higher cost) and
>>>> then
>>>>> moving the values to core registers before operations. if that is the
>>>>> case, my question is, how do we just make this as spill register
>>>> class
>>>>> so that we will replace ldr/str with equal number of fmov when it is
>>>>> possible.
>>>>
>>>> I'm also seeing stuff like this:
>>>>
>>>> => 0x7fb72a0928 <ClassFileParser::parse_constant_pool_entries(int,
>>>> Thread*)+2500>:
>>>>     add      x21, x4, x21, lsl #3
>>>> => 0x7fb72a092c <ClassFileParser::parse_constant_pool_entries(int,
>>>> Thread*)+2504>:
>>>>     fmov     w2, s8
>>>> => 0x7fb72a0930 <ClassFileParser::parse_constant_pool_entries(int,
>>>> Thread*)+2508>:
>>>>     str      w2, [x21,#88]
>>>>
>>>> I guess GCC doesn't know how to store an SImode value in an FP register
>>>> into
>>>> memory?  This is  4.8.1.
>>>>
>>>
>>> Please can you try that on trunk and report back.
>>
>> OK, this is trunk, and I'm not longer seeing that happen.
>>
>> However, I am seeing:
>>
>>    0x0000007fb76dc82c <+160>:   adrp    x25, 0x7fb7c80000
>>    0x0000007fb76dc830 <+164>:   add     x25, x25, #0x480
>>    0x0000007fb76dc834 <+168>:   fmov    d8, x0
>>    0x0000007fb76dc838 <+172>:   add     x0, x29, #0x160
>>    0x0000007fb76dc83c <+176>:   fmov    d9, x0
>>    0x0000007fb76dc840 <+180>:   add     x0, x29, #0xd8
>>    0x0000007fb76dc844 <+184>:   fmov    d10, x0
>>    0x0000007fb76dc848 <+188>:   add     x0, x29, #0xf8
>>    0x0000007fb76dc84c <+192>:   fmov    d11, x0
>>
>> followed later by:
>>
>>    0x0000007fb76dd224 <+2712>:  fmov    x0, d9
>>    0x0000007fb76dd228 <+2716>:  add     x6, x29, #0x118
>>    0x0000007fb76dd22c <+2720>:  str     x20, [x0,w27,sxtw #3]
>>    0x0000007fb76dd230 <+2724>:  fmov    x0, d10
>>    0x0000007fb76dd234 <+2728>:  str     w28, [x0,w27,sxtw #2]
>>    0x0000007fb76dd238 <+2732>:  fmov    x0, d11
>>    0x0000007fb76dd23c <+2736>:  str     w19, [x0,w27,sxtw #2]
>>
>> which seems a bit suboptimal, given that these double registers now have
>> to be saved in the prologue.
> 
> That looks a bit suspicious - Is there a pre-processed file you can
> put on to bugzilla for someone to take a look at with command line
> options et al ?


I'll try, but I'm using precompiled headers so it's a bit tricky.  I'll let
you know.

Andrew.

Re: Using particular register class (like floating point registers) as spill register class

Reply via email to