On Mon, May 19, 2014 at 1:02 PM, Andrew Haley <a...@redhat.com> wrote:
> On 05/16/2014 05:20 PM, Ian Bolton wrote:
>>> On 05/16/2014 12:05 PM, Kugan wrote:
>>>>
>>>>
>>>> On 16/05/14 20:40, pins...@gmail.com wrote:
>>>>>
>>>>>
>>>>>> On May 16, 2014, at 3:23 AM, Kugan
>>> <kugan.vivekanandara...@linaro.org> wrote:
>>>>>>
>>>>>> I would like to know if there is anyway we can use registers from
>>>>>> particular register class just as spill registers (in places where
>>>>>> register allocator would normally spill to stack and nothing more),
>>> when
>>>>>> it can be useful.
>>>>>>
>>>>>> In AArch64, in some cases, compiling with -mgeneral-regs-only
>>> produces
>>>>>> better performance compared not using it. The difference here is
>>> that
>>>>>> when -mgeneral-regs-only is not used, floating point register are
>>> also
>>>>>> used in register allocation. Then IRA/LRA has to move them to core
>>>>>> registers before performing operations as shown below.
>>>>>
>>>>> Can you show the code with fp register disabled?  Does it use the
>>> stack to spill?  Normally this is due to register to register class
>>> costs compared to register to memory move cost.  Also I think it
>>> depends on the processor rather the target.  For thunder, using the fp
>>> registers might actually be better than using the stack depending if
>>> the stack was in L1.
>>>> Not all the LDR/STR combination match to fmov. In the testcase I
>>> have,
>>>>
>>>> aarch64-none-linux-gnu-gcc sha_dgst.c -O2  -S  -mgeneral-regs-only
>>>> grep -c "ldr" sha_dgst.s
>>>> 50
>>>> grep -c "str" sha_dgst.s
>>>> 42
>>>> grep -c "fmov" sha_dgst.s
>>>> 0
>>>>
>>>> aarch64-none-linux-gnu-gcc sha_dgst.c -O2  -S
>>>> grep -c "ldr" sha_dgst.s
>>>> 42
>>>> grep -c "str" sha_dgst.s
>>>> 31
>>>> grep -c "fmov" sha_dgst.s
>>>> 105
>>>>
>>>> I  am not saying that we shouldn't use floating point register here.
>>> But
>>>> from the above, it seems like register allocator is using it as more
>>>> like core register (even though the cost mode has higher cost) and
>>> then
>>>> moving the values to core registers before operations. if that is the
>>>> case, my question is, how do we just make this as spill register
>>> class
>>>> so that we will replace ldr/str with equal number of fmov when it is
>>>> possible.
>>>
>>> I'm also seeing stuff like this:
>>>
>>> => 0x7fb72a0928 <ClassFileParser::parse_constant_pool_entries(int,
>>> Thread*)+2500>:
>>>     add      x21, x4, x21, lsl #3
>>> => 0x7fb72a092c <ClassFileParser::parse_constant_pool_entries(int,
>>> Thread*)+2504>:
>>>     fmov     w2, s8
>>> => 0x7fb72a0930 <ClassFileParser::parse_constant_pool_entries(int,
>>> Thread*)+2508>:
>>>     str      w2, [x21,#88]
>>>
>>> I guess GCC doesn't know how to store an SImode value in an FP register
>>> into
>>> memory?  This is  4.8.1.
>>>
>>
>> Please can you try that on trunk and report back.
>
> OK, this is trunk, and I'm not longer seeing that happen.
>
> However, I am seeing:
>
>    0x0000007fb76dc82c <+160>:   adrp    x25, 0x7fb7c80000
>    0x0000007fb76dc830 <+164>:   add     x25, x25, #0x480
>    0x0000007fb76dc834 <+168>:   fmov    d8, x0
>    0x0000007fb76dc838 <+172>:   add     x0, x29, #0x160
>    0x0000007fb76dc83c <+176>:   fmov    d9, x0
>    0x0000007fb76dc840 <+180>:   add     x0, x29, #0xd8
>    0x0000007fb76dc844 <+184>:   fmov    d10, x0
>    0x0000007fb76dc848 <+188>:   add     x0, x29, #0xf8
>    0x0000007fb76dc84c <+192>:   fmov    d11, x0
>
> followed later by:
>
>    0x0000007fb76dd224 <+2712>:  fmov    x0, d9
>    0x0000007fb76dd228 <+2716>:  add     x6, x29, #0x118
>    0x0000007fb76dd22c <+2720>:  str     x20, [x0,w27,sxtw #3]
>    0x0000007fb76dd230 <+2724>:  fmov    x0, d10
>    0x0000007fb76dd234 <+2728>:  str     w28, [x0,w27,sxtw #2]
>    0x0000007fb76dd238 <+2732>:  fmov    x0, d11
>    0x0000007fb76dd23c <+2736>:  str     w19, [x0,w27,sxtw #2]
>
> which seems a bit suboptimal, given that these double registers now have
> to be saved in the prologue.

That looks a bit suspicious - Is there a pre-processed file you can
put on to bugzilla for someone to take a look at with command line
options et al ?

I had a testcase that I was investigating a few days back from a
benchmark that was doing SHA2 calculations. From my notes I'd been
playing with REGISTER_MOVE_COST and MEMORY_MOVE_COST and additionally
the extra moves appeared to disappear with -fno-schedule-insns.
Remember however that on AArch64 we don't have sched-pressure on by
default.


regards
Ramana

>

Reply via email to