On 05/19/2014 01:19 PM, Ramana Radhakrishnan wrote: > On Mon, May 19, 2014 at 1:02 PM, Andrew Haley <a...@redhat.com> wrote: >> On 05/16/2014 05:20 PM, Ian Bolton wrote: >>>> On 05/16/2014 12:05 PM, Kugan wrote: >>>>> >>>>> >>>>> On 16/05/14 20:40, pins...@gmail.com wrote: >>>>>> >>>>>> >>>>>>> On May 16, 2014, at 3:23 AM, Kugan >>>> <kugan.vivekanandara...@linaro.org> wrote: >>>>>>> >>>>>>> I would like to know if there is anyway we can use registers from >>>>>>> particular register class just as spill registers (in places where >>>>>>> register allocator would normally spill to stack and nothing more), >>>> when >>>>>>> it can be useful. >>>>>>> >>>>>>> In AArch64, in some cases, compiling with -mgeneral-regs-only >>>> produces >>>>>>> better performance compared not using it. The difference here is >>>> that >>>>>>> when -mgeneral-regs-only is not used, floating point register are >>>> also >>>>>>> used in register allocation. Then IRA/LRA has to move them to core >>>>>>> registers before performing operations as shown below. >>>>>> >>>>>> Can you show the code with fp register disabled? Does it use the >>>> stack to spill? Normally this is due to register to register class >>>> costs compared to register to memory move cost. Also I think it >>>> depends on the processor rather the target. For thunder, using the fp >>>> registers might actually be better than using the stack depending if >>>> the stack was in L1. >>>>> Not all the LDR/STR combination match to fmov. In the testcase I >>>> have, >>>>> >>>>> aarch64-none-linux-gnu-gcc sha_dgst.c -O2 -S -mgeneral-regs-only >>>>> grep -c "ldr" sha_dgst.s >>>>> 50 >>>>> grep -c "str" sha_dgst.s >>>>> 42 >>>>> grep -c "fmov" sha_dgst.s >>>>> 0 >>>>> >>>>> aarch64-none-linux-gnu-gcc sha_dgst.c -O2 -S >>>>> grep -c "ldr" sha_dgst.s >>>>> 42 >>>>> grep -c "str" sha_dgst.s >>>>> 31 >>>>> grep -c "fmov" sha_dgst.s >>>>> 105 >>>>> >>>>> I am not saying that we shouldn't use floating point register here. >>>> But >>>>> from the above, it seems like register allocator is using it as more >>>>> like core register (even though the cost mode has higher cost) and >>>> then >>>>> moving the values to core registers before operations. if that is the >>>>> case, my question is, how do we just make this as spill register >>>> class >>>>> so that we will replace ldr/str with equal number of fmov when it is >>>>> possible. >>>> >>>> I'm also seeing stuff like this: >>>> >>>> => 0x7fb72a0928 <ClassFileParser::parse_constant_pool_entries(int, >>>> Thread*)+2500>: >>>> add x21, x4, x21, lsl #3 >>>> => 0x7fb72a092c <ClassFileParser::parse_constant_pool_entries(int, >>>> Thread*)+2504>: >>>> fmov w2, s8 >>>> => 0x7fb72a0930 <ClassFileParser::parse_constant_pool_entries(int, >>>> Thread*)+2508>: >>>> str w2, [x21,#88] >>>> >>>> I guess GCC doesn't know how to store an SImode value in an FP register >>>> into >>>> memory? This is 4.8.1. >>>> >>> >>> Please can you try that on trunk and report back. >> >> OK, this is trunk, and I'm not longer seeing that happen. >> >> However, I am seeing: >> >> 0x0000007fb76dc82c <+160>: adrp x25, 0x7fb7c80000 >> 0x0000007fb76dc830 <+164>: add x25, x25, #0x480 >> 0x0000007fb76dc834 <+168>: fmov d8, x0 >> 0x0000007fb76dc838 <+172>: add x0, x29, #0x160 >> 0x0000007fb76dc83c <+176>: fmov d9, x0 >> 0x0000007fb76dc840 <+180>: add x0, x29, #0xd8 >> 0x0000007fb76dc844 <+184>: fmov d10, x0 >> 0x0000007fb76dc848 <+188>: add x0, x29, #0xf8 >> 0x0000007fb76dc84c <+192>: fmov d11, x0 >> >> followed later by: >> >> 0x0000007fb76dd224 <+2712>: fmov x0, d9 >> 0x0000007fb76dd228 <+2716>: add x6, x29, #0x118 >> 0x0000007fb76dd22c <+2720>: str x20, [x0,w27,sxtw #3] >> 0x0000007fb76dd230 <+2724>: fmov x0, d10 >> 0x0000007fb76dd234 <+2728>: str w28, [x0,w27,sxtw #2] >> 0x0000007fb76dd238 <+2732>: fmov x0, d11 >> 0x0000007fb76dd23c <+2736>: str w19, [x0,w27,sxtw #2] >> >> which seems a bit suboptimal, given that these double registers now have >> to be saved in the prologue. > > That looks a bit suspicious - Is there a pre-processed file you can > put on to bugzilla for someone to take a look at with command line > options et al ?
I'll try, but I'm using precompiled headers so it's a bit tricky. I'll let you know. Andrew.