> On 05/16/2014 12:05 PM, Kugan wrote: > > > > > > On 16/05/14 20:40, pins...@gmail.com wrote: > >> > >> > >>> On May 16, 2014, at 3:23 AM, Kugan > <kugan.vivekanandara...@linaro.org> wrote: > >>> > >>> I would like to know if there is anyway we can use registers from > >>> particular register class just as spill registers (in places where > >>> register allocator would normally spill to stack and nothing more), > when > >>> it can be useful. > >>> > >>> In AArch64, in some cases, compiling with -mgeneral-regs-only > produces > >>> better performance compared not using it. The difference here is > that > >>> when -mgeneral-regs-only is not used, floating point register are > also > >>> used in register allocation. Then IRA/LRA has to move them to core > >>> registers before performing operations as shown below. > >> > >> Can you show the code with fp register disabled? Does it use the > stack to spill? Normally this is due to register to register class > costs compared to register to memory move cost. Also I think it > depends on the processor rather the target. For thunder, using the fp > registers might actually be better than using the stack depending if > the stack was in L1. > > Not all the LDR/STR combination match to fmov. In the testcase I > have, > > > > aarch64-none-linux-gnu-gcc sha_dgst.c -O2 -S -mgeneral-regs-only > > grep -c "ldr" sha_dgst.s > > 50 > > grep -c "str" sha_dgst.s > > 42 > > grep -c "fmov" sha_dgst.s > > 0 > > > > aarch64-none-linux-gnu-gcc sha_dgst.c -O2 -S > > grep -c "ldr" sha_dgst.s > > 42 > > grep -c "str" sha_dgst.s > > 31 > > grep -c "fmov" sha_dgst.s > > 105 > > > > I am not saying that we shouldn't use floating point register here. > But > > from the above, it seems like register allocator is using it as more > > like core register (even though the cost mode has higher cost) and > then > > moving the values to core registers before operations. if that is the > > case, my question is, how do we just make this as spill register > class > > so that we will replace ldr/str with equal number of fmov when it is > > possible. > > I'm also seeing stuff like this: > > => 0x7fb72a0928 <ClassFileParser::parse_constant_pool_entries(int, > Thread*)+2500>: > add x21, x4, x21, lsl #3 > => 0x7fb72a092c <ClassFileParser::parse_constant_pool_entries(int, > Thread*)+2504>: > fmov w2, s8 > => 0x7fb72a0930 <ClassFileParser::parse_constant_pool_entries(int, > Thread*)+2508>: > str w2, [x21,#88] > > I guess GCC doesn't know how to store an SImode value in an FP register > into > memory? This is 4.8.1. >
Please can you try that on trunk and report back. Thanks, Ian