On 05/16/2014 12:05 PM, Kugan wrote: > > > On 16/05/14 20:40, pins...@gmail.com wrote: >> >> >>> On May 16, 2014, at 3:23 AM, Kugan <kugan.vivekanandara...@linaro.org> >>> wrote: >>> >>> I would like to know if there is anyway we can use registers from >>> particular register class just as spill registers (in places where >>> register allocator would normally spill to stack and nothing more), when >>> it can be useful. >>> >>> In AArch64, in some cases, compiling with -mgeneral-regs-only produces >>> better performance compared not using it. The difference here is that >>> when -mgeneral-regs-only is not used, floating point register are also >>> used in register allocation. Then IRA/LRA has to move them to core >>> registers before performing operations as shown below. >> >> Can you show the code with fp register disabled? Does it use the stack to >> spill? Normally this is due to register to register class costs compared to >> register to memory move cost. Also I think it depends on the processor >> rather the target. For thunder, using the fp registers might actually be >> better than using the stack depending if the stack was in L1. > Not all the LDR/STR combination match to fmov. In the testcase I have, > > aarch64-none-linux-gnu-gcc sha_dgst.c -O2 -S -mgeneral-regs-only > grep -c "ldr" sha_dgst.s > 50 > grep -c "str" sha_dgst.s > 42 > grep -c "fmov" sha_dgst.s > 0 > > aarch64-none-linux-gnu-gcc sha_dgst.c -O2 -S > grep -c "ldr" sha_dgst.s > 42 > grep -c "str" sha_dgst.s > 31 > grep -c "fmov" sha_dgst.s > 105 > > I am not saying that we shouldn’t use floating point register here. But > from the above, it seems like register allocator is using it as more > like core register (even though the cost mode has higher cost) and then > moving the values to core registers before operations. if that is the > case, my question is, how do we just make this as spill register class > so that we will replace ldr/str with equal number of fmov when it is > possible.
I'm also seeing stuff like this: => 0x7fb72a0928 <ClassFileParser::parse_constant_pool_entries(int, Thread*)+2500>: add x21, x4, x21, lsl #3 => 0x7fb72a092c <ClassFileParser::parse_constant_pool_entries(int, Thread*)+2504>: fmov w2, s8 => 0x7fb72a0930 <ClassFileParser::parse_constant_pool_entries(int, Thread*)+2508>: str w2, [x21,#88] I guess GCC doesn't know how to store an SImode value in an FP register into memory? This is 4.8.1. Andrew.