> On 05/16/2014 12:05 PM, Kugan wrote:
> >
> >
> > On 16/05/14 20:40, pins...@gmail.com wrote:
> >>
> >>
> >>> On May 16, 2014, at 3:23 AM, Kugan
> <kugan.vivekanandara...@linaro.org> wrote:
> >>>
> >>> I would like to know if there is anyway we can use registers from
> >>> particular register class just as spill registers (in places where
> >>> register allocator would normally spill to stack and nothing more),
> when
> >>> it can be useful.
> >>>
> >>> In AArch64, in some cases, compiling with -mgeneral-regs-only
> produces
> >>> better performance compared not using it. The difference here is
> that
> >>> when -mgeneral-regs-only is not used, floating point register are
> also
> >>> used in register allocation. Then IRA/LRA has to move them to core
> >>> registers before performing operations as shown below.
> >>
> >> Can you show the code with fp register disabled?  Does it use the
> stack to spill?  Normally this is due to register to register class
> costs compared to register to memory move cost.  Also I think it
> depends on the processor rather the target.  For thunder, using the fp
> registers might actually be better than using the stack depending if
> the stack was in L1.
> > Not all the LDR/STR combination match to fmov. In the testcase I
> have,
> >
> > aarch64-none-linux-gnu-gcc sha_dgst.c -O2  -S  -mgeneral-regs-only
> > grep -c "ldr" sha_dgst.s
> > 50
> > grep -c "str" sha_dgst.s
> > 42
> > grep -c "fmov" sha_dgst.s
> > 0
> >
> > aarch64-none-linux-gnu-gcc sha_dgst.c -O2  -S
> > grep -c "ldr" sha_dgst.s
> > 42
> > grep -c "str" sha_dgst.s
> > 31
> > grep -c "fmov" sha_dgst.s
> > 105
> >
> > I  am not saying that we shouldn't use floating point register here.
> But
> > from the above, it seems like register allocator is using it as more
> > like core register (even though the cost mode has higher cost) and
> then
> > moving the values to core registers before operations. if that is the
> > case, my question is, how do we just make this as spill register
> class
> > so that we will replace ldr/str with equal number of fmov when it is
> > possible.
> 
> I'm also seeing stuff like this:
> 
> => 0x7fb72a0928 <ClassFileParser::parse_constant_pool_entries(int,
> Thread*)+2500>:
>     add       x21, x4, x21, lsl #3
> => 0x7fb72a092c <ClassFileParser::parse_constant_pool_entries(int,
> Thread*)+2504>:
>     fmov      w2, s8
> => 0x7fb72a0930 <ClassFileParser::parse_constant_pool_entries(int,
> Thread*)+2508>:
>     str       w2, [x21,#88]
> 
> I guess GCC doesn't know how to store an SImode value in an FP register
> into
> memory?  This is  4.8.1.
> 

Please can you try that on trunk and report back.

Thanks,
Ian
 



Reply via email to