On Tue, 2011-04-19 at 17:41 +0800, Guozhi Wei wrote:
> Reload pass tries to determine the stack frame, so it needs to check the
> push/pop lr optimization opportunity. One of the criteria is if there is any
> far jump inside the function. Unfortunately at this time gcc can't decide each
> instruction's length and basic block layout, so it can't know the offset of
> a jump. To be conservative it assumes every jump is a far jump. So any jump
> in a function will prevent this push/pop lr optimization.
> 
> To enable the push/pop lr optimization in reload pass, I compute the possible
> maximum length of the function body. If the length is not large enough, far
> jump is not necessary, so we can safely do push/pop lr optimization.
> 
> Tested on arm qemu with options -march=armv5te -mthumb, without regression.
> 
> This patch is for google/main.
> 
> 2011-04-19  Guozhi Wei  <car...@google.com>
> 
>       Google ref 40255.
>       * gcc/config/arm/arm.c (SHORTEST_FAR_JUMP_LENGTH): New constant.
>       (estimate_function_length): New function.
>       (thumb_far_jump_used_p): No far jump is needed in short function.
> 

Setting aside for the moment Richi's issue with hot/cold sections, this
isn't safe.  Firstly get_attr_length() doesn't return the worst case
length; and secondly, it doesn't take into account the size of reload
insns that are still on the reloads stack -- these are only emitted
right at the end of the reload pass.  Both of these would need to be
addressed before this can be safely done.

It's worth noting here that in the dim and distant past we used to try
to estimate the size of the function and eliminate redundant saves of
R14, but the code had to be removed because it was too fragile; but it
looks like some vestiges of the code are still in the compiler.

A slightly less optimistic approach, but one that is much safer is to
scan the function after reload has completed and see if we can avoid
having to push LR.  We can do this if:

- The function makes no calls
- The function saves nothing on the stack other than r14
- It's small enough (by this point we can use get_attr_length)
- R14 is only modified by internal jump instructions

There's already some code to try to do this in the ARM back-end (look
for lr_save_eliminated), but it's probably not doing its job properly
because it tries to cache the result early on (it's costly to work this
out) and at the time its first called we cannot assert that the register
won't be live.

R.



Reply via email to