On Wed, 2011-04-20 at 16:26 +0800, Carrot Wei wrote: > On Tue, Apr 19, 2011 at 8:55 PM, Richard Earnshaw <rearn...@arm.com> wrote: > > > > On Tue, 2011-04-19 at 17:41 +0800, Guozhi Wei wrote: > >> Reload pass tries to determine the stack frame, so it needs to check the > >> push/pop lr optimization opportunity. One of the criteria is if there is > >> any > >> far jump inside the function. Unfortunately at this time gcc can't decide > >> each > >> instruction's length and basic block layout, so it can't know the offset of > >> a jump. To be conservative it assumes every jump is a far jump. So any jump > >> in a function will prevent this push/pop lr optimization. > >> > >> To enable the push/pop lr optimization in reload pass, I compute the > >> possible > >> maximum length of the function body. If the length is not large enough, far > >> jump is not necessary, so we can safely do push/pop lr optimization. > >> > >> Tested on arm qemu with options -march=armv5te -mthumb, without regression. > >> > >> This patch is for google/main. > >> > >> 2011-04-19 Guozhi Wei <car...@google.com> > >> > >> Google ref 40255. > >> * gcc/config/arm/arm.c (SHORTEST_FAR_JUMP_LENGTH): New constant. > >> (estimate_function_length): New function. > >> (thumb_far_jump_used_p): No far jump is needed in short function. > >> > > > > Setting aside for the moment Richi's issue with hot/cold sections, this > > isn't safe. Firstly get_attr_length() doesn't return the worst case > > length; and secondly, it doesn't take into account the size of reload > > insns that are still on the reloads stack -- these are only emitted > > right at the end of the reload pass. Both of these would need to be > > addressed before this can be safely done. > > > > It's worth noting here that in the dim and distant past we used to try > > to estimate the size of the function and eliminate redundant saves of > > R14, but the code had to be removed because it was too fragile; but it > > looks like some vestiges of the code are still in the compiler. > > > > A slightly less optimistic approach, but one that is much safer is to > > scan the function after reload has completed and see if we can avoid > > having to push LR. We can do this if: > > > I guess "less optimistic" is relative to the ideal optimization > situation, I believe it is still much better than current result. Do > you think if arm_reorg() is appropriate place to do this? >
Making the decision in a single pass would certainly be the best approach; and arm_reorg is certainly going to come after all other major code re-arrangements. Indeed, you should probably do this after the minipool placement so that you can be sure that these don't bulk up the body of the function too much. As you are doing the elimination late on in the compilation you can do a better job of estimation by calling shorten_branches() to work out the precise length of each insn. Then you can simply scan over the insns to work out if there is a branch that still needs r14. R.