RE: Register allocation: caller-save vs spilling

Wilco Dijkstra Thu, 04 Sep 2014 11:38:12 -0700

Hi Vlad,

I added you directly in case you hadn't spotted my original post.


A simple example for AArch64 trunk is as follows:

// Compile with: -O2 -fomit-frame-pointer -ffixed-d8 -ffixed-d9 -ffixed-d10 
-ffixed-d11 -ffixed-d12
-ffixed-d13 -ffixed-d14 -ffixed-d15 -f(no-)caller-saves
void g(void);

float f(float x)
{
  x += 3.0;
  g();
  x *= 3.0;
  return x;
}

It seems that reload only ever considers rematerialization of spilled 
liveranges, not caller-saved
ones. That means the caller-save code should either reject constants outright 
or the memory spill
cost for these should always be lower than that of a caller-save (given 
memory_move_cost=4 and
register_move_cost=2 as commonly used by targets, anything that can be 
rematerialized should have
less than half the cost of being spilled or caller-saved).

Wilco

> -----Original Message-----
> From: Wilco Dijkstra [mailto:wdijk...@arm.com]
> Sent: 27 August 2014 17:25
> To: 'gcc@gcc.gnu.org'
> Subject: Register allocation: caller-save vs spilling
> 
> Hi,
> 
> I'm investigating various register allocation inefficiencies. The first thing 
> that stands out
> is that GCC both supports caller-saves as well as spilling. Spilling seems to 
> spill all
> definitions and all uses of a liverange. This means you often end up with 
> multiple reloads
> close together, while it would be more efficient to do a single load and then 
> reuse the loaded
> value several times. Caller-save does better in that case, but it is 
> inefficient in that it
> repeatedly stores registers across every call even if unchanged. If both were 
> fixed to
> minimise the number of loads/stores I can't see how one could beat the other, 
> so you'd no
> longer need both.
> 
> Anyway due to the current implementation there are clearly cases where 
> caller-save is best and
> cases where spilling is best. However I do not see it making the correct 
> decision despite
> trying to account for the costs - some code is significantly faster with 
> -fno-caller-saves,
> other code wins with -fcaller-saves. As an example, I see code like this on 
> AArch64:
> 
>         ldr     s4, .LC20
>         fmul    s0, s0, s4
>         str     s4, [x29, 104]
>         bl      f
>         ldr     s4, [x29, 104]
>         fmul    s0, s0, s4
> 
> With -fno-caller-saves it spills and rematerializes the constant as you'd 
> expect:
> 
>         ldr     s1, .LC20
>         fmul    s0, s0, s1
>         bl      f
>         ldr     s5, .LC20
>         fmul    s0, s0, s5
> 
> So given this, is the cost calculation correct and does it include 
> rematerialization? The
> spill code understands how to rematerialize so it should take this into 
> account in the costs.
> I did find some code in ira-costs.c in scan_one_insn() that attempts 
> something that looks like
> an adjustment for rematerialization but it doesn't appear to handle all cases 
> (simple
> immediates, 2-instruction immediates, address-constants, non-aliased loads 
> such as literal
> pool and const data loads).
> 
> Also the hook CALLER_SAVE_PROFITABLE appears to have disappeared - overall 
> performance
> improves significantly if I add this (basically the default heuristic used on 
> instruction
> frequencies):
> 
> --- a/gcc/ira-costs.c
> +++ b/gcc/ira-costs.c
> @@ -2230,6 +2230,8 @@ ira_tune_allocno_costs (void)
>                            * ALLOCNO_FREQ (a)
>                            * IRA_HARD_REGNO_ADD_COST_MULTIPLIER (regno) / 2);
>  #endif
> +                  if (ALLOCNO_FREQ (a) < 4 * ALLOCNO_CALL_FREQ (a))
> +                    cost = INT_MAX;
>                 }
>               if (INT_MAX - cost < reg_costs[j])
>                 reg_costs[j] = INT_MAX;
> 
> If such a simple heuristic can beat the costs, they can't be quite right.

Note if (ALLOCNO_FREQ (a) < 2 * ALLOCNO_CALL_FREQ (a)) turns out to be best 
overall.

> Is there anyone who understands the cost calculations?
> 
> Wilco

RE: Register allocation: caller-save vs spilling

Reply via email to