Hi,

I'm investigating various register allocation inefficiencies. The first thing 
that stands out is
that GCC both supports caller-saves as well as spilling. Spilling seems to 
spill all definitions and
all uses of a liverange. This means you often end up with multiple reloads 
close together, while it
would be more efficient to do a single load and then reuse the loaded value 
several times.
Caller-save does better in that case, but it is inefficient in that it 
repeatedly stores registers
across every call even if unchanged. If both were fixed to minimise the number 
of loads/stores I
can't see how one could beat the other, so you'd no longer need both.

Anyway due to the current implementation there are clearly cases where 
caller-save is best and cases
where spilling is best. However I do not see it making the correct decision 
despite trying to
account for the costs - some code is significantly faster with 
-fno-caller-saves, other code wins
with -fcaller-saves. As an example, I see code like this on AArch64:

        ldr     s4, .LC20
        fmul    s0, s0, s4
        str     s4, [x29, 104]
        bl      f
        ldr     s4, [x29, 104]
        fmul    s0, s0, s4

With -fno-caller-saves it spills and rematerializes the constant as you'd 
expect:

        ldr     s1, .LC20
        fmul    s0, s0, s1
        bl      f
        ldr     s5, .LC20
        fmul    s0, s0, s5

So given this, is the cost calculation correct and does it include 
rematerialization? The spill code
understands how to rematerialize so it should take this into account in the 
costs. I did find some
code in ira-costs.c in scan_one_insn() that attempts something that looks like 
an adjustment for
rematerialization but it doesn't appear to handle all cases (simple immediates, 
2-instruction
immediates, address-constants, non-aliased loads such as literal pool and const 
data loads).

Also the hook CALLER_SAVE_PROFITABLE appears to have disappeared - overall 
performance improves
significantly if I add this (basically the default heuristic used on 
instruction frequencies):

--- a/gcc/ira-costs.c
+++ b/gcc/ira-costs.c
@@ -2230,6 +2230,8 @@ ira_tune_allocno_costs (void)
                           * ALLOCNO_FREQ (a)
                           * IRA_HARD_REGNO_ADD_COST_MULTIPLIER (regno) / 2);
 #endif
+                  if (ALLOCNO_FREQ (a) < 4 * ALLOCNO_CALL_FREQ (a))
+                    cost = INT_MAX;
                }
              if (INT_MAX - cost < reg_costs[j])
                reg_costs[j] = INT_MAX;

If such a simple heuristic can beat the costs, they can't be quite right. 

Is there anyone who understands the cost calculations?

Wilco


Reply via email to