PR87507 shows a problem where IRA assigns a non-volatile TImode reg pair to a pseudo when there is a volatile reg pair available to use. This then causes us to emit save/restore code for the non-volatile reg usage.
The problem here is that the only volatile reg pair that is available is an odd/even reg pair (r7,r8) and ira-costs.c:ira_tune_allocno_costs() disparages odd/even reg pairs by increasing their cost. That's fine, but an even/odd non-volatile reg pair should still be more expensive than an odd/even reg pair given the save/restore that we'd need to use it. However, the costs used in assign_hard_reg() show that non-volatile reg pair (r30,r31) is cheaper than odd/even reg pair (r7,r8) (15 versus 1000). That's a huge disparity in costs, so looking at where the non-volatile cost comes from, it comes from the code below in ira-color.c:assign_hard_reg(): if (!HONOR_REG_ALLOC_ORDER) { if ((saved_nregs = calculate_saved_nregs (hard_regno, mode)) != 0) /* We need to save/restore the hard register in epilogue/prologue. Therefore we increase the cost. */ { rclass = REGNO_REG_CLASS (hard_regno); add_cost = ((ira_memory_move_cost[mode][rclass][0] + ira_memory_move_cost[mode][rclass][1]) * saved_nregs / hard_regno_nregs (hard_regno, mode) - 1); cost += add_cost; full_cost += add_cost; } } I'm not sure I understand the "* saved_nregs / h_r_n (h_r, m) - 1" part of the calculation. If saved_nregs is the number of hard regs that need to be saved for hard_regno in mode MODE (ie, we don't need to save a hard reg if it's already been saved, etc.), then why aren't we just multiplying by saved_nregs? The other problem I see here is that we're not scaling the cost by the basic block frequency of the prologue/epilogue, which is what is causing the non-volatile reg cost to be so low compared to the odd/even volatile reg use, which is scaled. However, even if I fix that, improve_allocation() comes along and undoes it, because it too does not correctly compute the cost of non-volatiles, so that seems to me that it needs fixing too. I have the following work in progress patch I'd like some comments on. Am I on the right track here? I noticed that assign_hard_reg() tracks min_cost and min_full_cost, but min_cost is actually never used for anything other than setting min_cost, so I removed it. I also don't understand why we don't charge non-volatile usage for targets that define HONOR_REG_ALLOC_ORDER. Why shouldn't we always account for save/restore of non-volatile reg usage? I'll note I did not change that behavior. Thoughts on the patch below? Vlad, can you comment on some of my questions above? Peter gcc/ PR rtl-optimization/87507 * ira-color.c (calculate_saved_nregs): Rename from this... (calculate_saved_nregs_cost): ...to this. Return cost of saving NREGS. (assign_hard_reg): (improve_allocation): gcc/testsuite/ PR rtl-optimization/87507 * gcc.dg/pr10474.c: Don't XFAIL for powerpc*-*-*. * gcc.target/powerpc/vsx-vector-6.p8.c: Update expected output. Index: gcc/ira-color.c =================================================================== --- gcc/ira-color.c (revision 264795) +++ gcc/ira-color.c (working copy) @@ -1648,11 +1648,10 @@ check_hard_reg_p (ira_allocno_t a, int h return j == nregs; } -/* Return number of registers needed to be saved and restored at - function prologue/epilogue if we allocate HARD_REGNO to hold value - of MODE. */ +/* Return the cost of saving and restoring HARD_REGNO in mode MODE at + function prologue and epilogue. */ static int -calculate_saved_nregs (int hard_regno, machine_mode mode) +calculate_saved_nregs_cost (int hard_regno, machine_mode mode) { int i; int nregs = 0; @@ -1663,7 +1662,14 @@ calculate_saved_nregs (int hard_regno, m && !TEST_HARD_REG_BIT (call_used_reg_set, hard_regno + i) && !LOCAL_REGNO (hard_regno + i)) nregs++; - return nregs; + if (nregs == 0) + return 0; + + enum reg_class rclass = REGNO_REG_CLASS (hard_regno); + return (ira_memory_move_cost[mode][rclass][0] + + ira_memory_move_cost[mode][rclass][1]) + * nregs + * REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun)); } /* Choose a hard register for allocno A. If RETRY_P is TRUE, it means @@ -1694,14 +1700,11 @@ assign_hard_reg (ira_allocno_t a, bool r { HARD_REG_SET conflicting_regs[2], profitable_hard_regs; int i, j, hard_regno, best_hard_regno, class_size; - int cost, mem_cost, min_cost, full_cost, min_full_cost, nwords, word; + int cost, mem_cost, full_cost, min_full_cost, nwords, word; int *a_costs; enum reg_class aclass; machine_mode mode; - static int costs[FIRST_PSEUDO_REGISTER], full_costs[FIRST_PSEUDO_REGISTER]; - int saved_nregs; - enum reg_class rclass; - int add_cost; + static int full_costs[FIRST_PSEUDO_REGISTER]; #ifdef STACK_REGS bool no_stack_reg_p; #endif @@ -1713,9 +1716,7 @@ assign_hard_reg (ira_allocno_t a, bool r aclass = ALLOCNO_CLASS (a); class_size = ira_class_hard_regs_num[aclass]; best_hard_regno = -1; - memset (full_costs, 0, sizeof (int) * class_size); mem_cost = 0; - memset (costs, 0, sizeof (int) * class_size); memset (full_costs, 0, sizeof (int) * class_size); #ifdef STACK_REGS no_stack_reg_p = false; @@ -1733,15 +1734,9 @@ assign_hard_reg (ira_allocno_t a, bool r cost = ALLOCNO_UPDATED_CLASS_COST (a); for (i = 0; i < class_size; i++) if (a_costs != NULL) - { - costs[i] += a_costs[i]; - full_costs[i] += a_costs[i]; - } + full_costs[i] += a_costs[i]; else - { - costs[i] += cost; - full_costs[i] += cost; - } + full_costs[i] += cost; nwords = ALLOCNO_NUM_OBJECTS (a); curr_allocno_process++; for (word = 0; word < nwords; word++) @@ -1861,7 +1856,7 @@ assign_hard_reg (ira_allocno_t a, bool r queue_update_cost (a, NULL, COST_HOP_DIVISOR); update_conflict_hard_regno_costs (full_costs, aclass, false); } - min_cost = min_full_cost = INT_MAX; + min_full_cost = INT_MAX; /* We don't care about giving callee saved registers to allocnos no living through calls because call clobbered registers are allocated first (it is usual practice to put them first in @@ -1878,25 +1873,15 @@ assign_hard_reg (ira_allocno_t a, bool r if (! check_hard_reg_p (a, hard_regno, conflicting_regs, profitable_hard_regs)) continue; - cost = costs[i]; + full_cost = full_costs[i]; if (!HONOR_REG_ALLOC_ORDER) { - if ((saved_nregs = calculate_saved_nregs (hard_regno, mode)) != 0) - /* We need to save/restore the hard register in - epilogue/prologue. Therefore we increase the cost. */ - { - rclass = REGNO_REG_CLASS (hard_regno); - add_cost = ((ira_memory_move_cost[mode][rclass][0] - + ira_memory_move_cost[mode][rclass][1]) - * saved_nregs / hard_regno_nregs (hard_regno, - mode) - 1); - cost += add_cost; - full_cost += add_cost; - } + /* If we need to save/restore the hard register in epilogue/prologue, + then increase its cost. */ + full_cost += calculate_saved_nregs_cost (hard_regno, mode); } - if (min_cost > cost) - min_cost = cost; + if (min_full_cost > full_cost) { min_full_cost = full_cost; @@ -2971,12 +2956,18 @@ improve_allocation (void) for (j = 0; j < class_size; j++) { hregno = ira_class_hard_regs[aclass][j]; + int hregno_cost = costs[hregno]; + + /* Include the cost of saving and restoring HREGNO in the function + prologue/epilogue. */ + hregno_cost += calculate_saved_nregs_cost (hregno, mode); + if (check_hard_reg_p (a, hregno, conflicting_regs, profitable_hard_regs) - && min_cost > costs[hregno]) + && min_cost > hregno_cost) { best = hregno; - min_cost = costs[hregno]; + min_cost = hregno_cost; } } if (min_cost >= 0)