Hi Richard,

> Can you fold in the rtx costs part of the original GOT relaxation patch?

Sure, see below for the updated version.

> I don't think there's enough information here for me to be able to review
> the patch though.  I'll need to find testcases, look in detail at what
> the rtl passes are doing, and try to work out whether (and why) this is
> a good way of fixing things.

Well today GCC does everything with costs rather than backend callbacks.
I'd be interested in hearing about alternatives that have the same effect 
without a callback that allows a backend to decide between spilling and
rematerialization.

Cheers,
Wilco


v2: fold in GOT remat cost

Improve rematerialization costs of addresses.  The current costs are set too 
high
which results in extra register pressure and spilling.  Using lower costs means
addresses will be rematerialized more often rather than being spilled or causing
spills.  This results in significant codesize reductions and performance gains.
SPECINT2017 improves by 0.27% with LTO and 0.16% without LTO.  Codesize is 0.12%
smaller.

Passes bootstrap and regress. OK for commit?

ChangeLog:
2021-06-01  Wilco Dijkstra  <wdijk...@arm.com>

        * config/aarch64/aarch64.c (aarch64_rtx_costs): Use better 
rematerialization
        costs for HIGH, LO_SUM and SYMREF.
---

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
39de231d8ac6d10362cdd2b48eb9bd9de60c6703..a7f99ece55383168fb0f77e5c11c501d0bb2f013
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -13610,45 +13610,28 @@ cost_plus:
          return false;  /* All arguments need to be in registers.  */
        }
 
+    /* The following costs are used for rematerialization of addresses.
+       Set a low cost for all global accesses - this ensures they are
+       preferred for rematerialization, blocks them from being spilled
+       and reduces register pressure.  The result is significant codesize
+       reductions and performance gains. */
+
     case SYMBOL_REF:
 
-      if (aarch64_cmodel == AARCH64_CMODEL_LARGE
-         || aarch64_cmodel == AARCH64_CMODEL_SMALL_SPIC)
-       {
-         /* LDR.  */
-         if (speed)
-           *cost += extra_cost->ldst.load;
-       }
-      else if (aarch64_cmodel == AARCH64_CMODEL_SMALL
-              || aarch64_cmodel == AARCH64_CMODEL_SMALL_PIC)
-       {
-         /* ADRP, followed by ADD.  */
-         *cost += COSTS_N_INSNS (1);
-         if (speed)
-           *cost += 2 * extra_cost->alu.arith;
-       }
-      else if (aarch64_cmodel == AARCH64_CMODEL_TINY
-              || aarch64_cmodel == AARCH64_CMODEL_TINY_PIC)
-       {
-         /* ADR.  */
-         if (speed)
-           *cost += extra_cost->alu.arith;
-       }
+      /* Use a separate remateralization cost for GOT accesses.  */
+      if (aarch64_cmodel == AARCH64_CMODEL_SMALL_PIC
+         && aarch64_classify_symbol (x, 0) == SYMBOL_SMALL_GOT_4G)
+       *cost = COSTS_N_INSNS (1) / 2;
 
-      if (flag_pic)
-       {
-         /* One extra load instruction, after accessing the GOT.  */
-         *cost += COSTS_N_INSNS (1);
-         if (speed)
-           *cost += extra_cost->ldst.load;
-       }
+      *cost = 0;
       return true;
 
     case HIGH:
+      *cost = 0;
+      return true;
+
     case LO_SUM:
-      /* ADRP/ADD (immediate).  */
-      if (speed)
-       *cost += extra_cost->alu.arith;
+      *cost = COSTS_N_INSNS (3) / 4;
       return true;
 
     case ZERO_EXTRACT:

Reply via email to