Hi, I noticed there's a costly signed 64-bit division in rtx_cost on x86 as well as any other target where UNITS_PER_WORD is implemented like TARGET_64BIT ? 8 : 4. It's also evident that rtx_cost does redundant work for a SET rtx argument.
Obviously the variable named 'factor' rarely exceeds 1, so in the majority of cases it can be computed with a well-predictable branch rather than a division. This patch makes rtx_cost do the division only in case mode is wider than UNITS_PER_WORD, and also moves a test for a SET up front to avoid redundancy. No functional change. Bootstrapped on x86_64, ok for trunk? To illustrate the improvement this buys, for tramp3d -O2 compilation, I got before: 73887675319 instructions:u 72438432200 cycles:u 924298569 idq.ms_uops:u 102603799255 uops_dispatched.thread:u after: 73888371724 instructions:u 72386986612 cycles:u 802744775 idq.ms_uops:u 102096987220 uops_dispatched.thread:u (this is on Sandybridge, idq.ms_uops are uops going via the microcode sequencer, so the unneeded division is responsible for a good fraction of them) * rtlanal.c (rtx_cost): Handle a SET up front. Avoid division if the mode is not wider than UNITS_PER_WORD. diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c index 9a7afccefb8..c7ab86e228b 100644 --- a/gcc/rtlanal.c +++ b/gcc/rtlanal.c @@ -4207,18 +4207,23 @@ rtx_cost (rtx x, machine_mode mode, enum rtx_code outer_code, const char *fmt; int total; int factor; + unsigned mode_size; if (x == 0) return 0; - if (GET_MODE (x) != VOIDmode) + if (GET_CODE (x) == SET) + /* A SET doesn't have a mode, so let's look at the SET_DEST to get + the mode for the factor. */ + mode = GET_MODE (SET_DEST (x)); + else if (GET_MODE (x) != VOIDmode) mode = GET_MODE (x); + mode_size = estimated_poly_value (GET_MODE_SIZE (mode)); + /* A size N times larger than UNITS_PER_WORD likely needs N times as many insns, taking N times as long. */ - factor = estimated_poly_value (GET_MODE_SIZE (mode)) / UNITS_PER_WORD; - if (factor == 0) - factor = 1; + factor = mode_size > UNITS_PER_WORD ? mode_size / UNITS_PER_WORD : 1; /* Compute the default costs of certain things. Note that targetm.rtx_costs can override the defaults. */ @@ -4243,14 +4248,6 @@ rtx_cost (rtx x, machine_mode mode, enum rtx_code outer_code, /* Used in combine.c as a marker. */ total = 0; break; - case SET: - /* A SET doesn't have a mode, so let's look at the SET_DEST to get - the mode for the factor. */ - mode = GET_MODE (SET_DEST (x)); - factor = estimated_poly_value (GET_MODE_SIZE (mode)) / UNITS_PER_WORD; - if (factor == 0) - factor = 1; - /* FALLTHRU */ default: total = factor * COSTS_N_INSNS (1); }