On Thu, Sep 17, 2020 at 12:51:25PM -0500, Segher Boessenkool wrote:
> Hi!
>
> On Tue, Sep 15, 2020 at 10:49:45AM +0930, Alan Modra wrote:
> > Also use rs6000_cost only for speed.
>
> More directly: use something completely different for !speed, namely,
> code size.
Yes, that might be better.
> > - if (CONST_INT_P (XEXP (x, 1))
> > - && satisfies_constraint_I (XEXP (x, 1)))
> > + if (!speed)
> > + /* A little more than one insn so that nothing is tempted to
> > + turn a shift left into a multiply. */
> > + *total = COSTS_N_INSNS (1) + 1;
>
> Please don't. We have a lot of code elsewhere to handle this directly,
> already. Also, this is just wrong for size. Five shifts is *not*
> better than four muls. If that is the only way to get good results,
> than unfortunately we probably have to; but do not do this without any
> proof.
Huh. If a cost of 5 is "just wrong for size" then you prefer a cost
of 12 for example (power9 mulsi or muldi rs6000_cost)? Noticing that
result for !speed rs6000_rtx_costs is the entire basis for the !speed
changes. I don't have any proof that this is correct.
> > case FMA:
> > - if (mode == SFmode)
> > + if (!speed)
> > + *total = COSTS_N_INSNS (1) + 1;
>
> Not here, either.
>
> > case DIV:
> > case MOD:
> > if (FLOAT_MODE_P (mode))
> > {
> > - *total = mode == DFmode ? rs6000_cost->ddiv
> > - : rs6000_cost->sdiv;
> > + if (!speed)
> > + *total = COSTS_N_INSNS (1) + 2;
>
> And why + 2 even?
>
> > - if (GET_MODE (XEXP (x, 1)) == DImode)
> > + if (!speed)
> > + *total = COSTS_N_INSNS (1) + 2;
> > + else if (GET_MODE (XEXP (x, 1)) == DImode)
> > *total = rs6000_cost->divdi;
> > else
> > *total = rs6000_cost->divsi;
>
> (more)
OK, I can remove all the !speed changes. To be honest, I didn't look
anywhere near as much at code size changes as I worried about
performance. And about not regressing any fiddly testcase we have.
> > @@ -21368,6 +21378,7 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int
> > outer_code,
> > return false;
> >
> > case AND:
> > + *total = COSTS_N_INSNS (1);
> > right = XEXP (x, 1);
> > if (CONST_INT_P (right))
> > {
> > @@ -21380,15 +21391,15 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int
> > outer_code,
> > || left_code == LSHIFTRT)
> > && rs6000_is_valid_shift_mask (right, left, mode))
> > {
> > - *total = rtx_cost (XEXP (left, 0), mode, left_code, 0, speed);
> > - if (!CONST_INT_P (XEXP (left, 1)))
> > - *total += rtx_cost (XEXP (left, 1), SImode, left_code, 1,
> > speed);
> > - *total += COSTS_N_INSNS (1);
> > + rtx reg_op = XEXP (left, 0);
> > + if (!REG_P (reg_op))
> > + *total += rtx_cost (reg_op, mode, left_code, 0, speed);
> > + reg_op = XEXP (left, 1);
> > + if (!REG_P (reg_op) && !CONST_INT_P (reg_op))
> > + *total += rtx_cost (reg_op, mode, left_code, 1, speed);
> > return true;
> > }
> > }
> > -
> > - *total = COSTS_N_INSNS (1);
> > return false;
>
> This doesn't improve anything? It just makes it different from all
> surrounding code?
So it moves the common COSTS_N_INSNS (1) count and doesn't recurse for
regs, like it doesn't for const_int.
> > @@ -21519,7 +21530,9 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int
> > outer_code,
> > if (outer_code == TRUNCATE
> > && GET_CODE (XEXP (x, 0)) == MULT)
> > {
> > - if (mode == DImode)
> > + if (!speed)
> > + *total = COSTS_N_INSNS (1) + 1;
>
> (more)
>
> > + case SET:
> > + /* The default cost of a SET is the number of general purpose
> > + regs being set multiplied by COSTS_N_INSNS (1). That only
> > + works where the incremental cost of the operation and
> > + operands is zero, when the operation performed can be done in
> > + one instruction. For other cases where we add COSTS_N_INSNS
> > + for some operation (see point 5 above), COSTS_N_INSNS (1)
> > + should be subtracted from the total cost. */
>
> What does "incremental cost" mean there? If what increases?
>
> > + {
> > + rtx_code src_code = GET_CODE (SET_SRC (x));
> > + if (src_code == CONST_INT
> > + || src_code == CONST_DOUBLE
> > + || src_code == CONST_WIDE_INT)
> > + return false;
> > + int set_cost = (rtx_cost (SET_SRC (x), mode, SET, 1, speed)
> > + + rtx_cost (SET_DEST (x), mode, SET, 0, speed));
>
> This should use set_src_cost, if anything. But that will recurse then,
> currently? Ugh.
>
> Using rtx_cost for SET_DEST is problematic, too.
>
> What (if anything!) calls this for a SET? Oh, set_rtx_cost still does
> that, hrm.
>
> rtx_cost costs RTL *expressions*. Not instructions. Except where it is
> (ab)used for that, sigh.
>
> Many targets have something for it already, but all quite different from
> this.
Right, you are starting to understand just how difficult it is to do
anything at all to rs6000_rtx_costs.
> > + if (set_cost >= COSTS_N_INSNS (1))
> > + *total += set_cost - COSTS_N_INSNS (1);
>
> I don't understand this part at all, for example. Why not just
> *total += set_cost - COSTS_N_INSNS (1);
> ? If set_cost is lower than one insn's cost, don't we have a problem
> already?
The set_cost I calculate here from src and dest can easily be zero.
(set (reg) (reg)) and (set (reg) (const_int 0)) for example have a
dest cost of zero and a src cost of zero. That can't change without
breaking places where rtx_costs is called to compare pieces of RTL.
Here though we happen to be looking at a SET, so have an entire
instruction. The value returned should be comparable to our
instruction costs. That's tricky to do, and this change is just a
hack. Without the hack I saw some testcases regress.
I don't like this hack any more than you do reviewing it!
>
> Generic things. Please split this patch up when sending it again, it
> does too many different things, and many of those are not obvious.
>
> All such changes that aren't completely obvious (like the previous ones
> were) should have some measurement. We are in stage1, and we will
> notice (non-trivial) degradations, but if we can expect degradations
> (like for this patch), it needs benchmarking.
Pat did benchmark these changes.. I was somewhat surprised to see
a small improvement in spec results.
> Since you add !speed all over the place, maybe we should just have a
> separate function that does !speed? It looks like quite a few things
> will simplify.
Revised patch as follows.
* config/rs6000/rs6000.c (rs6000_rtx_costs): Reduce cost for SETs
when insn operation cost handled on recursive call. Tidy
break/return. Tidy AND costing.
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 6af8a9a31cb..26c2f443502 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -21397,7 +21397,7 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int
outer_code,
*total = rs6000_cost->fp;
else
*total = rs6000_cost->dmul;
- break;
+ return false;
case DIV:
case MOD:
@@ -21457,6 +21457,7 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int
outer_code,
return false;
case AND:
+ *total = COSTS_N_INSNS (1);
right = XEXP (x, 1);
if (CONST_INT_P (right))
{
@@ -21469,15 +21470,15 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int
outer_code,
|| left_code == LSHIFTRT)
&& rs6000_is_valid_shift_mask (right, left, mode))
{
- *total = rtx_cost (XEXP (left, 0), mode, left_code, 0, speed);
- if (!CONST_INT_P (XEXP (left, 1)))
- *total += rtx_cost (XEXP (left, 1), SImode, left_code, 1,
speed);
- *total += COSTS_N_INSNS (1);
+ rtx reg_op = XEXP (left, 0);
+ if (!REG_P (reg_op))
+ *total += rtx_cost (reg_op, mode, left_code, 0, speed);
+ reg_op = XEXP (left, 1);
+ if (!REG_P (reg_op) && !CONST_INT_P (reg_op))
+ *total += rtx_cost (reg_op, mode, left_code, 1, speed);
return true;
}
}
-
- *total = COSTS_N_INSNS (1);
return false;
case IOR:
@@ -21575,7 +21576,7 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int
outer_code,
*total = rs6000_cost->fp;
return false;
}
- break;
+ return false;
case NE:
case EQ:
@@ -21613,13 +21614,40 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int
outer_code,
*total = 0;
return true;
}
- break;
+ return false;
+
+ case SET:
+ /* The default cost of a SET is the number of general purpose
+ regs being set multiplied by COSTS_N_INSNS (1). Here we
+ happen to be looking at a SET, so have an instruction rather
+ than just a piece of RTL and want to return a cost comparable
+ to rs6000 instruction costing. That's a little complicated
+ because in some cases the cost of SET operands is non-zero,
+ see point 5 above and cost of PLUS for example, and in
+ others it is zero, for example for (set (reg) (reg)).
+ But (set (reg) (reg)) actually costs the same as
+ (set (reg) (plus (reg) (reg))). Hack around this by
+ subtracting COSTS_N_INSNS (1) from the operand cost in cases
+ were we add COSTS_N_INSNS (1) for some operation. Don't do
+ so for constants that might cost more than zero because they
+ don't fit in one instruction. FIXME: rtx_costs should not be
+ looking at entire instructions. */
+ {
+ rtx_code src_code = GET_CODE (SET_SRC (x));
+ if (src_code == CONST_INT
+ || src_code == CONST_DOUBLE
+ || src_code == CONST_WIDE_INT)
+ return false;
+ int set_cost = (rtx_cost (SET_SRC (x), mode, SET, 1, speed)
+ + rtx_cost (SET_DEST (x), mode, SET, 0, speed));
+ if (set_cost >= COSTS_N_INSNS (1))
+ *total += set_cost - COSTS_N_INSNS (1);
+ return true;
+ }
default:
- break;
+ return false;
}
-
- return false;
}
/* Debug form of r6000_rtx_costs that is selected if -mdebug=cost. */
--
Alan Modra
Australia Development Lab, IBM