Hi!

On Mon, Aug 10, 2020 at 05:47:53PM +0200, Hans-Peter Nilsson wrote:
> > All of this was added in https://gcc.gnu.org/g:64b8935d4809 , which also
> > adds the
> > 
> > +  /* Disallow this recombination if both new_cost and old_cost are
> > +     greater than zero, and new_cost is greater than old cost.  */
> > 
> > comment (which is what it actually does).  Before that change, combine
> > didn't look at costs at all.
> > 
> > Combine never has used a "strictly smaller than" cost thing.
> 
> (I suggest we use terms of generally greater/lower cost, instead
> of smaller/greater cost, to avoid confusion whether "smaller"
> refers to the general cost or just code size.

It refers to neither.  It refers to the concept of "cost" in combine,
which is a number (and some extra conditions).  Combine is a local
optimisation ("peephole", if you want), so it uses a local cost
function.

And yes, that cost is calculated differently if you use -Os.  The is
because it uses insn_cost to base on.

As it is a number, I (and most other people) use "smaller than".

> > > Right, you can affect this with your target TARGET_RTX_COSTS and
> > > TARGET_INSN_COST hooks, but only for trivial cases, and you have
> > > increasingly more complex combinations (many-to-many
> > > combinations) where you have to twist simple logic to appease
> > > combine (stop it from combining) or give up.
> > 
> > There are 2-1, 3-1, 4-1, 3-2, 4-2, which all reduce the number of insns,
> > and then there is 2-2, which gets rid of one log_link.  If the isnn_cost
> > stays the same, it always wins something else.
> 
> That "something else" is presumptious.

I wrote this code, I am maintainer of combine, I really do know.  This
is not presumptuous.  This is important to make sure combine never can
get into an infinite loop!

> A longer
> dependency-chain may sum up to faster execution considering
> parallel or OoO execution.  Someone adding another N-M
> combination will notice, after two more years of work. :)

Combine knows nothing about how any specific CPU will actually execute
the code.  It sees just a tiny fraction of code at any one time, anyway.

All it does is optimise for the lowest cost.  Whether that actually
performs best, is something the backend writer has to take care of.

(insn_cost is only the secondary "score", anyway: the number of RTL
insns is primary).

> > 5) Improve your target so that its insn_cost reflects ithe costs of
> > the insns better.
> 
> I see cheap gnawing, I hope I didn't add any of that myself.

I guess we both did.  Sorry.

> I already covered target costs before the 1..4 list and you
> actually quoted that (the quoted paragraph above your log_links
> comment).

Let me write this differently, then:

5) Write your target cost callbacks so that you work *with* combine, not
*against* it.  This gives better results.

> To recap, it all began with the observation of comments in
> combine.c saying "cheaper than", with the code implementing
> "same or cheaper", together with the flaw fixed by 84c5396d4bdbf
> which I belive is a sufficient conclusion for that specific
> instance.  A lower cost also seems more consistent with other
> decisions.  (BTW, the commit is a bit misleading; I believe all
> "case #2" cc0 convertees will benefit from an accurate
> is_just_move, not just CRIS.)

It makes a difference on many other targets, but it never results in
different code there.  I wonder why not (or, why it does for cris).

> The fixed 2-2 case is all the "typical examples" I had so far.
> Remaining suggestions are just fixing perceived inconsistencies.

All the other cases reduce the number of RTL insns, which is a clear
cost metric (and originally the only one!)  If you have RTL insns that
correspond to more than just single machine insns, this can hurt you;
otherwise, there are a few cases where you want to trick combine, but
not many (doing that tends to backfire anyway, better keep it to a
minimum).


Segher

Reply via email to