Quoting Richard Sandiford <rdsandif...@googlemail.com>:

[responding because you kept me on cc:]

Joern Rennecke <joern.renne...@embecosm.com> writes:
This uses the same interface as my previous patch:
http://gcc.gnu.org/ml/gcc-patches/2012-11/msg00473.html ,
but I refined the algorithm for the get_insn_variants
mechanism to work properly with the reworked ARC port -
http://gcc.gnu.org/ml/gcc-patches/2012-11/msg01891.html -
the only user so far, and added some documentation.

Bootstrapped in r193731 on i686-pc-linux-gnu .

In http://gcc.gnu.org/ml/gcc-patches/2012-11/msg00029.html
the conversation was roughly:

   > Here's a very general (and rather complicated) interface,
   I wasn't after something as complicated as that.  I just
   wanted to model alignment.

and it looks like this patch implements the very general and rather
complicated interface. :-)

For the port writer, the complexities exposed are proportional to the
complexities of the port that (s)he wants to describe.

 Obviously you're completely free to do that,
but it means that any comments from me are going to be same (and as
unhelpful and unproductive) as last time.  So I still think I should
bow out of this one.

In case that seems unfair, here's a summary of my concerns:

1) As Richard B says, having "locked lengths" with the comment "care must
   be taken to avoid cycles" doesn't sound like good design.  So the
   question was: without this, why would the length be going up and down
   "arbitrarily", even though we're trying to reach a fixed point?
   And the good answer seemed to be: because the backend wants to grow
   some instructions in order to give a particular (mis)alignment to
   later ones.  There's no inherent cycle problem with that, because
   although the length of an instruction can decrease when the padding
   it previously contained is no longer needed, the instruction addresses
   themselves still only move in one direction.  We don't need a locked
   length for this case.

I've seen cycles between different alignment requirements for different
instructions and branch lengths that were borderline.
The new approach to stabilize on instruction variants that are available
should give a saner convergence, but I was afraid there might be more complex
cycles that remain.  Well, I'll try to remove the lock_length logic and see
what happens.  But obviously I can't test with all possible code that could
be compiled - I just do a test build of elf32 arc-linux-uclibc toolchain
including libraries and linux kernels.  It could happen that that suceeds
but we find a few months later that there is something else that causes
cycles.

2) Another reason the lengths could go up and down "arbitrarily" is because
   the backend wants to perform if-conversion, predication, call-it-what-
   you-will on the fly during shorten_branches, and can therefore delete
   or restore instructions relative to previous iterations.  I'm still
   not convinced that's something we should support.

3) As mentioned previously, I don't think ADJUST_LENGTH should be
   aware of ADJUST_LENGTH calls for other instructions.  The way that
   the ARC port "simulates" instructions between the insn passed in the
   previous and current calls doesn't seem right.

I have removed these aspects of the ARC port, as you can see in:
http://gcc.gnu.org/ml/gcc-patches/2012-11/msg01891.html

The new approach is to do an arc-specific if-conversion based on the ccfsm
action, and run this pass at a few strategic places, so that branch_shortening
sees these effects in the rtl.

4) The patch provides a complex interface and allows for complex decisions
   to be made, but the results of those decisions aren't stored in the rtl.
   That's not a problem for ARC because ARC keeps an on-the-side FSM
   to track these things.  I don't think we should encourage or require
   that though.

As stated above, the arc port no longer uses the FSM during branch shortening.
The interface needs to be complex because the target instruction selection
has to take complex circumstances into account.

FWIW, one interface that would deal with the alignment side of things is:

    unsigned HOST_WIDE_INT preferred_insn_alignment (rtx insn);

which returns an instruction's preferred alignment, and which could
(if useful) be called for SEQUENCEs and/or the insns inside them.


No, that doesn't make sense, because alignment cost differentials depend
on instruction length.
As length changes during branch shortening, you'd get different alignment
requirements, and thus infinite cycles.

    rtx longer_insn_form (rtx insn, unsigned HOST_WIDE_INT current_length,
                          unsigned HOST_WIDE_INT target_length);

which returns a pattern that would turn INSN from being CURRENT_LENGTH
to TARGET_LENGTH bytes in length, or null if no such pattern exists.

And again you would ignore the cost and alignment interactions.

shorten_branches would only use these hooks when optimising the insns
for speed rather than size (the posted patch doesn't seem to account
for that).  It could record the original pattern internally so that
it can revert to that pattern if the extra padding is no longer needed.

That isn't quite right either, because we are supposed to do do
speed optimizations as long as they don't hurt size.
So lengthening an insn to avoid inserting a nop in front of an alignment
would be nice.

longer_insn_form could be used for label alignment as well as
instruction alignment.

There's the potential for a third hook to return the possible target
lengths for a given instruction, a bit like the one in the patch,
so that alignment could be distributed over several instructions.
I think that's an unnecessary complication at this stage though.
(I don't think the original ARC submission supported it.)

Indeed, the focus for the ARC port is on (mis)alignment in terms of
an instruction address mod 4 being either 0 or 2.  This can be archived
by upsizing a single instruction.
It would be desirable to have the ability to also do cache-line aware
optimizations, e.g. when you have a bunch of long instructions, and a
hot path branches to them, we'd prefer there to be no cache line boundary
right after or inside the first instruction.
The interface I've designed does support that, the current iteration of the
code does not.

Reply via email to