Instructions vs Expressions in the backend (was Re: RFA: Rework FOR_BB_INSNS iterators)

David Malcolm Mon, 23 Jun 2014 12:01:57 -0700

On Mon, 2014-06-09 at 20:32 +0100, Richard Sandiford wrote:
> Steven Bosscher <[email protected]> writes:
> > On Sat, Jun 7, 2014 at 7:54 PM, Richard Sandiford wrote:
> >> The two parts of the loop condition are really handling two different
> >> kinds of block: ones like entry and exit that are completely empty
> >> and normal ones that have at least a block note.  There's no real
> >> need to check for null INSNs in normal blocks.
> >
> > Block notes should go away some day, they're just remains of a time
> > when there was no actual CFG in the compiler.
> 
> Yeah.  I suppose when that happens empty blocks would look just like
> entry and exit as far as these iterators go.
> 
> >> Also, refetching NEXT_INSN (BB_END (BB)) for each iteration can be 
> >> expensive.
> >> If we're prepared to say that the loop body can't insert instructions
> >> for another block immediately after BB_END,
> >
> > This can happen with "block_label()" if e.g. a new jump is inserted
> > for one reason or another. Not very likely for passes working in
> > cfglayout mode, but post-RA there may be places that need this
> > (splitters, peepholes, machine dependent reorgs, etc.).
> >
> > So even if we're prepared to say what you suggest, I don't think you
> > can easily enforce it.
> 
> Probably true.  But if we want to allow insertions after BB_END,
> we need to make FOR_BB_INSNS_SAFE handle that for INSN == BB_END too.
> 
> The alternative definition:
> 
>   for (rtx INSN = BB_HEAD (BB), INSN##_cond_ = INSN, INSN##_next_;    \
>        INSN##_cond_                                                   \
>        && (INSN##_next_ = NEXT_INSN (INSN),                           \
>            INSN##_cond_ = BB_END (BB),                                \
>            true);                                                     \
>        INSN##_cond_ = (INSN == INSN##_cond_ ? NULL_RTX : (rtx) 1),    \
>        INSN = INSN##_next_)
> 
> works too.  It isn't quite as fast, but obviously correctness comes first.
> 
> >> It's easier to change these macros if they define the INSN variables
> >> themselves.
> >
> > If you're going to hack these iterators anyway (much appreciated BTW),
> > I suggest to make them similar to the gsi, loop, edge, and bitmap
> > iterators: A new "insn_iterator" structure to hold the variables and
> > static inline functions wrapped in the macros. This will also be
> > helpful if (when) we ever manage to make the type for an insn a
> > non-rtx...
> 
> Well, with this and...
> 
> >> +/* For iterating over insns in a basic block.  The iterator allows the 
> >> loop
> >> +   body to delete INSN.  It also ignores any instructions that the body
> >> +   inserts between INSN and the following instruction.  */
> >> +#define FOR_BB_INSNS(BB, INSN)                                         \
> >> +  for (rtx INSN = BB_HEAD (BB), INSN##_cond_ = INSN, INSN##_next_,     \
> >> + INSN##_end_ = INSN ? NEXT_INSN (BB_END (BB)) : NULL_RTX; \
> >> + INSN##_cond_ && (INSN##_next_ = NEXT_INSN (INSN), true); \
> >> +       INSN = INSN##_next_,                                            \
> >> +       INSN##_cond_ = (INSN != INSN##_end_ ? (rtx) 1 : NULL_RTX))
> >
> > This just makes my eyes hurt...
> >
> >
> > What about cases where a FOR_BB_INSNS is terminated before reaching
> > the end of a basic block, and you need to know at what insn you
> > stopped? Up to now, you could do:
> >
> >   rtx insn; basic_block bb;
> >   FOR_BB_INSNS (bb, insn)
> >     {
> >       ... // do stuff
> >       if (something) break;
> >     }
> >   do_something_with (insn);
> >
> > Looks like this is no longer possible with the implementation of
> > FOR_BB_INSNS of your patch.
> 
> ...this I suppose it depends where we want to go with these iterators.
> I'm hoping eventually we'll move to C++11, where the natural way of
> writing the loop would be:
> 
>   for (rtx insn : bb->insns ())
> 
> (Or "auto *" instead of "rtx" if you prefer.)
> 
> And I think the idiom of having the FOR_* macro define the iterator
> variable is much closer to that than:
> 
>   rtx insn;
>   FOR_BB_INSNS (iter, bb, insn)
> 
> would be.
> 
> It's true that you can't leave "insn" with a signficant value after
> the loop, but no current code wants to do that.  Personally I like
> the fact that loops that do want to set a final value have to make
> that explicit.  When the vast majority (currently all) instances of:
> 
>   FOR_BB_INSNS (bb, insn)
> 
> treat "insn" as local to the loop, it's helpful when the exceptions
> are obvious.
> 
> I think if anything the patch would make it easier to change the
> type of insns to something other than rtx.


(sorry for the belated response; I was on vacation).

FWIW I'm actually working on such a change, or, at least, to make
instructions be a subclass of rtx_def; presumably this should make it
easier to make it an entirely separate type, if that's desirable.

Executive summary is that it's still a work in progress, and I'm going
to be giving a talk about it at Cauldron next month ("A proposal for
typesafe RTL", currently scheduled for the Sunday at 12.45 in Stream 2).

More detailed version:
I experimented with this a few months ago, and came up with a 159-patch
patch series:
http://dmalcolm.fedorapeople.org/gcc/patch-backups/rtx-classes/v1/

That patch kit builds on x86_64 (and regrtests iirc), and uses a
separate subclass for insns e.g. in the core basic block class, in the
scheduler, register allocators, etc, but:
  (A) it only builds on about 70 of the ~200 configurations in
config-list.mk
  (B) there was a tangled mess of dependencies between the patches
making it hard to fix (A)
  (C) I went with the rather verbose "rtx_base_insn" name, which in v1
is a typedef for a pointer to an insn, along with a family of other
typedefs, which I now realize is frowned upon e.g.:
https://gcc.gnu.org/ml/gcc-patches/2014-04/msg01520.html
https://gcc.gnu.org/ml/gcc-patches/2014-04/msg01632.html
  (D) I was using as_a *methods* rather than just as_a <>, like
in v1 of my gimple-classes patches:
(https://gcc.gnu.org/ml/gcc-patches/2014-05/msg00887.html)


So I've been reworking it, incorporating feedback from the gimple
statement classes work; the latest patch series is v7 and can be seen
here:
http://dmalcolm.fedorapeople.org/gcc/patch-backups/rtx-classes/v7/

It builds and regrtests on x86_64 on top of r211061 (aka
dcd5393fd5b3ac53775a5546f3103958b64789bb) and builds on 184 of the
configurations in config-list.mk; I have patches to fix the remaining
ones to achieve parity with an unpatched build.

The current class hierarchy looks like this:
/* Subclasses of rtx_def, using indentation to show the class
   hierarchy, along with the relevant invariant.  */
class rtx_def;
  class rtx_insn; /* (INSN_P (X) || NOTE_P (X)
                       || JUMP_TABLE_DATA_P (X) || BARRIER_P (X)
                       || LABEL_P (X)) */
    class rtx_real_insn;         /* INSN_P (X) */
      class rtx_debug_insn;      /* DEBUG_INSN_P (X) */
      class rtx_nonjump_insn;    /* NONJUMP_INSN_P (X) */
      class rtx_jump_insn;       /* JUMP_P (X) */
      class rtx_call_insn;       /* CALL_P (X) */
    class rtx_jump_table_data;   /* JUMP_TABLE_DATA_P (X) */
    class rtx_barrier;           /* BARRIER_P (X) */
    class rtx_code_label;        /* LABEL_P (X) */
    class rtx_note;              /* NOTE_P (X) */

and the code now uses "rtx_insn *" in hundreds of places where it would
previously have used "rtx".

My aim is that it always builds on every config: each patch is
self-contained.  To do this, the initial patches add scaffolding that
adds some strategically-placed checked downcasts to rtx_insn * (e.g. on
the result of PREV_INSN/NEXT_INSN).  The subsequent patches then use
these promises in order to strengthen the insn-handling code to work on
the new subclass. The idea is that the final patches then remove this
scaffolding, with the core BB types becoming rtx_insn *.  The drawback
of course is that during the process the resulting compiler is much
slower - until the scaffolding is removed.

v1 of the patch series also added some non-insn subclasses of rtx, for
EXPR_LIST, INSN_LIST, and for SEQUENCE, allowing rewriting of iterations
like this (in jump.c:rebuild_jump_labels_1), where "insn" and
"forced_labels" are INSN_LIST and hence are "rtx_insn_list *":

-    for (insn = forced_labels; insn; insn = XEXP (insn, 1))
-      if (LABEL_P (XEXP (insn, 0)))
-         LABEL_NUSES (XEXP (insn, 0))++;
+    for (insn = forced_labels; insn; insn = insn->next ())
+      if (LABEL_P (insn->element ()))
+         LABEL_NUSES (insn->element ())++;

Thoughts?
Dave

> It would just mean changing
> the "rtx" at the start of the two "for" loops to the new type,
> whereas at the moment you would need to change all the separate
> "rtx insn"s.  (In particular, it's common for "insn" to be defined
> on the same line as other rtx variables that might need to stay
> as "rtx"es after the change.)

Instructions vs Expressions in the backend (was Re: RFA: Rework FOR_BB_INSNS iterators)

Reply via email to