Re: RFC: LRA for x86/x86-64 [7/9]

Richard Sandiford Tue, 02 Oct 2012 04:22:24 -0700

Hi Vlad,

Vladimir Makarov <vmaka...@redhat.com> writes:
> This is the major patch containing all new files.  The patch also adds 
> necessary calls to LRA from IRA.As the patch is too big, it continues in 
> the next email.
>
> 2012-09-27  Vladimir Makarov  <vmaka...@redhat.com>
>
>      * Makefile.in (LRA_INT_H): New.
>      (OBJS): Add lra.o, lra-assigns.o, lra-coalesce.o,
>      lra-constraints.o, lra-eliminations.o, lra-lives.o, and lra-spills.o.
>      (ira.o): Add dependence on lra.h.
>      (lra.o, lra-assigns.o, lra-coalesce.o, lra-constraints.o): New entries.
>      (lra-eliminations.o, lra-lives.o, lra-spills.o): Ditto.
>      * ira.c: Include lra.h.
>      (ira_init_once, ira_init, ira_finish_once): Call lra_start_once,
>      lra_init, lra_finish_once in anyway.
>          (lra_in_progress): Remove.
>      (do_reload): Call LRA.
>      * lra.h: New.
>      * lra-int.h: Ditto.
>      * lra.c: Ditto.
>      * lra-assigns.c: Ditto.
>      * lra-constraints.c: Ditto.
>      * lra-coalesce.c: Ditto.
>      * lra-eliminations.c: Ditto.
>      * lra-lives.c: Ditto.
>      * lra-spills.c: Ditto.
>      * doc/passes.texi: Describe LRA pass.


A non-authoritative review of the documentation and lra-eliminations.c:

> +LRA is different from the reload pass in LRA division on small,
> +manageable, and separated sub-tasks.  All LRA transformations and
> +decisions are reflected in RTL as more as possible.  Instruction
> +constraints as a primary source of the info and that minimizes number
> +of target-depended macros/hooks.
>
> +LRA is run for the targets it were ported.

Suggest something like:

  Unlike the reload pass, intermediate LRA decisions are reflected in
  RTL as much as possible.  This reduces the number of target-dependent
  macros and hooks, leaving instruction constraints as the primary
  source of control.

  LRA is run on targets for which TARGET_LRA_P returns true.

> +/* The virtual registers (like argument and frame pointer) are widely
> +   used in RTL.       Virtual registers should be changed by real hard
> +   registers (like stack pointer or hard frame pointer) plus some
> +   offset.  The offsets are changed usually every time when stack is
> +   expanded.  We know the final offsets only at the very end of LRA.

I always think of "virtual" as [FIRST_VIRTUAL_REGISTER, LAST_VIRTUAL_REGISTER].
Maybe "eliminable" would be better?  E.g.

/* Eliminable registers (like a soft argument or frame pointer) are widely
   used in RTL.  These eliminable registers should be replaced by real hard
   registers (like the stack pointer or hard frame pointer) plus some offset.
   The offsets usually change whenever the stack is expanded.  We know the
   final offsets only at the very end of LRA.

> +   We keep RTL code at most time in such state that the virtual
> +   registers can be changed by just the corresponding hard registers
> +   (with zero offsets) and we have the right RTL code.       To achieve this
> +   we should add initial offset at the beginning of LRA work and update
> +   offsets after each stack expanding.       But actually we update virtual
> +   registers to the same virtual registers + corresponding offsets
> +   before every constraint pass because it affects constraint
> +   satisfaction (e.g. an address displacement became too big for some
> +   target).

Suggest:

   Within LRA, we usually keep the RTL in such a state that the eliminable
   registers can be replaced by just the corresponding hard register
   (without any offset).  To achieve this we should add the initial
   elimination offset at the beginning of LRA and update the offsets
   whenever the stack is expanded.  We need to do this before every
   constraint pass because the choice of offset often affects whether
   a particular address or memory constraint is satisfied.

> +   The final change of virtual registers to the corresponding hard
> +   registers are done at the very end of LRA when there were no change
> +   in offsets anymore:
> +
> +                  fp + 42     =>     sp + 42

virtual=>eliminable if the above is OK.

> +   Such approach requires a few changes in the rest GCC code because
> +   virtual registers are not recognized as real ones in some
> +   constraints and predicates.       Fortunately, such changes are
> +   small.  */

Not sure whether the last paragraph really belongs in the code,
since it's more about the reload->LRA transition.

> +  /* Nonzero if this elimination can be done.  */
> +  bool can_eliminate;                
> +  /* CAN_ELIMINATE since the last check.  */
> +  bool prev_can_eliminate;

AFAICT, these two fields are (now) only ever assigned at the same time,
via init_elim_table and setup_can_eliminate.  Looks like we can do
without prev_can_eliminate.  (And the way that the pass doesn't
need to differentiate between the raw CAN_ELIMINABLE value and
the processed value feels nice and reassuring.)

> +/* Map: 'from regno' -> to the current elimination, NULL otherwise.
> +   The elimination table may contains more one elimination of a hard
> +   register.  The map contains only one currently used elimination of
> +   the hard register.  */
> +static struct elim_table *elimination_map[FIRST_PSEUDO_REGISTER];

Nit: s/-> to/->/, s/may contains/may contain/.  Maybe:

/* Map: eliminable "from" register -> its current elimination,
   or NULL if none.  The elimination table may contain more than
   one elimination for the same hard register, but this map specifies
   the one that we are currently using.  */

> +/* When an eliminable hard register becomes not eliminable, we use the
> +   special following structure to restore original offsets for the
> +   register.  */

s/special following/following special/

> +/* Return the current substitution hard register of the elimination of
> +   HARD_REGNO.       If HARD_REGNO is not eliminable, return itself.  */
> +int
> +lra_get_elimation_hard_regno (int hard_regno)

Typo: s/get_elimation/get_elimination/

> +/* Scan X and replace any eliminable registers (such as fp) with a
> +   replacement (such as sp) if SYBST_P, plus an offset.       The offset is

Typo: SUBST_P.

> +   a change in the offset between the eliminable register and its
> +   substitution if UPDATE_P, or the full offset if FULL_P, or
> +   otherwise zero.

I wonder if an enum would be better than two booleans?
It avoids invalid combinations like UPDATE_P && FULL_P
and might make the arguments more obvious too.

> +    /* You might think handling MINUS in a manner similar to PLUS is a
> +       good idea.  It is not.  It has been tried multiple times and every
> +       time the change has had to have been reverted.
> +
> +       Other parts of LRA know a PLUS is special and require special
> +       code to handle code a reloaded PLUS operand.
> +
> +       Also consider backends where the flags register is clobbered by a
> +       MINUS, but we can emit a PLUS that does not clobber flags (IA-32,
> +       lea instruction comes to mind).       If we try to reload a MINUS, we
> +       may kill the flags register that was holding a useful value.
> +
> +       So, please before trying to handle MINUS, consider reload as a
> +       whole instead of this little section as well as the backend
> +       issues.       */

A few references to the old enemy, especially in the last paragraph.

I think this old comment could be dropped altogether.  Handling MINUS
in a similar manner isn't a good idea because (minus ... (const_int ...))
isn't canonical.

> +           /* The only time we want to replace a PLUS with a REG
> +              (this occurs when the constant operand of the PLUS is
> +              the negative of the offset) is when we are inside a
> +              MEM.  We won't want to do so at other times because
> +              that would change the structure of the insn in a way
> +              that reload can't handle.  We special-case the
> +              commonest situation in eliminate_regs_in_insn, so
> +              just replace a PLUS with a PLUS here, unless inside a
> +              MEM.  */

Reload reference.  Does this restriction still apply?  The later comment:

> +      Note that there is no risk of modifying the structure of the insn,
> +      since we only get called for its operands, thus we are either
> +      modifying the address inside a MEM, or something like an address
> +      operand of a load-address insn.  */

makes it sound on face value like the MEM restriction above is a reload-specific
thing.  Same question for:

> +         /* As above, if we are not inside a MEM we do not want to
> +            turn a PLUS into something else.  We might try to do so here
> +            for an addition of 0 if we aren't optimizing.  */

Given all the fuss in the 0/9 thread, maybe elimination could be
done in-place, at least when applied internally by LRA rather than
externally.  Most of the cases, including PLUS, are for things that
can't legitimately be shared.

I don't think that should be a requirement for merging though,
especially since this is how things have always been done.
Just an idea.

> +      /* We do not support elimination of a register that is modified.
> +      elimination_effects has already make sure that this does not
> +      happen.  */
> +      return x;
> 
> +    case PRE_MODIFY:
> +    case POST_MODIFY:
> +      /* We do not support elimination of a hard register that is
> +      modified.  elimination_effects has already make sure that
> +      this does not happen.  The only remaining case we need to
> +      consider here is that the increment value may be an
> +      eliminable register.  */

Reload references (elimination_effects).

> +#ifdef WORD_REGISTER_OPERATIONS
> +                /* On these machines, combine can create RTL of the form
> +                   (set (subreg:m1 (reg:m2 R) 0) ...)
> +                   where m1 < m2, and expects something interesting to
> +                   happen to the entire word.  Moreover, it will use the
> +                   (reg:m2 R) later, expecting all bits to be preserved.
> +                   So if the number of words is the same, preserve the
> +                   subreg so that push_reload can see it.  */
> +                && ! ((x_size - 1) / UNITS_PER_WORD
> +                      == (new_size -1 ) / UNITS_PER_WORD)
> +#endif

Reload reference (push_reload).  Do we still need this for LRA?

> +         {
> +           SUBREG_REG (x) = new_rtx;
> +           alter_subreg (&x, false);
> +           return x;
> +         }

The reload version is:

    return adjust_address_nv (new_rtx, GET_MODE (x), SUBREG_BYTE (x));

Isn't that safe here too?  We're only doing this for non-paradoxical subregs, 
so:

      /* For paradoxical subregs on big-endian machines, SUBREG_BYTE
         contains 0 instead of the proper offset.  See simplify_subreg.  */
      if (offset == 0
          && GET_MODE_SIZE (GET_MODE (y)) < GET_MODE_SIZE (GET_MODE (x)))
        {
          int difference = GET_MODE_SIZE (GET_MODE (y))
                           - GET_MODE_SIZE (GET_MODE (x));
          if (WORDS_BIG_ENDIAN)
            offset += (difference / UNITS_PER_WORD) * UNITS_PER_WORD;
          if (BYTES_BIG_ENDIAN)
            offset += difference % UNITS_PER_WORD;
        }

doesn't apply.

> +/* Scan rtx X for modifications of elimination target registers.  Update
> +   the table of eliminables to reflect the changed state.  MEM_MODE is
> +   the mode of an enclosing MEM rtx, or VOIDmode if not within a MEM.  */
> +static void
> +mark_not_eliminable (rtx x)

Maybe:

/* Scan rtx X for references to elimination source or target registers
   in contexts that would prevent the elimination from happening.
   Update ... */

The function looks at uses as well as modifications, and at elimination
source registers as well as target registers.

> +      if (REG_P (XEXP (x, 0)) && REGNO (XEXP (x, 0)) < FIRST_PSEUDO_REGISTER)
> +     /* If we modify the source of an elimination rule, disable it.  */
> +     for (ep = reg_eliminate;
> +          ep < &reg_eliminate[NUM_ELIMINABLE_REGS];
> +            ep++)
> +       if (ep->from_rtx == XEXP (x, 0)
> +           || (ep->to_rtx == XEXP (x, 0)
> +               && ep->to_rtx != hard_frame_pointer_rtx))
> +         setup_can_eliminate (ep, false);

Comment doesn't mention the to_rtx case.

> +   If REPLACE_P is false, do an offset updates.       */

s/do an offset updates/just update the offsets/
Maybe worth adding "while keeping the base register the same".

> +static void
> +eliminate_regs_in_insn (rtx insn, bool replace_p)
> +{
> +  int icode = recog_memoized (insn);
> +  rtx old_set = single_set (insn);
> +  bool val;

"validate_p" might be a better name.

> +           /* If we are assigning to a hard register that can be
> +              eliminated, it must be as part of a PARALLEL, since
> +              the code above handles single SETs.  We must indicate
> +              that we can no longer eliminate this reg.  */
> +           for (ep = reg_eliminate;
> +                ep < &reg_eliminate[NUM_ELIMINABLE_REGS];
> +                ep++)
> +             lra_assert (ep->from_rtx != orig_operand[i]
> +                         || ! ep->can_eliminate);

This is now an assert, so rather than "We must indicate ...",
I think the comment should say who enforces this.  (mark_not_eliminable?)

> +       /* If an output operand changed from a REG to a MEM and INSN is an
> +          insn, write a CLOBBER insn.  */
> +       if (static_id->operand[i].type != OP_IN
> +           && REG_P (orig_operand[i])
> +           && MEM_P (substed_operand[i])
> +           && replace_p)
> +         emit_insn_after (gen_clobber (orig_operand[i]), insn);

I realise this is copied from reload, but why is it needed?

> +/* Initialize the table of hard registers to eliminate.
> +   Pre-condition: global flag frame_pointer_needed has been set before
> +   calling this function.  Set up hard registers in DONT_USE_REGS
> +   which can not be used for allocation because their identical
> +   elimination is not possible.       */
> +static void
> +init_elim_table (void)

DONT_USE_REGS doesn't exist any more.

> +/* Eliminate hard reg given by its location LOC.  */
> +void
> +lra_eliminate_reg_if_possible (rtx *loc)
> +{
> +  int regno;
> +  struct elim_table *ep;
> +
> +  lra_assert (REG_P (*loc));
> +  if ((regno = REGNO (*loc)) >= FIRST_PSEUDO_REGISTER
> +      /* Virtual registers are not allocatable. ??? */
> +      || ! TEST_HARD_REG_BIT (lra_no_alloc_regs, regno))
> +    return;

I don't understand this comment.  Is the "if" statement needed at all?
Given the requirement for rtx equality, I'd have thought:

> +  if ((ep = get_elimination (*loc)) != NULL)
> +    *loc = ep->to_rtx;

would do everything.  If that isn't true, the function might need
a bit more commentary.

> +  for (i = FIRST_PSEUDO_REGISTER; i < regs_num; i++)
> +    if (lra_reg_info[i].nrefs != 0)
> +      {
> +     mem_loc = ira_reg_equiv[i].memory;
> +     invariant = ira_reg_equiv[i].invariant;
> +     if (mem_loc != NULL_RTX)
> +       mem_loc = lra_eliminate_regs_1 (mem_loc, VOIDmode,
> +                                       final_p, ! final_p, false);
> +     ira_reg_equiv[i].memory = mem_loc;
> +     if (invariant != NULL_RTX)
> +       invariant = lra_eliminate_regs_1 (invariant, VOIDmode,
> +                                         final_p, ! final_p, false);
> +     ira_reg_equiv[i].invariant = invariant;

Minor nit, but the first assignment to "invariant" seems to belong
a bit further down.

Looks really good to me FWIW.  Keeping the rtl in a form where the
final elimination is simply a register replacement is a nice trick.

Richard

Re: RFC: LRA for x86/x86-64 [7/9]

Reply via email to