Re: MIPS: 2'nd pass of ira, causes weird register allocation for 2-op mult

Richard Sandiford Mon, 28 May 2012 12:10:24 -0700

Klaus Pedersen <[email protected]> writes:
> The summery goes something like this:
>
> It is possible for the second pass of ira to get confused and decide that
> NO_REGS or a hard float register are better choices for the result of the
> 2 operand mult. First pass already optimally allocated in GR_AND_MD1_REGS.


Yeah.  I'm afraid this is something I've been sitting on for a while now.
I think the problem boils down to this:

> Pass 0 for finding pseudo/allocno costs
>
>     a2 (r209,l0) best GR_REGS, allocno GR_REGS
>     a4 (r208,l0) best GR_REGS, allocno GR_REGS
>     a5 (r207,l0) best GR_REGS, allocno GR_REGS
>     a1 (r200,l0) best GR_REGS, allocno GR_REGS
>     a3 (r199,l0) best GR_AND_MD1_REGS, allocno GR_AND_MD1_REGS
>     a0 (r194,l0) best GR_REGS, allocno GR_REGS
>
> ...
>   a3(r199,l0) costs: M16_REGS:6000,6000 T_REG:6000,6000
> M16_T_REGS:6000,6000 PIC_FN_ADDR_REG:6000,6000 V1_REG:6000,6000
> LEA_REGS:6000,6000 GR_REGS:6000,6000 FP_REGS:14000,14000
> MD1_REG:6000,6000 MD_REGS:2000000,2000000 ACC_REGS:2000000,2000000
> GR_AND_MD0_REGS:2000000,2000000 GR_AND_MD1_REGS:18000,18000
> GR_AND_MD_REGS:2000000,2000000 GR_AND_ACC_REGS:2000000,2000000
> ALL_REGS:2000000,2000000 MEM:14000,14000

Here the cost of using GR_REGS is 6000 and the cost of using MD1_REG
is also 6000.  But the cost of using _either_ GR_REGS _or_ MD1_REG
is 18000!  That doesn't make conceptual sense.

This doesn't matter during the first pass because we collect the
union of the cheapest classes:

              if (i_costs[k] < best_cost)
                {
                  best_cost = i_costs[k];
                  best = (enum reg_class) rclass;
                }
              else if (i_costs[k] == best_cost)
                best = ira_reg_class_subunion[best][rclass];

I.e. if the cost of using X is C and the cost of using Y is C, we assume
that the cost of using either X or Y is also C.  We don't care in the
"else" case about the cost of ira_reg_class_subunion[best][rclass],
because if that cost is something other than C, it's wrong.
So despite what the dump says, the first pass is effectively
(and correctly) giving GR_AND_MD1_REGS a cost of 6000.

But the second pass goes out of its way to avoid calculating the costs
of these constituent classes, and just calculates the cost of
GR_AND_MD1_REGS itself:

      /* We exclude classes from consideration which are subsets of
         ACLASS only if ACLASS is a pressure class or subset of a
         pressure class.  It means by the definition of pressure classes
         that cost of moving between susbets of ACLASS is cheaper than
         load or store.  */
      for (i = 0; i < ira_pressure_classes_num; i++)
        {
          cl = ira_pressure_classes[i];
          if (cl == aclass)
            break;
          COPY_HARD_REG_SET (temp2, reg_class_contents[cl]);
          AND_COMPL_HARD_REG_SET (temp2, ira_no_alloc_regs);
          if (hard_reg_set_subset_p (temp, temp2))
            break;
        }
      exclude_p = i < ira_pressure_classes_num;
      classes.num = 0;
      for (i = 0; i < ira_important_classes_num; i++)
        {
          cl = ira_important_classes[i];
          if (exclude_p)
            {
              /* Exclude no-pressure classes which are subsets of
                 ACLASS.  */
              COPY_HARD_REG_SET (temp2, reg_class_contents[cl]);
              AND_COMPL_HARD_REG_SET (temp2, ira_no_alloc_regs);
              if (! ira_reg_pressure_class_p[cl]
                  && hard_reg_set_subset_p (temp2, temp) && cl != aclass)
                continue;
            }
          classes.classes[classes.num++] = cl;
        }

So we get:

>   a3(r199,l0) costs: FP_REGS:14000,14000 MD_REGS:2000000,2000000
> ACC_REGS:2000000,2000000 GR_AND_MD0_REGS:2000000,2000000
> GR_AND_MD1_REGS:18000,18000 GR_AND_MD_REGS:2000000,2000000
> GR_AND_ACC_REGS:2000000,2000000 ALL_REGS:2000000,2000000
> MEM:14000,14000

I.e. the same incorrect value for GR_AND_MD1_REGS, but without the
individual GR_REGS and MD1_REG costs that would give the correct value.

Part of the problem is that may_move_{in,out}_cost is too conservative.
It is either the full cost of a move or 0.  E.g. take:

    MMOC == may_move_out_cost[mode][X][Y]

which is used where an output constraint specifies class X and the
allocation class is Y.  MMOC is 0 if Y is a subset of X, because no move
is ever needed in that case.  But in other cases MMOC is the worst-case
cost of moving from X to _anything_ in Y.  So if Y is X \union X', say,
then MMOC will include the cost of an X->X move, even though a move is
only needed for registers in the X' part of Y.

We could do better by making:

    may_move_out_cost[mode][X][Y] = move_cost[mode][X][superdiff[Y][X]]

where superdiff is a new array specifying the smallest class that contains
Y - X.  But even that's too pessimistic.  When we see an X constraint,
we'll count the cost of:

    move_cost[mode][X][X']

And when we see an X' constraint, we'll count the cost of:

    move_cost[mode][X'][X]

And we'll sum these costs together, even though they represent
mutually-exclusive scenarios.  In cases like yours (which are common
on MIPS) we'll end up with a GR_AND_MD1_REGS cost of 12000 instead
of 18000, but the value really ought to be 6000.

I think the only practical way of calculating accurate costs for things
like GR_AND_MD_REGS really is to count the cost of the constituent classes
and then take their MAX.

Vlad, what do you think?  Is the above exclude_p code "just" a compile-time
speed-up?  Or is it a conceptual part of the algorithm?  More generally,
what kind of situations does the second pass help with?  I.e. how does
it improve on the first pass?

Richard

Re: MIPS: 2'nd pass of ira, causes weird register allocation for 2-op mult

Reply via email to