Klaus Pedersen <[email protected]> writes:
> The summery goes something like this:
>
> It is possible for the second pass of ira to get confused and decide that
> NO_REGS or a hard float register are better choices for the result of the
> 2 operand mult. First pass already optimally allocated in GR_AND_MD1_REGS.

Yeah.  I'm afraid this is something I've been sitting on for a while now.
I think the problem boils down to this:

> Pass 0 for finding pseudo/allocno costs
>
>     a2 (r209,l0) best GR_REGS, allocno GR_REGS
>     a4 (r208,l0) best GR_REGS, allocno GR_REGS
>     a5 (r207,l0) best GR_REGS, allocno GR_REGS
>     a1 (r200,l0) best GR_REGS, allocno GR_REGS
>     a3 (r199,l0) best GR_AND_MD1_REGS, allocno GR_AND_MD1_REGS
>     a0 (r194,l0) best GR_REGS, allocno GR_REGS
>
> ...
>   a3(r199,l0) costs: M16_REGS:6000,6000 T_REG:6000,6000
> M16_T_REGS:6000,6000 PIC_FN_ADDR_REG:6000,6000 V1_REG:6000,6000
> LEA_REGS:6000,6000 GR_REGS:6000,6000 FP_REGS:14000,14000
> MD1_REG:6000,6000 MD_REGS:2000000,2000000 ACC_REGS:2000000,2000000
> GR_AND_MD0_REGS:2000000,2000000 GR_AND_MD1_REGS:18000,18000
> GR_AND_MD_REGS:2000000,2000000 GR_AND_ACC_REGS:2000000,2000000
> ALL_REGS:2000000,2000000 MEM:14000,14000

Here the cost of using GR_REGS is 6000 and the cost of using MD1_REG
is also 6000.  But the cost of using _either_ GR_REGS _or_ MD1_REG
is 18000!  That doesn't make conceptual sense.

This doesn't matter during the first pass because we collect the
union of the cheapest classes:

              if (i_costs[k] < best_cost)
                {
                  best_cost = i_costs[k];
                  best = (enum reg_class) rclass;
                }
              else if (i_costs[k] == best_cost)
                best = ira_reg_class_subunion[best][rclass];

I.e. if the cost of using X is C and the cost of using Y is C, we assume
that the cost of using either X or Y is also C.  We don't care in the
"else" case about the cost of ira_reg_class_subunion[best][rclass],
because if that cost is something other than C, it's wrong.
So despite what the dump says, the first pass is effectively
(and correctly) giving GR_AND_MD1_REGS a cost of 6000.

But the second pass goes out of its way to avoid calculating the costs
of these constituent classes, and just calculates the cost of
GR_AND_MD1_REGS itself:

      /* We exclude classes from consideration which are subsets of
         ACLASS only if ACLASS is a pressure class or subset of a
         pressure class.  It means by the definition of pressure classes
         that cost of moving between susbets of ACLASS is cheaper than
         load or store.  */
      for (i = 0; i < ira_pressure_classes_num; i++)
        {
          cl = ira_pressure_classes[i];
          if (cl == aclass)
            break;
          COPY_HARD_REG_SET (temp2, reg_class_contents[cl]);
          AND_COMPL_HARD_REG_SET (temp2, ira_no_alloc_regs);
          if (hard_reg_set_subset_p (temp, temp2))
            break;
        }
      exclude_p = i < ira_pressure_classes_num;
      classes.num = 0;
      for (i = 0; i < ira_important_classes_num; i++)
        {
          cl = ira_important_classes[i];
          if (exclude_p)
            {
              /* Exclude no-pressure classes which are subsets of
                 ACLASS.  */
              COPY_HARD_REG_SET (temp2, reg_class_contents[cl]);
              AND_COMPL_HARD_REG_SET (temp2, ira_no_alloc_regs);
              if (! ira_reg_pressure_class_p[cl]
                  && hard_reg_set_subset_p (temp2, temp) && cl != aclass)
                continue;
            }
          classes.classes[classes.num++] = cl;
        }

So we get:

>   a3(r199,l0) costs: FP_REGS:14000,14000 MD_REGS:2000000,2000000
> ACC_REGS:2000000,2000000 GR_AND_MD0_REGS:2000000,2000000
> GR_AND_MD1_REGS:18000,18000 GR_AND_MD_REGS:2000000,2000000
> GR_AND_ACC_REGS:2000000,2000000 ALL_REGS:2000000,2000000
> MEM:14000,14000

I.e. the same incorrect value for GR_AND_MD1_REGS, but without the
individual GR_REGS and MD1_REG costs that would give the correct value.

Part of the problem is that may_move_{in,out}_cost is too conservative.
It is either the full cost of a move or 0.  E.g. take:

    MMOC == may_move_out_cost[mode][X][Y]

which is used where an output constraint specifies class X and the
allocation class is Y.  MMOC is 0 if Y is a subset of X, because no move
is ever needed in that case.  But in other cases MMOC is the worst-case
cost of moving from X to _anything_ in Y.  So if Y is X \union X', say,
then MMOC will include the cost of an X->X move, even though a move is
only needed for registers in the X' part of Y.

We could do better by making:

    may_move_out_cost[mode][X][Y] = move_cost[mode][X][superdiff[Y][X]]

where superdiff is a new array specifying the smallest class that contains
Y - X.  But even that's too pessimistic.  When we see an X constraint,
we'll count the cost of:

    move_cost[mode][X][X']

And when we see an X' constraint, we'll count the cost of:

    move_cost[mode][X'][X]

And we'll sum these costs together, even though they represent
mutually-exclusive scenarios.  In cases like yours (which are common
on MIPS) we'll end up with a GR_AND_MD1_REGS cost of 12000 instead
of 18000, but the value really ought to be 6000.

I think the only practical way of calculating accurate costs for things
like GR_AND_MD_REGS really is to count the cost of the constituent classes
and then take their MAX.

Vlad, what do you think?  Is the above exclude_p code "just" a compile-time
speed-up?  Or is it a conceptual part of the algorithm?  More generally,
what kind of situations does the second pass help with?  I.e. how does
it improve on the first pass?

Richard

Reply via email to