Andrew MacLeod <[EMAIL PROTECTED]> writes:

> This describes my current work-in-progress, RABLET, which stands for
> RABLE-Themes, and conveniently implies something smaller.

Thanks for this proposal.


> ssa-to-rtl
> spill cost analysis
> global allocation
> spiller
> spill location optimizer
> instruction rewriter.

You omitted the RTL loop optimizer passes, which still do quite a bit
of work despite the tree-ssa loop passes.  Also if-conversion and some
minor passes, though they are less relevant.


> If expand is made much smarter, I would argue that much of GCSE and CSE
> isn't needed.  We've already performed those optimizations at  a high
> level, and we can hopefully do a lot of the factoring and things on
> addressing registers exposed during expand.  I'm sure there are other
> things to do, but I would argue that they are significantly less than a
> "general purpose" CSE and GCSE pass. And in the cases of high register
> pressure, how much would you want them to do anyway?  Its really these
> high register pressure areas that RABLET is attacking anyway.

Here I think you are waving your hands a little too hard.  RTL level
CSE is significant for handling common expressions exposed by address
calculations and by DImode (and larger) computations.  On some
processors giving up CSE on address calculations would be very
painful.  There needs to be a plan to handle that.

Also at present may vector calculations are not exposed at the tree
level--they are hidden inside builtin functions until they are
expanded--and vector heavy code can also have a lot of common
subexpressions.


> If I recall, scheduling is register pressure aware and normally doesn't
> increase register pressure dramatically. If it does increase pressure,
> well, this won't solve every problem after all.

Unfortunately, scheduling is currently not register pressure aware at
all.  The scheduler will gleefully increase register pressure.  That's
why we don't even run the scheduler before register allocation on x86.


Modulo the above comments, I don't see anything wrong with your basic
idea.  But I also wonder whether you couldn't get a similar effect by
forcing instruction selection to occur before register allocation.  If
that is done well, reload will have much less work to do.

One of the basic issues with the current code is not that we do
register allocation well or poorly, but that reload takes the output
of the register allocator and munges it unpredictably.  That's going
to happen with your proposal as well.  It doesn't mean that your
proposal won't improve things.  But no register allocator can do a
good job when it can't make the final decisions.

Ian

Reply via email to