Andrew MacLeod <[EMAIL PROTECTED]> writes: > This describes my current work-in-progress, RABLET, which stands for > RABLE-Themes, and conveniently implies something smaller.
Thanks for this proposal. > ssa-to-rtl > spill cost analysis > global allocation > spiller > spill location optimizer > instruction rewriter. You omitted the RTL loop optimizer passes, which still do quite a bit of work despite the tree-ssa loop passes. Also if-conversion and some minor passes, though they are less relevant. > If expand is made much smarter, I would argue that much of GCSE and CSE > isn't needed. We've already performed those optimizations at a high > level, and we can hopefully do a lot of the factoring and things on > addressing registers exposed during expand. I'm sure there are other > things to do, but I would argue that they are significantly less than a > "general purpose" CSE and GCSE pass. And in the cases of high register > pressure, how much would you want them to do anyway? Its really these > high register pressure areas that RABLET is attacking anyway. Here I think you are waving your hands a little too hard. RTL level CSE is significant for handling common expressions exposed by address calculations and by DImode (and larger) computations. On some processors giving up CSE on address calculations would be very painful. There needs to be a plan to handle that. Also at present may vector calculations are not exposed at the tree level--they are hidden inside builtin functions until they are expanded--and vector heavy code can also have a lot of common subexpressions. > If I recall, scheduling is register pressure aware and normally doesn't > increase register pressure dramatically. If it does increase pressure, > well, this won't solve every problem after all. Unfortunately, scheduling is currently not register pressure aware at all. The scheduler will gleefully increase register pressure. That's why we don't even run the scheduler before register allocation on x86. Modulo the above comments, I don't see anything wrong with your basic idea. But I also wonder whether you couldn't get a similar effect by forcing instruction selection to occur before register allocation. If that is done well, reload will have much less work to do. One of the basic issues with the current code is not that we do register allocation well or poorly, but that reload takes the output of the register allocator and munges it unpredictably. That's going to happen with your proposal as well. It doesn't mean that your proposal won't improve things. But no register allocator can do a good job when it can't make the final decisions. Ian