> > I mention all this because I was wondering which other architectures > > have turned off sched1 for -Os? More importantly, I was wondering > > if anyone else had considered creating some kind of clever hybrid > > that only uses sched1 when it will increase performance without > > increasing register pressure? > > > > > http://gcc.gnu.org/ml/gcc-patches/2009-09/msg00003.html > > Another problem is that sched1 for architectures with few registers can > result in reload failure. I tried to fix this in the patch mentioned > above but I am not sure it is done for all targets and all possible > programs. The right solution for this would be implementing hard > register spills in the reload.
I don't think we have so few registers that reload failure will occur, so it might be worth me trying this. > > The mentioned above code does not work for RA based on priority > coloring > because register pressure calculation for intersected or nested classes > has a little sense. Hmm. Thanks for mentioning that. As you might recall, we are using priority coloring at the moment because it yielded better performance than Chaitin-Briggs. Well, the real reason CB was rejected was that we were already using Priority and so moving to CB would cause sufficient disruption such that performance increases and decreases would be inevitable. I did get some gains, but I also got regressions that we couldn't absorb at the time I did the work. I might be revisiting CB soon though, as it did tend to yield smaller code, which is becoming more important to us. > > If scheduling for the target is very important (as for itanium or > in-order execution power6), I'd recommend to look at the selective > scheduler. I don't think scheduling is highly important for us, but I will take a look at the selective scheduler. > > > Or perhaps I could make a heuristic based on the balanced-ness of the > > tree? (I see sched1 does a lot better if the tree is balanced, since > > it has more options to play with.) > > > > > > > The register pressure is already mostly minimized when shed1 starts to > work. I guess this is a factor in the unbalancedness of the tree. The more you balance it, the more likely it will get wider and require more registers. But the wider and more balanced, the more options for sched1 and the more chance of a performance win (assuming the increase in reg pressure does not outweigh the scheduling performance win.) > > Now onto interblock-scheduling ... > > > > As we all know, you can't have interblock-scheduling enabled unless > > you use the sched1 pass, so if sched1 is off then interblock is > > irrelevant. For now, let's assume we are going to make some clever > > hybrid that allows sched1 when we think it will increase performance > > for Os and we are going to keep sched1 on for O2 and O3. > > > > As I understand it, interblock-scheduling enlarges the scope of > > sched1, such that you can insert independent insns from a > > completely different block in between dependent insns in this > > block. As well as potentially amortizing stalls on high latency > > insns, we also get the chance to do "meatier" work in the destination > > block and leave less to do in the source block. I don't know if this > > is a deliberate effect of interblock-scheduling or if it is just > > a happy side-effect. > > > > Anyway, the reason I mention interblock-scheduling is that I see it > > doing seemingly intelligent moves, but then the later BB-reorder pass > > is juggling blocks around such that we end up with extra code inside > > hot loops! I assume this is because the scheduler and BB-reorderer > > are largely ignorant of each other, and so good intentions on the > > part of the former can be scuppered by the latter. > > > > > That is right. It would be nice if somebody solves the problem. Hmm. If we keep sched1 on, then maybe I will be the man to do it! Best regards, Ian