Thanks for everyone who has responded to this email thread! That was quite informative for me although I am not yet familiar with all the phases in GCC. I have worked on instruction scheduling and register allocation in another compiler and am currently trying to understand how things are done in GCC. Can you guys kindly give me some references or brief descriptions of the following:
1. Selective scheduling. I am not familiar with the term. Is this a trace scheduler of some sort? 2. Reload. Is this another name for the register allocation phase or something else? 3. Register allocation in GCC. Is it based on a graph-coloring algorithm or a linear-scan algorithm or something else? It looks that the answer to my question about fixing pre-pass scheduling by adding register pressure awareness is that Vladimir is currently working on it. Vladimir, can you please give us some brief description of the approach that you are taking to solve this problem? If you can refer me to some document that you have written up on this, that would be great. Balancing the tradeoff between hiding latencies and minimizing register pressure (shortening live ranges) is a very complex problem. Ideally, a precise solution requires some kind of backtracking to undo bad scheduling decisions. However, I think it may be possible to develop a one-pass greedy algorithm that gives a good approximation in most cases. My thinking is that an such an approximate greedy algorithm (i.e., without backtracking) will probably do something like this: The scheduler keeps track of the register pressure and then for each instruction in the ready list, it computes what would the register pressure be if this instruction was scheduled next. Based on that, it selects the instruction that leads to the minimum register pressure. Computing register pressure after the scheduling of an instruction requires keeping track of the number of uses of each live range. If an instruction has the last use of a live range, that should give it priority over other instructions. This is very much what the existing heuristic tries to do, but the existing heuristic is static. It does not dynamically keep track of register pressure and the defs and uses of each live range. How does this relate to the solution that you are currently working on, Vladimir? Do you know when you will have something working? Do you have any initial performance results? Thanks -Ghassan -----Original Message----- From: Andrey Belevantsev [mailto:a...@ispras.ru] Sent: Wednesday, April 08, 2009 9:25 AM To: Vladimir Makarov Cc: Steven Bosscher; Shobaki, Ghassan; gcc@gcc.gnu.org Subject: Re: Fixing the pre-pass scheduler on x86 (Bug 38403) Vladimir Makarov wrote: > Steven Bosscher wrote: >> On Wed, Apr 8, 2009 at 5:19 AM, Vladimir Makarov <vmaka...@redhat.com> >> wrote: >>> I've been working on register-pressure sensitive insn scheduling last >>> two >>> months and I hope to submit this work for gcc4.5. I am implementing >>> also a >>> mode in insn-scheduler to do only live range shrinkage. >> >> Is all of this still necessary if the selective scheduler (with >> register renaming) is made to work on i686/x86_64 after reload? >> > > That is a really interesting question, Steven. I thought about > this for a few months (since last fall). Here is my points to result > me in starting my work on haifa-scheduler: > > 1. Selective scheduler works only for Itanium now. As I know there > are some plans to make it working on PPC, but there are no plans > to make it working for other acrhictectures for now. There are some patches for fixing sel-sched on PPC that I need to ping, thanks for reminding me :) They were too late for regression-only mode, and we didn't get access to modern PPCs as we hoped, so this was not an issue earlier. Btw, when we've submitted the scheduler, it did work on x86-64 (compile farm), I don't know whether this is still the case. > > 2. My understanding is that (register-pressure sensitive insn > scheduling + RA + 2nd insn scheduling) is not equal to (RA + > selective scheduling with register renaming) with the point of > the potential performance results. In first case, the > pressure-sensitive insn scheduler could improve RA by live-range > shrinkage. In the 2nd case it is already impossible. It could > be improved by the 2nd RA but RA is more time consuming than > scheduling now. In general this chicken-egg problem could be > solved by iterating the 2 contradictory passes insn scheduling + > RA (or RA + insns scheduling). It is a matter of practicality > when to stop these iterations. > > 3. My current understanding is that selective scheduler is overkill > for architectures with few registers. In other words, I don't > think it will give a better performance for such architectures. > On the other hand, it is much slower than haifa-scheduler because > of its complexity. Haifa-scheduler before RA already adds 3-5% compile > time, selective scheduler will add 5-7% more compile time without > performance improvement for mainstream architectures x86/x86_64. > I think it is intolerable. We still plan to do some speedups to the sel-sched code within 4.5 timeframe, mainly to the dependence handling code. But even after that, I agree that for out-of-order architectures selective scheduler will probably be an overkill. Besides, register renaming itself would remain an expensive operation, because it needs to scan all paths along which an insn was moved to prove that the new register can be used, though we have invested a lot of time for speeding up this process via various caching mechanisms. On ia64, register renaming is useful for pipelining loops. Also, we have tried to limit register renaming for the 1st pass selective scheduler via tracking register pressure and having a cutoff for that, but it didn't work out very well on ia64, so I agree that much more of RA knowledge should be brought in for this task. Hope this helps. Vlad, Steven, thanks for caring. Andrey