On 12/09/2014 04:37 AM, lin zuojian wrote: > Hi, > I have read ira/lra code for a while, but still fails to understand > their relationship. The main question is why ira do color so early? > lra pass will do the assignment anyway. Sorry if I mess up coloring > and hard register assignment, but I think it's better to get job > done after lra elimiation, inheriation, ... > There are two major approaches in RA implementation. One is division of it on global and local passes and another one is to use iterative approach with the same coloring algorithm as it is described in most books and research articles. Historically GCC used separate passes approach (even more complicated regmove, local RA, then global RA, and then reload). Some industrial compilers to which I had access to the sources use this approach too (e.g. pathscale compiler uses global and local RA).
I believe changing RA in GCC is very challenging task and only evolutionary approach could work. Changing old global/local/regmove by IRA took 4-5 years. Changing reload by LRA took 3 years and is still in progress. This is even more challenging task than IRA. That is why we have what we have: IRA (global RA) and LRA (local RA). The division also solves compilation time problem: one expensive global RA IRA (which builds conflict graph and dealing with irregular register file architectures by dynamically created register classes which is further development of Chaitin-Briggs coloring algorithm) and LRA which uses simplified coloring without building expensive conflict graph but making it several times (in average about 2-4 coloring passes for function). On my estimation, using the same algorithm as in IRA iteratively would add 5-10% more to all GCC compilation time. It would be hard to convince people to such approach as GCC compilation speed is a sensitive area (especially when we have a strong competitor of GCC as LLVM). The articles about RA never say about RA cycling problem (although Peter Bergner who implemented an extension of Chaitin-Brigs allocator mentioned this to me). This problem happens in LRA even with simpler coloring algorithm. It would be worse if we used more complicated one from IRA. Saying all that, I would also add that using iteratively the same algorithm is appealing idea to me too. I try to use this approach in YARA project (predecessor of IRA) in which I tried to remove all old GCC RA (including reload) at once but it was too slow and did not even generate the correct code in many cases even for x86. Jeff Law tried IRA coloring reusage too for reload but whole RA became slower (although he achieved performance improvements on x86). As I know, Preston Briggs (and his small team) from google tried to implement his iterative coloring algorithm only for x86/x86-64 but performance results were not better than IRA+reload that time. So many things in IRA/LRA have historical roots. That is what we worked out. Probably some approaches could work and used too if the same performance and compiler speed is achieved for major architectures and they generates correct code for all other architectures (supporting all of them by RA is complicated task by itself). Personally, I like classical iterative Chaitin-Briggs allocator approach but I failed to implement this and probably will avoid to returning to its implementation but I encourage anyone to try other approaches and help to include them into GCC if the results will be promising. I hope I answer your question.