Thanks Vladimir & Jeff & Kugan. Combining the replies I get a better
view of RA problem.
--
Lin Zuojian
On Tue, Dec 09, 2014 at 12:10:29PM -0500, Vladimir Makarov wrote:
> On 12/09/2014 04:37 AM, lin zuojian wrote:
> > Hi,
> > I have read ira/lra code for a while, but still fails to understand
> > their relationship. The main question is why ira do color so early?
> > lra pass will do the assignment anyway. Sorry if I mess up coloring
> > and hard register assignment, but I think it's better to get job
> > done after lra elimiation, inheriation, ...
> >
> There are two major approaches in RA implementation. One is division
> of it on global and local passes and another one is to use iterative
> approach with the same coloring algorithm as it is described in most
> books and research articles. Historically GCC used separate passes
> approach (even more complicated regmove, local RA, then global RA, and
> then reload). Some industrial compilers to which I had access to the
> sources use this approach too (e.g. pathscale compiler uses global and
> local RA).
>
> I believe changing RA in GCC is very challenging task and only
> evolutionary approach could work. Changing old global/local/regmove by
> IRA took 4-5 years. Changing reload by LRA took 3 years and is still in
> progress. This is even more challenging task than IRA. That is why we
> have what we have: IRA (global RA) and LRA (local RA). The division
> also solves compilation time problem: one expensive global RA IRA (which
> builds conflict graph and dealing with irregular register file
> architectures by dynamically created register classes which is further
> development of Chaitin-Briggs coloring algorithm) and LRA which uses
> simplified coloring without building expensive conflict graph but making
> it several times (in average about 2-4 coloring passes for function).
> On my estimation, using the same algorithm as in IRA iteratively would
> add 5-10% more to all GCC compilation time. It would be hard to
> convince people to such approach as GCC compilation speed is a sensitive
> area (especially when we have a strong competitor of GCC as LLVM). The
> articles about RA never say about RA cycling problem (although Peter
> Bergner who implemented an extension of Chaitin-Brigs allocator
> mentioned this to me). This problem happens in LRA even with simpler
> coloring algorithm. It would be worse if we used more complicated one
> from IRA.
>
> Saying all that, I would also add that using iteratively the same
> algorithm is appealing idea to me too. I try to use this approach in
> YARA project (predecessor of IRA) in which I tried to remove all old GCC
> RA (including reload) at once but it was too slow and did not even
> generate the correct code in many cases even for x86. Jeff Law tried
> IRA coloring reusage too for reload but whole RA became slower (although
> he achieved performance improvements on x86). As I know, Preston Briggs
> (and his small team) from google tried to implement his iterative
> coloring algorithm only for x86/x86-64 but performance results were not
> better than IRA+reload that time.
>
> So many things in IRA/LRA have historical roots. That is what we
> worked out. Probably some approaches could work and used too if the
> same performance and compiler speed is achieved for major architectures
> and they generates correct code for all other architectures (supporting
> all of them by RA is complicated task by itself). Personally, I like
> classical iterative Chaitin-Briggs allocator approach but I failed to
> implement this and probably will avoid to returning to its
> implementation but I encourage anyone to try other approaches and help
> to include them into GCC if the results will be promising.
>
> I hope I answer your question.
>
>