Re: A Question About LRA/reload

lin zuojian Tue, 09 Dec 2014 19:32:39 -0800
Thanks Vladimir & Jeff & Kugan. Combining the replies I get a better
view of RA problem.
--
Lin Zuojian
On Tue, Dec 09, 2014 at 12:10:29PM -0500, Vladimir Makarov wrote:
> On 12/09/2014 04:37 AM, lin zuojian wrote:
> > Hi,
> >     I have read ira/lra code for a while, but still fails to understand
> >     their relationship. The main question is why ira do color so early?
> >     lra pass will do the assignment anyway. Sorry if I mess up coloring
> >     and hard register assignment, but I think it's better to get job
> >     done after lra elimiation, inheriation, ...
> >
>   There are two major approaches in RA implementation.  One is division
> of it on global and local passes and another one is to use iterative
> approach with the same coloring algorithm as it is described in most
> books and research articles.  Historically GCC used separate passes
> approach (even more complicated regmove, local RA, then global RA, and
> then reload).  Some industrial compilers to which I had access to the
> sources use this approach too (e.g. pathscale compiler uses global and
> local RA).
> 
>   I believe changing RA in GCC is very challenging task and only
> evolutionary approach could work.   Changing old global/local/regmove by
> IRA took 4-5 years.  Changing reload by LRA took 3 years and is still in
> progress. This is even more challenging task than IRA.  That is why we
> have what we have: IRA (global RA) and LRA (local RA).  The division
> also solves compilation time problem: one expensive global RA IRA (which
> builds conflict graph and dealing with irregular register file
> architectures by dynamically created register classes which is further
> development of Chaitin-Briggs coloring algorithm) and LRA which uses
> simplified coloring without building expensive conflict graph but making
> it several times (in average about 2-4 coloring passes for function). 
> On my estimation, using the same algorithm as in IRA iteratively would
> add 5-10% more to all GCC compilation time.  It would be hard to
> convince people to such approach as GCC compilation speed is a sensitive
> area (especially when we have a strong competitor of GCC as LLVM).  The
> articles about RA never say about RA cycling problem (although Peter
> Bergner who implemented an extension of Chaitin-Brigs allocator
> mentioned this to me).  This problem happens in LRA even with simpler
> coloring algorithm.  It would be worse if we used more complicated one
> from IRA.
>  
>   Saying all that, I would also add that using iteratively the same
> algorithm is appealing idea to me too.  I try to use this approach in
> YARA project (predecessor of IRA) in which I tried to remove all old GCC
> RA (including reload) at once but it was too slow and did not even
> generate the correct code in many cases even for x86.  Jeff Law tried
> IRA coloring reusage too for reload but whole RA became slower (although
> he achieved performance improvements on x86).  As I know, Preston Briggs
> (and his small team) from google tried to implement his iterative
> coloring algorithm only for x86/x86-64 but performance results were not
> better than IRA+reload that time.
> 
>   So many things in IRA/LRA have historical roots.  That is what we
> worked out.  Probably some approaches could work and used too if the
> same performance and compiler speed is achieved for major architectures
> and they generates correct code for all other architectures (supporting
> all of them by RA is complicated task by itself).  Personally, I like
> classical iterative Chaitin-Briggs allocator approach but I failed to
> implement this and probably will avoid to returning to its
> implementation but I encourage anyone to try other approaches and help
> to include them into GCC if the results will be promising.
> 
>   I hope I answer your question.
>  
>
Re: A Question About LRA/reload

Reply via email to