Thoughts on LLVM and LTO

Diego Novillo Tue, 22 Nov 2005 08:20:07 -0800

First off, regardless of what direction we choose to go, I think we are in 
a great position.  Finally, GCC will have all the obvious and standard 
technology that one reads in textbooks.  Not long ago, GCC didn't even 
build a flowgraph, and now here we are deciding what IPA technology we 
want to implement.


In the end, I don't think it really matters which way we go.  We are not 
doing advanced rocket science here.  Sure, the engineering will be tricky 
and convoluted.  But this technology is relatively mature and there are 
not very many variations on the subject.  The final result will be roughly 
the same.  Different shades of gray, and all that.

Right now, I am more concerned about the approach we take to get there.  I 
am a big proponent of evolution vs revolution, so any approach that 
involves starting from scratch gives me the willies.

In principle, I can't tell which approach will take the most effort.  Both 
seem to be missing X features that the other one has.

If we go with LTO:

GVM, TU combination and all the associated slimming down of our IR data 
structures will be quite a bit of work.  This is also needed for other 
projects

We would keep a fully functional compiler throughout.  Rewiring internal 
data structures and code to make them smaller/nimbler can be easily tested 
by making sure we can still build the world.

LLVM already has some of the technology we need for link-time optimization.  
Perhaps we should look into it and swipe design ideas, if not code.

Initially, I wasn't too thrilled with the stack-based IR chosen for GVM.  
But I understand the rationale and don't have major objections against it.  

One thing that is not clear from the LTO document is whether GVM will be 
useful for dynamic optimization.  This is one area that we will eventually 
want to move into.


If we choose LLVM, I have more questions than ideas, take these thoughts as 
very preliminary based on incomplete information:

The initial impression I get is that LLVM involves starting from scratch.  
I don't quite agree that this is necessary.  One of the engineering 
challenges we need to tackle is the requirement of keeping a fully 
functional compiler *while* we improve its architecture.

With our limited resources, we cannot really afford to go off on a 
multi-year tangent nurturing and growing a new technology just to add a 
new feature.

>From what I understand, LLVM has never been used outside of a research 
environment and it can only generate code for a very limited set of 
targets.  These two are very serious limitations.  We would be losing 
years of target tweaking and compromise our ability to be a system 
compiler.

LLVM is missing a few other features like debugging information and 
vectorization.  Yes, all of it is fixable, but again, we have limited 
resources.  Furthermore, it may be hard to convince our development 
community to add these missing features: "we already implemented that!".  
It is much easier to entice folks to do something new than to re-implement 
old stuff.

The lack of FSF copyright assignment for LLVM is a problem.  It may even be 
a bigger problem than what we think.  Then again, it may not.  I just 
don't know.  What I do know is that this must absolutely be resolved 
before we even think of adding LLVM to any branch.  Chris said he'd be 
adding LLVM to the apple branch soon.  I hope the FSF assignment is worked 
out by then.  I understand that even code in branches should be under FSF 
copyright assignment.

A minor hurdle is LLVM's implementation language.  Personally, I would be 
ecstatic if we started implementing in C++.  However, not everyone in the 
community thinks this is a good idea.

Another minor nit is performance.  Judging by SPEC, LLVM has some 
performance problems.  It's very good for floating point (a 9% advantage 
over GCC), but GCC has a 24% advantage over LLVM 1.2 in integer code.  I'm 
sure that is fixable and I only have data for an old release of LLVM.  But 
is still more work to be done.  Particularly for targets not yet supported 
by LLVM.

To summarize: I am very impressed with LLVM's technical merits.  It already 
has much of the technology that we want to add to GCC.  But I think moving 
all of GCC's infrastructure to it would present quite a few problems for 
us.

Yes, LLVM gives us a well-defined and solid IPA framework, but it is 
missing quite a few things that we already take for granted.  All of them 
are fixable and some are in the process of being fixed.  But what are the 
timelines?  What resources are needed?  The LLVM solution seems to require 
a whole lot more effort from the whole community.  Not only, the core 
developers will need to be involved in it.  Many of the target and 
sub-system maintainers will need to pitch in.  Being a volunteer project, 
it is not clear whether we will be able to pull it off in a reasonable 
period of time.

The LTO approach has the disadvantage that it needs to play catch-up to 
what LLVM already has.  But it does have that evolutionary flavour that I 
personally find easier to work with.

So, my question/proposal is this: Should we consider swiping chunks of LLVM 
and adapt them to our existing framework?  So, go from GIMPLE into LLVM, 
stream to disk, do all the link-time stuff using LLVM and then read it 
back in.  In time, we'd hammer the rest of the compiler in shape.

But I would want to avoid anything that forces us to throw away the bath 
water.  Getting the baby back could prove very expensive.

Finally, I would be very interested in timelines.  Neither proposal 
mentions them.  My impression is that they will both take roughly the same 
amount of time, though the LLVM approach (as described) may take longer 
because it seems to have more missing pieces.

Thoughts on LLVM and LTO

Reply via email to