First off, regardless of what direction we choose to go, I think we are in a great position. Finally, GCC will have all the obvious and standard technology that one reads in textbooks. Not long ago, GCC didn't even build a flowgraph, and now here we are deciding what IPA technology we want to implement.
In the end, I don't think it really matters which way we go. We are not doing advanced rocket science here. Sure, the engineering will be tricky and convoluted. But this technology is relatively mature and there are not very many variations on the subject. The final result will be roughly the same. Different shades of gray, and all that. Right now, I am more concerned about the approach we take to get there. I am a big proponent of evolution vs revolution, so any approach that involves starting from scratch gives me the willies. In principle, I can't tell which approach will take the most effort. Both seem to be missing X features that the other one has. If we go with LTO: GVM, TU combination and all the associated slimming down of our IR data structures will be quite a bit of work. This is also needed for other projects We would keep a fully functional compiler throughout. Rewiring internal data structures and code to make them smaller/nimbler can be easily tested by making sure we can still build the world. LLVM already has some of the technology we need for link-time optimization. Perhaps we should look into it and swipe design ideas, if not code. Initially, I wasn't too thrilled with the stack-based IR chosen for GVM. But I understand the rationale and don't have major objections against it. One thing that is not clear from the LTO document is whether GVM will be useful for dynamic optimization. This is one area that we will eventually want to move into. If we choose LLVM, I have more questions than ideas, take these thoughts as very preliminary based on incomplete information: The initial impression I get is that LLVM involves starting from scratch. I don't quite agree that this is necessary. One of the engineering challenges we need to tackle is the requirement of keeping a fully functional compiler *while* we improve its architecture. With our limited resources, we cannot really afford to go off on a multi-year tangent nurturing and growing a new technology just to add a new feature. >From what I understand, LLVM has never been used outside of a research environment and it can only generate code for a very limited set of targets. These two are very serious limitations. We would be losing years of target tweaking and compromise our ability to be a system compiler. LLVM is missing a few other features like debugging information and vectorization. Yes, all of it is fixable, but again, we have limited resources. Furthermore, it may be hard to convince our development community to add these missing features: "we already implemented that!". It is much easier to entice folks to do something new than to re-implement old stuff. The lack of FSF copyright assignment for LLVM is a problem. It may even be a bigger problem than what we think. Then again, it may not. I just don't know. What I do know is that this must absolutely be resolved before we even think of adding LLVM to any branch. Chris said he'd be adding LLVM to the apple branch soon. I hope the FSF assignment is worked out by then. I understand that even code in branches should be under FSF copyright assignment. A minor hurdle is LLVM's implementation language. Personally, I would be ecstatic if we started implementing in C++. However, not everyone in the community thinks this is a good idea. Another minor nit is performance. Judging by SPEC, LLVM has some performance problems. It's very good for floating point (a 9% advantage over GCC), but GCC has a 24% advantage over LLVM 1.2 in integer code. I'm sure that is fixable and I only have data for an old release of LLVM. But is still more work to be done. Particularly for targets not yet supported by LLVM. To summarize: I am very impressed with LLVM's technical merits. It already has much of the technology that we want to add to GCC. But I think moving all of GCC's infrastructure to it would present quite a few problems for us. Yes, LLVM gives us a well-defined and solid IPA framework, but it is missing quite a few things that we already take for granted. All of them are fixable and some are in the process of being fixed. But what are the timelines? What resources are needed? The LLVM solution seems to require a whole lot more effort from the whole community. Not only, the core developers will need to be involved in it. Many of the target and sub-system maintainers will need to pitch in. Being a volunteer project, it is not clear whether we will be able to pull it off in a reasonable period of time. The LTO approach has the disadvantage that it needs to play catch-up to what LLVM already has. But it does have that evolutionary flavour that I personally find easier to work with. So, my question/proposal is this: Should we consider swiping chunks of LLVM and adapt them to our existing framework? So, go from GIMPLE into LLVM, stream to disk, do all the link-time stuff using LLVM and then read it back in. In time, we'd hammer the rest of the compiler in shape. But I would want to avoid anything that forces us to throw away the bath water. Getting the baby back could prove very expensive. Finally, I would be very interested in timelines. Neither proposal mentions them. My impression is that they will both take roughly the same amount of time, though the LLVM approach (as described) may take longer because it seems to have more missing pieces.