I forgot to mention: I wonder if it might be worth looking at link time optimisation, ala llvm-gcc? i.e. store LLVM bitcode in the .o files, and substitute the llvm linker for standard ld in order to generate assembly at link time. I suspect that GHC itself is getting most of the cross-module inlining opportunities, but checking might be a relatively cheap experiment to do.
Cheers, Max On 1 March 2010 10:54, Max Bolingbroke <batterseapo...@hotmail.com> wrote: > David replied to my query about LLVM as below, which I'm forwarding to the > list. > > It sounds like there are two potential LLVM based GSoC projects: > > Project 1: LLVM optimisation passes > ~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > 1) Clearly identify issues with LLVM produced assembly code in the > context of GHC > * This can be done by examining how it compares to the native code > generator on nofib benchmarks > * You might be able to get some mileage from simply eyeballing the > assembly and looking for "obvious" stupidity, like the ESP issue David > spotted > * The result of this part should be a simple set of Haskell test > cases, the assembly they produced, what the assembly *should* be > (roughly) and perhaps some notes on what might fix it > 2) The second part would be to identify the lowest hanging fruit from > those things identified in 1) and make changes to the LLVM output / > write LLVM optimisations (apparently this is a joy to do, the LLVM > framework is very well designed) to fix the issues > > Separating the project into two parts like this means that we could > get something out of the project even if the student is unable to make > significant progress with LLVM itself in the GSoC timeframe. Having a > clear description of the problems involved + simple benchmark programs > would be a huge help to someone attempting part 2) in the future, or > they could serve as the basis for feature requests to the LLVM > developers. > > Project 2: Tables next to code > ~~~~~~~~~~~~~~~~~~~~~~ > > My feeling is that this is the more challenging of the two projects, > as it is likely to touch more of LLVM / GHC. However, it is likely to > yield a solid 6% speedup. It seems there are two implementation > options: > > 1) Postprocessor ala the Evil Mangler (a nice self contained project, > but not the best long term solution) > 2) Modify LLVM to support this feature > > Do either of these seem like realistic GSoC projects? I would be > willing to mentor a potential student on either one, though I'll need > to do some learning about LLVM to keep up :-) > > Cheers, > Max > > On 1 March 2010 05:12, David Terei <davidte...@gmail.com> wrote: >> Hi Max, >> >> To be honest, I don't know (especially since the GSOC involves money). The >> indications are that, yes there might be some quick gains such as fixing up >> the ESP issue. Other 'indications' that some tuning could be done are these >> comments by Chris Lattner: >> >> http://groups.google.com/group/llvm-dev/browse_thread/thread/d05d4a80dd245c51/9eef60d9c05636d6?lnk=gst&q=ghc#9eef60d9c05636d6 >> >> There is also the TABLES_NEXT_TO_CODE optimisation that is used in Ghc but >> can't be used with the llvm back-end as LLVM doesn't support the features we >> need to implement this (control over data/function layout in object files). >> That costs us about 5% so fixing that by adding the needed features to LLVM >> would be great! I'm not sure if it would be a quick fix though, it would be >> challenging but probably doable in a GSOC timeframe (although that may be >> all they would do). >> >> This is all speculative though and I think any project would be speculative >> to some degree. If there was a very good mentor then this might work out but >> could also fail quite badly. My knowledge of both GHC and LLVM is fairly >> specialised and doesn't include any knowledge really of optimisation passes. >> >> If the student was capable though this could be part of this project. One >> thing I meant to do but never had time to as part of my thesis was to look >> at some of the nofib benchmarks for which LLVM performs quite badly at and >> identify the problem. Even if there isn't time to fix it if we had a list of >> performance regressions (with reasons) for the llvm code generator (vs the >> other code generators) this itself would be very useful. >> >> Sorry, this probably doesn't help much but I've been focused on getting >> ghc+llvm working, not really worrying about what comes out of it too much. I >> would suggest perhaps talking to the both the GHC developers and LLVM >> developers, they probably won't be able to give a great answer either but >> with this and theirs combined it should give a good enough idea for a >> decision. >> >> Cheers, >> David >> >> Max Bolingbroke wrote: >>> >>> Hi David, >>> >>> This is great stuff of course! >>> >>> I notice that you have identified some problems with the generated >>> code (e.g. the ESP manipulation stuff you point to at the end of >>> http://groups.google.com/group/llvm-dev/msg/28b99513bcc38e0f). >>> >>> Do you think there are some quick wins here that could be had by >>> writing some optimisation pass plugins for LLVM that tackle the kind >>> of performance bugs commonly seen by code originating from GHC? Is so, >>> this might be a suitable project for a Summer of Code student with >>> some previous knowledge of compiler design. >>> >>> Cheers, >>> Max >>> >>> On 18 February 2010 23:55, David Terei<davidte...@gmail.com> wrote: >>>> >>>> Hi all, >>>> >>>> Over the last 6 months I've been working on a new code generator for GHC >>>> which targets the LLVM compiler infrastructure. Most of the work was done >>>> as >>>> part of an honours thesis at the University of New South Wales under the >>>> supervision of Manuel Chakravarty. This ended at the start of November >>>> but I >>>> have continued to work on the code generator since (although at a much >>>> reduced pace since I'm back at full time work). Its now at a stage where >>>> I >>>> feel pretty confident in its correctness and would love to see it merged >>>> into GHC mainline. >>>> >>>> The patch for the llvm back-end can be found here (should apply cleanly >>>> to >>>> GHC head): >>>> >>>> http://www.cse.unsw.edu.au/~davidt/downloads/ghc-llvmbackend-full.gz >>>> >>>> This is what I would like/am requesting merged into GHC head. >>>> >>>> The thesis paper which offers a detailed performance evaluation, as well >>>> as >>>> the motivation and design of the back-end can be found at: >>>> >>>> http://www.cse.unsw.edu.au/~pls/thesis/davidt-thesis.pdf >>>> >>>> Below I'll quickly detail out the important points though. There are also >>>> instructions on how to get started with the back-end. >>>> >>>> Finally there are also some issues that I think may need to be sorted out >>>> before a merge could be done. They are at the end. >>>> >>>> Performance >>>> ----------- >>>> (All done on linux/x86-32) >>>> >>>> A quick summary of the results are that for the 'nofib' benchmark suite, >>>> the >>>> llvm code generator was 3.8% slower than the NCG (the C code generator >>>> was >>>> 6.9% slower than the NCG). The DPH project includes a benchmark suite >>>> which >>>> I also ran and for this type of code using the llvm back-end shortened >>>> the >>>> runtime by an average of 25% compared to the NCG. Also, while not >>>> included >>>> in my thesis paper as I ran out of time, I did do some benchmarking with >>>> the >>>> 'nobench' benchmark suite. It gave performance ratios for the back-ends >>>> of >>>> around: >>>> >>>> NCG : 1.11 >>>> C : 1.05 >>>> LLVM : 1.14 >>>> >>>> >>>> Supported Platforms& 'Correctness' >>>> ----------------------------------- >>>> >>>> Linux x86-32/x86-64 are currently well supported. The back-end can pass >>>> the >>>> test suite and build a working version of GHC (bootstrap test). >>>> >>>> Mac OS X 10.5 currently has a rather nasty bug with any dynamic lib calls >>>> (all libffi stuff) [due to the stack not being 16byte aligned when the >>>> calls >>>> are made as required by OSX ABI for the curious]. Test suite passes >>>> except >>>> for most the ffi tests. >>>> >>>> Other platforms haven't been tested at all. As using the back-end with a >>>> registered build of GHC requires a modified version of LLVM, people >>>> wanting >>>> to try it out on those platforms will need to either make the needed >>>> changes >>>> to LLVM themselves, or use an unregistered build of GHC which will work >>>> with >>>> a vanilla install of LLVM. (A patch for LLVM for x86 is linked to below.) >>>> >>>> Validate >>>> -------- >>>> >>>> I've validated my GHC patch to make sure it won't break anything. This is >>>> just compiling and running GHC normally but with the llvm back-end code >>>> included. It doesn't actually test the llvm code generator, just makes >>>> sure >>>> it hasn't broken the NCG or C code generator. >>>> >>>> Linux/x86-32: >>>> >>>> OVERALL SUMMARY for test run started at Do 18. Feb 11:21:48 EST 2010 >>>> 2457 total tests, which gave rise to >>>> 9738 test cases, of which >>>> 0 caused framework failures >>>> 7573 were skipped >>>> >>>> 2088 expected passes >>>> 76 expected failures >>>> 0 unexpected passes >>>> 1 unexpected failures >>>> >>>> Unexpected failures: >>>> user001(normal) >>>> >>>> Linux/x86-64: >>>> >>>> OVERALL SUMMARY for test run started at Thu 18 Feb 15:28:32 EST 2010 >>>> 2458 total tests, which gave rise to >>>> 9739 test cases, of which >>>> 0 caused framework failures >>>> 7574 were skipped >>>> >>>> 2087 expected passes >>>> 77 expected failures >>>> 0 unexpected passes >>>> 1 unexpected failures >>>> >>>> Unexpected failures: >>>> T1969(normal) >>>> >>>> Mac OS X 10.5/x86-32: >>>> >>>> OVERALL SUMMARY for test run started at Thu Feb 18 12:35:49 EST 2010 >>>> 2458 total tests, which gave rise to >>>> 9122 test cases, of which >>>> 0 caused framework failures >>>> 6959 were skipped >>>> >>>> 2085 expected passes >>>> 76 expected failures >>>> 0 unexpected passes >>>> 2 unexpected failures >>>> >>>> Unexpected failures: >>>> T1969(normal) >>>> ffi005(optc) >>>> >>>> All of the test failures fail for me with a unmodified GHC head build as >>>> well as when the llvm patch is included, so the llvm patch isn't >>>> introducing >>>> any new failures. >>>> >>>> Installing >>>> ---------- >>>> >>>> http://www.cse.unsw.edu.au/~davidt/downloads/ghc-llvmbackend-full.gz >>>> >>>> Apply the darcs patch linked above to GHC head. This will make some >>>> changes >>>> across GHC, with the bulk of the new code ending up in >>>> 'compiler/llvmGen'. >>>> >>>> To build GHC you need to add two flags to build.mk, they are: >>>> >>>> GhcWithLlvmCodeGen = YES >>>> GhcEnableTablesNextToCode = NO >>>> >>>> The llvm code generator doesn't support at this time the >>>> TABLES_NEXT_TO_CODE >>>> optimisation due to limitations with LLVM. >>>> >>>> You will also need LLVM installed on your computer to use the back-end. >>>> Version 2.6 or SVN trunk is supported. If you want to use the back-end in >>>> an >>>> unregistered ghc build, then you can use a vanilla build of LLVM. However >>>> if >>>> you want to use a registered ghc build (very likely) then you need to >>>> patch >>>> LLVM for this to work. The patch for llvm can be found here: >>>> >>>> http://www.cse.unsw.edu.au/~davidt/downloads/llvm-ghc.patch >>>> >>>> LLVM is very easy to build and install. It can be done as follows: >>>> >>>> $ svn co http://llvm.org/svn/llvm-project/llvm/trunk llvm >>>> $ cd llvm >>>> $ patch -p0 -i ~/llvm-ghc.patch >>>> $ ./configure --enable-optimized # probably also want to set --prefix >>>> $ make >>>> $ make install >>>> >>>> Just make sure this modified version of LLVM is on your path and takes >>>> precedence over any other builds. >>>> >>>> Using >>>> ----- >>>> >>>> Once GHC is built, you can trigger GHC to use the LLVM back-end with the >>>> '-fllvm' flag. There is also a new '-ddump-llvm' which will dump out the >>>> llvm IR code generated. (or use the '-keep-tmp-files' flag). >>>> >>>> 'ghc --info' should also now report that it includes the llvm code >>>> generator. >>>> >>>> Issues >>>> ------ >>>> Issues that might need to be resolved before merging the patch: >>>> >>>> 1. Developed in isolation by 1 person with no Haskell knowledge at first. >>>> So >>>> usual issues with that may apply, misused data structures, bad style... >>>> ect. >>>> Criticisms of the code are very welcome. There are some specific notes on >>>> what I think may be wrong with the code atm in 'compiler/llvmGen/NOTES'. >>>> >>>> 2. The back-end has a LLVM binding of sorts, this binding is similar in >>>> design to say the Cmm representation used in GHC. It represents the LLVM >>>> Assembly language using a collection of data types and can pretty print >>>> it >>>> out correctly. This binding lives in the 'compiler/llvmGen/Llvm' folder. >>>> Should this binding be split out into a separate library? >>>> >>>> 3. As mentioned above, LLVM needs to be patched to work with a registered >>>> build of GHC. If the llvm back-end was merged, how would this be handled? >>>> I >>>> would suggest simply carrying the patch with some instructions on how to >>>> use >>>> it in the GHC repo. People using GHC head could be expected to grab the >>>> LLVM >>>> source code and apply the patch themselves at this stage. >>>> >>>> 4. Finally this email is long. I need to put all this info into a web >>>> page >>>> as well. >>>> >>>> Cheers, >>>> David >>>> >>>> _______________________________________________ >>>> Cvs-ghc mailing list >>>> Cvs-ghc@haskell.org >>>> http://www.haskell.org/mailman/listinfo/cvs-ghc >>>> >> >> > _______________________________________________ Cvs-ghc mailing list Cvs-ghc@haskell.org http://www.haskell.org/mailman/listinfo/cvs-ghc