Re: Merge Request: LLVM Code Generator for GHC

Max Bolingbroke Mon, 01 Mar 2010 02:56:12 -0800

I forgot to mention: I wonder if it might be worth looking at link
time optimisation, ala llvm-gcc? i.e. store LLVM bitcode in the .o
files, and substitute the llvm linker for standard ld in order to
generate assembly at link time. I suspect that GHC itself is getting
most of the cross-module inlining opportunities, but checking might be
a relatively cheap experiment to do.


Cheers,
Max

On 1 March 2010 10:54, Max Bolingbroke <batterseapo...@hotmail.com> wrote:
> David replied to my query about LLVM as below, which I'm forwarding to the 
> list.
>
> It sounds like there are two potential LLVM based GSoC projects:
>
> Project 1: LLVM optimisation passes
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> 1) Clearly identify issues with LLVM produced assembly code in the
> context of GHC
>  * This can be done by examining how it compares to the native code
> generator on nofib benchmarks
>  * You might be able to get some mileage from simply eyeballing the
> assembly and looking for "obvious" stupidity, like the ESP issue David
> spotted
>  * The result of this part should be a simple set of Haskell test
> cases, the assembly they produced, what the assembly *should* be
> (roughly) and perhaps some notes on what might fix it
> 2) The second part would be to identify the lowest hanging fruit from
> those things identified in 1) and make changes to the LLVM output /
> write LLVM optimisations (apparently this is a joy to do, the LLVM
> framework is very well designed) to fix the issues
>
> Separating the project into two parts like this means that we could
> get something out of the project even if the student is unable to make
> significant progress with LLVM itself in the GSoC timeframe. Having a
> clear description of the problems involved + simple benchmark programs
> would be a huge help to someone attempting part 2) in the future, or
> they could serve as the basis for feature requests to the LLVM
> developers.
>
> Project 2: Tables next to code
> ~~~~~~~~~~~~~~~~~~~~~~
>
> My feeling is that this is the more challenging of the two projects,
> as it is likely to touch more of LLVM / GHC. However, it is likely to
> yield a solid 6% speedup. It seems there are two implementation
> options:
>
> 1) Postprocessor ala the Evil Mangler (a nice self contained project,
> but not the best long term solution)
> 2) Modify LLVM to support this feature
>
> Do either of these seem like realistic GSoC projects? I would be
> willing to mentor a potential student on either one, though I'll need
> to do some learning about LLVM to keep up :-)
>
> Cheers,
> Max
>
> On 1 March 2010 05:12, David Terei <davidte...@gmail.com> wrote:
>> Hi Max,
>>
>> To be honest, I don't know (especially since the GSOC involves money). The
>> indications are that, yes there might be some quick gains such as fixing up
>> the ESP issue. Other 'indications' that some tuning could be done are these
>> comments by Chris Lattner:
>>
>> http://groups.google.com/group/llvm-dev/browse_thread/thread/d05d4a80dd245c51/9eef60d9c05636d6?lnk=gst&q=ghc#9eef60d9c05636d6
>>
>> There is also the TABLES_NEXT_TO_CODE optimisation that is used in Ghc but
>> can't be used with the llvm back-end as LLVM doesn't support the features we
>> need to implement this (control over data/function layout in object files).
>> That costs us about 5% so fixing that by adding the needed features to LLVM
>> would be great! I'm not sure if it would be a quick fix though, it would be
>> challenging but probably doable in a GSOC timeframe (although that may be
>> all they would do).
>>
>> This is all speculative though and I think any project would be speculative
>> to some degree. If there was a very good mentor then this might work out but
>> could also fail quite badly. My knowledge of both GHC and LLVM is fairly
>> specialised and doesn't include any knowledge really of optimisation passes.
>>
>> If the student was capable though this could be part of this project. One
>> thing I meant to do but never had time to as part of my thesis was to look
>> at some of the nofib benchmarks for which LLVM performs quite badly at and
>> identify the problem. Even if there isn't time to fix it if we had a list of
>> performance regressions (with reasons) for the llvm code generator (vs the
>> other code generators) this itself would be very useful.
>>
>> Sorry, this probably doesn't help much but I've been focused on getting
>> ghc+llvm working, not really worrying about what comes out of it too much. I
>> would suggest perhaps talking to the both the GHC developers and LLVM
>> developers, they probably won't be able to give a great answer either but
>> with this and theirs combined it should give a good enough idea for a
>> decision.
>>
>> Cheers,
>> David
>>
>> Max Bolingbroke wrote:
>>>
>>> Hi David,
>>>
>>> This is great stuff of course!
>>>
>>> I notice that you have identified some problems with the generated
>>> code (e.g. the ESP manipulation stuff you point to at the end of
>>> http://groups.google.com/group/llvm-dev/msg/28b99513bcc38e0f).
>>>
>>> Do you think there are some quick wins here that could be had by
>>> writing some optimisation pass plugins for LLVM that tackle the kind
>>> of performance bugs commonly seen by code originating from GHC? Is so,
>>> this might be a suitable project for a Summer of Code student with
>>> some previous knowledge of compiler design.
>>>
>>> Cheers,
>>> Max
>>>
>>> On 18 February 2010 23:55, David Terei<davidte...@gmail.com>  wrote:
>>>>
>>>> Hi all,
>>>>
>>>> Over the last 6 months I've been working on a new code generator for GHC
>>>> which targets the LLVM compiler infrastructure. Most of the work was done
>>>> as
>>>> part of an honours thesis at the University of New South Wales under the
>>>> supervision of Manuel Chakravarty. This ended at the start of November
>>>> but I
>>>> have continued to work on the code generator since (although at a much
>>>> reduced pace since I'm back at full time work). Its now at a stage where
>>>> I
>>>> feel pretty confident in its correctness and would love to see it merged
>>>> into GHC mainline.
>>>>
>>>> The patch for the llvm back-end can be found here (should apply cleanly
>>>> to
>>>> GHC head):
>>>>
>>>> http://www.cse.unsw.edu.au/~davidt/downloads/ghc-llvmbackend-full.gz
>>>>
>>>> This is what I would like/am requesting merged into GHC head.
>>>>
>>>> The thesis paper which offers a detailed performance evaluation, as well
>>>> as
>>>> the motivation and design of the back-end can be found at:
>>>>
>>>> http://www.cse.unsw.edu.au/~pls/thesis/davidt-thesis.pdf
>>>>
>>>> Below I'll quickly detail out the important points though. There are also
>>>> instructions on how to get started with the back-end.
>>>>
>>>> Finally there are also some issues that I think may need to be sorted out
>>>> before a merge could be done. They are at the end.
>>>>
>>>> Performance
>>>> -----------
>>>> (All done on linux/x86-32)
>>>>
>>>> A quick summary of the results are that for the 'nofib' benchmark suite,
>>>> the
>>>> llvm code generator was 3.8% slower than the NCG (the C code generator
>>>> was
>>>> 6.9% slower than the NCG). The DPH project includes a benchmark suite
>>>> which
>>>> I also ran and for this type of code using the llvm back-end shortened
>>>> the
>>>> runtime by an average of 25% compared to the NCG. Also, while not
>>>> included
>>>> in my thesis paper as I ran out of time, I did do some benchmarking with
>>>> the
>>>> 'nobench' benchmark suite. It gave performance ratios for the back-ends
>>>> of
>>>> around:
>>>>
>>>> NCG : 1.11
>>>> C : 1.05
>>>> LLVM : 1.14
>>>>
>>>>
>>>> Supported Platforms&  'Correctness'
>>>> -----------------------------------
>>>>
>>>> Linux x86-32/x86-64 are currently well supported. The back-end can pass
>>>> the
>>>> test suite and build a working version of GHC (bootstrap test).
>>>>
>>>> Mac OS X 10.5 currently has a rather nasty bug with any dynamic lib calls
>>>> (all libffi stuff) [due to the stack not being 16byte aligned when the
>>>> calls
>>>> are made as required by OSX ABI for the curious]. Test suite passes
>>>> except
>>>> for most the ffi tests.
>>>>
>>>> Other platforms haven't been tested at all. As using the back-end with a
>>>> registered build of GHC requires a modified version of LLVM, people
>>>> wanting
>>>> to try it out on those platforms will need to either make the needed
>>>> changes
>>>> to LLVM themselves, or use an unregistered build of GHC which will work
>>>> with
>>>> a vanilla install of LLVM. (A patch for LLVM for x86 is linked to below.)
>>>>
>>>> Validate
>>>> --------
>>>>
>>>> I've validated my GHC patch to make sure it won't break anything. This is
>>>> just compiling and running GHC normally but with the llvm back-end code
>>>> included. It doesn't actually test the llvm code generator, just makes
>>>> sure
>>>> it hasn't broken the NCG or C code generator.
>>>>
>>>> Linux/x86-32:
>>>>
>>>> OVERALL SUMMARY for test run started at Do 18. Feb 11:21:48 EST 2010
>>>> 2457 total tests, which gave rise to
>>>> 9738 test cases, of which
>>>> 0 caused framework failures
>>>> 7573 were skipped
>>>>
>>>> 2088 expected passes
>>>> 76 expected failures
>>>> 0 unexpected passes
>>>> 1 unexpected failures
>>>>
>>>> Unexpected failures:
>>>> user001(normal)
>>>>
>>>> Linux/x86-64:
>>>>
>>>> OVERALL SUMMARY for test run started at Thu 18 Feb 15:28:32 EST 2010
>>>> 2458 total tests, which gave rise to
>>>> 9739 test cases, of which
>>>> 0 caused framework failures
>>>> 7574 were skipped
>>>>
>>>> 2087 expected passes
>>>> 77 expected failures
>>>> 0 unexpected passes
>>>> 1 unexpected failures
>>>>
>>>> Unexpected failures:
>>>> T1969(normal)
>>>>
>>>> Mac OS X 10.5/x86-32:
>>>>
>>>> OVERALL SUMMARY for test run started at Thu Feb 18 12:35:49 EST 2010
>>>> 2458 total tests, which gave rise to
>>>> 9122 test cases, of which
>>>> 0 caused framework failures
>>>> 6959 were skipped
>>>>
>>>> 2085 expected passes
>>>> 76 expected failures
>>>> 0 unexpected passes
>>>> 2 unexpected failures
>>>>
>>>> Unexpected failures:
>>>> T1969(normal)
>>>> ffi005(optc)
>>>>
>>>> All of the test failures fail for me with a unmodified GHC head build as
>>>> well as when the llvm patch is included, so the llvm patch isn't
>>>> introducing
>>>> any new failures.
>>>>
>>>> Installing
>>>> ----------
>>>>
>>>> http://www.cse.unsw.edu.au/~davidt/downloads/ghc-llvmbackend-full.gz
>>>>
>>>> Apply the darcs patch linked above to GHC head. This will make some
>>>> changes
>>>> across GHC, with the bulk of the new code ending up in
>>>> 'compiler/llvmGen'.
>>>>
>>>> To build GHC you need to add two flags to build.mk, they are:
>>>>
>>>> GhcWithLlvmCodeGen = YES
>>>> GhcEnableTablesNextToCode = NO
>>>>
>>>> The llvm code generator doesn't support at this time the
>>>> TABLES_NEXT_TO_CODE
>>>> optimisation due to limitations with LLVM.
>>>>
>>>> You will also need LLVM installed on your computer to use the back-end.
>>>> Version 2.6 or SVN trunk is supported. If you want to use the back-end in
>>>> an
>>>> unregistered ghc build, then you can use a vanilla build of LLVM. However
>>>> if
>>>> you want to use a registered ghc build (very likely) then you need to
>>>> patch
>>>> LLVM for this to work. The patch for llvm can be found here:
>>>>
>>>> http://www.cse.unsw.edu.au/~davidt/downloads/llvm-ghc.patch
>>>>
>>>> LLVM is very easy to build and install. It can be done as follows:
>>>>
>>>> $ svn co http://llvm.org/svn/llvm-project/llvm/trunk llvm
>>>> $ cd llvm
>>>> $ patch -p0 -i ~/llvm-ghc.patch
>>>> $ ./configure --enable-optimized # probably also want to set --prefix
>>>> $ make
>>>> $ make install
>>>>
>>>> Just make sure this modified version of LLVM is on your path and takes
>>>> precedence over any other builds.
>>>>
>>>> Using
>>>> -----
>>>>
>>>> Once GHC is built, you can trigger GHC to use the LLVM back-end with the
>>>> '-fllvm' flag. There is also a new '-ddump-llvm' which will dump out the
>>>> llvm IR code generated. (or use the '-keep-tmp-files' flag).
>>>>
>>>> 'ghc --info' should also now report that it includes the llvm code
>>>> generator.
>>>>
>>>> Issues
>>>> ------
>>>> Issues that might need to be resolved before merging the patch:
>>>>
>>>> 1. Developed in isolation by 1 person with no Haskell knowledge at first.
>>>> So
>>>> usual issues with that may apply, misused data structures, bad style...
>>>> ect.
>>>> Criticisms of the code are very welcome. There are some specific notes on
>>>> what I think may be wrong with the code atm in 'compiler/llvmGen/NOTES'.
>>>>
>>>> 2. The back-end has a LLVM binding of sorts, this binding is similar in
>>>> design to say the Cmm representation used in GHC. It represents the LLVM
>>>> Assembly language using a collection of data types and can pretty print
>>>> it
>>>> out correctly. This binding lives in the 'compiler/llvmGen/Llvm' folder.
>>>> Should this binding be split out into a separate library?
>>>>
>>>> 3. As mentioned above, LLVM needs to be patched to work with a registered
>>>> build of GHC. If the llvm back-end was merged, how would this be handled?
>>>> I
>>>> would suggest simply carrying the patch with some instructions on how to
>>>> use
>>>> it in the GHC repo. People using GHC head could be expected to grab the
>>>> LLVM
>>>> source code and apply the patch themselves at this stage.
>>>>
>>>> 4. Finally this email is long. I need to put all this info into a web
>>>> page
>>>> as well.
>>>>
>>>> Cheers,
>>>> David
>>>>
>>>> _______________________________________________
>>>> Cvs-ghc mailing list
>>>> Cvs-ghc@haskell.org
>>>> http://www.haskell.org/mailman/listinfo/cvs-ghc
>>>>
>>
>>
>

_______________________________________________
Cvs-ghc mailing list
Cvs-ghc@haskell.org
http://www.haskell.org/mailman/listinfo/cvs-ghc

Re: Merge Request: LLVM Code Generator for GHC

Reply via email to