Re: Merge Request: LLVM Code Generator for GHC

Max Bolingbroke Mon, 01 Mar 2010 02:54:33 -0800

David replied to my query about LLVM as below, which I'm forwarding to the list.


It sounds like there are two potential LLVM based GSoC projects:

Project 1: LLVM optimisation passes
~~~~~~~~~~~~~~~~~~~~~~~~~~~

1) Clearly identify issues with LLVM produced assembly code in the
context of GHC
  * This can be done by examining how it compares to the native code
generator on nofib benchmarks
  * You might be able to get some mileage from simply eyeballing the
assembly and looking for "obvious" stupidity, like the ESP issue David
spotted
  * The result of this part should be a simple set of Haskell test
cases, the assembly they produced, what the assembly *should* be
(roughly) and perhaps some notes on what might fix it
2) The second part would be to identify the lowest hanging fruit from
those things identified in 1) and make changes to the LLVM output /
write LLVM optimisations (apparently this is a joy to do, the LLVM
framework is very well designed) to fix the issues

Separating the project into two parts like this means that we could
get something out of the project even if the student is unable to make
significant progress with LLVM itself in the GSoC timeframe. Having a
clear description of the problems involved + simple benchmark programs
would be a huge help to someone attempting part 2) in the future, or
they could serve as the basis for feature requests to the LLVM
developers.

Project 2: Tables next to code
~~~~~~~~~~~~~~~~~~~~~~

My feeling is that this is the more challenging of the two projects,
as it is likely to touch more of LLVM / GHC. However, it is likely to
yield a solid 6% speedup. It seems there are two implementation
options:

1) Postprocessor ala the Evil Mangler (a nice self contained project,
but not the best long term solution)
2) Modify LLVM to support this feature

Do either of these seem like realistic GSoC projects? I would be
willing to mentor a potential student on either one, though I'll need
to do some learning about LLVM to keep up :-)

Cheers,
Max

On 1 March 2010 05:12, David Terei <davidte...@gmail.com> wrote:
> Hi Max,
>
> To be honest, I don't know (especially since the GSOC involves money). The
> indications are that, yes there might be some quick gains such as fixing up
> the ESP issue. Other 'indications' that some tuning could be done are these
> comments by Chris Lattner:
>
> http://groups.google.com/group/llvm-dev/browse_thread/thread/d05d4a80dd245c51/9eef60d9c05636d6?lnk=gst&q=ghc#9eef60d9c05636d6
>
> There is also the TABLES_NEXT_TO_CODE optimisation that is used in Ghc but
> can't be used with the llvm back-end as LLVM doesn't support the features we
> need to implement this (control over data/function layout in object files).
> That costs us about 5% so fixing that by adding the needed features to LLVM
> would be great! I'm not sure if it would be a quick fix though, it would be
> challenging but probably doable in a GSOC timeframe (although that may be
> all they would do).
>
> This is all speculative though and I think any project would be speculative
> to some degree. If there was a very good mentor then this might work out but
> could also fail quite badly. My knowledge of both GHC and LLVM is fairly
> specialised and doesn't include any knowledge really of optimisation passes.
>
> If the student was capable though this could be part of this project. One
> thing I meant to do but never had time to as part of my thesis was to look
> at some of the nofib benchmarks for which LLVM performs quite badly at and
> identify the problem. Even if there isn't time to fix it if we had a list of
> performance regressions (with reasons) for the llvm code generator (vs the
> other code generators) this itself would be very useful.
>
> Sorry, this probably doesn't help much but I've been focused on getting
> ghc+llvm working, not really worrying about what comes out of it too much. I
> would suggest perhaps talking to the both the GHC developers and LLVM
> developers, they probably won't be able to give a great answer either but
> with this and theirs combined it should give a good enough idea for a
> decision.
>
> Cheers,
> David
>
> Max Bolingbroke wrote:
>>
>> Hi David,
>>
>> This is great stuff of course!
>>
>> I notice that you have identified some problems with the generated
>> code (e.g. the ESP manipulation stuff you point to at the end of
>> http://groups.google.com/group/llvm-dev/msg/28b99513bcc38e0f).
>>
>> Do you think there are some quick wins here that could be had by
>> writing some optimisation pass plugins for LLVM that tackle the kind
>> of performance bugs commonly seen by code originating from GHC? Is so,
>> this might be a suitable project for a Summer of Code student with
>> some previous knowledge of compiler design.
>>
>> Cheers,
>> Max
>>
>> On 18 February 2010 23:55, David Terei<davidte...@gmail.com>  wrote:
>>>
>>> Hi all,
>>>
>>> Over the last 6 months I've been working on a new code generator for GHC
>>> which targets the LLVM compiler infrastructure. Most of the work was done
>>> as
>>> part of an honours thesis at the University of New South Wales under the
>>> supervision of Manuel Chakravarty. This ended at the start of November
>>> but I
>>> have continued to work on the code generator since (although at a much
>>> reduced pace since I'm back at full time work). Its now at a stage where
>>> I
>>> feel pretty confident in its correctness and would love to see it merged
>>> into GHC mainline.
>>>
>>> The patch for the llvm back-end can be found here (should apply cleanly
>>> to
>>> GHC head):
>>>
>>> http://www.cse.unsw.edu.au/~davidt/downloads/ghc-llvmbackend-full.gz
>>>
>>> This is what I would like/am requesting merged into GHC head.
>>>
>>> The thesis paper which offers a detailed performance evaluation, as well
>>> as
>>> the motivation and design of the back-end can be found at:
>>>
>>> http://www.cse.unsw.edu.au/~pls/thesis/davidt-thesis.pdf
>>>
>>> Below I'll quickly detail out the important points though. There are also
>>> instructions on how to get started with the back-end.
>>>
>>> Finally there are also some issues that I think may need to be sorted out
>>> before a merge could be done. They are at the end.
>>>
>>> Performance
>>> -----------
>>> (All done on linux/x86-32)
>>>
>>> A quick summary of the results are that for the 'nofib' benchmark suite,
>>> the
>>> llvm code generator was 3.8% slower than the NCG (the C code generator
>>> was
>>> 6.9% slower than the NCG). The DPH project includes a benchmark suite
>>> which
>>> I also ran and for this type of code using the llvm back-end shortened
>>> the
>>> runtime by an average of 25% compared to the NCG. Also, while not
>>> included
>>> in my thesis paper as I ran out of time, I did do some benchmarking with
>>> the
>>> 'nobench' benchmark suite. It gave performance ratios for the back-ends
>>> of
>>> around:
>>>
>>> NCG : 1.11
>>> C : 1.05
>>> LLVM : 1.14
>>>
>>>
>>> Supported Platforms&  'Correctness'
>>> -----------------------------------
>>>
>>> Linux x86-32/x86-64 are currently well supported. The back-end can pass
>>> the
>>> test suite and build a working version of GHC (bootstrap test).
>>>
>>> Mac OS X 10.5 currently has a rather nasty bug with any dynamic lib calls
>>> (all libffi stuff) [due to the stack not being 16byte aligned when the
>>> calls
>>> are made as required by OSX ABI for the curious]. Test suite passes
>>> except
>>> for most the ffi tests.
>>>
>>> Other platforms haven't been tested at all. As using the back-end with a
>>> registered build of GHC requires a modified version of LLVM, people
>>> wanting
>>> to try it out on those platforms will need to either make the needed
>>> changes
>>> to LLVM themselves, or use an unregistered build of GHC which will work
>>> with
>>> a vanilla install of LLVM. (A patch for LLVM for x86 is linked to below.)
>>>
>>> Validate
>>> --------
>>>
>>> I've validated my GHC patch to make sure it won't break anything. This is
>>> just compiling and running GHC normally but with the llvm back-end code
>>> included. It doesn't actually test the llvm code generator, just makes
>>> sure
>>> it hasn't broken the NCG or C code generator.
>>>
>>> Linux/x86-32:
>>>
>>> OVERALL SUMMARY for test run started at Do 18. Feb 11:21:48 EST 2010
>>> 2457 total tests, which gave rise to
>>> 9738 test cases, of which
>>> 0 caused framework failures
>>> 7573 were skipped
>>>
>>> 2088 expected passes
>>> 76 expected failures
>>> 0 unexpected passes
>>> 1 unexpected failures
>>>
>>> Unexpected failures:
>>> user001(normal)
>>>
>>> Linux/x86-64:
>>>
>>> OVERALL SUMMARY for test run started at Thu 18 Feb 15:28:32 EST 2010
>>> 2458 total tests, which gave rise to
>>> 9739 test cases, of which
>>> 0 caused framework failures
>>> 7574 were skipped
>>>
>>> 2087 expected passes
>>> 77 expected failures
>>> 0 unexpected passes
>>> 1 unexpected failures
>>>
>>> Unexpected failures:
>>> T1969(normal)
>>>
>>> Mac OS X 10.5/x86-32:
>>>
>>> OVERALL SUMMARY for test run started at Thu Feb 18 12:35:49 EST 2010
>>> 2458 total tests, which gave rise to
>>> 9122 test cases, of which
>>> 0 caused framework failures
>>> 6959 were skipped
>>>
>>> 2085 expected passes
>>> 76 expected failures
>>> 0 unexpected passes
>>> 2 unexpected failures
>>>
>>> Unexpected failures:
>>> T1969(normal)
>>> ffi005(optc)
>>>
>>> All of the test failures fail for me with a unmodified GHC head build as
>>> well as when the llvm patch is included, so the llvm patch isn't
>>> introducing
>>> any new failures.
>>>
>>> Installing
>>> ----------
>>>
>>> http://www.cse.unsw.edu.au/~davidt/downloads/ghc-llvmbackend-full.gz
>>>
>>> Apply the darcs patch linked above to GHC head. This will make some
>>> changes
>>> across GHC, with the bulk of the new code ending up in
>>> 'compiler/llvmGen'.
>>>
>>> To build GHC you need to add two flags to build.mk, they are:
>>>
>>> GhcWithLlvmCodeGen = YES
>>> GhcEnableTablesNextToCode = NO
>>>
>>> The llvm code generator doesn't support at this time the
>>> TABLES_NEXT_TO_CODE
>>> optimisation due to limitations with LLVM.
>>>
>>> You will also need LLVM installed on your computer to use the back-end.
>>> Version 2.6 or SVN trunk is supported. If you want to use the back-end in
>>> an
>>> unregistered ghc build, then you can use a vanilla build of LLVM. However
>>> if
>>> you want to use a registered ghc build (very likely) then you need to
>>> patch
>>> LLVM for this to work. The patch for llvm can be found here:
>>>
>>> http://www.cse.unsw.edu.au/~davidt/downloads/llvm-ghc.patch
>>>
>>> LLVM is very easy to build and install. It can be done as follows:
>>>
>>> $ svn co http://llvm.org/svn/llvm-project/llvm/trunk llvm
>>> $ cd llvm
>>> $ patch -p0 -i ~/llvm-ghc.patch
>>> $ ./configure --enable-optimized # probably also want to set --prefix
>>> $ make
>>> $ make install
>>>
>>> Just make sure this modified version of LLVM is on your path and takes
>>> precedence over any other builds.
>>>
>>> Using
>>> -----
>>>
>>> Once GHC is built, you can trigger GHC to use the LLVM back-end with the
>>> '-fllvm' flag. There is also a new '-ddump-llvm' which will dump out the
>>> llvm IR code generated. (or use the '-keep-tmp-files' flag).
>>>
>>> 'ghc --info' should also now report that it includes the llvm code
>>> generator.
>>>
>>> Issues
>>> ------
>>> Issues that might need to be resolved before merging the patch:
>>>
>>> 1. Developed in isolation by 1 person with no Haskell knowledge at first.
>>> So
>>> usual issues with that may apply, misused data structures, bad style...
>>> ect.
>>> Criticisms of the code are very welcome. There are some specific notes on
>>> what I think may be wrong with the code atm in 'compiler/llvmGen/NOTES'.
>>>
>>> 2. The back-end has a LLVM binding of sorts, this binding is similar in
>>> design to say the Cmm representation used in GHC. It represents the LLVM
>>> Assembly language using a collection of data types and can pretty print
>>> it
>>> out correctly. This binding lives in the 'compiler/llvmGen/Llvm' folder.
>>> Should this binding be split out into a separate library?
>>>
>>> 3. As mentioned above, LLVM needs to be patched to work with a registered
>>> build of GHC. If the llvm back-end was merged, how would this be handled?
>>> I
>>> would suggest simply carrying the patch with some instructions on how to
>>> use
>>> it in the GHC repo. People using GHC head could be expected to grab the
>>> LLVM
>>> source code and apply the patch themselves at this stage.
>>>
>>> 4. Finally this email is long. I need to put all this info into a web
>>> page
>>> as well.
>>>
>>> Cheers,
>>> David
>>>
>>> _______________________________________________
>>> Cvs-ghc mailing list
>>> Cvs-ghc@haskell.org
>>> http://www.haskell.org/mailman/listinfo/cvs-ghc
>>>
>
>

_______________________________________________
Cvs-ghc mailing list
Cvs-ghc@haskell.org
http://www.haskell.org/mailman/listinfo/cvs-ghc

Re: Merge Request: LLVM Code Generator for GHC

Reply via email to