First of all, congrats to David on getting this to the point where it can be merged. Nice work!

It's clear we want this in GHC, so let's look at what needs to be done, and the potential sticking points.

We should make LLVM optional for the time being, since it entails turning off TABLES_NEXT_TO_CODE which impacts performance of the -fasm backend. This is the main sticking point I believe: we have to resolve this question one way or the other. Either we stop using TABLES_NEXT_TO_CODE altogether, or we add support to for it to LLVM. My preference would be the latter, especially since we're already patching LLVM, but then I know very little about LLVM or whether they'd be likely to accept patches to do TNTC. If the LLVM backend did TNTC, then it would probably consistently beat the NCG (albeit with longer compile times).

I'm currently running some benchmarks to see how much impact turning off TNTC has on the -fasm backend.

For the time being we can pull LLVM into ghc-tarballs as Manuel suggested, and modify GHC to use the local LLVM. This is fine for experimentation, but not as a long-term solution. It will cause problems for distros like Debian which have a policy that disallows bundled copies of libs, so the long-term strategy should be to remove the bundled LLVM, which means that any changes we make to LLVM should be potentially acceptable upstream.

I presume the calling-convention changes can be made in an acceptable way, by defining a new calling convention rather than modifying the exisitng C one? Does LLVM allow new calling conventions to be defined dynamically, without patchig LLVM itself?

Cheers,
        Simon


On 18/02/2010 23:55, David Terei wrote:
Hi all,

Over the last 6 months I've been working on a new code generator for GHC
which targets the LLVM compiler infrastructure. Most of the work was
done as part of an honours thesis at the University of New South Wales
under the supervision of Manuel Chakravarty. This ended at the start of
November but I have continued to work on the code generator since
(although at a much reduced pace since I'm back at full time work). Its
now at a stage where I feel pretty confident in its correctness and
would love to see it merged into GHC mainline.

The patch for the llvm back-end can be found here (should apply cleanly
to GHC head):

http://www.cse.unsw.edu.au/~davidt/downloads/ghc-llvmbackend-full.gz

This is what I would like/am requesting merged into GHC head.

The thesis paper which offers a detailed performance evaluation, as well
as the motivation and design of the back-end can be found at:

http://www.cse.unsw.edu.au/~pls/thesis/davidt-thesis.pdf

Below I'll quickly detail out the important points though. There are
also instructions on how to get started with the back-end.

Finally there are also some issues that I think may need to be sorted
out before a merge could be done. They are at the end.

Performance
-----------
(All done on linux/x86-32)

A quick summary of the results are that for the 'nofib' benchmark suite,
the llvm code generator was 3.8% slower than the NCG (the C code
generator was 6.9% slower than the NCG). The DPH project includes a
benchmark suite which I also ran and for this type of code using the
llvm back-end shortened the runtime by an average of 25% compared to the
NCG. Also, while not included in my thesis paper as I ran out of time, I
did do some benchmarking with the 'nobench' benchmark suite. It gave
performance ratios for the back-ends of around:

NCG : 1.11
C : 1.05
LLVM : 1.14


Supported Platforms & 'Correctness'
-----------------------------------

Linux x86-32/x86-64 are currently well supported. The back-end can pass
the test suite and build a working version of GHC (bootstrap test).

Mac OS X 10.5 currently has a rather nasty bug with any dynamic lib
calls (all libffi stuff) [due to the stack not being 16byte aligned when
the calls are made as required by OSX ABI for the curious]. Test suite
passes except for most the ffi tests.

Other platforms haven't been tested at all. As using the back-end with a
registered build of GHC requires a modified version of LLVM, people
wanting to try it out on those platforms will need to either make the
needed changes to LLVM themselves, or use an unregistered build of GHC
which will work with a vanilla install of LLVM. (A patch for LLVM for
x86 is linked to below.)

Validate
--------

I've validated my GHC patch to make sure it won't break anything. This
is just compiling and running GHC normally but with the llvm back-end
code included. It doesn't actually test the llvm code generator, just
makes sure it hasn't broken the NCG or C code generator.

Linux/x86-32:

OVERALL SUMMARY for test run started at Do 18. Feb 11:21:48 EST 2010
2457 total tests, which gave rise to
9738 test cases, of which
0 caused framework failures
7573 were skipped

2088 expected passes
76 expected failures
0 unexpected passes
1 unexpected failures

Unexpected failures:
user001(normal)

Linux/x86-64:

OVERALL SUMMARY for test run started at Thu 18 Feb 15:28:32 EST 2010
2458 total tests, which gave rise to
9739 test cases, of which
0 caused framework failures
7574 were skipped

2087 expected passes
77 expected failures
0 unexpected passes
1 unexpected failures

Unexpected failures:
T1969(normal)

Mac OS X 10.5/x86-32:

OVERALL SUMMARY for test run started at Thu Feb 18 12:35:49 EST 2010
2458 total tests, which gave rise to
9122 test cases, of which
0 caused framework failures
6959 were skipped

2085 expected passes
76 expected failures
0 unexpected passes
2 unexpected failures

Unexpected failures:
T1969(normal)
ffi005(optc)

All of the test failures fail for me with a unmodified GHC head build as
well as when the llvm patch is included, so the llvm patch isn't
introducing any new failures.

Installing
----------

http://www.cse.unsw.edu.au/~davidt/downloads/ghc-llvmbackend-full.gz

Apply the darcs patch linked above to GHC head. This will make some
changes across GHC, with the bulk of the new code ending up in
'compiler/llvmGen'.

To build GHC you need to add two flags to build.mk, they are:

GhcWithLlvmCodeGen = YES
GhcEnableTablesNextToCode = NO

The llvm code generator doesn't support at this time the
TABLES_NEXT_TO_CODE optimisation due to limitations with LLVM.

You will also need LLVM installed on your computer to use the back-end.
Version 2.6 or SVN trunk is supported. If you want to use the back-end
in an unregistered ghc build, then you can use a vanilla build of LLVM.
However if you want to use a registered ghc build (very likely) then you
need to patch LLVM for this to work. The patch for llvm can be found here:

http://www.cse.unsw.edu.au/~davidt/downloads/llvm-ghc.patch

LLVM is very easy to build and install. It can be done as follows:

$ svn co http://llvm.org/svn/llvm-project/llvm/trunk llvm
$ cd llvm
$ patch -p0 -i ~/llvm-ghc.patch
$ ./configure --enable-optimized # probably also want to set --prefix
$ make
$ make install

Just make sure this modified version of LLVM is on your path and takes
precedence over any other builds.

Using
-----

Once GHC is built, you can trigger GHC to use the LLVM back-end with the
'-fllvm' flag. There is also a new '-ddump-llvm' which will dump out the
llvm IR code generated. (or use the '-keep-tmp-files' flag).

'ghc --info' should also now report that it includes the llvm code
generator.

Issues
------
Issues that might need to be resolved before merging the patch:

1. Developed in isolation by 1 person with no Haskell knowledge at
first. So usual issues with that may apply, misused data structures, bad
style... ect. Criticisms of the code are very welcome. There are some
specific notes on what I think may be wrong with the code atm in
'compiler/llvmGen/NOTES'.

2. The back-end has a LLVM binding of sorts, this binding is similar in
design to say the Cmm representation used in GHC. It represents the LLVM
Assembly language using a collection of data types and can pretty print
it out correctly. This binding lives in the 'compiler/llvmGen/Llvm'
folder. Should this binding be split out into a separate library?

3. As mentioned above, LLVM needs to be patched to work with a
registered build of GHC. If the llvm back-end was merged, how would this
be handled? I would suggest simply carrying the patch with some
instructions on how to use it in the GHC repo. People using GHC head
could be expected to grab the LLVM source code and apply the patch
themselves at this stage.

4. Finally this email is long. I need to put all this info into a web
page as well.

Cheers,
David

_______________________________________________
Cvs-ghc mailing list
Cvs-ghc@haskell.org
http://www.haskell.org/mailman/listinfo/cvs-ghc

_______________________________________________
Cvs-ghc mailing list
Cvs-ghc@haskell.org
http://www.haskell.org/mailman/listinfo/cvs-ghc

Reply via email to