Simon Marlow: > On 22/02/2010 16:49, Simon Marlow wrote: >> On 22/02/2010 12:34, Simon Marlow wrote: >> >>> I'm currently running some benchmarks to see how much impact turning off >>> TNTC has on the -fasm backend. >> >> Here are the results on x86-64/Linux: > [ snip ] >> -------------------------------------------------------------------------------- >> >> Mi +4.7% -0.0% -0.6% -1.7% >> Max +8.9% +0.0% +16.9% +13.8% >> Geometric Mean +6.1% -0.0% +4.9% +4.2% > > and here are the results on x86/Linux: > > -------------------------------------------------------------------------------- > Program Size Allocs Runtime Elapsed > -------------------------------------------------------------------------------- > anna +6.9% +0.0% +7.1% +7.4% > ansi +4.3% +0.0% 0.00 0.00 > atom +4.5% +0.0% +23.6% +21.7% > awards +4.2% +0.0% 0.00 0.00 > banner +3.5% +0.0% 0.00 0.00 > bernouilli +4.2% +0.0% +2.7% +1.8% > boyer +4.3% +0.0% 0.10 0.11 > boyer2 +4.1% +0.0% 0.01 0.02 > bspt +5.5% +0.0% 0.02 0.02 > cacheprof +5.3% +0.0% +3.1% +3.0% > calendar +4.2% +0.0% 0.00 0.00 > cichelli +4.2% +0.0% 0.19 0.22 > circsim +4.6% +0.0% +3.3% +2.5% > clausify +4.3% +0.0% 0.07 0.09 > comp_lab_zift +4.5% +0.0% +15.3% +14.4% > compress +4.4% +0.0% +4.1% +4.3% > compress2 +4.3% +0.0% +0.5% +0.4% > constraints +4.5% +0.0% +6.4% +5.9% > cryptarithm1 +3.8% +0.0% +5.3% +3.3% > cryptarithm2 +4.0% +0.0% 0.03 0.03 > cse +3.9% +0.0% 0.00 0.00 > eliza +3.6% +0.0% 0.00 0.00 > event +4.3% +0.0% +7.9% +7.5% > exp3_8 +4.2% +0.0% +17.8% +13.3% > expert +4.1% +0.0% 0.00 0.00 > fem +5.5% +0.0% 0.06 0.06 > fft +4.6% +0.0% 0.09 0.10 > fft2 +4.9% +0.0% 0.22 +12.3% > fibheaps +4.3% +0.0% 0.08 0.08 > fish +4.0% +0.0% 0.05 0.06 > fluid +6.3% +0.0% 0.02 0.02 > fulsom +6.1% +0.0% +3.4% +3.2% > gamteb +5.0% +0.0% 0.19 0.21 > gcd +4.2% +0.0% 0.06 0.07 > gen_regexps +4.0% +0.0% 0.00 0.00 > genfft +4.2% +0.0% 0.09 0.10 > gg +5.1% +0.0% 0.03 0.03 > grep +4.5% +0.0% 0.00 0.00 > hidden +5.7% (stdout) (stdout) (stdout) > hpg +5.2% +0.0% +6.1% +2.0% > ida +4.4% +0.0% +10.2% +6.6% > infer +4.9% +0.0% 0.13 0.14 > integer +4.2% +0.0% +1.2% -0.2% > integrate +4.6% +0.0% +4.9% +5.0% > knights +4.6% +0.0% 0.01 0.01 > lcss +4.2% +0.0% +8.5% +7.7% > life +3.8% +0.0% +23.8% +19.5% > lift +4.5% +0.0% 0.00 0.00 > listcompr +3.8% +0.0% +5.3% +4.7% > listcopy +3.8% +0.0% +5.7% +6.3% > maillist +4.0% +0.0% 0.15 +6.1% > mandel +4.5% +0.0% -0.6% -2.4% > mandel2 +3.9% +0.0% 0.02 0.02 > minimax +4.2% +0.0% 0.01 0.01 > mkhprog +4.2% +0.0% 0.00 0.01 > multiplier +4.4% +0.0% +10.0% +10.6% > nucleic2 +4.6% +0.0% +16.8% +15.0% > para +4.4% +0.0% +11.7% +9.7% > paraffins +4.3% +0.0% -1.9% +0.8% > parser +5.0% +0.0% 0.08 0.08 > parstof +4.8% +0.0% 0.02 0.02 > pic +5.0% +0.0% 0.03 0.03 > power +4.4% +0.0% +2.7% +2.7% > pretty +4.4% +0.0% 0.00 0.00 > primes +4.2% +0.0% 0.12 0.13 > primetest +4.3% +0.0% -0.9% +0.5% > prolog +4.2% +0.0% 0.00 0.00 > puzzle +4.1% +0.0% +8.7% +7.8% > queens +4.2% +0.0% 0.03 0.03 > reptile +5.1% +0.0% 0.03 0.04 > rewrite +4.6% +0.0% 0.02 0.03 > rfib +4.5% +0.0% 0.12 0.12 > rsa +4.3% +0.0% 0.17 0.18 > scc +3.7% +0.0% 0.00 0.00 > sched +4.3% +0.0% 0.05 0.05 > scs +5.7% +0.0% +2.3% +1.3% > simple +6.8% +0.0% +5.6% +5.8% > solid +4.5% +0.0% +11.1% +6.6% > sorting +4.0% +0.0% 0.00 0.00 > sphere +5.3% +0.0% +17.2% +12.9% > symalg +5.3% +0.0% 0.10 0.10 > tak +4.2% +0.0% 0.02 0.02 > transform +4.9% +0.0% +2.2% +2.1% > treejoin +3.7% +0.0% -0.4% +2.7% > typecheck +4.3% +0.0% -23.8% -24.1% > veritas +6.5% +0.0% 0.00 0.00 > wang +4.6% +0.0% +8.0% +7.7% > wave4main +4.4% +0.0% +5.2% +5.3% > wheel-sieve1 +4.2% +0.0% +10.0% +8.8% > wheel-sieve2 +4.2% +0.0% +2.1% +2.2% > x2n1 +4.6% +0.0% 0.06 0.06 > -------------------------------------------------------------------------------- > Min +3.5% +0.0% -23.8% -24.1% > Max +6.9% +0.0% +23.8% +21.7% > Geometric Mean +4.5% -0.0% +6.0% +5.3% > > Slightly worse than the x86_64 results, though this is an older processor. > > The result for typecheck is very odd. It's repeatable, but only on this > machine - I suspect a bad cache interaction or similar. I should probably > re-run the tests on a machine with a more recent processor.
Here the results on a E5472 (QuadCore Harpertown @ 3GHz) with x86/MacOS and not all tests worked for me. -------------------------------------------------------------------------------- Program Size Allocs Runtime Elapsed -------------------------------------------------------------------------------- ansi +1.4% +0.0% 0.00 0.00 atom +1.3% +0.0% -4.8% +0.0% awards +0.7% +0.0% 0.00 0.00 banner +1.6% +0.0% 0.00 0.00 bernouilli +1.4% +0.0% +0.0% +0.0% boyer +0.7% +0.0% 0.03 0.03 boyer2 +0.8% +0.0% 0.00 0.00 calendar +2.1% +0.0% 0.00 0.00 cichelli +1.7% +0.0% 0.08 0.08 circsim +1.9% +0.0% +4.4% +3.4% clausify +1.4% +0.0% 0.04 0.04 comp_lab_zift +2.0% +0.0% 0.17 0.17 constraints +1.4% +0.0% +1.3% +1.7% cryptarithm1 +0.9% +0.0% +5.3% +5.0% cryptarithm2 +1.8% +0.0% 0.01 0.01 cse +1.8% +0.0% 0.00 0.00 eliza +0.7% +0.0% 0.00 0.00 exp3_8 +0.7% +0.0% 0.10 0.11 expert +1.5% +0.0% 0.00 0.00 fft2 +1.2% +0.0% 0.06 0.06 fibheaps +1.4% +0.0% 0.03 0.03 fish +0.9% +0.0% 0.02 0.02 gcd +1.4% +0.0% 0.02 0.02 gen_regexps +0.9% +0.0% 0.00 0.00 integrate +1.9% +0.0% +2.7% +3.1% nucleic2 +1.4% (stdout) (stdout) (stdout) paraffins +2.0% +0.0% 0.08 0.10 primes +1.4% +0.0% 0.05 0.05 queens +1.4% +0.0% 0.02 0.02 rfib +1.3% +0.0% 0.05 0.05 sched +2.1% +0.0% 0.02 0.02 solid +1.2% +0.0% 0.11 0.14 tak +1.4% +0.0% 0.01 0.01 transform +1.2% +0.0% +7.4% +3.4% typecheck +1.4% +0.0% 0.21 0.21 wang +1.9% +0.0% 0.08 0.09 wave4main +1.3% +0.0% +6.2% +3.0% wheel-sieve1 +1.4% +0.0% +22.7% +22.7% wheel-sieve2 +1.4% +0.0% 0.15 0.19 x2n1 +1.9% +0.0% 0.02 0.02 -------------------------------------------------------------------------------- Min +0.7% +0.0% -4.8% +0.0% Max +2.1% +0.0% +22.7% +22.7% Geometric Mean +1.4% -0.0% +4.8% +4.5% The most interesting observation seems to be that the results are largely inconsistent (look at 'atom', for example). Even wheel-sieve1, where our numbers go in the same direction, varies between +10% and +22.7%. My numbers are from an older HEAD (as I had nofib handy there), but I'm still wondering where this difference comes from. Is it just that different processors are more or less sensible to TNTC? But then it doesn't seem that one processor is consistently less affected, but it also seems to depend on the program. > =============== > > So here's a crazy idea. Why don't we post-process the assembly code coming > out of LLVM? Before you throw up your hands in horror, consider that > > - it's a simple transformation, just re-ordering blocks of code > > - we can do it in Haskell using ByteStrings, it would probably > amount to a couple of hundred lines of code at the most. Perhaps > an Alex lexer would be the quickest way to split into blocks, then > a bit of Haskell to glue them back into the correct order. We may > have to fiddle with the .aligns a bit. > > - we don't care too much about compile-time performance, since LLVM is > a -O2 thing, we have the NCG for generating code fast > > - at the same time we can talk with the LLVM folks about adding > support for TNTC, but we'd have a way to generate code in the > meantime. > > Just a thought... Maybe it's not that crazy, at least as an interim solution. Manuel _______________________________________________ Cvs-ghc mailing list Cvs-ghc@haskell.org http://www.haskell.org/mailman/listinfo/cvs-ghc