Simon Marlow:
> On 22/02/2010 16:49, Simon Marlow wrote:
>> On 22/02/2010 12:34, Simon Marlow wrote:
>> 
>>> I'm currently running some benchmarks to see how much impact turning off
>>> TNTC has on the -fasm backend.
>> 
>> Here are the results on x86-64/Linux:
> [ snip ]
>> --------------------------------------------------------------------------------
>> 
>> Mi             +4.7% -0.0%  -0.6%  -1.7%
>> Max            +8.9% +0.0% +16.9% +13.8%
>> Geometric Mean +6.1% -0.0%  +4.9%  +4.2%
> 
> and here are the results on x86/Linux:
> 
> --------------------------------------------------------------------------------
>        Program           Size    Allocs   Runtime   Elapsed
> --------------------------------------------------------------------------------
>           anna          +6.9%     +0.0%     +7.1%     +7.4%
>           ansi          +4.3%     +0.0%      0.00      0.00
>           atom          +4.5%     +0.0%    +23.6%    +21.7%
>         awards          +4.2%     +0.0%      0.00      0.00
>         banner          +3.5%     +0.0%      0.00      0.00
>     bernouilli          +4.2%     +0.0%     +2.7%     +1.8%
>          boyer          +4.3%     +0.0%      0.10      0.11
>         boyer2          +4.1%     +0.0%      0.01      0.02
>           bspt          +5.5%     +0.0%      0.02      0.02
>      cacheprof          +5.3%     +0.0%     +3.1%     +3.0%
>       calendar          +4.2%     +0.0%      0.00      0.00
>       cichelli          +4.2%     +0.0%      0.19      0.22
>        circsim          +4.6%     +0.0%     +3.3%     +2.5%
>       clausify          +4.3%     +0.0%      0.07      0.09
>  comp_lab_zift          +4.5%     +0.0%    +15.3%    +14.4%
>       compress          +4.4%     +0.0%     +4.1%     +4.3%
>      compress2          +4.3%     +0.0%     +0.5%     +0.4%
>    constraints          +4.5%     +0.0%     +6.4%     +5.9%
>   cryptarithm1          +3.8%     +0.0%     +5.3%     +3.3%
>   cryptarithm2          +4.0%     +0.0%      0.03      0.03
>            cse          +3.9%     +0.0%      0.00      0.00
>          eliza          +3.6%     +0.0%      0.00      0.00
>          event          +4.3%     +0.0%     +7.9%     +7.5%
>         exp3_8          +4.2%     +0.0%    +17.8%    +13.3%
>         expert          +4.1%     +0.0%      0.00      0.00
>            fem          +5.5%     +0.0%      0.06      0.06
>            fft          +4.6%     +0.0%      0.09      0.10
>           fft2          +4.9%     +0.0%      0.22    +12.3%
>       fibheaps          +4.3%     +0.0%      0.08      0.08
>           fish          +4.0%     +0.0%      0.05      0.06
>          fluid          +6.3%     +0.0%      0.02      0.02
>         fulsom          +6.1%     +0.0%     +3.4%     +3.2%
>         gamteb          +5.0%     +0.0%      0.19      0.21
>            gcd          +4.2%     +0.0%      0.06      0.07
>    gen_regexps          +4.0%     +0.0%      0.00      0.00
>         genfft          +4.2%     +0.0%      0.09      0.10
>             gg          +5.1%     +0.0%      0.03      0.03
>           grep          +4.5%     +0.0%      0.00      0.00
>         hidden          +5.7%  (stdout)  (stdout)  (stdout)
>            hpg          +5.2%     +0.0%     +6.1%     +2.0%
>            ida          +4.4%     +0.0%    +10.2%     +6.6%
>          infer          +4.9%     +0.0%      0.13      0.14
>        integer          +4.2%     +0.0%     +1.2%     -0.2%
>      integrate          +4.6%     +0.0%     +4.9%     +5.0%
>        knights          +4.6%     +0.0%      0.01      0.01
>           lcss          +4.2%     +0.0%     +8.5%     +7.7%
>           life          +3.8%     +0.0%    +23.8%    +19.5%
>           lift          +4.5%     +0.0%      0.00      0.00
>      listcompr          +3.8%     +0.0%     +5.3%     +4.7%
>       listcopy          +3.8%     +0.0%     +5.7%     +6.3%
>       maillist          +4.0%     +0.0%      0.15     +6.1%
>         mandel          +4.5%     +0.0%     -0.6%     -2.4%
>        mandel2          +3.9%     +0.0%      0.02      0.02
>        minimax          +4.2%     +0.0%      0.01      0.01
>        mkhprog          +4.2%     +0.0%      0.00      0.01
>     multiplier          +4.4%     +0.0%    +10.0%    +10.6%
>       nucleic2          +4.6%     +0.0%    +16.8%    +15.0%
>           para          +4.4%     +0.0%    +11.7%     +9.7%
>      paraffins          +4.3%     +0.0%     -1.9%     +0.8%
>         parser          +5.0%     +0.0%      0.08      0.08
>        parstof          +4.8%     +0.0%      0.02      0.02
>            pic          +5.0%     +0.0%      0.03      0.03
>          power          +4.4%     +0.0%     +2.7%     +2.7%
>         pretty          +4.4%     +0.0%      0.00      0.00
>         primes          +4.2%     +0.0%      0.12      0.13
>      primetest          +4.3%     +0.0%     -0.9%     +0.5%
>         prolog          +4.2%     +0.0%      0.00      0.00
>         puzzle          +4.1%     +0.0%     +8.7%     +7.8%
>         queens          +4.2%     +0.0%      0.03      0.03
>        reptile          +5.1%     +0.0%      0.03      0.04
>        rewrite          +4.6%     +0.0%      0.02      0.03
>           rfib          +4.5%     +0.0%      0.12      0.12
>            rsa          +4.3%     +0.0%      0.17      0.18
>            scc          +3.7%     +0.0%      0.00      0.00
>          sched          +4.3%     +0.0%      0.05      0.05
>            scs          +5.7%     +0.0%     +2.3%     +1.3%
>         simple          +6.8%     +0.0%     +5.6%     +5.8%
>          solid          +4.5%     +0.0%    +11.1%     +6.6%
>        sorting          +4.0%     +0.0%      0.00      0.00
>         sphere          +5.3%     +0.0%    +17.2%    +12.9%
>         symalg          +5.3%     +0.0%      0.10      0.10
>            tak          +4.2%     +0.0%      0.02      0.02
>      transform          +4.9%     +0.0%     +2.2%     +2.1%
>       treejoin          +3.7%     +0.0%     -0.4%     +2.7%
>      typecheck          +4.3%     +0.0%    -23.8%    -24.1%
>        veritas          +6.5%     +0.0%      0.00      0.00
>           wang          +4.6%     +0.0%     +8.0%     +7.7%
>      wave4main          +4.4%     +0.0%     +5.2%     +5.3%
>   wheel-sieve1          +4.2%     +0.0%    +10.0%     +8.8%
>   wheel-sieve2          +4.2%     +0.0%     +2.1%     +2.2%
>           x2n1          +4.6%     +0.0%      0.06      0.06
> --------------------------------------------------------------------------------
>            Min          +3.5%     +0.0%    -23.8%    -24.1%
>            Max          +6.9%     +0.0%    +23.8%    +21.7%
> Geometric Mean          +4.5%     -0.0%     +6.0%     +5.3%
> 
> Slightly worse than the x86_64 results, though this is an older processor.
> 
> The result for typecheck is very odd.  It's repeatable, but only on this 
> machine - I suspect a bad cache interaction or similar.  I should probably 
> re-run the tests on a machine with a more recent processor.

Here the results on a E5472 (QuadCore Harpertown @ 3GHz) with x86/MacOS and not 
all tests worked for me.

--------------------------------------------------------------------------------
        Program           Size    Allocs   Runtime   Elapsed
--------------------------------------------------------------------------------
           ansi          +1.4%     +0.0%      0.00      0.00
           atom          +1.3%     +0.0%     -4.8%     +0.0%
         awards          +0.7%     +0.0%      0.00      0.00
         banner          +1.6%     +0.0%      0.00      0.00
     bernouilli          +1.4%     +0.0%     +0.0%     +0.0%
          boyer          +0.7%     +0.0%      0.03      0.03
         boyer2          +0.8%     +0.0%      0.00      0.00
       calendar          +2.1%     +0.0%      0.00      0.00
       cichelli          +1.7%     +0.0%      0.08      0.08
        circsim          +1.9%     +0.0%     +4.4%     +3.4%
       clausify          +1.4%     +0.0%      0.04      0.04
  comp_lab_zift          +2.0%     +0.0%      0.17      0.17
    constraints          +1.4%     +0.0%     +1.3%     +1.7%
   cryptarithm1          +0.9%     +0.0%     +5.3%     +5.0%
   cryptarithm2          +1.8%     +0.0%      0.01      0.01
            cse          +1.8%     +0.0%      0.00      0.00
          eliza          +0.7%     +0.0%      0.00      0.00
         exp3_8          +0.7%     +0.0%      0.10      0.11
         expert          +1.5%     +0.0%      0.00      0.00
           fft2          +1.2%     +0.0%      0.06      0.06
       fibheaps          +1.4%     +0.0%      0.03      0.03
           fish          +0.9%     +0.0%      0.02      0.02
            gcd          +1.4%     +0.0%      0.02      0.02
    gen_regexps          +0.9%     +0.0%      0.00      0.00
      integrate          +1.9%     +0.0%     +2.7%     +3.1%
       nucleic2          +1.4%  (stdout)  (stdout)  (stdout)
      paraffins          +2.0%     +0.0%      0.08      0.10
         primes          +1.4%     +0.0%      0.05      0.05
         queens          +1.4%     +0.0%      0.02      0.02
           rfib          +1.3%     +0.0%      0.05      0.05
          sched          +2.1%     +0.0%      0.02      0.02
          solid          +1.2%     +0.0%      0.11      0.14
            tak          +1.4%     +0.0%      0.01      0.01
      transform          +1.2%     +0.0%     +7.4%     +3.4%
      typecheck          +1.4%     +0.0%      0.21      0.21
           wang          +1.9%     +0.0%      0.08      0.09
      wave4main          +1.3%     +0.0%     +6.2%     +3.0%
   wheel-sieve1          +1.4%     +0.0%    +22.7%    +22.7%
   wheel-sieve2          +1.4%     +0.0%      0.15      0.19
           x2n1          +1.9%     +0.0%      0.02      0.02
--------------------------------------------------------------------------------
            Min          +0.7%     +0.0%     -4.8%     +0.0%
            Max          +2.1%     +0.0%    +22.7%    +22.7%
 Geometric Mean          +1.4%     -0.0%     +4.8%     +4.5%

The most interesting observation seems to be that the results are largely 
inconsistent (look at 'atom', for example).  Even wheel-sieve1, where our 
numbers go in the same direction, varies between +10% and +22.7%.  My numbers 
are from an older HEAD (as I had nofib handy there), but I'm still wondering 
where this difference comes from.  Is it just that different processors are 
more or less sensible to TNTC?  But then it doesn't seem that one processor is 
consistently less affected, but it also seems to depend on the program.

> ===============
> 
> So here's a crazy idea.  Why don't we post-process the assembly code coming 
> out of LLVM?  Before you throw up your hands in horror, consider that
> 
> - it's a simple transformation, just re-ordering blocks of code
> 
> - we can do it in Haskell using ByteStrings, it would probably
>   amount to a couple of hundred lines of code at the most.  Perhaps
>   an Alex lexer would be the quickest way to split into blocks, then
>   a bit of Haskell to glue them back into the correct order.  We may
>   have to fiddle with the .aligns a bit.
> 
> - we don't care too much about compile-time performance, since LLVM is
>   a -O2 thing, we have the NCG for generating code fast
> 
> - at the same time we can talk with the LLVM folks about adding
>   support for TNTC, but we'd have a way to generate code in the
>   meantime.
> 
> Just a thought...

Maybe it's not that crazy, at least as an interim solution.

Manuel

_______________________________________________
Cvs-ghc mailing list
Cvs-ghc@haskell.org
http://www.haskell.org/mailman/listinfo/cvs-ghc

Reply via email to