[go-nuts] Re: How to optimize as with -O2 from Gcc? (50x speedup)

pedroso Fri, 04 Aug 2017 16:35:44 -0700

Thanks, Peter.

I actually don't mind a larger code if the performance can be improved. 
Also, we probably do computations more often with smaller integers (at 
least in my applications); that's why I overlooked the case greater than 
10...


But now I'm using your (+Michael Jones and others) suggestions:
// PowP computes real raised to positive integer xⁿ
func PowP(x float64, n uint32) (r float64) {
if n == 0 {
return 1.0
}
if n == 1 {
return x
}
if n == 2 {
return x * x
}
if n == 3 {
return x * x * x
}
if n == 4 {
r = x * x
return r * r
}
if n == 5 {
r = x * x
return r * r * x
}
if n == 6 {
r = x * x * x
return r * r
}
if n == 7 {
r = x * x * x
return r * r * x
}
if n == 8 {
r = x * x * x * x
return r * r
}
if n == 9 {
r = x * x * x
return r * r * r
}
if n == 10 {
r = x * x * x
return r * r * r * x
}
r = 1.0
for i := n; i > 0; i >>= 1 {
if i&1 == 1 {
r *= x
}
x *= x
}
return
}

The output of the benchmark is:
BenchmarkPowP10-32         50000000        26.4 ns/op
BenchmarkPowP10std-32     5000000       245 ns/op
BenchmarkPowP20-32         20000000        79.2 ns/op
BenchmarkPowP20std-32     3000000       556 ns/op
BenchmarkPowP50-32         5000000       255 ns/op
BenchmarkPowP50std-32     1000000      1566 ns/op
BenchmarkPowP100-32       2000000       616 ns/op
BenchmarkPowP100std-32      500000      3594 ns/op
BenchmarkPowP200-32       1000000      1365 ns/op
BenchmarkPowP200std-32      200000      9184 ns/op

So, the speedups for each N are (more or less):
N=10: 9x
N=20: 7x
N=50: 6x
N=100: 5x
N=200: 6x

(files 
here: https://gist.github.com/cpmech/9f871df5b59fa8407221b0d3fb361a3d)

I'm thinking about looking at Go-ASM to see if I can improve this further. 
I may try to optimize the math.Sin() function too...

Thanks also to Michael Jones.






On Friday, August 4, 2017 at 10:36:16 PM UTC+10, peterGo wrote:
>
> Dorival Pedroso,
>
> PowP has a lot of code. PowG is simpler and it is modestly slower for 
> small values of n and much faster for larger values of n.
>
> func PowG(x float64, n uint32) float64 {
>     y := 1.0
>     for i := n; i > 0; i >>= 1 {
>         if i&1 == 1 {
>             y *= x
>         }
>         x *= x
>     }
>     return y
> }
>
>
> BenchmarkPowP1      1000000000        2.89 ns/op
> BenchmarkPowP5       100000000       13.6 ns/op
> BenchmarkPowP10       50000000       29.2 ns/op
> BenchmarkPowP20       20000000       89.6 ns/op
> BenchmarkPowP200        200000    11666 ns/op
> BenchmarkPowG1      1000000000        2.66 ns/op
> BenchmarkPowG5       100000000       17.9 ns/op
> BenchmarkPowG10       30000000       45.1 ns/op
> BenchmarkPowG20       20000000       97.7 ns/op
> BenchmarkPowG200        300000     3972 ns/op
>
> Peter
>
> On Friday, August 4, 2017 at 10:08:36 AM UTC, Dorival Pedroso wrote:
>>
>> Since my ASM skills are limited, for positive integers, I'm planning on 
>> using:
>>
>> // PowP computes real raised to positive integer xⁿ
>> func PowP(x float64, n uint32) (r float64) {
>> if n == 0 {
>> return 1.0
>> }
>> if n == 1 {
>> return x
>> }
>> if n == 2 {
>> return x * x
>> }
>> if n == 3 {
>> return x * x * x
>> }
>> if n == 4 {
>> r = x * x
>> return r * r
>> }
>> if n == 5 {
>> r = x * x
>> return r * r * x
>> }
>> if n == 6 {
>> r = x * x * x
>> return r * r
>> }
>> if n == 7 {
>> r = x * x * x
>> return r * r * x
>> }
>> if n == 8 {
>> r = x * x * x * x
>> return r * r
>> }
>> if n == 9 {
>> r = x * x * x
>> return r * r * r
>> }
>> if n == 10 {
>> r = x * x * x
>> return r * r * r * x
>> }
>> r = x * x * x
>> r = r * r * r * x
>> var i uint32
>> for i = 11; i <= 20; i++ {
>> r *= x
>> if n == i {
>> return
>> }
>> }
>> return math.Pow(x, float64(n))
>> }
>>
>> which gives a speedup of 5-8 times for small positive integers (<20). the 
>> case of negative integers could be handled too.
>>
>> output:
>> BenchmarkPowP10-32         50000000        30.4 ns/op
>> BenchmarkPowP10std-32     5000000       247 ns/op
>> BenchmarkPowP20-32         20000000        98.2 ns/op
>> BenchmarkPowP20std-32     3000000       561 ns/op
>> BenchmarkPowP200-32        200000     10145 ns/op
>> BenchmarkPowP200std-32      200000      9231 ns/op
>>
>> all files are here: 
>> https://gist.github.com/cpmech/9f871df5b59fa8407221b0d3fb361a3d
>>
>> I'm also looking at the Cephes library: http://www.netlib.org/cephes/ 
>> for ideas
>>
>> maybe we could have a similar function in Go (and many more optimised 
>> ones, especially for complex numbers...)
>>
>> cheers
>> d
>>
>>
>>
>> On Friday, August 4, 2017 at 4:20:41 PM UTC+10, Dorival Pedroso wrote:
>>>
>>> I've noticed that this C code:
>>>
>>> #include "math.h"
>>> int main() {
>>>         double x = 2.5;
>>>         int Nmax = 10000000;
>>>         for (int N=0; N<Nmax; N++) {
>>>                 for (int i=0; i<20; i++) {
>>>                         pow(x, i);
>>>                 }
>>>         }
>>> }
>>>
>>> can run up to 50x faster than this Go code:
>>>
>>> package main
>>>
>>> import "math"
>>>
>>> func main() {
>>> x := 2.5
>>> Nmax := 10000000
>>> for N := 0; N < Nmax; N++ {
>>> for i := 0; i < 20; i++ {
>>> math.Pow(x, float64(i))
>>> }
>>> }
>>> }
>>>
>>> The C code was compiled with: gcc -O2 ccode.c -o ccode -lm
>>> then run with time ./ccode
>>>
>>> The Go code was compiled with: go build gcode.go
>>> then run with  time ./gcode
>>>
>>> I've used the time command on Linux (Ubuntu) to get some estimate.
>>>
>>> So the question is: how can we make the Go code faster?
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[go-nuts] Re: How to optimize as with -O2 from Gcc? (50x speedup)

Reply via email to