> Once you have some proper benchmarks, it might be fun to compare GoAWK's
>>> performance to that of my awk package <https://github.com/spakin/awk>.
>>>
>>
I'm not going to do thorough benchmarks at this point, but it looks like
GoAWK is significantly faster at present. Using the example in the
https://github.com/spakin/awk README, which is equivalent to this AWK
script:
BEGIN { FS = OFS = "," }
{ $3 = $1+$2; print }
On a file with 1M lines of random numbers, with the example as is (no
stdout buffering) GoAWK takes about 1.1 seconds, and spakin/awk takes 36
seconds! However, most of this is due to the non-buffered writes to
os.Stdout. GoAWK automatically wraps os.Stdout in a bufio.Writer (though
I'd forgotten to do this at first as well). When I added the line (before
s.Run):
s.Output = bufio.NewWriterSize(os.Stdout, 64*1024)
It speeds up spakin/awk by a factor of about 10x to 3.6 seconds. So GoAWK
is about 3x as fast for this simple (but not unrealistic) benchmark.
I generated the 1M line random file using this Python script (guess I
should have used AWK :-):
import random, sys
for _ in range(int(sys.argv[1])):
n = random.randrange(1000000)
m = random.randrange(1000000)
print('%d,%d' % (n, m))
So my main suggestion (for spakin/awk) would be able to wrap os.Stdout in a
bufio.NewWriter (and be sure to call Flush before Run finishes). If the
user wants to pass an unbuffered version, they still can, but at least the
default is performant.
I also added CPU profiling to the spakin/awk script, and it looks like it's
doing a bunch more garbage collection than GoAWK, as well as some regexp
stuff. I suspect NewValue() is probably quite slow as it takes an
interface{} and does type checking. Also, strings are converted to numbers
using a regex, which is probably slower than a dedicated conversion/check
function (see parseFloatPrefix in goawk/interp/value.go).
See more optimization ideas in my post at
https://benhoyt.com/writings/goawk/
-Ben
On Thu, Nov 22, 2018 at 11:24 PM Tong Sun <[email protected]> wrote:
>
>
> On Tuesday, August 28, 2018 at 9:06:22 AM UTC-4, Ben Hoyt wrote:
>>
>> Once you have some proper benchmarks, it might be fun to compare GoAWK's
>>> performance to that of my awk package <https://github.com/spakin/awk>.
>>>
>>
>> Nice -- will do!
>>
>
> Please post back when you've done that.
>
> I'm interested to know. Thx.
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "golang-nuts" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/golang-nuts/kYZp3Q1KKfE/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> [email protected].
> For more options, visit https://groups.google.com/d/optout.
>
--
You received this message because you are subscribed to the Google Groups
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.