Hi
A followup message ... I managed to increase the speed of my little prime
factorization program by a bit more than a factor of two, the performance
win is hardware-dependent.
After I learned how to profile the code, I discovered a few things:
- 70% of the time was taken in math/big.Mod function (I already knew
this)
- that function was calling QuoRem which was calling nat.divLarge,
meaning that it was calculating the Mod by actually doing the division and
then finding the remainder
- the big surprise was that of the 65% percent of the runtime taken by
divLarge and routines called by it, more than half was going to what is
essentially memory management
Part of this was going to runtime.makeslice and ultimately
runtime.mallocgc, the rest was going to sync Pool put and get calls.
divLarge needs to create temporaries, and apparently, in order to relieve
pressure on the garbage collector, divLarge makes use of a sync pool of
"big.nat" objects. I guess it's done this way (via sync pool) to make
divLarge thread-safe.
I didn't need to do the division, and I didn't need (not yet anyway) thread
safety, and my calculations are all like "a mod N" where a*a < N, in that
case one can do Barrett reduction to find a mod N, and since it's millions
of a mod N with the same N, the constants for N can be precomputed. I made
a pair of global temporaries for the function in the package, which avoids
all the calls to sync pool but now it's not thread safe I guess.
There were a number of other optimisations that had to do with avoiding
makeslice, one was to avoid reusing a particular big.Int variable for first
small and then large values ... I defined two temporaries, one to hold
values for large calcs and another to hold for smaller, avoiding work in
the "make" call as there will always be enough capacity to re-use the
variable in those cases.
A final silly one that saved 10% of the run time an eliminated almost all
garbage collection (after the previously mentioned one) was swapping
for (loop over lots of iterations) {
var tmp1 big.Int
...
for
var tmp1 big.Int
for (loop over lots of iterations) {
...
I had naively assumed that as long as I was executing the loop, it was the
same "tmp1", however it was allocating a new tmp1 on each trip around the
loop. 1.2 GB of memory allocated in a 25-second run :-)
It's been a lot of fun, thanks for the discussions, and I am very impressed
with the tooling. pprof absolutely rocks.
JT
--
You received this message because you are subscribed to the Google Groups
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.