On Thu, Feb 10, 2011 at 11:53, eat <e.antero.ta...@gmail.com> wrote: > Thanks Chuck, > > for replying. But don't you still feel very odd that dot outperforms sum in > your machine? Just to get it simply; why sum can't outperform dot? Whatever > architecture (computer, cache) you have, it don't make any sense at all that > when performing significantly less instructions, you'll reach to spend more > time ;-).
These days, the determining factor is less often instruction count than memory latency, and the optimized BLAS implementations of dot() heavily optimize the memory access patterns. Additionally, the number of instructions in your dot() probably isn't that many more than the sum(). The sum() is pretty dumb and just does a linear accumulation using the ufunc reduce mechanism, so (m*n-1) ADDs plus quite a few instructions for traversing the array in a generic manner. With fused multiply-adds, being able to assume contiguous data and ignore the numpy iterator overhead, and applying divide-and-conquer kernels to arrange sums, the optimized dot() implementations could have a comparable instruction count. If you were willing to spend that amount of developer time and code complexity to make platform-specific backends to sum(), you could make it go really fast, too. Typically, it's not all that important to make it worthwhile, though. One thing that might be worthwhile is to make implementations of sum() and cumsum() that avoid the ufunc machinery and do their iterations more quickly, at least for some common combinations of dtype and contiguity. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion