Re: [Numpy-discussion] odd performance of sum?

eat Thu, 10 Feb 2011 23:49:34 -0800

Hi,

On Fri, Feb 11, 2011 at 12:08 AM, Robert Kern <robert.k...@gmail.com> wrote:


>  On Thu, Feb 10, 2011 at 15:32, eat <e.antero.ta...@gmail.com> wrote:
> > Hi Robert,
> >
> > On Thu, Feb 10, 2011 at 10:58 PM, Robert Kern <robert.k...@gmail.com>
> wrote:
> >>
> >> On Thu, Feb 10, 2011 at 14:29, eat <e.antero.ta...@gmail.com> wrote:
> >> > Hi Robert,
> >> >
> >> > On Thu, Feb 10, 2011 at 8:16 PM, Robert Kern <robert.k...@gmail.com>
> >> > wrote:
> >> >>
> >> >> On Thu, Feb 10, 2011 at 11:53, eat <e.antero.ta...@gmail.com> wrote:
> >> >> > Thanks Chuck,
> >> >> >
> >> >> > for replying. But don't you still feel very odd that dot
> outperforms
> >> >> > sum
> >> >> > in
> >> >> > your machine? Just to get it simply; why sum can't outperform dot?
> >> >> > Whatever
> >> >> > architecture (computer, cache) you have, it don't make any sense at
> >> >> > all
> >> >> > that
> >> >> > when performing significantly less instructions, you'll reach to
> >> >> > spend
> >> >> > more
> >> >> > time ;-).
> >> >>
> >> >> These days, the determining factor is less often instruction count
> >> >> than memory latency, and the optimized BLAS implementations of dot()
> >> >> heavily optimize the memory access patterns.
> >> >
> >> > Can't we have this as well with simple sum?
> >>
> >> It's technically feasible to accomplish, but as I mention later, it
> >> entails quite a large cost. Those optimized BLASes represent many
> >> man-years of effort
> >
> > Yes I acknowledge this. But didn't they then  ignore them something
> simpler,
> > like sum (but which actually could benefit exactly similiar
> optimizations).
>
> Let's set aside the fact that the people who optimized the
> implementation of dot() (the authors of ATLAS or the MKL or whichever
> optimized BLAS library you linked to) are different from those who
> implemented sum() (the numpy devs). Let me repeat a reason why one
> would put a lot of effort into optimizing dot() but not sum():
>
> """
> >> However, they are frequently worth it
> >> because those operations are often bottlenecks in whole applications.
> >> sum(), even in its stupidest implementation, rarely is.
> """
>
> I don't know if I'm just not communicating very clearly, or if you
> just reply to individual statements before reading the whole email.

You are communicating very well. It's me who's not communicating so well.

>
> >> and cause substantial headaches for people
> >> building and installing numpy.
> >
> > I appreciate this. No doubt at all.
> >>
> >> However, they are frequently worth it
> >> because those operations are often bottlenecks in whole applications.
> >> sum(), even in its stupidest implementation, rarely is. In the places
> >> where it is a significant bottleneck, an ad hoc implementation in C or
> >> Cython or even FORTRAN for just that application is pretty easy to
> >> write.
> >
> > But here I have to disagree; I'll think that at least I (if not even the
> > majority of numpy users) don't like (nor I'm be capable/ or have enough
> > time/ resources) go to dwell such details.
>
> And you think we have the time and resources to do it for you?

My intention was not to suggest anything like that.

>
> > I'm sorry but I'll have to
> > restate that it's quite reasonable to expect that sum outperforms dot in
> any
> > case.
>
> You don't optimize a function just because you are capable of it. You
> optimize a function because it is taking up a significant portion of
> total runtime in your real application. Anything else is a waste of
> time.
>
> > Lets now to start make such movements, which enables sum to outperform
> > dot.
>
> Sorry, you don't get to volunteer anyone's time or set anyone's
> priorities but your own. There are some sensible things one could do
> to optimize sums or even general ufunc reductions, as outlined my
> Mark, Pauli and myself, but please stop using the accelerated-BLAS
> dot() as your benchmark. It's a completely inappropriate comparison.

OK. Lets compare sum to itself:
In []: M= randn(1e5, 1e2)
In []: timeit M.sum(0)
10 loops, best of 3: 169 ms per loop
In []: timeit M.sum(1)
10 loops, best of 3: 37.5 ms per loop
In []: timeit M.sum()
10 loops, best of 3: 18.1 ms per loop
In []: timeit sum(M.sum(0))
10 loops, best of 3: 169 ms per loop
In []: timeit sum(M.sum(1))
10 loops, best of 3: 37.7 ms per loop
In []: timeit M.T.sum(0)
10 loops, best of 3: 37.7 ms per loop
In []: timeit M.T.sum(1)
10 loops, best of 3: 171 ms per loop
In []: timeit M.T.sum()
1 loops, best of 3: 288 ms per loop
In []: timeit sum(M.T.sum(0))
10 loops, best of 3: 37.7 ms per loop
In []: timeit sum(M.T.sum(1))
10 loops, best of 3: 170 ms per loop
In []: 288./ 18.1
Out[]: 15.91160220994475

>
> >> You can gain speed by specializing to just your use case, e.g.
> >> contiguous data, summing down to one number, or summing along one axis
> >> of only 2D data, etc. There's usually no reason to try to generalize
> >> that implementation to put it back into numpy.
> >
> > Yes, I would really like to specialize into my case, but 'without going
> out
> > the python realm.'
>
> The Bottleneck project is a good place for such things. It's a nice
> middle-ground for somewhat-specialized routines that are still pretty
> common but not general enough to be in numpy yet.
>
>  http://pypi.python.org/pypi/Bottleneck
>
> If you're not willing to learn how to implement it yourself in Cython,
> I'm afraid that you're stuck waiting for someone to do it for you. But
> please don't expect anyone to feel obligated to do so.

Perhaps my bad wordings, but I do not expect anyone to do it for me. I also
think that there exists some real substance I addressed (as demonstrated
now with comparing sum to itself).


Regards,
eat

>
> --
>  Robert Kern
>
> "I have come to believe that the whole world is an enigma, a harmless
> enigma that is made terrible by our own mad attempt to interpret it as
> though it had an underlying truth."
>   -- Umberto Eco
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] odd performance of sum?

Reply via email to