[Numpy-discussion] poor performance of sum with sub-machine-word integer types

Zachary Pincus Tue, 21 Jun 2011 09:46:13 -0700

Hello all,

As a result of the "fast greyscale conversion" thread, I noticed an anomaly 
with numpy.ndararray.sum(): summing along certain axes is much slower with 
sum() than versus doing it explicitly, but only with integer dtypes and when 
the size of the dtype is less than the machine word. I checked in 32-bit and 
64-bit modes and in both cases only once the dtype got as large as that did the 
speed difference go away. See below...


Is this something to do with numpy or something inexorable about machine / 
memory architecture?

Zach

Timings -- 64-bit mode:
----------------------
In [2]: i = numpy.ones((1024,1024,4), numpy.int8)
In [3]: timeit i.sum(axis=-1)
10 loops, best of 3: 131 ms per loop
In [4]: timeit i[...,0]+i[...,1]+i[...,2]+i[...,3]
100 loops, best of 3: 2.57 ms per loop

In [5]: i = numpy.ones((1024,1024,4), numpy.int16)
In [6]: timeit i.sum(axis=-1)
10 loops, best of 3: 131 ms per loop
In [7]: timeit i[...,0]+i[...,1]+i[...,2]+i[...,3]
100 loops, best of 3: 4.75 ms per loop

In [8]: i = numpy.ones((1024,1024,4), numpy.int32)
In [9]: timeit i.sum(axis=-1)
10 loops, best of 3: 131 ms per loop
In [10]: timeit i[...,0]+i[...,1]+i[...,2]+i[...,3]
100 loops, best of 3: 6.37 ms per loop

In [11]: i = numpy.ones((1024,1024,4), numpy.int64)
In [12]: timeit i.sum(axis=-1)
100 loops, best of 3: 16.6 ms per loop
In [13]: timeit i[...,0]+i[...,1]+i[...,2]+i[...,3]
100 loops, best of 3: 15.1 ms per loop



Timings -- 32-bit mode:
----------------------
In [2]: i = numpy.ones((1024,1024,4), numpy.int8)
In [3]: timeit i.sum(axis=-1)
10 loops, best of 3: 138 ms per loop
In [4]: timeit i[...,0]+i[...,1]+i[...,2]+i[...,3]
100 loops, best of 3: 3.68 ms per loop

In [5]: i = numpy.ones((1024,1024,4), numpy.int16)
In [6]: timeit i.sum(axis=-1)
10 loops, best of 3: 140 ms per loop
In [7]: timeit i[...,0]+i[...,1]+i[...,2]+i[...,3]
100 loops, best of 3: 4.17 ms per loop

In [8]: i = numpy.ones((1024,1024,4), numpy.int32)
In [9]: timeit i.sum(axis=-1)
10 loops, best of 3: 22.4 ms per loop
In [10]: timeit i[...,0]+i[...,1]+i[...,2]+i[...,3]
100 loops, best of 3: 12.2 ms per loop

In [11]: i = numpy.ones((1024,1024,4), numpy.int64)
In [12]: timeit i.sum(axis=-1)
10 loops, best of 3: 29.2 ms per loop
In [13]: timeit i[...,0]+i[...,1]+i[...,2]+i[...,3]
10 loops, best of 3: 23.8 ms per loop

_______________________________________________
NumPy-Discussion mailing list
[email protected]
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] poor performance of sum with sub-machine-word integer types

Reply via email to