Hello all, As a result of the "fast greyscale conversion" thread, I noticed an anomaly with numpy.ndararray.sum(): summing along certain axes is much slower with sum() than versus doing it explicitly, but only with integer dtypes and when the size of the dtype is less than the machine word. I checked in 32-bit and 64-bit modes and in both cases only once the dtype got as large as that did the speed difference go away. See below...
Is this something to do with numpy or something inexorable about machine / memory architecture? Zach Timings -- 64-bit mode: ---------------------- In [2]: i = numpy.ones((1024,1024,4), numpy.int8) In [3]: timeit i.sum(axis=-1) 10 loops, best of 3: 131 ms per loop In [4]: timeit i[...,0]+i[...,1]+i[...,2]+i[...,3] 100 loops, best of 3: 2.57 ms per loop In [5]: i = numpy.ones((1024,1024,4), numpy.int16) In [6]: timeit i.sum(axis=-1) 10 loops, best of 3: 131 ms per loop In [7]: timeit i[...,0]+i[...,1]+i[...,2]+i[...,3] 100 loops, best of 3: 4.75 ms per loop In [8]: i = numpy.ones((1024,1024,4), numpy.int32) In [9]: timeit i.sum(axis=-1) 10 loops, best of 3: 131 ms per loop In [10]: timeit i[...,0]+i[...,1]+i[...,2]+i[...,3] 100 loops, best of 3: 6.37 ms per loop In [11]: i = numpy.ones((1024,1024,4), numpy.int64) In [12]: timeit i.sum(axis=-1) 100 loops, best of 3: 16.6 ms per loop In [13]: timeit i[...,0]+i[...,1]+i[...,2]+i[...,3] 100 loops, best of 3: 15.1 ms per loop Timings -- 32-bit mode: ---------------------- In [2]: i = numpy.ones((1024,1024,4), numpy.int8) In [3]: timeit i.sum(axis=-1) 10 loops, best of 3: 138 ms per loop In [4]: timeit i[...,0]+i[...,1]+i[...,2]+i[...,3] 100 loops, best of 3: 3.68 ms per loop In [5]: i = numpy.ones((1024,1024,4), numpy.int16) In [6]: timeit i.sum(axis=-1) 10 loops, best of 3: 140 ms per loop In [7]: timeit i[...,0]+i[...,1]+i[...,2]+i[...,3] 100 loops, best of 3: 4.17 ms per loop In [8]: i = numpy.ones((1024,1024,4), numpy.int32) In [9]: timeit i.sum(axis=-1) 10 loops, best of 3: 22.4 ms per loop In [10]: timeit i[...,0]+i[...,1]+i[...,2]+i[...,3] 100 loops, best of 3: 12.2 ms per loop In [11]: i = numpy.ones((1024,1024,4), numpy.int64) In [12]: timeit i.sum(axis=-1) 10 loops, best of 3: 29.2 ms per loop In [13]: timeit i[...,0]+i[...,1]+i[...,2]+i[...,3] 10 loops, best of 3: 23.8 ms per loop _______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
