On Sep 5, 12:29 pm, Francesc Altet <[EMAIL PROTECTED]> wrote: > A Wednesday 05 September 2007, George Sakkis escrigué: > > > > > I was surprised to see that an in-place modification of a 2-d array > > turns out to be slower from the respective non-mutating operation on > > 1- d arrays, although the latter creates new array objects. Here is > > the benchmarking code: > > > import timeit > > > for n in 10,100,1000,10000: > > setup = 'from numpy.random import random;' \ > > 'm=random((%d,2));' \ > > 'u1=random(%d);' \ > > 'u2=u1.reshape((u1.size,1))' % (n,n) > > timers = [timeit.Timer(stmt,setup) for stmt in > > # 1-d operations; create new arrays > > 'a0 = m[:,0]-u1; a1 = m[:,1]-u1', > > # 2-d in place operation > > 'm -= u2' > > ] > > print n, [min(timer.repeat(3,1000)) for timer in timers] > > > And some results (Python 2.5, WinXP): > > > 10 [0.010832382327921563, 0.0045706926438974782] > > 100 [0.010882668048592767, 0.021704993232380093] > > 1000 [0.018272154701226007, 0.19477587235249172] > > 10000 [0.073787590322233698, 1.9234369172618306] > > > So the 2-d in-place modification time grows linearly with the array > > size but the 1-d operations are much more efficient, despite > > allocating new arrays while doing so. What gives ? > > This seems the effect of broadcasting u2. If you were to use a > pre-computed broadcasted, you would get rid of such bottleneck: > > for n in 10,100,1000,10000: > setup = 'import numpy;' \ > 'm=numpy.random.random((%d,2));' \ > 'u1=numpy.random.random(%d);' \ > 'u2=u1[:, numpy.newaxis];' \ > 'u3=numpy.array([u1,u1]).transpose()' % (n,n) > timers = [timeit.Timer(stmt,setup) for stmt in > # 1-d operations; create new arrays > 'a0 = m[:,0]-u1; a1 = m[:,1]-u1', > # 2-d in place operation (using broadcasting) > 'm -= u2', > # 2-d in-place operation (not forcing broadcasting) > 'm -= u3' > ] > print n, [min(timer.repeat(3,1000)) for timer in timers] > > gives in my machine: > > 10 [0.03213191032409668, 0.012019872665405273, 0.0068600177764892578] > 100 [0.033048152923583984, 0.06542205810546875, 0.0076580047607421875] > 1000 [0.040294170379638672, 0.59892702102661133, 0.014600992202758789] > 10000 [0.32667303085327148, 5.9721651077270508, 0.10261106491088867]
Thank you, indeed broadcasting is the bottleneck here. I had the impression that broadcasting was a fast operation, i.e. it doesn't require allocating physically the target array of the broadcast but it seems this is not the case. George _______________________________________________ Numpy-discussion mailing list [email protected] http://projects.scipy.org/mailman/listinfo/numpy-discussion
