On Sat, Nov 12, 2011 at 11:32 AM, Warren Weckesser <[email protected]> wrote: > > > On Sat, Nov 12, 2011 at 9:59 AM, <[email protected]> wrote: >> >> On Sat, Nov 12, 2011 at 10:31 AM, Warren Weckesser >> <[email protected]> wrote: >> > >> > >> > On Sat, Nov 12, 2011 at 6:43 AM, <[email protected]> wrote: >> >> >> >> On Sat, Nov 12, 2011 at 3:36 AM, Geoffrey Zhu <[email protected]> >> >> wrote: >> >> > Hi, >> >> > >> >> > I am playing with multiple ways to speed up the following expression >> >> > (it is in the inner loop): >> >> > >> >> > >> >> > C[1:(M - 1)]=(a * C[2:] + b * C[1:(M-1)] + c * C[:(M-2)]) >> >> > >> >> > where C is an array of about 200-300 elements, M=len(C), a, b, c are >> >> > scalars. >> >> > >> >> > I played with numexpr, but it was way slower than directly using >> >> > numpy. It would be nice if I could create a Mx3 matrix without >> >> > copying >> >> > memory and so I can use dot() to calculate the whole thing. >> >> > >> >> > Can anyone help with giving some advices to make this faster? >> >> >> >> looks like a np.convolve(C, [a,b,c]) to me except for the boundary >> >> conditions. >> > >> > >> > >> > As Josef pointed out, this is a convolution. There are (at least) >> > three convolution functions in numpy+scipy that you could use: >> > numpy.convolve, scipy.signal.convolve, and scipy.ndimage.convolve1d. >> > Of these, scipy.ndimage.convolve1d is the fastest. However, it doesn't >> > beat the simple expression >> > a*x[2:] + b*x[1:-1] + c*x[:-2] >> > Your idea of forming a matrix without copying memory can be done using >> > "stride tricks", and for arrays of the size you are interested in, it >> > computes the result faster than the simple expression (see below). >> > >> > Another fast alternative is to use one of the inline code generators. >> > This example is a easy to implement with scipy.weave.blitz, and it gives >> > a big speedup. >> > >> > Here's a test script: >> > >> > #----- convolve1dtest.py ----- >> > >> > >> > import numpy as np >> > from numpy.lib.stride_tricks import as_strided >> > from scipy.ndimage import convolve1d >> > from scipy.weave import blitz >> > >> > # weighting coefficients >> > a = -0.5 >> > b = 1.0 >> > c = -0.25 >> > w = np.array((a,b,c)) >> > # Reversed w: >> > rw = w[::-1] >> > >> > # Length of C >> > n = 250 >> > >> > # The original version of the calculation: >> > # Some dummy data >> > C = np.arange(float(n)) >> > C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2] >> > # Save for comparison. >> > C0 = C.copy() >> > >> > # Do it again using a matrix multiplication. >> > C = np.arange(float(n)) >> > # The "virtual" matrix view of C. >> > V = as_strided(C, shape=(C.size-2, 3), strides=(C.strides[0], >> > C.strides[0])) >> > C[1:-1] = np.dot(V, rw) >> > C1 = C.copy() >> > >> > # Again, with convolve1d this time. >> > C = np.arange(float(n)) >> > C[1:-1] = convolve1d(C, w)[1:-1] >> > C2 = C.copy() >> > >> > # scipy.weave.blitz >> > C = np.arange(float(n)) >> > # Must work with a copy, D, in the formula, because blitz does not use >> > # a temporary variable. >> > D = C.copy() >> > expr = "C[1:-1] = a * D[2:] + b * D[1:-1] + c * D[:-2]" >> > blitz(expr, check_size=0) >> > C3 = C.copy() >> > >> > >> > # Verify that all the methods give the same result. >> > print np.all(C0 == C1) >> > print np.all(C0 == C2) >> > print np.all(C0 == C3) >> > >> > #----- >> > >> > And here's a snippet from an ipython session: >> > >> > In [51]: run convolve1dtest.py >> > True >> > True >> > True >> > >> > In [52]: %timeit C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2] >> > 100000 loops, best of 3: 16.5 us per loop >> > >> > In [53]: %timeit C[1:-1] = np.dot(V, rw) >> > 100000 loops, best of 3: 9.84 us per loop >> > >> > In [54]: %timeit C[1:-1] = convolve1d(C, w)[1:-1] >> > 100000 loops, best of 3: 18.7 us per loop >> > >> > In [55]: %timeit D = C.copy(); blitz(expr, check_size=0) >> > 100000 loops, best of 3: 4.91 us per loop >> > >> > >> > >> > scipy.weave.blitz is fastest (but note that blitz has already been >> > called >> > once, so the time shown does not include the compilation required in >> > the first call). You could also try scipy.weave.inline, cython.inline, >> > or np_inline (http://pages.cs.wisc.edu/~johnl/np_inline/). >> > >> > Also note that C[-1:1] = np.dot(V, rw) is faster than either the simple >> > expression or convolve1d. However, if you also have to set up V inside >> > your inner loop, the speed gain will be lost. The relative speeds also >> > depend on the size of C. For large C, the simple expression is faster >> > than the matrix multiplication by V (but blitz is still faster). In >> > the following, I have changed n to 2500 before running >> > convolve1dtest.py: >> > >> > In [56]: run convolve1dtest.py >> > True >> > True >> > True >> > >> > In [57]: %timeit C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2] >> > 10000 loops, best of 3: 29.5 us per loop >> > >> > In [58]: %timeit C[1:-1] = np.dot(V, rw) >> > 10000 loops, best of 3: 56.4 us per loop >> > >> > In [59]: %timeit C[1:-1] = convolve1d(C, w)[1:-1] >> > 10000 loops, best of 3: 37.3 us per loop >> > >> > In [60]: %timeit D = C.copy(); blitz(expr, check_size=0) >> > 100000 loops, best of 3: 10.3 us per loop >> > >> > >> > blitz wins, the simple numpy expression is a distant second, and now >> > the matrix multiplication is slowest. >> > >> > I hope that helps--I know I learned quite a bit. :) >> >> Interesting, two questions >> >> does scipy.signal convolve have a similar overhead as np.convolve1d ? > > > Did you mean np.convolve? There is no np.convolve1d. Some of the tests > that I've done with convolution functions are here: > http://www.scipy.org/Cookbook/ApplyFIRFilter > I should add np.convolve to that page. For the case considered here, > np.convolve is a bit slower than scipy.ndimage.convolve1d, but for larger > arrays, it looks like np.convolve can be much faster. > > >> >> memory: >> the blitz code doesn't include the array copy (D), so the timing might >> be a bit misleading? > > > Look again at my %timeit calls in the ipython snippets. :) >
Sorry, I was reading way too fast or selectively (skipping the imports, namespaces for example). (I thought convolve1d was numpy not ndimage) I tried out your cookbook script a while ago, it's nice, I had only a rough idea about the timing before. Compared to the current case the x is much longer, (unless I don't remember or read it correctly.) Thanks, Josef > >> >> I assume the as_strided call doesn't allocate any memory yet, so the >> timing should be correct. (or is this your comment about setting up V >> in the inner loop) >> > > > Yes, that's what I meant; if V has to be created inside the inner loop (so > as_strided is called in the loop), the time it takes to create V eliminates > the benefit of using the matrix approach. > > Warren > > >> >> Josef >> >> > >> > Warren >> > >> > >> >> >> >> Josef >> >> >> >> >> >> > >> >> > Thanks, >> >> > G >> >> > _______________________________________________ >> >> > NumPy-Discussion mailing list >> >> > [email protected] >> >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > >> >> _______________________________________________ >> >> NumPy-Discussion mailing list >> >> [email protected] >> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > >> > >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > [email protected] >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > >> > >> _______________________________________________ >> NumPy-Discussion mailing list >> [email protected] >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > [email protected] > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
