On Mon, Jan 24, 2011 at 6:49 PM, <josef.p...@gmail.com> wrote: > On Mon, Jan 24, 2011 at 4:29 PM, eat <e.antero.ta...@gmail.com> wrote: >> Hi, >> >> Running on: >> In []: np.__version__ >> Out[]: '1.5.1' >> In []: sys.version >> Out[]: '2.7.1 (r271:86832, Nov 27 2010, 18:30:46) [MSC v.1500 32 bit >> (Intel)]' >> >> For the reference: >> In []: X= randn(10, 125) >> In []: timeit dot(X.T, X) >> 10000 loops, best of 3: 170 us per loop >> In []: X= randn(10, 250) >> In []: timeit dot(X.T, X) >> 1000 loops, best of 3: 671 us per loop >> In []: X= randn(10, 500) >> In []: timeit dot(X.T, X) >> 100 loops, best of 3: 5.15 ms per loop >> In []: X= randn(10, 1000) >> In []: timeit dot(X.T, X) >> 100 loops, best of 3: 20 ms per loop >> In []: X= randn(10, 2000) >> In []: timeit dot(X.T, X) >> 10 loops, best of 3: 80.7 ms per loop >> >> Performance of triu_indices: >> In []: timeit triu_indices(125) >> 1000 loops, best of 3: 662 us per loop >> In []: timeit triu_indices(250) >> 100 loops, best of 3: 2.55 ms per loop >> In []: timeit triu_indices(500) >> 100 loops, best of 3: 15 ms per loop >> In []: timeit triu_indices(1000) >> 10 loops, best of 3: 59.8 ms per loop >> In []: timeit triu_indices(2000) >> 1 loops, best of 3: 239 ms per loop >> >> So the tri*_indices calculations seems to be unreasonable slow compared to >> for >> example calculations of inner products. >> >> Now, just to compare for a very naive implementation of triu indices. >> In []: def iut(n): >> ..: r= np.empty(n* (n+ 1)/ 2, dtype= int) >> ..: c= r.copy() >> ..: a= np.arange(n) >> ..: m= 0 >> ..: for i in xrange(n): >> ..: ni= n- i >> ..: mni= m+ ni >> ..: r[m: mni]= i >> ..: c[m: mni]= a[i: n] >> ..: m+= ni >> ..: return (r, c) >> ..: >> >> Are we really calculating the same thing? >> In []: triu_indices(5) >> Out[]: >> (array([0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4]), >> array([0, 1, 2, 3, 4, 1, 2, 3, 4, 2, 3, 4, 3, 4, 4])) >> In []: iut(5) >> Out[]: >> (array([0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4]), >> array([0, 1, 2, 3, 4, 1, 2, 3, 4, 2, 3, 4, 3, 4, 4])) >> >> Seems so, and then its performance: >> In []: timeit iut(125) >> 1000 loops, best of 3: 992 us per loop >> In []: timeit iut(250) >> 100 loops, best of 3: 2.03 ms per loop >> In []: timeit iut(500) >> 100 loops, best of 3: 5.3 ms per loop >> In []: timeit iut(1000) >> 100 loops, best of 3: 13.9 ms per loop >> In []: timeit iut(2000) >> 10 loops, best of 3: 39.8 ms per loop >> >> Even the naive implementation is very slow, but allready outperforms >> triu_indices, when n is > 250! >> >> So finally my question is how one could substantially improve the performance >> of indices calculations? > > What's the timing of this version (taken from nitime) ? it builds a > full intermediate array
I should have checked the numpy source first, that's exactly the implementation of triu_indices in numpy 1.5.1 Josef > > m = np.ones((n,n),int) > a = np.triu(m,k) > np.where(a != 0) > >>>> n=5 >>>> m = np.ones((n,n),int) >>>> np.where(np.triu(m,0) != 0) > (array([0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4]), > array([0, 1, 2, 3, 4, 1, 2, 3, 4, 2, 3, 4, 3, 4, 4])) > > or maybe variations on it that all build a full intermediate matrix > >>>> np.where(1-np.tri(n,n,-1)) > (array([0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4]), > array([0, 1, 2, 3, 4, 1, 2, 3, 4, 2, 3, 4, 3, 4, 4])) > >>>> np.where(np.subtract.outer(np.arange(n), np.arange(n))<=0) > (array([0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4]), > array([0, 1, 2, 3, 4, 1, 2, 3, 4, 2, 3, 4, 3, 4, 4])) > > Josef > >> >> >> Regards, >> eat >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion@scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion