On Mon, Jan 24, 2011 at 4:29 PM, eat <e.antero.ta...@gmail.com> wrote: > Hi, > > Running on: > In []: np.__version__ > Out[]: '1.5.1' > In []: sys.version > Out[]: '2.7.1 (r271:86832, Nov 27 2010, 18:30:46) [MSC v.1500 32 bit (Intel)]' > > For the reference: > In []: X= randn(10, 125) > In []: timeit dot(X.T, X) > 10000 loops, best of 3: 170 us per loop > In []: X= randn(10, 250) > In []: timeit dot(X.T, X) > 1000 loops, best of 3: 671 us per loop > In []: X= randn(10, 500) > In []: timeit dot(X.T, X) > 100 loops, best of 3: 5.15 ms per loop > In []: X= randn(10, 1000) > In []: timeit dot(X.T, X) > 100 loops, best of 3: 20 ms per loop > In []: X= randn(10, 2000) > In []: timeit dot(X.T, X) > 10 loops, best of 3: 80.7 ms per loop > > Performance of triu_indices: > In []: timeit triu_indices(125) > 1000 loops, best of 3: 662 us per loop > In []: timeit triu_indices(250) > 100 loops, best of 3: 2.55 ms per loop > In []: timeit triu_indices(500) > 100 loops, best of 3: 15 ms per loop > In []: timeit triu_indices(1000) > 10 loops, best of 3: 59.8 ms per loop > In []: timeit triu_indices(2000) > 1 loops, best of 3: 239 ms per loop > > So the tri*_indices calculations seems to be unreasonable slow compared to for > example calculations of inner products. > > Now, just to compare for a very naive implementation of triu indices. > In []: def iut(n): > ..: r= np.empty(n* (n+ 1)/ 2, dtype= int) > ..: c= r.copy() > ..: a= np.arange(n) > ..: m= 0 > ..: for i in xrange(n): > ..: ni= n- i > ..: mni= m+ ni > ..: r[m: mni]= i > ..: c[m: mni]= a[i: n] > ..: m+= ni > ..: return (r, c) > ..: > > Are we really calculating the same thing? > In []: triu_indices(5) > Out[]: > (array([0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4]), > array([0, 1, 2, 3, 4, 1, 2, 3, 4, 2, 3, 4, 3, 4, 4])) > In []: iut(5) > Out[]: > (array([0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4]), > array([0, 1, 2, 3, 4, 1, 2, 3, 4, 2, 3, 4, 3, 4, 4])) > > Seems so, and then its performance: > In []: timeit iut(125) > 1000 loops, best of 3: 992 us per loop > In []: timeit iut(250) > 100 loops, best of 3: 2.03 ms per loop > In []: timeit iut(500) > 100 loops, best of 3: 5.3 ms per loop > In []: timeit iut(1000) > 100 loops, best of 3: 13.9 ms per loop > In []: timeit iut(2000) > 10 loops, best of 3: 39.8 ms per loop > > Even the naive implementation is very slow, but allready outperforms > triu_indices, when n is > 250! > > So finally my question is how one could substantially improve the performance > of indices calculations?
What's the timing of this version (taken from nitime) ? it builds a full intermediate array m = np.ones((n,n),int) a = np.triu(m,k) np.where(a != 0) >>> n=5 >>> m = np.ones((n,n),int) >>> np.where(np.triu(m,0) != 0) (array([0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4]), array([0, 1, 2, 3, 4, 1, 2, 3, 4, 2, 3, 4, 3, 4, 4])) or maybe variations on it that all build a full intermediate matrix >>> np.where(1-np.tri(n,n,-1)) (array([0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4]), array([0, 1, 2, 3, 4, 1, 2, 3, 4, 2, 3, 4, 3, 4, 4])) >>> np.where(np.subtract.outer(np.arange(n), np.arange(n))<=0) (array([0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4]), array([0, 1, 2, 3, 4, 1, 2, 3, 4, 2, 3, 4, 3, 4, 4])) Josef > > > Regards, > eat > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion