On 22 May 2015 at 12:15, Mathieu Blondel <[email protected]> wrote:
> Right now I am using np.sum(A * B.T, axis=1) for dense data and I have
> implemented a Cython routine for sparse data.
> I haven't benched np.sum(A * B.T, axis=1) vs. np.einsum("ij,ji->i", A, B)
> yet since I am mostly interested in the sparse case right now.
>
In my system, einsum seems to be faster.
In [3]: N = 256
In [4]: A = np.random.random((N, N))
In [5]: B = np.random.random((N, N))
In [6]: %timeit np.sum(A * B.T, axis=1)
1000 loops, best of 3: 260 µs per loop
In [7]: %timeit np.einsum("ij,ji->i", A, B)
10000 loops, best of 3: 147 µs per loop
In [9]: N = 1023
In [10]: A = np.random.random((N, N))
In [11]: B = np.random.random((N, N))
In [12]: %timeit np.sum(A * B.T, axis=1)
100 loops, best of 3: 14 ms per loop
In [13]: %timeit np.einsum("ij,ji->i", A, B)
100 loops, best of 3: 10.7 ms per loop
I have ATLAS installed from the Fedora repos, so not tuned; but einsum is
only using one thread anyway, so probably it is not using it (definitely
not computing the full dot, because that already takes 200 ms).
If B is in FORTRAN order, it is much faster (for N=5000).
In [25]: Bf = B.copy(order='F')
In [26]: %timeit np.einsum("ij,ji->i", A, Bf)
10 loops, best of 3: 25.7 ms per loop
In [27]: %timeit np.einsum("ij,ji->i", A, B)
1 loops, best of 3: 404 ms per loop
In [29]: %timeit np.sum(A * Bf.T, axis=1)
10 loops, best of 3: 118 ms per loop
In [30]: %timeit np.sum(A * B.T, axis=1)
1 loops, best of 3: 517 ms per loop
But the copy is not worth it:
In [31]: %timeit Bf = B.copy(order='F')
1 loops, best of 3: 463 ms per loop
/David.
_______________________________________________
NumPy-Discussion mailing list
[email protected]
http://mail.scipy.org/mailman/listinfo/numpy-discussion