Re: [Numpy-discussion] np.diag(np.dot(A, B))

Daπid Fri, 22 May 2015 03:54:33 -0700

On 22 May 2015 at 12:15, Mathieu Blondel <[email protected]> wrote:


> Right now I am using np.sum(A * B.T, axis=1) for dense data and I have
> implemented a Cython routine for sparse data.
> I haven't benched np.sum(A * B.T, axis=1) vs. np.einsum("ij,ji->i", A, B)
> yet since I am mostly interested in the sparse case right now.
>


In my system, einsum seems to be faster.


In [3]: N = 256

In [4]: A = np.random.random((N, N))

In [5]: B = np.random.random((N, N))

In [6]: %timeit np.sum(A * B.T, axis=1)
1000 loops, best of 3: 260 µs per loop

In [7]: %timeit  np.einsum("ij,ji->i", A, B)
10000 loops, best of 3: 147 µs per loop


In [9]: N = 1023

In [10]: A = np.random.random((N, N))

In [11]: B = np.random.random((N, N))

In [12]: %timeit np.sum(A * B.T, axis=1)
100 loops, best of 3: 14 ms per loop

In [13]: %timeit  np.einsum("ij,ji->i", A, B)
100 loops, best of 3: 10.7 ms per loop


I have ATLAS installed from the Fedora repos, so not tuned; but einsum is
only using one thread anyway, so probably it is not using it (definitely
not computing the full dot, because that already takes 200 ms).

If B is in FORTRAN order, it is much faster (for N=5000).

In [25]: Bf = B.copy(order='F')

In [26]: %timeit  np.einsum("ij,ji->i", A, Bf)
10 loops, best of 3: 25.7 ms per loop

In [27]: %timeit  np.einsum("ij,ji->i", A, B)
1 loops, best of 3: 404 ms per loop

In [29]: %timeit np.sum(A * Bf.T, axis=1)
10 loops, best of 3: 118 ms per loop

In [30]: %timeit np.sum(A * B.T, axis=1)
1 loops, best of 3: 517 ms per loop

But the copy is not worth it:

In [31]: %timeit Bf = B.copy(order='F')
1 loops, best of 3: 463 ms per loop



/David.

_______________________________________________
NumPy-Discussion mailing list
[email protected]
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] np.diag(np.dot(A, B))

Reply via email to