A simple workaround gets the speed back:

In [11]: %timeit (X.T * A.dot(X.T)).sum(axis=0)
1 loop, best of 3: 612 ms per loop

In [12]: %timeit np.einsum('ij,ji->j', A.dot(X.T), X)
1 loop, best of 3: 414 ms per loop


If working as advertised, the code in gh-5488 will convert the
three-argument einsum call into my version automatically.

On Sun, Jun 5, 2016 at 7:44 PM, Stephan Hoyer <sho...@gmail.com> wrote:

> On Sun, Jun 5, 2016 at 5:08 PM, Mark Daoust <daoust...@gmail.com> wrote:
>
>> Here's the einsum version:
>>
>> `es =  np.einsum('Na,ab,Nb->N',X,A,X)`
>>
>> But that's running ~45x slower than your version.
>>
>> OT: anyone know why einsum is so bad for this one?
>>
>
> I think einsum can create some large intermediate arrays. It certainly
> doesn't always do multiplication in the optimal order:
> https://github.com/numpy/numpy/pull/5488
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to