A simple workaround gets the speed back:
In [11]: %timeit (X.T * A.dot(X.T)).sum(axis=0) 1 loop, best of 3: 612 ms per loop In [12]: %timeit np.einsum('ij,ji->j', A.dot(X.T), X) 1 loop, best of 3: 414 ms per loop If working as advertised, the code in gh-5488 will convert the three-argument einsum call into my version automatically. On Sun, Jun 5, 2016 at 7:44 PM, Stephan Hoyer <sho...@gmail.com> wrote: > On Sun, Jun 5, 2016 at 5:08 PM, Mark Daoust <daoust...@gmail.com> wrote: > >> Here's the einsum version: >> >> `es = np.einsum('Na,ab,Nb->N',X,A,X)` >> >> But that's running ~45x slower than your version. >> >> OT: anyone know why einsum is so bad for this one? >> > > I think einsum can create some large intermediate arrays. It certainly > doesn't always do multiplication in the optimal order: > https://github.com/numpy/numpy/pull/5488 > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion