I see fundamental problem in this PR.
Jacobian(Y, W):
tensor compute.jacobian{0x165b360}[0] : float32 [32, 3000, 3000, 10000]
axes (i : [0, 31], j : [0, 2999], jac_i0 : [0, 2999], jac_i1 : [0, 9999])
Reduction
identity [0.000000f]
lhs [x.der] rhs [y.der]
combiner [(x.der + y.der)]
axes (k : [0, 9999])
condition (uint1)1
source[0] = (X(i, k)*float32(((jac_i0 == j) && (jac_i1 == k))))
This is a really, really big tensor, and the approach this PR take has a "cliff
of death" performance chart.
This PR then rely on simplification to eliminate all those tensor. If any
tensor is not eliminated(which seems to be the case for more complex tensor)
the performance will be very bad.
Reverse mode automatic differentiation should only calculate vector jacobian
product.
The jacobian of Y, W, should be dW times jacobian Y W. the Jacobian should
simply never be manifested.
can this be fixed so the algorithm will not be algorithmically slower without
optimization?
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-tvm/issues/1996#issuecomment-595890287