@sergei-grechanik that is what ppl do. the problem of parallel ad is shared read become shared write, but you can pull optimization tricks to turn shared write into a single write (for example, https://people.csail.mit.edu/tzumao/gradient_halide/). I think this is a smarter approach, for when their optimization failed, they only have write sync, but when ours optimization failed, we have giant tensors.
-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/apache/incubator-tvm/issues/1996#issuecomment-595998821