Thanks @matt-arm This is certainly quite interesting for the case of fused depthwise and 3x3 conv2d.
As a middle ground, perhaps it is worth while to think about simply tiling the output computations and use the `compute_at` at the intermediate tiles, it would of course less ideal than a rolling buffer, but would still create something similar in effect (as long as recomputation is minimum at the edge) Given the receptive field can quickly expand with 3x3 conv, it may not make sense to create a very deep pathways. --- [Visit Topic](https://discuss.tvm.apache.org/t/rfc-cascade-scheduling/8119/2) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/74ea08a39da694011ec8d577afa83c7baa31ae7bd26e725e6d11bf36b737e23c).