Thanks @matt-arm This is certainly quite interesting for the case of fused 
depthwise and 3x3 conv2d.

As a middle ground, perhaps it is worth while to think about simply tiling the 
output computations and use the `compute_at`  at the intermediate tiles, it 
would of course less ideal than a rolling buffer, but would still create 
something similar in effect (as long as recomputation is minimum at the edge)

Given the receptive field can quickly expand with 3x3 conv, it may not make 
sense to create a very deep pathways.





---
[Visit Topic](https://discuss.tvm.apache.org/t/rfc-cascade-scheduling/8119/2) 
to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/74ea08a39da694011ec8d577afa83c7baa31ae7bd26e725e6d11bf36b737e23c).

Reply via email to