Thanks for the feedback :) Tiling the output computations + `compute_at` is actually exactly what I've been doing to prototype this - and you're right that for a sufficiently large tile the recompute isn't particularly bad. I think the rolling buffers aren't immediately essential, but they would be a very beneficial future optimization.
In our testing/prototyping we have found profitable cascades of 5+ ops, particularly in both mobilenet-type architectures and super-resolution networks. Determining whether continuing a cascade is profitable would be one of the jobs of the cascading algorithm. My major concern integrating this is that convolution-type operations are always on their own in primitive functions. For my experiments I'm currently lowering the whole graph to a single TE but this will not work alongside the current TOPI integration which expects 'master ops' to determine the schedule. In essence I would like to do a hierarchical scheduling, first of the cascades and the second of the ops themselves. --- [Visit Topic](https://discuss.tvm.apache.org/t/rfc-cascade-scheduling/8119/3) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/3094e775f5188e9ab1f01dc892191d3dad3dfe6af3c8b41133276a75aa463b41).