Glad to see a proposal which such functionality.
I have to admit that I had also done something similar to what you are proposing (at least at the TE level). One problem I had was enlargment of the tensor iteration domains (https://discuss.tvm.apache.org/t/tvm-scheduling-split-factors-missmatch-to-original-extent/2982/2). This had the problem of many inner loops being mostly "out of original domain" which was really not performant or a huge explotion of "program code" to statically solve all those ifs. Another problem you would phase are the limitations of FuseOps. Without any change here, I dont know how you would "automatically" get those composed stages you will need in order to schedule them. Nonetheless, for specific configurations of layers, I think what you propose is a reasonable way of processing. Rolling buffers would make it even sweeter. ## side note [quote="matt-arm, post:3, topic:8119"] Determining whether continuing a cascade is profitable would be one of the jobs of the cascading algorithm. [/quote] This kind of reminds me of the "graph tunner" infrastructure. Maybe there should be a more general infrastructure for doing "graph level tunning" (I think the current one only tunes w.r.t. different layout transformations). --- [Visit Topic](https://discuss.tvm.apache.org/t/rfc-cascade-scheduling/8119/4) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/fc540957de51233737735e283093080978e1d5c6f0e02afab2f015bf3b859608).