The current approach used by auto_scheduler to extract tuning tasks leverages Relay op strategy. In short, auto_scheduler registers an implementation in Relay op strategy as AutoTVM, but instead of using the TOPI schedule function, auto_scheduler creates an empty schedule and extracts the lowered TE compute function as a tuning task (ref: https://github.com/apache/incubator-tvm/blob/main/python/tvm/relay/op/op.py#L147).
However, an obvious issue of this approach is that the scope of a tuning task is limited by Relay compile engine and op strategy. Specifically, each primitive Relay function can only have at most one complicated op (i.e., reduce ops like conv2d). Relay compile engine will mark that op as the anchor op (ref: https://github.com/apache/incubator-tvm/blob/main/src/relay/backend/compile_engine.cc#L231), and use the TOPI schedule of that op to schedule an entire Relay function (ref: https://github.com/apache/incubator-tvm/blob/main/src/relay/backend/compile_engine.cc#L152). Here is a motivating example: ``` def @main(%data: Tensor[(1, 3, 224, 224), float32], %weight1: Tensor[(32, 3, 3, 3), float32], %weight2: Tensor[(32, 32, 3, 3), float32]) { %3 = fn (%data1: Tensor[(1, 3, 224, 224), float32], %weight11: Tensor[(32, 3, 3, 3), float32], %weight21: Tensor[(32, 32, 3, 3), float32], Primitive=1) { %0 = nn.conv2d(%data1, %weight11, padding=[1, 1, 1, 1], kernel_size=[3, 3]); %1 = nn.relu(%0); %2 = nn.conv2d(%1, %weight21, padding=[1, 1, 1, 1], kernel_size=[3, 3]); nn.relu(%2) }; %3(%data, %weight1, %weight2) } ``` As can be seen, we manually set `%3` to primitive so that it won't be partitioned to two separate functions after the `FuseOps` pass. If we simply build this function, we will get the follow error message: ``` Check failed: !anchor_op_.defined() || anchor_op_pattern_ < kCommReduce == false: Cannot apply TOPI schedule to a primitive function with two complicated ops anchor=Op(nn.conv2d) current=Op(nn.conv2d) ``` As a result, the goal of this RFC is to propos a mechanism that is able to make the above Relay function as an auto_scheduler tuning task, and we can also build it with the tuning logs. The proposed mechanism is: 1. Add a mode, `use_topi_schedule`, to Relay compile engine. When `use_topi_schedule=true`, it performs as it is. When `use_topi_schedule=false`, we do not check if this function has more than one reduce ops but simply invokes `auto_schedule_topi` for an entire TE compute. 2. Propagate the flag `use_topi_schedule` all the way to `GraphRuntimeCodegen` and `relay.Build`. 1. In `auto_scheduler.extract_tasks`, we set `use_topi_schedule=false` so that it can extract tasks. 2. In `relay.build`, we use `auto_scheduler.DispatchContext.current` to judge whether we should query auto_scheduler schedule for an entire function, or query TOPI schedule of the anchor op. The draft PR is available [here](https://github.com/apache/incubator-tvm/pull/6903). Note that since we now extract auto_scheduler tasks directly via compile engine, we completely removed auto_scheduler related logics from Relay op strategy. I also provide a running script [here](https://gist.github.com/comaniac/cc10a341b7d1c2cd504a5cd5456f6b44) if you are willing to play with more Relay functions. Comments and suggestions are welcome :) cc @merrymercy @tqchen @jcf94 @zhiics @haichen --- [Visit Topic](https://discuss.tvm.apache.org/t/rfc-a-general-task-extraction-mechanism-for-auto-scheduler/8444/1) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/640548771452b0c3aad1deffe452c145260627e991f2e7ecfa16e3519a3e407b).