The current approach used by auto_scheduler to extract tuning tasks leverages 
Relay op strategy. In short, auto_scheduler registers an implementation in 
Relay op strategy as AutoTVM, but instead of using the TOPI schedule function, 
auto_scheduler creates an empty schedule and extracts the lowered TE compute 
function as a tuning task (ref: 
https://github.com/apache/incubator-tvm/blob/main/python/tvm/relay/op/op.py#L147).

However, an obvious issue of this approach is that the scope of a tuning task 
is limited by Relay compile engine and op strategy. Specifically, each 
primitive Relay function can only have at most one complicated op (i.e., reduce 
ops like conv2d). Relay compile engine will mark that op as the anchor op (ref: 
https://github.com/apache/incubator-tvm/blob/main/src/relay/backend/compile_engine.cc#L231),
 and use the TOPI schedule of that op to schedule an entire Relay function 
(ref: 
https://github.com/apache/incubator-tvm/blob/main/src/relay/backend/compile_engine.cc#L152).

Here is a motivating example:

```
def @main(%data: Tensor[(1, 3, 224, 224), float32], %weight1: Tensor[(32, 3, 3, 
3), float32], %weight2: Tensor[(32, 32, 3, 3), float32]) {
  %3 = fn (%data1: Tensor[(1, 3, 224, 224), float32], %weight11: Tensor[(32, 3, 
3, 3), float32], %weight21: Tensor[(32, 32, 3, 3), float32], Primitive=1) {
    %0 = nn.conv2d(%data1, %weight11, padding=[1, 1, 1, 1], kernel_size=[3, 3]);
    %1 = nn.relu(%0);
    %2 = nn.conv2d(%1, %weight21, padding=[1, 1, 1, 1], kernel_size=[3, 3]);
    nn.relu(%2)
  };
  %3(%data, %weight1, %weight2)
}
```

As can be seen, we manually set `%3` to primitive so that it won't be 
partitioned to two separate functions after the `FuseOps` pass. If we simply 
build this function, we will get the follow error message:

```
Check failed: !anchor_op_.defined() || anchor_op_pattern_ < kCommReduce == 
false: Cannot apply TOPI schedule to a primitive function with two complicated 
ops anchor=Op(nn.conv2d) current=Op(nn.conv2d)
```

As a result, the goal of this RFC is to propos a mechanism that is able to make 
the above Relay function as an auto_scheduler tuning task, and we can also 
build it with the tuning logs.

The proposed mechanism is:
1. Add a mode, `use_topi_schedule`, to Relay compile engine. When 
`use_topi_schedule=true`, it performs as it is. When `use_topi_schedule=false`, 
we do not check if this function has more than one reduce ops but simply 
invokes `auto_schedule_topi` for an entire TE compute.
2. Propagate the flag `use_topi_schedule` all the way to `GraphRuntimeCodegen` 
and `relay.Build`.
        1. In `auto_scheduler.extract_tasks`, we set `use_topi_schedule=false` 
so that it can extract tasks.
        2. In `relay.build`, we use `auto_scheduler.DispatchContext.current` to 
judge whether we should query auto_scheduler schedule for an entire function, 
or query TOPI schedule of the anchor op.

The draft PR is available 
[here](https://github.com/apache/incubator-tvm/pull/6903). Note that since we 
now extract auto_scheduler tasks directly via compile engine, we completely 
removed auto_scheduler related logics from Relay op strategy.

I also provide a running script 
[here](https://gist.github.com/comaniac/cc10a341b7d1c2cd504a5cd5456f6b44) if 
you are willing to play with more Relay functions.

Comments and suggestions are welcome :)

cc @merrymercy @tqchen @jcf94 @zhiics @haichen





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/rfc-a-general-task-extraction-mechanism-for-auto-scheduler/8444/1)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/640548771452b0c3aad1deffe452c145260627e991f2e7ecfa16e3519a3e407b).

Reply via email to