[TVM Discuss] [Development/RFC] [RFC] TVM Target Specification

tqchen via TVM Discuss Tue, 02 Jun 2020 08:57:11 -0700


[quote="kparzysz, post:11, topic:6844"]
a composite target looks like a better solution. As the next step I suggest 
that we **drop the target host** completely. A function can be split into 
multiple parts meant for different targets. Instead of explicitly designating a 
certain target as a *target host* , we can use the same mechanism that assigns 
the individual parts to their targets to assign the “host” part to its target. 
This would remove the distinction between a host and a device code for the 
purposes of code generation.
[/quote]

I agree that it is attempting to remove target host completely. However that
does brings some trouble to our analysis in the early stage.

Specifically, when we talk about say a "GPU program", there are two kinds of
mindset here.

In the high level(relay, topi), we would make a composite GPU kernel(e.g.
softmax) as a "gpu program". The softmax could actually contain multiple kernel
launches and needs host code for dimension calculations, but because the code
itself only reads/writes GPU mem, we view such kernel as a GPU program. It is
also useful to view it in that way, because the ML Kernel writer and scheduler
view them as GPU program, rather than heterogeneous program in high level
scheduling.

At the lowest level, the "GPU program" only refers to the device code, but not
the host code that drives the program.

So we can find the design choice really boils down to how do we view the device
program:
- V0: a gpu(device) program is a program that involves a single device target
and related host code to drive that target.
- V1: a gpu(device) program is a program that only involves device code but not
the host driving part.

While from the low-level driver's PoV it is certainly easier to take the V1
view. The V0 view can be more useful in the following regard:
- Provide a useful device key for dispatching high level schedules.
- It is the natural way high level developers use to think about program with a
single device target.
- It offers simplicity for users who want to specify the target(e.g. they don't
have to specify cuda as a composite target).

It also acknowledges the fact that there is a difference between a single
target(host/device mix) program and multiple device targets program. We can
still use the composite target for the later ones. That does mean though
usually such per target split could happen earlier in the graph stage instead
of the later stage.

As some additional fruits for thoughts, V0 and V1 also corresponds to two
different kinds of mindsets that the CUDA programming model and OpenCL
programming model advocated for. As we know nvcc allows GPU kernels to directly
blend into "cu files" and to programmers the cu files becomes what we know GPU
program. The OpenCL model is closer to the V1. As we know that CUDA model "won"
the GPGPU programming over the other one, in my opinion, due to the mindset
offered in V0

---
[Visit Topic](https://discuss.tvm.ai/t/rfc-tvm-target-specification/6844/12) to
respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click
here](https://discuss.tvm.ai/email/unsubscribe/6d569a59ddce5eaff7bc7b7acc9e6f6dec3f27b9be0e8d099073267eee19f454).

[TVM Discuss] [Development/RFC] [RFC] TVM Target Specification

Reply via email to