Thanks @Menooker  or the RFC. It would be great a motivation section can be 
added(why b16 support) for others who does not have a background on this. In 
terms of techinical choices discussions, it would be great to list the design 
choices, discuss their pros and cons, and then talk about concerns. 

### Design Choices and Pros and Cons
- B0: Do all legalization(cast, and compute) in TIR
- B1: Do all legalization in the target codegen(LLVM, CUDA etc.)
- B2: Do compute legalization in TIR, cast legalization in target codegen.

### Discussion
Given the above choices, B0 allows us to use the same legalization process for 
all target backends(e.g. CUDA, LLVM, C if necessary), this is the main reason 
why doing it in TIR is more desirable. The implementation complexity is not 
that different, given the main difference is about moving related 
implementations from the target to the common TIR. 

Notably, we don't have to lower the cast into a specific function, while 
external function was being used in custom data types as an example, we can 
certainly in this case directly lowers to sequence of expressions(reinterpret 
then shift), which will be properly vectorize.

bf16 is already a  "first class type" from the moment we bring 
`DataType::kBfloat16`. The legalization is necessary for backends that does not 
support the type, as more backend moves to support it, we could optionally skip 
the legalization step for those backend. This is another reason why it is 
important to have the b16 support either in TIR or the backend itself, instead 
of splitting the support into two parts.





---
[Visit Topic](https://discuss.tvm.ai/t/rfc-add-bfloat16-data-type/6778/3) to 
respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.ai/email/unsubscribe/0546a61a2c73b37304a519a768846a08b563f4baa24cec8127a8bd4d060ac35c).

Reply via email to