Thanks @Menooker or the RFC. It would be great a motivation section can be added(why b16 support) for others who does not have a background on this. In terms of techinical choices discussions, it would be great to list the design choices, discuss their pros and cons, and then talk about concerns.
### Design Choices and Pros and Cons - B0: Do all legalization(cast, and compute) in TIR - B1: Do all legalization in the target codegen(LLVM, CUDA etc.) - B2: Do compute legalization in TIR, cast legalization in target codegen. ### Discussion Given the above choices, B0 allows us to use the same legalization process for all target backends(e.g. CUDA, LLVM, C if necessary), this is the main reason why doing it in TIR is more desirable. The implementation complexity is not that different, given the main difference is about moving related implementations from the target to the common TIR. Notably, we don't have to lower the cast into a specific function, while external function was being used in custom data types as an example, we can certainly in this case directly lowers to sequence of expressions(reinterpret then shift), which will be properly vectorize. bf16 is already a "first class type" from the moment we bring `DataType::kBfloat16`. The legalization is necessary for backends that does not support the type, as more backend moves to support it, we could optionally skip the legalization step for those backend. This is another reason why it is important to have the b16 support either in TIR or the backend itself, instead of splitting the support into two parts. --- [Visit Topic](https://discuss.tvm.ai/t/rfc-add-bfloat16-data-type/6778/3) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/0546a61a2c73b37304a519a768846a08b563f4baa24cec8127a8bd4d060ac35c).