Hi all,
We have completed a workable draft of bfloat16 (bf16) in TVM and the bf16 
related codegen in LLVM.

We add bfloat16 as a new type named "bf16" in the frontend. Completed LLVM 
backend for generating bf16.

 * Use int16 as the storage type in LLVM
 * Add legalization to enable computations on bf16
 * Add runtime frontend support (e.g. allow converting numpy's uint16 array to 
bf16 NDArray)

# Details on legalization

Since most of the HW has no native support for computation on bf16, we added a 
pass `BF16Legalization` to use fp32 computing bf16 data. It adds 
`cast_to_fp32()` before each Op involing bf16 operands, and use Ops of fp32 to 
compute. Finally, it adds a 'cast_to_bf16()' after each Op that is altered. e.g.

`add(a,b)` => `cast16(add(cast32(a), cast32(b)))`

We call this phase as "BF16Promotion". It is a sub-pass of `BF16Legalization` 
pass.

We note that this will add redundant casting. e.g.

`add(a, neg(b))` => `cast16(add(cast32(a), cast32(cast16(neg(cast32(b))))) `

The pattern `cast32(cast16(some_fp32_value))` can be simplified to 
`some_fp32_value`.

Thus, we add an optimization pass after "BF16Promotion" in `BF16Legalization` 
pass, which eliminates redundant casts.

After `BF16Legalization` pass, there will be no bf16 related computation in the 
AST, except casting between fp32 and bf16, bf16 value comparasion and 
assignment.

# Casting between fp32 and bf16

We follow PyTorch's bf16 
[casting](https://github.com/pytorch/pytorch/blob/master/c10/util/BFloat16.h) 
implementation.

# Pull request

[Here](https://github.com/apache/incubator-tvm/pull/5601)





---
[Visit Topic](https://discuss.tvm.ai/t/add-bfloat16-data-type/6778/1) to 
respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.ai/email/unsubscribe/7c9a1759e37404737fff0d6fbab4d4bffd63f659b2095d393cd57fd2aa1cef7f).

Reply via email to