[PATCH] D132329: [X86][RFC] Using `__bf16` for AVX512_BF16 intrinsics

Phoebe Wang via Phabricator via cfe-commits Wed, 19 Oct 2022 06:39:15 -0700

pengfei added inline comments.


================
Comment at: clang/lib/Headers/avx512bf16intrin.h:13
 
+#ifdef __SSE2__
+
----------------
LuoYuanke wrote:
> What is this macro check used for?
`__bf16` is not available without SSE2. This is to make sure no error generated 
if user include <immintrin.h> 


================
Comment at: clang/test/CodeGen/X86/avx512bf16-error.c:14
+__bfloat16 bar(__bfloat16 a, __bfloat16 b) {
+  return a + b;
+}
----------------
LuoYuanke wrote:
> Need test for other operations (-, *, /) as well?
I don't think so. This is to check `__bfloat16` is deprecated. We should check 
them when enabling `__bf16` on SSE2.


================
Comment at: llvm/include/llvm/IR/IntrinsicsX86.td:4928
               Intrinsic<[llvm_v4f32_ty],
               [llvm_v4f32_ty, llvm_v4i32_ty, llvm_v4i32_ty], [IntrNoMem]>;
   def int_x86_avx512bf16_dpbf16ps_256:
----------------
LuoYuanke wrote:
> It seems we still use i32 to represent <2 x bf16>, but we don't have a better 
> way since 1 bit mask cover a pair of bf16 elements.
I think mask is not an issue because both the passthru and dst are <4 x float>.


================
Comment at: llvm/lib/IR/AutoUpgrade.cpp:4095
+        Intrinsic::x86_avx512bf16_mask_cvtneps2bf16_128)
+      Args[1] = Builder.CreateBitCast(
+          Args[1], FixedVectorType::get(Builder.getBFloatTy(), NumElts));
----------------
LuoYuanke wrote:
> Why there is no bitcast for the input for the other intrinsics? I expect to 
> see the bitcast from vXi16 to vXbf16.
Others don't have vXbf16 in inputs.


================
Comment at: llvm/lib/Target/X86/X86InstrAVX512.td:3916
+multiclass mask_move_lowering_f16_bf16<AVX512VLVectorVTInfo _> {
 let Predicates = [HasBWI] in {
+  def : Pat<(_.info512.VT (vselect VK32WM:$mask, (_.info512.VT VR512:$src1), 
(_.info512.VT VR512:$src0))),
----------------
LuoYuanke wrote:
> Not sure the indent is correct or not.
The format is chaos in td files, at least we have code using in this way :)


================
Comment at: llvm/test/CodeGen/X86/bfloat.ll:32
+; BF16-NEXT:    shll $16, %eax
+; BF16-NEXT:    vmovd %eax, %xmm1
+; BF16-NEXT:    vaddss %xmm0, %xmm1, %xmm0
----------------
LuoYuanke wrote:
> It seems the difference between SSE2 and BF16 is using SSE instruction or AVX 
> instruction. What do we expect to test for BF16?
This is to make sure the scalar type works under AVX512-BF16. We may optimize 
it with `vcvtneps2bf16` in future.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D132329/new/

https://reviews.llvm.org/D132329

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D132329: [X86][RFC] Using `__bf16` for AVX512_BF16 intrinsics

Reply via email to