pengfei added inline comments.
================ Comment at: clang/lib/Headers/avx512bf16intrin.h:13 +#ifdef __SSE2__ + ---------------- LuoYuanke wrote: > What is this macro check used for? `__bf16` is not available without SSE2. This is to make sure no error generated if user include <immintrin.h> ================ Comment at: clang/test/CodeGen/X86/avx512bf16-error.c:14 +__bfloat16 bar(__bfloat16 a, __bfloat16 b) { + return a + b; +} ---------------- LuoYuanke wrote: > Need test for other operations (-, *, /) as well? I don't think so. This is to check `__bfloat16` is deprecated. We should check them when enabling `__bf16` on SSE2. ================ Comment at: llvm/include/llvm/IR/IntrinsicsX86.td:4928 Intrinsic<[llvm_v4f32_ty], [llvm_v4f32_ty, llvm_v4i32_ty, llvm_v4i32_ty], [IntrNoMem]>; def int_x86_avx512bf16_dpbf16ps_256: ---------------- LuoYuanke wrote: > It seems we still use i32 to represent <2 x bf16>, but we don't have a better > way since 1 bit mask cover a pair of bf16 elements. I think mask is not an issue because both the passthru and dst are <4 x float>. ================ Comment at: llvm/lib/IR/AutoUpgrade.cpp:4095 + Intrinsic::x86_avx512bf16_mask_cvtneps2bf16_128) + Args[1] = Builder.CreateBitCast( + Args[1], FixedVectorType::get(Builder.getBFloatTy(), NumElts)); ---------------- LuoYuanke wrote: > Why there is no bitcast for the input for the other intrinsics? I expect to > see the bitcast from vXi16 to vXbf16. Others don't have vXbf16 in inputs. ================ Comment at: llvm/lib/Target/X86/X86InstrAVX512.td:3916 +multiclass mask_move_lowering_f16_bf16<AVX512VLVectorVTInfo _> { let Predicates = [HasBWI] in { + def : Pat<(_.info512.VT (vselect VK32WM:$mask, (_.info512.VT VR512:$src1), (_.info512.VT VR512:$src0))), ---------------- LuoYuanke wrote: > Not sure the indent is correct or not. The format is chaos in td files, at least we have code using in this way :) ================ Comment at: llvm/test/CodeGen/X86/bfloat.ll:32 +; BF16-NEXT: shll $16, %eax +; BF16-NEXT: vmovd %eax, %xmm1 +; BF16-NEXT: vaddss %xmm0, %xmm1, %xmm0 ---------------- LuoYuanke wrote: > It seems the difference between SSE2 and BF16 is using SSE instruction or AVX > instruction. What do we expect to test for BF16? This is to make sure the scalar type works under AVX512-BF16. We may optimize it with `vcvtneps2bf16` in future. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D132329/new/ https://reviews.llvm.org/D132329 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits