pengfei added inline comments.
================ Comment at: llvm/test/CodeGen/X86/avx512bf16-intrinsics-upgrade.ll:30 ; X64-NEXT: kmovd %edi, %k1 # encoding: [0xc5,0xfb,0x92,0xcf] -; X64-NEXT: vcvtne2ps2bf16 %zmm1, %zmm0, %zmm0 {%k1} {z} # encoding: [0x62,0xf2,0x7f,0xc9,0x72,0xc1] +; X64-NEXT: vmovdqu16 %zmm0, %zmm0 {%k1} {z} # encoding: [0x62,0xf1,0xff,0xc9,0x6f,0xc0] ; X64-NEXT: retq # encoding: [0xc3] ---------------- RKSimon wrote: > pengfei wrote: > > RKSimon wrote: > > > any chance we can recover the predicated instruction? > > It's possible, e.g., iterate all users of the intrinsic, bitcast all the > > select operands as well; or add patterns for i16; or make vselect peek > > through bitcast etc. > > But I think the small performance regression is not a critical requirement > > as the backward compatibility for the old intrinsics. It may not worth the > > code complexity. > OK - how come the mask_move_lowering_f16_bf16 refactoring in > X86InstrAVX512.td didn't fix this? The `mask_move_lowering_f16_bf16` should do nothing with it. I think the problem is after AutoUpgrade the IR becomes: ``` %0 = tail call <32 x bfloat> @llvm.x86.avx512bf16.cvtne2ps2bf16.512(<16 x float> %A, <16 x float> %B) %1 = bitcast i32 %U to <32 x i1> %2 = bitcast <32 x bfloat> %0 to <32 x i16> %3 = select <32 x i1> %1, <32 x i16> %2, <32 x i16> zeroinitializer %4 = bitcast <32 x i16> %3 to <8 x i64> ret <8 x i64> %4 ``` And after refactoring of X86InstrAVX512.td, we are able to match ``` %0 = tail call <32 x bfloat> @llvm.x86.avx512bf16.cvtne2ps2bf16.512(<16 x float> %A, <16 x float> %B) ... ... %2 = select <32 x i1> %1, <32 x bfloat> %0, <32 x bfloat> zeroinitializer ``` So leaving the upgraded IRs failed to match the predicated instruction. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D132329/new/ https://reviews.llvm.org/D132329 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits