pengfei added inline comments.
================ Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:616 + setOperationAction(ISD::FROUNDEVEN, MVT::f16, Promote); + setOperationAction(ISD::FP_ROUND, MVT::f16, Expand); + setOperationAction(ISD::FP_EXTEND, MVT::f32, Expand); ---------------- LuoYuanke wrote: > Just confused how to expand it. Will the expand fail and finally turns to > libcall? Yeah, we can use `LibCall` instead. ================ Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:763 + if (isTypeLegal(MVT::f16)) { + setOperationAction(ISD::FP_EXTEND, MVT::f80, Custom); + setOperationAction(ISD::STRICT_FP_EXTEND, MVT::f80, Custom); ---------------- LuoYuanke wrote: > Why f16 emulation affect f80 type? Are we checking isTypeLegal(MVT::f80)? It's in the scope of `if (UseX87)`. And we need to lower `fpext half %0 to x86_fp80`. ================ Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:22448 + if (SrcVT == MVT::f16) + return SDValue(); + ---------------- LuoYuanke wrote: > Why we don't extent to f32 here? Return `SDValue()` will extent later. This can save the code. ================ Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:22522 + if (!isScalarFPTypeInSSEReg(SrcVT) || + (SrcVT == MVT::f16 && !Subtarget.hasFP16())) return SDValue(); ---------------- LuoYuanke wrote: > Why we don't extent to f32 here? Will it be promoted finally? Yes. ================ Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:22765 + DAG.getIntPtrConstant(0, DL)); + Res = DAG.getNode(X86ISD::STRICT_CVTPS2PH, DL, {MVT::v8i16, MVT::Other}, + {Chain, Res, DAG.getTargetConstant(4, DL, MVT::i32)}); ---------------- LuoYuanke wrote: > Should MVT::v8i16 be MVT::v8f16? No. We use `MVT::v8i16` when we enabled F16C instructions. ================ Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:22775 + + Res = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, MVT::i16, Res, + DAG.getIntPtrConstant(0, DL)); ---------------- LuoYuanke wrote: > MVT::f16 and delete the bitcast? I don't think we have pattern to extract `f16` from `v8i16`. Besides, I think keeping the bitcast makes the flow clear. ================ Comment at: llvm/lib/Target/X86/X86InstrAVX512.td:1476 } -let Predicates = [HasFP16] in { +let Predicates = [HasBWI] in { def : Pat<(v32f16 (X86VBroadcastld16 addr:$src)), ---------------- LuoYuanke wrote: > If target don't have avx512bw feature. There is some other pattern to lower > the node or fp16 broadcast node is invalid? Good catch. Added in X86InstrSSE.td ================ Comment at: llvm/lib/Target/X86/X86InstrAVX512.td:4107 _.ExeDomain>, EVEX_4V, Sched<[SchedWriteFShuffle.XMM]>; + let Predicates = [prd] in { def rrkz : AVX512PI<0x10, MRMSrcReg, (outs _.RC:$dst), ---------------- LuoYuanke wrote: > Previous prd only apply to "def rr"? Is it a bug for previous code? No. previous code works well because no mask variants before AVX512 and no f16 before FP16. The latter is not true now. ================ Comment at: llvm/lib/Target/X86/X86InstrAVX512.td:4352 +defm : avx512_move_scalar_lowering<"VMOVSHZ", X86Movsh, fp16imm0, v8f16x_info>; +defm : avx512_store_scalar_lowering<"VMOVSHZ", avx512vl_f16_info, + (v32i1 (bitconvert (and GR32:$mask, (i32 1)))), GR32>; ---------------- LuoYuanke wrote: > Why previous code don't have predicates? Because no legal `f16` previously. ================ Comment at: llvm/lib/Target/X86/X86InstrAVX512.td:11657 +let Predicates = [HasBWI], AddedComplexity = -10 in { + def : Pat<(f16 (load addr:$src)), (COPY_TO_REGCLASS (VPINSRWZrm (v8i16 (IMPLICIT_DEF)), addr:$src, 0), FR16X)>; ---------------- LuoYuanke wrote: > Why set AddedComplexity to -10? There no such addtional complexity in > previous code. Add comments for it? We used it before, but very little. We need to make sure select FP16 instructions first if available. ================ Comment at: llvm/lib/Target/X86/X86InstrSSE.td:3970 +let Predicates = [UseSSE2], AddedComplexity = -10 in { + def : Pat<(f16 (load addr:$src)), (COPY_TO_REGCLASS (PINSRWrm (v8i16 (IMPLICIT_DEF)), addr:$src, 0), FR16)>; ---------------- LuoYuanke wrote: > Why AddedComplexity = -10? Add comments for it? This is to avoid FP16 instructions been overridden. ================ Comment at: llvm/lib/Target/X86/X86InstrSSE.td:3978 +let Predicates = [HasAVX, NoBWI], AddedComplexity = -10 in { + def : Pat<(f16 (load addr:$src)), (COPY_TO_REGCLASS (VPINSRWrm (v8i16 (IMPLICIT_DEF)), addr:$src, 0), FR16)>; + def : Pat<(i16 (bitconvert f16:$src)), (EXTRACT_SUBREG (VPEXTRWrr (v8i16 (COPY_TO_REGCLASS FR16:$src, VR128)), 0), sub_16bit)>; ---------------- LuoYuanke wrote: > Miss pattern for store? It's in line 5214. ================ Comment at: llvm/lib/Target/X86/X86InstrSSE.td:5214 + +let Predicates = [HasAVX, NoBWI] in + def : Pat<(store f16:$src, addr:$dst), (VPEXTRWmr addr:$dst, (v8i16 (COPY_TO_REGCLASS FR16:$src, VR128)), 0)>; ---------------- LuoYuanke wrote: > Why no AddedComplexity for it? We don't need it if no BWI. ================ Comment at: llvm/lib/Target/X86/X86RegisterInfo.td:540 +def FR16 : RegisterClass<"X86", [f16], 16, (add FR32)> {let Size = 32;} + ---------------- LuoYuanke wrote: > pengfei wrote: > > skan wrote: > > > The alignment is not same as the size? > > No. This is spill size instead of alignment. > When there is avx512fp16 feature, is the spill size still 32? Yes, it's more efficient to use movss that insert/extrct. And we also use `FR32X` for AVX512 targets without FP16. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D107082/new/ https://reviews.llvm.org/D107082 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits