arsenm added inline comments.
================ Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:4819-4831 + // When we don't have 16 bit instructions, bf16 is illegal and gets + // softened to i16 for storage, with float being used for arithmetic. + // + // After softening, some i16 -> fp32 bf16_to_fp operations can be left over. + // Lower those to (f32 (fp_extend (f16 (bitconvert x)))) + if (!Op->getValueType(0).isFloatingPoint() || + Op->getOperand(0).getValueType() != MVT::i16) ---------------- Pierre-vh wrote: > arsenm wrote: > > Pierre-vh wrote: > > > arsenm wrote: > > > > Pierre-vh wrote: > > > > > arsenm wrote: > > > > > > Pierre-vh wrote: > > > > > > > arsenm wrote: > > > > > > > > Pierre-vh wrote: > > > > > > > > > arsenm wrote: > > > > > > > > > > The generic legalizer should have handled this? > > > > > > > > > It looks like those operations are not implemented in the > > > > > > > > > generic legalizer, e.g. I get > > > > > > > > > ``` > > > > > > > > > Do not know how to promote this operator's operand! > > > > > > > > > ``` > > > > > > > > Right, this is the code that would go there > > > > > > > Do I just copy/paste this code in that PromoteInt function, and > > > > > > > keep a copy here too in LowerOperation? (not really a fan of > > > > > > > copy-pasting code in different files, I'd rather keep it all here) > > > > > > > We need to have the lowering too AFAIK, it didn't go well when I > > > > > > > tried to remove it > > > > > > I'm not following why you need to handle it here > > > > > IIRC: > > > > > - I need to handle FP_TO_BF16 in ReplaceNodeResult because that's > > > > > what the Integer Legalizer calls (through CustomLowerNode) > > > > > - I need to handle both opcodes in LowerOperation because otherwise > > > > > they'll fail selection. They can be left over from > > > > > expanding/legalizing other operations. > > > > But why are they custom? We don't have to handle FP16_TO_FP or > > > > FP_TO_FP16 there, and they aren't custom lowered. They have the same > > > > basic properties. We have this: > > > > > > > > > > > > ``` > > > > setOperationAction(ISD::FP16_TO_FP, MVT::i16, Promote); > > > > AddPromotedToType(ISD::FP16_TO_FP, MVT::i16, MVT::i32); > > > > setOperationAction(ISD::FP_TO_FP16, MVT::i16, Promote); > > > > AddPromotedToType(ISD::FP_TO_FP16, MVT::i16, MVT::i32); > > > > ``` > > > > > > > > I'd expect the same basic pattern > > > PromoteIntegerOperand, PromoteFloatOperand and PromoteIntegerResult don't > > > handle FP_TO_BF16 and BF16_TO_FP, and unless we put a Custom lowering > > > mode it'll assert/unreachable. > > > I tried to make it work (for a while) using the default expand but I > > > can't quite get it to work. It feels like there is some legalizer work > > > missing for handling BF16 like we want to. > > > Even though it's not ideal I think the custom lowering is easiest > > What about Expand? that's where the implemented part is > Last I tried, Expand will emit a libcall in many cases that we don't handle Library call is supposed to be a distinct action now, the DAG only did about 5% of the work to migrate to using it. This code can go to the default expand action ================ Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:5564 + SDLoc SL(Op); + const TargetLowering &TLI = DAG.getTargetLoweringInfo(); + ---------------- This is just this ================ Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:5578 + DAG.getConstant(16, SL, + TLI.getShiftAmountTy(MVT::i32, DAG.getDataLayout()))}); + return DAG.getNode(ISD::TRUNCATE, SL, MVT::i16, Result); ---------------- can just hardcode i32 ================ Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:5590 + SDLoc SL(Op); + const TargetLowering &TLI = DAG.getTargetLoweringInfo(); + ---------------- This is just this ================ Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:5596 + ISD::SHL, SL, MVT::i32, + {DAG.getNode(ISD::ZERO_EXTEND, SL, MVT::i32, Op->getOperand(0)), + DAG.getConstant(16, SL, ---------------- This can be any_extend and the combiner will probably turn it into one ================ Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:5598 + DAG.getConstant(16, SL, + TLI.getShiftAmountTy(MVT::i32, DAG.getDataLayout()))}); + Result = DAG.getBitcast(MVT::f32, Result); ---------------- Can just hardcode i32 ================ Comment at: llvm/test/CodeGen/AMDGPU/bf16.ll:11 +; than simple load/stores. + +define void @test_load_store(ptr addrspace(1) %in, ptr addrspace(1) %out) { ---------------- Doesn't cover the different f32/f64 conversions? ================ Comment at: llvm/test/CodeGen/AMDGPU/bf16.ll:2953 +; GFX10-NEXT: s_setpc_b64 s[30:31] + %ins.0 = insertvalue { <32 x i32>, bfloat } undef, <32 x i32> %b, 0 + %ins.1 = insertvalue { <32 x i32>, bfloat } %ins.0 ,bfloat %a, 1 ---------------- Use poison instead of undef in tests Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D139398/new/ https://reviews.llvm.org/D139398 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits