[PATCH] D139398: [AMDGPU] Add bf16 storage support

Pierre van Houtryve via Phabricator via cfe-commits Fri, 09 Dec 2022 01:32:02 -0800

Pierre-vh added inline comments.


================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:913
+      else
+        RegisterVT = (ScalarVT == MVT::bf16 ? MVT::v2bf16 : MVT::v2f16);
       IntermediateVT = RegisterVT;
----------------
arsenm wrote:
> If you wanted the promote to i32, you could have done it here instead of in 
> the tablegen cc handling
Do you mean somewhere else in that function? Changing v2bf16 to i32 here 
doesn't fix it 
I also tried changing the function above but I kept running into asserts so I 
just left the TableGen CC for now


================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:5563
+  return DAG.getNode(ISD::BITCAST, SL, MVT::i16,
+                     DAG.getFPExtendOrRound(Op->getOperand(0), SL, MVT::f16));
+}
----------------
arsenm wrote:
> Should be specific cast, not FPExtOrRound. I don't think the FP_ROUND case 
> would be correct
But we need to do f32 -> f16, isn't FP_ROUND used for that? I thought it's what 
we needed


================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:5573-5576
+  SDLoc SL(Op);
+  return DAG.getNode(
+      ISD::FP_EXTEND, SL, MVT::f32,
+      DAG.getNode(ISD::BITCAST, SL, MVT::f16, Op->getOperand(0)));
----------------
arsenm wrote:
> ExpandNode covers lowering BF16_TO_FP. It also has a shift by 16-bits into 
> the high bits. Is this correct?
Ah I didn't know that, though as long as we use custom lowering, and our 
FP_TO_BF16/BF16_TO_FP methods are consistent, it should be fine, no?


================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:4819-4831
+    // When we don't have 16 bit instructions, bf16 is illegal and gets
+    // softened to i16 for storage, with float being used for arithmetic.
+    //
+    // After softening, some i16 -> fp32 bf16_to_fp operations can be left 
over.
+    // Lower those to (f32 (fp_extend (f16 (bitconvert x))))
+    if (!Op->getValueType(0).isFloatingPoint() ||
+        Op->getOperand(0).getValueType() != MVT::i16)
----------------
arsenm wrote:
> Pierre-vh wrote:
> > arsenm wrote:
> > > Pierre-vh wrote:
> > > > arsenm wrote:
> > > > > Pierre-vh wrote:
> > > > > > arsenm wrote:
> > > > > > > The generic legalizer should have handled this?
> > > > > > It looks like those operations are not implemented in the generic 
> > > > > > legalizer, e.g. I get 
> > > > > > ``` 
> > > > > > Do not know how to promote this operator's operand!
> > > > > > ```
> > > > > Right, this is the code that would go there
> > > > Do I just copy/paste this code in that PromoteInt function, and keep a 
> > > > copy here too in LowerOperation? (not really a fan of copy-pasting code 
> > > > in different files, I'd rather keep it all here)
> > > > We need to have the lowering too AFAIK, it didn't go well when I tried 
> > > > to remove it
> > > I'm not following why you need to handle it here
> > IIRC:
> >  - I need to handle FP_TO_BF16 in ReplaceNodeResult because that's what the 
> > Integer Legalizer calls (through CustomLowerNode)
> >  - I need to handle both opcodes in LowerOperation because otherwise 
> > they'll fail selection. They can be left over from expanding/legalizing 
> > other operations.
> But why are they custom? We don't have to handle FP16_TO_FP or FP_TO_FP16 
> there, and they aren't custom lowered. They have the same basic properties. 
> We have this:
> 
> 
> ```
>     setOperationAction(ISD::FP16_TO_FP, MVT::i16, Promote);
>     AddPromotedToType(ISD::FP16_TO_FP, MVT::i16, MVT::i32);
>     setOperationAction(ISD::FP_TO_FP16, MVT::i16, Promote);
>     AddPromotedToType(ISD::FP_TO_FP16, MVT::i16, MVT::i32);
> ```
> 
> I'd expect the same basic pattern
PromoteIntegerOperand, PromoteFloatOperand and PromoteIntegerResult don't 
handle FP_TO_BF16 and BF16_TO_FP, and unless we put a Custom lowering mode 
it'll assert/unreachable.
I tried to make it work (for a while) using the default expand but I can't 
quite get it to work. It feels like there is some legalizer work missing for 
handling BF16 like we want to.
Even though it's not ideal I think the custom lowering is easiest


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139398/new/

https://reviews.llvm.org/D139398

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D139398: [AMDGPU] Add bf16 storage support

Reply via email to