[clang] [llvm] [AMDGPU] Add dot product patterns with saturating add (clamp) (PR #187945)

Matt Arsenault via cfe-commits Thu, 09 Apr 2026 10:48:34 -0700

================
@@ -2366,6 +2373,164 @@ bool 
AMDGPUCodeGenPrepareImpl::visitMbcntHi(IntrinsicInst &I) const {
   return tryReplaceWithWorkitemId(I, Wave);
 }
 
+/// Helper to match the dot4 pattern: mul(zext/sext <4 x i8>, zext/sext <4 x
+/// i8>) Returns true if pattern matches, sets A, B to the <4 x i8> sources and
+/// IsSigned based on whether sext was used.
+static bool matchDot4Pattern(Value *MulOp, Value *&A, Value *&B,
+                             bool &IsSigned) {
+  auto *Mul = dyn_cast<BinaryOperator>(MulOp);
+  if (!Mul || Mul->getOpcode() != Instruction::Mul)
+    return false;
+
+  // Check that result type is <4 x i32>
+  auto *MulTy = dyn_cast<FixedVectorType>(Mul->getType());
+  if (!MulTy || MulTy->getNumElements() != 4 ||
+      !MulTy->getElementType()->isIntegerTy(32))
+    return false;
+
+  Value *Src0 = Mul->getOperand(0);
+  Value *Src1 = Mul->getOperand(1);
+
+  // Match zext <4 x i8> or sext <4 x i8>
+  auto matchExtend = [](Value *V, Value *&Src, bool &Signed) -> bool {
+    if (auto *ZExt = dyn_cast<ZExtInst>(V)) {
+      auto *SrcTy = dyn_cast<FixedVectorType>(ZExt->getSrcTy());
+      if (SrcTy && SrcTy->getNumElements() == 4 &&
+          SrcTy->getElementType()->isIntegerTy(8)) {
----------------
arsenm wrote:


This could probably be a bit more tolerant of smaller bitwidths. Could 
computeNumSignBits, but best left for later 

https://github.com/llvm/llvm-project/pull/187945
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Add dot product patterns with saturating add (clamp) (PR #187945)

Reply via email to