aemerson added a comment. Herald added a subscriber: jplehr. @dmgreen I've been looking at this test again trying to see what's missing. The problem now is that only a VF of 4 is chosen. In the good case, instcombine/simplifyCFG runs so that it simplifies down to an `smin` intrinsic. After this change `__SSAT()` is inlined first. We then have:
target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128" target triple = "aarch64-linux-gnu" define void @arm_mult_q15(ptr %pSrcA, ptr %pSrcB, ptr noalias %pDst, i32 %blockSize) { entry: br label %while.cond while.cond: ; preds = %while.body, %entry %pSrcB.addr.0 = phi ptr [ %pSrcB, %entry ], [ %incdec.ptr1, %while.body ] %pDst.addr.0 = phi ptr [ %pDst, %entry ], [ %incdec.ptr4, %while.body ] %pSrcA.addr.0 = phi ptr [ %pSrcA, %entry ], [ %incdec.ptr, %while.body ] %blkCnt.0 = phi i32 [ %blockSize, %entry ], [ %dec, %while.body ] %cmp.not = icmp eq i32 %blkCnt.0, 0 br i1 %cmp.not, label %while.end, label %while.body while.body: ; preds = %while.cond %incdec.ptr = getelementptr inbounds i16, ptr %pSrcA.addr.0, i32 1 %0 = load i16, ptr %pSrcA.addr.0, align 2 %conv = sext i16 %0 to i32 %incdec.ptr1 = getelementptr inbounds i16, ptr %pSrcB.addr.0, i32 1 %1 = load i16, ptr %pSrcB.addr.0, align 2 %conv2 = sext i16 %1 to i32 %mul = mul nsw i32 %conv, %conv2 %shr = ashr i32 %mul, 15 %cmp4.i = icmp sgt i32 %shr, 32767 %switch.i = icmp ult i1 %cmp4.i, true %spec.select.i = select i1 %switch.i, i32 %shr, i32 32767 %conv3 = trunc i32 %spec.select.i to i16 %incdec.ptr4 = getelementptr inbounds i16, ptr %pDst.addr.0, i32 1 store i16 %conv3, ptr %pDst.addr.0, align 2 %dec = add i32 %blkCnt.0, -1 br label %while.cond while.end: ; preds = %while.cond ret void } These instructions are from the callee that should now be combined into `smin`: %cmp4.i = icmp sgt i32 %shr, 32767 %switch.i = icmp ult i1 %cmp4.i, true %spec.select.i = select i1 %switch.i, i32 %shr, i32 32767 ... except due to the surrounding instructions, the first icmp is optimized into `icmp sgt i32 %mul, 1073741823` by `InstCombinerImpl::foldICmpInstWithConstant()` This breaks the smin recognition. I'm not sure what the best approach is to fix this. InstCombine already has this chunk of code to try to avoid messing with compares that might form min/max patterns but it expects further simplification to fire: // Test if the ICmpInst instruction is used exclusively by a select as // part of a minimum or maximum operation. If so, refrain from doing // any other folding. This helps out other analyses which understand // non-obfuscated minimum and maximum idioms, such as ScalarEvolution // and CodeGen. And in this case, at least one of the comparison // operands has at least one user besides the compare (the select), // which would often largely negate the benefit of folding anyway. // // Do the same for the other patterns recognized by matchSelectPattern. if (I.hasOneUse()) if (SelectInst *SI = dyn_cast<SelectInst>(I.user_back())) { Value *A, *B; SelectPatternResult SPR = matchSelectPattern(SI, A, B); if (SPR.Flavor != SPF_UNKNOWN) return nullptr; } Any ideas? I'd really like to get this inliner change in because it's fundamentally a good change to have. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D143624/new/ https://reviews.llvm.org/D143624 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits