[llvm] [clang] [clang-tools-extra] [LoopVectorize] Refine runtime memory check costs when there is an outer loop (PR #76034)

2024-01-18 Thread Rin Dobrescu via cfe-commits


@@ -2076,16 +2081,61 @@ class GeneratedRTChecks {
 LLVM_DEBUG(dbgs() << "  " << C << "  for " << I << "\n");
 RTCheckCost += C;
   }
-if (MemCheckBlock)
+if (MemCheckBlock) {
+  InstructionCost MemCheckCost = 0;
   for (Instruction &I : *MemCheckBlock) {
 if (MemCheckBlock->getTerminator() == &I)
   continue;
 InstructionCost C =
 TTI->getInstructionCost(&I, TTI::TCK_RecipThroughput);
 LLVM_DEBUG(dbgs() << "  " << C << "  for " << I << "\n");
-RTCheckCost += C;
+MemCheckCost += C;
   }
 
+  // If the runtime memory checks are being created inside an outer loop
+  // we should find out if these checks are outer loop invariant. If so,
+  // the checks will likely be hoisted out and so the effective cost will
+  // reduce according to the outer loop trip count.
+  if (OuterLoop) {
+ScalarEvolution *SE = MemCheckExp.getSE();
+// TODO: We could refine this further by analysing every individual
+// memory check, since there could be a mixture of loop variant and
+// invariant checks that mean the final condition is variant. However,
+// I think it would need further analysis to prove this is beneficial.
+const SCEV *Cond = SE->getSCEV(MemRuntimeCheckCond);
+if (SE->isLoopInvariant(Cond, OuterLoop)) {
+  // It seems reasonable to assume that we can reduce the effective
+  // cost of the checks even when we know nothing about the trip
+  // count. Here I've assumed that the outer loop executes at least
+  // twice.
+  unsigned BestTripCount = 2;
+
+  // If exact trip count is known use that.
+  if (unsigned SmallTC = SE->getSmallConstantTripCount(OuterLoop))
+BestTripCount = SmallTC;
+  else if (LoopVectorizeWithBlockFrequency) {
+// Else use profile data if available.
+if (auto EstimatedTC = getLoopEstimatedTripCount(OuterLoop))
+  BestTripCount = *EstimatedTC;
+  }
+
+  InstructionCost NewMemCheckCost = MemCheckCost / BestTripCount;
+
+  // Let's ensure the cost is always at least 1.
+  NewMemCheckCost = std::max(*NewMemCheckCost.getValue(), (long)1);

Rin18 wrote:

There's a buildbot failure at this line. Has that been fixed? Might be worth 
getting that triggered again.

https://github.com/llvm/llvm-project/pull/76034
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [clang-tools-extra] [LoopVectorize] Refine runtime memory check costs when there is an outer loop (PR #76034)

2024-01-18 Thread Rin Dobrescu via cfe-commits

https://github.com/Rin18 edited https://github.com/llvm/llvm-project/pull/76034
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [clang-tools-extra] [LoopVectorize] Refine runtime memory check costs when there is an outer loop (PR #76034)

2024-01-18 Thread Rin Dobrescu via cfe-commits

https://github.com/Rin18 commented:

One small comment, but otherwise LGTM! I'll leave someone else more familiar 
with the code to approve the change.

https://github.com/llvm/llvm-project/pull/76034
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits