[llvm-branch-commits] [llvm] use default intrinsic attrs for BPF packet loads (PR #105314)

2024-08-21 Thread Nikita Popov via llvm-branch-commits

https://github.com/nikic approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/105314
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/19.x: [AMDGPU] Disable inline constants for pseudo scalar transcendentals (#104395) (PR #105472)

2024-08-21 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad approved this pull request.


https://github.com/llvm/llvm-project/pull/105472
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [mlir] [mlir][OpenMP] Convert reduction alloc region to LLVMIR (PR #102524)

2024-08-21 Thread Tom Eccles via llvm-branch-commits


@@ -594,45 +594,85 @@ convertOmpOrderedRegion(Operation &opInst, 
llvm::IRBuilderBase &builder,
 
 /// Allocate space for privatized reduction variables.
 template 
-static void allocByValReductionVars(
-T loop, ArrayRef reductionArgs, llvm::IRBuilderBase 
&builder,
-LLVM::ModuleTranslation &moduleTranslation,
-llvm::OpenMPIRBuilder::InsertPointTy &allocaIP,
-SmallVectorImpl &reductionDecls,
-SmallVectorImpl &privateReductionVariables,
-DenseMap &reductionVariableMap,
-llvm::ArrayRef isByRefs) {
+static LogicalResult
+allocReductionVars(T loop, ArrayRef reductionArgs,
+   llvm::IRBuilderBase &builder,
+   LLVM::ModuleTranslation &moduleTranslation,
+   llvm::OpenMPIRBuilder::InsertPointTy &allocaIP,
+   SmallVectorImpl &reductionDecls,
+   SmallVectorImpl &privateReductionVariables,
+   DenseMap &reductionVariableMap,
+   llvm::ArrayRef isByRefs) {
   llvm::IRBuilderBase::InsertPointGuard guard(builder);
   builder.SetInsertPoint(allocaIP.getBlock()->getTerminator());
 
+  // delay creating stores until after all allocas
+  SmallVector> storesToCreate;
+  storesToCreate.reserve(loop.getNumReductionVars());
+
   for (std::size_t i = 0; i < loop.getNumReductionVars(); ++i) {
-if (isByRefs[i])
-  continue;
-llvm::Value *var = builder.CreateAlloca(
-moduleTranslation.convertType(reductionDecls[i].getType()));
-moduleTranslation.mapValue(reductionArgs[i], var);
-privateReductionVariables[i] = var;
-reductionVariableMap.try_emplace(loop.getReductionVars()[i], var);
+Region &allocRegion = reductionDecls[i].getAllocRegion();
+if (isByRefs[i]) {
+  if (allocRegion.empty())

tblah wrote:

The alloc region is optional. If it isn't included it could still be included 
in the initialization region as normal. This could happen for example if there 
is no part of allocation that is on the stack (because we don't want a call to 
malloc mixed into the middle of allocas).

https://github.com/llvm/llvm-project/pull/102524
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [FixIrreducible] Use CycleInfo instead of a custom SCC traversal (PR #101386)

2024-08-21 Thread Sameer Sahasrabuddhe via llvm-branch-commits

https://github.com/ssahasra edited 
https://github.com/llvm/llvm-project/pull/101386
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [mlir] [MLIR][omp] Add omp.workshare op (PR #101443)

2024-08-21 Thread Tom Eccles via llvm-branch-commits

https://github.com/tblah approved this pull request.

LGTM. Thanks for the updates

https://github.com/llvm/llvm-project/pull/101443
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [FixIrreducible] Use CycleInfo instead of a custom SCC traversal (PR #101386)

2024-08-21 Thread Sameer Sahasrabuddhe via llvm-branch-commits


@@ -189,6 +195,21 @@ template  class GenericCycle {
   //@{
   using const_entry_iterator =
   typename SmallVectorImpl::const_iterator;
+  const_entry_iterator entry_begin() const {
+return const_entry_iterator{Entries.begin()};

ssahasra wrote:

Fixed.

https://github.com/llvm/llvm-project/pull/101386
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [FixIrreducible] Use CycleInfo instead of a custom SCC traversal (PR #101386)

2024-08-21 Thread Sameer Sahasrabuddhe via llvm-branch-commits


@@ -107,6 +107,12 @@ template  class GenericCycle {
 return is_contained(Entries, Block);
   }
 
+  /// \brief Replace all entries with \p Block as single entry.
+  void setSingleEntry(BlockT *Block) {
+Entries.clear();
+Entries.push_back(Block);

ssahasra wrote:

Fixed.

https://github.com/llvm/llvm-project/pull/101386
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [FixIrreducible] Use CycleInfo instead of a custom SCC traversal (PR #101386)

2024-08-21 Thread Sameer Sahasrabuddhe via llvm-branch-commits

ssahasra wrote:

> This needs a finer method that redirects only specific edges. Either that, or 
> we let the pass destroy some cycles. But updating `CycleInfo` for these 
> missing subcycles may be a fair amount of work too, so I would rather do it 
> the right way.

This now depends on the newly refactored ControlFlowHub, which correctly 
reroutes only the relevant edges. The effect was already caught in an existing 
test with nested cycles and a common header, so no new test needs to be written 
for this.

https://github.com/llvm/llvm-project/pull/101386
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [FixIrreducible] Use CycleInfo instead of a custom SCC traversal (PR #101386)

2024-08-21 Thread Sameer Sahasrabuddhe via llvm-branch-commits

ssahasra wrote:

> Note that I have not yet finished verifying all the lit tests. I might also 
> have to add a few more tests, especially involving a mix of irreducible and 
> reducible cycles that are siblings and/or nested inside each other in various 
> combinations. Especially with some overlap in the entry and header nodes.

- New tests added that involve nesting with common header or entry nodes. 
Existing tests also covered some relevant combinations.
- Verified all tests.

https://github.com/llvm/llvm-project/pull/101386
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [FixIrreducible] Use CycleInfo instead of a custom SCC traversal (PR #101386)

2024-08-21 Thread Sameer Sahasrabuddhe via llvm-branch-commits

https://github.com/ssahasra edited 
https://github.com/llvm/llvm-project/pull/101386
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [FixIrreducible] Use CycleInfo instead of a custom SCC traversal (PR #101386)

2024-08-21 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm approved this pull request.


https://github.com/llvm/llvm-project/pull/101386
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [flang][omp] Emit omp.workshare in frontend (PR #101444)

2024-08-21 Thread Tom Eccles via llvm-branch-commits

tblah wrote:

> Should we have a `-use-experimental-workshare` or similar flag to facilitate 
> some temporary in-tree development as this may require more moving pieces?

A flag like that sounds appropriate yes. The current code changes look good.

https://github.com/llvm/llvm-project/pull/101444
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] Backport taint analysis slowdown regression fix (PR #105516)

2024-08-21 Thread Balazs Benics via llvm-branch-commits

https://github.com/steakhal created 
https://github.com/llvm/llvm-project/pull/105516

Same as the cherry-picked commit + the release notes.

>From 1d10df6937e914e610da9c5818ba09ff711beb05 Mon Sep 17 00:00:00 2001
From: Balazs Benics 
Date: Wed, 21 Aug 2024 14:24:56 +0200
Subject: [PATCH 1/2] [analyzer] Limit `isTainted()` by skipping complicated
 symbols (#105493)

As discussed in

https://discourse.llvm.org/t/rfc-make-istainted-and-complex-symbols-friends/79570/10

Some `isTainted()` queries can blow up the analysis times, and
effectively halt the analysis under specific workloads.

We don't really have the time now to do a caching re-implementation of
`isTainted()`, so we need to workaround the case.

The workaround with the smallest blast radius was to limit what symbols
`isTainted()` does the query (by walking the SymExpr). So far, the
threshold 10 worked for us, but this value can be overridden using the
"max-tainted-symbol-complexity" config value.

This new option is "deprecated" from the getgo, as I expect this issue
to be fixed within the next few months and I don't want users to
override this value anyways. If they do, this message will let them know
that they are on their own, and the next release may break them (as we
no longer recognize this option if we drop it).

Mitigates #89720

CPP-5414

(cherry picked from commit 848658955a9d2d42ea3e319d191e2dcd5d76c837)
---
 .../StaticAnalyzer/Core/AnalyzerOptions.def   |  5 ++
 clang/lib/StaticAnalyzer/Checkers/Taint.cpp   |  7 +++
 clang/test/Analysis/analyzer-config.c |  1 +
 clang/test/Analysis/taint-generic.c   | 49 ++-
 4 files changed, 61 insertions(+), 1 deletion(-)

diff --git a/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def 
b/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def
index 29aa6a3b8a16e7..737bc8e86cfb6a 100644
--- a/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def
+++ b/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def
@@ -407,6 +407,11 @@ ANALYZER_OPTION(
 ANALYZER_OPTION(unsigned, MaxSymbolComplexity, "max-symbol-complexity",
 "The maximum complexity of symbolic constraint.", 35)
 
+// 
HACK:https://discourse.llvm.org/t/rfc-make-istainted-and-complex-symbols-friends/79570
+// Ideally, we should get rid of this option soon.
+ANALYZER_OPTION(unsigned, MaxTaintedSymbolComplexity, 
"max-tainted-symbol-complexity",
+"[DEPRECATED] The maximum complexity of a symbol to carry 
taint", 9)
+
 ANALYZER_OPTION(unsigned, MaxTimesInlineLarge, "max-times-inline-large",
 "The maximum times a large function could be inlined.", 32)
 
diff --git a/clang/lib/StaticAnalyzer/Checkers/Taint.cpp 
b/clang/lib/StaticAnalyzer/Checkers/Taint.cpp
index 6362c82b009d72..0bb5739db4b756 100644
--- a/clang/lib/StaticAnalyzer/Checkers/Taint.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/Taint.cpp
@@ -12,6 +12,7 @@
 
 #include "clang/StaticAnalyzer/Checkers/Taint.h"
 #include "clang/StaticAnalyzer/Core/BugReporter/BugReporter.h"
+#include "clang/StaticAnalyzer/Core/PathSensitive/AnalysisManager.h"
 #include "clang/StaticAnalyzer/Core/PathSensitive/ProgramStateTrait.h"
 #include 
 
@@ -256,6 +257,12 @@ std::vector 
taint::getTaintedSymbolsImpl(ProgramStateRef State,
   if (!Sym)
 return TaintedSymbols;
 
+  // 
HACK:https://discourse.llvm.org/t/rfc-make-istainted-and-complex-symbols-friends/79570
+  if (const auto &Opts = State->getAnalysisManager().getAnalyzerOptions();
+  Sym->computeComplexity() > Opts.MaxTaintedSymbolComplexity) {
+return {};
+  }
+
   // Traverse all the symbols this symbol depends on to see if any are tainted.
   for (SymbolRef SubSym : Sym->symbols()) {
 if (!isa(SubSym))
diff --git a/clang/test/Analysis/analyzer-config.c 
b/clang/test/Analysis/analyzer-config.c
index 2a4c40005a4dc0..1ee0d8e4eecebd 100644
--- a/clang/test/Analysis/analyzer-config.c
+++ b/clang/test/Analysis/analyzer-config.c
@@ -96,6 +96,7 @@
 // CHECK-NEXT: max-inlinable-size = 100
 // CHECK-NEXT: max-nodes = 225000
 // CHECK-NEXT: max-symbol-complexity = 35
+// CHECK-NEXT: max-tainted-symbol-complexity = 9
 // CHECK-NEXT: max-times-inline-large = 32
 // CHECK-NEXT: min-cfg-size-treat-functions-as-large = 14
 // CHECK-NEXT: mode = deep
diff --git a/clang/test/Analysis/taint-generic.c 
b/clang/test/Analysis/taint-generic.c
index b0df85f237298d..1c139312734bca 100644
--- a/clang/test/Analysis/taint-generic.c
+++ b/clang/test/Analysis/taint-generic.c
@@ -63,6 +63,7 @@ void clang_analyzer_isTainted_char(char);
 void clang_analyzer_isTainted_wchar(wchar_t);
 void clang_analyzer_isTainted_charp(char*);
 void clang_analyzer_isTainted_int(int);
+void clang_analyzer_dump_int(int);
 
 int coin();
 
@@ -459,7 +460,53 @@ unsigned radar11369570_hanging(const unsigned char *arr, 
int l) {
 longcmp(a, t, c);
 l -= 12;
   }
-  return 5/a; // expected-warning {{Division by a tainted value, possibly 
zero}}
+  return 5/a; // FIXME: Should be a "div by

[llvm-branch-commits] [clang] Backport taint analysis slowdown regression fix (PR #105516)

2024-08-21 Thread Balazs Benics via llvm-branch-commits

https://github.com/steakhal milestoned 
https://github.com/llvm/llvm-project/pull/105516
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] Backport taint analysis slowdown regression fix (PR #105516)

2024-08-21 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-clang-static-analyzer-1

Author: Balazs Benics (steakhal)


Changes

Same as the cherry-picked commit + the release notes.

---
Full diff: https://github.com/llvm/llvm-project/pull/105516.diff


5 Files Affected:

- (modified) clang/docs/ReleaseNotes.rst (+5) 
- (modified) clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def (+5) 
- (modified) clang/lib/StaticAnalyzer/Checkers/Taint.cpp (+7) 
- (modified) clang/test/Analysis/analyzer-config.c (+1) 
- (modified) clang/test/Analysis/taint-generic.c (+48-1) 


``diff
diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index 17ddbfe910f878..fa69fcb9aa29bf 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -1391,6 +1391,11 @@ Crash and bug fixes
 - Z3 crosschecking (aka. Z3 refutation) is now bounded, and can't consume
   more total time than the eymbolic execution itself. (#GH97298)
 
+- In clang-18, we regressed in terms of analysis time for projects having many
+  nested loops with buffer indexing or shifting or other binary operations.
+  For example, functions computing different hash values. Some of this slowdown
+  was attributed to taint analysis, which is fixed now. (#GH105493)
+
 - ``std::addressof``, ``std::as_const``, ``std::forward``,
   ``std::forward_like``, ``std::move``, ``std::move_if_noexcept``, are now
   modeled just like their builtin counterpart. (#GH94193)
diff --git a/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def 
b/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def
index 29aa6a3b8a16e7..737bc8e86cfb6a 100644
--- a/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def
+++ b/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def
@@ -407,6 +407,11 @@ ANALYZER_OPTION(
 ANALYZER_OPTION(unsigned, MaxSymbolComplexity, "max-symbol-complexity",
 "The maximum complexity of symbolic constraint.", 35)
 
+// 
HACK:https://discourse.llvm.org/t/rfc-make-istainted-and-complex-symbols-friends/79570
+// Ideally, we should get rid of this option soon.
+ANALYZER_OPTION(unsigned, MaxTaintedSymbolComplexity, 
"max-tainted-symbol-complexity",
+"[DEPRECATED] The maximum complexity of a symbol to carry 
taint", 9)
+
 ANALYZER_OPTION(unsigned, MaxTimesInlineLarge, "max-times-inline-large",
 "The maximum times a large function could be inlined.", 32)
 
diff --git a/clang/lib/StaticAnalyzer/Checkers/Taint.cpp 
b/clang/lib/StaticAnalyzer/Checkers/Taint.cpp
index 6362c82b009d72..0bb5739db4b756 100644
--- a/clang/lib/StaticAnalyzer/Checkers/Taint.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/Taint.cpp
@@ -12,6 +12,7 @@
 
 #include "clang/StaticAnalyzer/Checkers/Taint.h"
 #include "clang/StaticAnalyzer/Core/BugReporter/BugReporter.h"
+#include "clang/StaticAnalyzer/Core/PathSensitive/AnalysisManager.h"
 #include "clang/StaticAnalyzer/Core/PathSensitive/ProgramStateTrait.h"
 #include 
 
@@ -256,6 +257,12 @@ std::vector 
taint::getTaintedSymbolsImpl(ProgramStateRef State,
   if (!Sym)
 return TaintedSymbols;
 
+  // 
HACK:https://discourse.llvm.org/t/rfc-make-istainted-and-complex-symbols-friends/79570
+  if (const auto &Opts = State->getAnalysisManager().getAnalyzerOptions();
+  Sym->computeComplexity() > Opts.MaxTaintedSymbolComplexity) {
+return {};
+  }
+
   // Traverse all the symbols this symbol depends on to see if any are tainted.
   for (SymbolRef SubSym : Sym->symbols()) {
 if (!isa(SubSym))
diff --git a/clang/test/Analysis/analyzer-config.c 
b/clang/test/Analysis/analyzer-config.c
index 2a4c40005a4dc0..1ee0d8e4eecebd 100644
--- a/clang/test/Analysis/analyzer-config.c
+++ b/clang/test/Analysis/analyzer-config.c
@@ -96,6 +96,7 @@
 // CHECK-NEXT: max-inlinable-size = 100
 // CHECK-NEXT: max-nodes = 225000
 // CHECK-NEXT: max-symbol-complexity = 35
+// CHECK-NEXT: max-tainted-symbol-complexity = 9
 // CHECK-NEXT: max-times-inline-large = 32
 // CHECK-NEXT: min-cfg-size-treat-functions-as-large = 14
 // CHECK-NEXT: mode = deep
diff --git a/clang/test/Analysis/taint-generic.c 
b/clang/test/Analysis/taint-generic.c
index b0df85f237298d..1c139312734bca 100644
--- a/clang/test/Analysis/taint-generic.c
+++ b/clang/test/Analysis/taint-generic.c
@@ -63,6 +63,7 @@ void clang_analyzer_isTainted_char(char);
 void clang_analyzer_isTainted_wchar(wchar_t);
 void clang_analyzer_isTainted_charp(char*);
 void clang_analyzer_isTainted_int(int);
+void clang_analyzer_dump_int(int);
 
 int coin();
 
@@ -459,7 +460,53 @@ unsigned radar11369570_hanging(const unsigned char *arr, 
int l) {
 longcmp(a, t, c);
 l -= 12;
   }
-  return 5/a; // expected-warning {{Division by a tainted value, possibly 
zero}}
+  return 5/a; // FIXME: Should be a "div by tainted" warning here.
+}
+
+// This computation used to take a very long time.
+void complex_taint_queries(const int *p) {
+  int tainted = 0;
+  scanf("%d", &tainted);
+
+  // Make "tmp" tainted.
+  int tmp = tainted + tainted;
+  clang_a

[llvm-branch-commits] [clang] Backport taint analysis slowdown regression fix (PR #105516)

2024-08-21 Thread Balazs Benics via llvm-branch-commits

https://github.com/steakhal edited 
https://github.com/llvm/llvm-project/pull/105516
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [mlir] [MLIR][omp] Add omp.workshare op (PR #101443)

2024-08-21 Thread Ivan R. Ivanov via llvm-branch-commits

https://github.com/ivanradanov closed 
https://github.com/llvm/llvm-project/pull/101443
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [mlir] [MLIR][omp] Add omp.workshare op (PR #101443)

2024-08-21 Thread Ivan R. Ivanov via llvm-branch-commits

https://github.com/ivanradanov reopened 
https://github.com/llvm/llvm-project/pull/101443
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [WIP][flang] Introduce HLFIR lowerings to omp.workshare_loop_nest (PR #104748)

2024-08-21 Thread Ivan R. Ivanov via llvm-branch-commits

https://github.com/ivanradanov updated 
https://github.com/llvm/llvm-project/pull/104748

>From 4b1c15bf4dcd753e35ec5c1118b107ea058c58df Mon Sep 17 00:00:00 2001
From: Ivan Radanov Ivanov 
Date: Sun, 4 Aug 2024 17:33:52 +0900
Subject: [PATCH 1/5] Add workshare loop wrapper lowerings

---
 .../lib/Optimizer/HLFIR/Transforms/BufferizeHLFIR.cpp  |  6 --
 .../HLFIR/Transforms/OptimizedBufferization.cpp| 10 +++---
 2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/flang/lib/Optimizer/HLFIR/Transforms/BufferizeHLFIR.cpp 
b/flang/lib/Optimizer/HLFIR/Transforms/BufferizeHLFIR.cpp
index b608677c526310..1848dbe2c7a2c2 100644
--- a/flang/lib/Optimizer/HLFIR/Transforms/BufferizeHLFIR.cpp
+++ b/flang/lib/Optimizer/HLFIR/Transforms/BufferizeHLFIR.cpp
@@ -26,12 +26,13 @@
 #include "flang/Optimizer/HLFIR/HLFIRDialect.h"
 #include "flang/Optimizer/HLFIR/HLFIROps.h"
 #include "flang/Optimizer/HLFIR/Passes.h"
+#include "flang/Optimizer/OpenMP/Passes.h"
+#include "mlir/Dialect/OpenMP/OpenMPDialect.h"
 #include "mlir/IR/Dominance.h"
 #include "mlir/IR/PatternMatch.h"
 #include "mlir/Pass/Pass.h"
 #include "mlir/Pass/PassManager.h"
 #include "mlir/Transforms/DialectConversion.h"
-#include "mlir/Dialect/OpenMP/OpenMPDialect.h"
 #include "llvm/ADT/TypeSwitch.h"
 
 namespace hlfir {
@@ -792,7 +793,8 @@ struct ElementalOpConversion
 // Generate a loop nest looping around the fir.elemental shape and clone
 // fir.elemental region inside the inner loop.
 hlfir::LoopNest loopNest =
-hlfir::genLoopNest(loc, builder, extents, !elemental.isOrdered());
+hlfir::genLoopNest(loc, builder, extents, !elemental.isOrdered(),
+   flangomp::shouldUseWorkshareLowering(elemental));
 auto insPt = builder.saveInsertionPoint();
 builder.setInsertionPointToStart(loopNest.body);
 auto yield = hlfir::inlineElementalOp(loc, builder, elemental,
diff --git a/flang/lib/Optimizer/HLFIR/Transforms/OptimizedBufferization.cpp 
b/flang/lib/Optimizer/HLFIR/Transforms/OptimizedBufferization.cpp
index 3a0a98dc594463..f014724861e333 100644
--- a/flang/lib/Optimizer/HLFIR/Transforms/OptimizedBufferization.cpp
+++ b/flang/lib/Optimizer/HLFIR/Transforms/OptimizedBufferization.cpp
@@ -20,6 +20,7 @@
 #include "flang/Optimizer/HLFIR/HLFIRDialect.h"
 #include "flang/Optimizer/HLFIR/HLFIROps.h"
 #include "flang/Optimizer/HLFIR/Passes.h"
+#include "flang/Optimizer/OpenMP/Passes.h"
 #include "flang/Optimizer/Transforms/Utils.h"
 #include "mlir/Dialect/Func/IR/FuncOps.h"
 #include "mlir/IR/Dominance.h"
@@ -482,7 +483,8 @@ llvm::LogicalResult 
ElementalAssignBufferization::matchAndRewrite(
   // Generate a loop nest looping around the hlfir.elemental shape and clone
   // hlfir.elemental region inside the inner loop
   hlfir::LoopNest loopNest =
-  hlfir::genLoopNest(loc, builder, extents, !elemental.isOrdered());
+  hlfir::genLoopNest(loc, builder, extents, !elemental.isOrdered(),
+ flangomp::shouldUseWorkshareLowering(elemental));
   builder.setInsertionPointToStart(loopNest.body);
   auto yield = hlfir::inlineElementalOp(loc, builder, elemental,
 loopNest.oneBasedIndices);
@@ -553,7 +555,8 @@ llvm::LogicalResult 
BroadcastAssignBufferization::matchAndRewrite(
   llvm::SmallVector extents =
   hlfir::getIndexExtents(loc, builder, shape);
   hlfir::LoopNest loopNest =
-  hlfir::genLoopNest(loc, builder, extents, /*isUnordered=*/true);
+  hlfir::genLoopNest(loc, builder, extents, /*isUnordered=*/true,
+ flangomp::shouldUseWorkshareLowering(assign));
   builder.setInsertionPointToStart(loopNest.body);
   auto arrayElement =
   hlfir::getElementAt(loc, builder, lhs, loopNest.oneBasedIndices);
@@ -648,7 +651,8 @@ llvm::LogicalResult 
VariableAssignBufferization::matchAndRewrite(
   llvm::SmallVector extents =
   hlfir::getIndexExtents(loc, builder, shape);
   hlfir::LoopNest loopNest =
-  hlfir::genLoopNest(loc, builder, extents, /*isUnordered=*/true);
+  hlfir::genLoopNest(loc, builder, extents, /*isUnordered=*/true,
+ flangomp::shouldUseWorkshareLowering(assign));
   builder.setInsertionPointToStart(loopNest.body);
   auto rhsArrayElement =
   hlfir::getElementAt(loc, builder, rhs, loopNest.oneBasedIndices);

>From a79d7c8cee84295ef7281b0b6aabf2ea5ed50b9e Mon Sep 17 00:00:00 2001
From: Ivan Radanov Ivanov 
Date: Mon, 19 Aug 2024 15:01:31 +0900
Subject: [PATCH 2/5] Bufferize test

---
 flang/test/HLFIR/bufferize-workshare.fir | 58 
 1 file changed, 58 insertions(+)
 create mode 100644 flang/test/HLFIR/bufferize-workshare.fir

diff --git a/flang/test/HLFIR/bufferize-workshare.fir 
b/flang/test/HLFIR/bufferize-workshare.fir
new file mode 100644
index 00..86a2f031478dd7
--- /dev/null
+++ b/flang/test/HLFIR/bufferize-workshare.fir
@@ -0,0 +1,58 @@
+// RUN: fir-opt --bufferize-hlfir %s | FileCheck %s
+
+// CH

[llvm-branch-commits] [flang] [flang][omp] Emit omp.workshare in frontend (PR #101444)

2024-08-21 Thread Ivan R. Ivanov via llvm-branch-commits

https://github.com/ivanradanov updated 
https://github.com/llvm/llvm-project/pull/101444

>From 3d1258582adc0ec506a23dc3efdba371c29612ca Mon Sep 17 00:00:00 2001
From: Ivan Radanov Ivanov 
Date: Wed, 31 Jul 2024 14:11:47 +0900
Subject: [PATCH 1/2] [flang][omp] Emit omp.workshare in frontend

---
 flang/lib/Lower/OpenMP/OpenMP.cpp | 30 ++
 1 file changed, 26 insertions(+), 4 deletions(-)

diff --git a/flang/lib/Lower/OpenMP/OpenMP.cpp 
b/flang/lib/Lower/OpenMP/OpenMP.cpp
index d614db8b68ef65..83c90374afa5e3 100644
--- a/flang/lib/Lower/OpenMP/OpenMP.cpp
+++ b/flang/lib/Lower/OpenMP/OpenMP.cpp
@@ -1272,6 +1272,15 @@ static void genTaskwaitClauses(lower::AbstractConverter 
&converter,
   loc, llvm::omp::Directive::OMPD_taskwait);
 }
 
+static void genWorkshareClauses(lower::AbstractConverter &converter,
+semantics::SemanticsContext &semaCtx,
+lower::StatementContext &stmtCtx,
+const List &clauses, mlir::Location 
loc,
+mlir::omp::WorkshareOperands &clauseOps) {
+  ClauseProcessor cp(converter, semaCtx, clauses);
+  cp.processNowait(clauseOps);
+}
+
 static void genTeamsClauses(lower::AbstractConverter &converter,
 semantics::SemanticsContext &semaCtx,
 lower::StatementContext &stmtCtx,
@@ -1897,6 +1906,22 @@ genTaskyieldOp(lower::AbstractConverter &converter, 
lower::SymMap &symTable,
   return converter.getFirOpBuilder().create(loc);
 }
 
+static mlir::omp::WorkshareOp
+genWorkshareOp(lower::AbstractConverter &converter, lower::SymMap &symTable,
+   semantics::SemanticsContext &semaCtx, lower::pft::Evaluation &eval,
+   mlir::Location loc, const ConstructQueue &queue,
+   ConstructQueue::iterator item) {
+  lower::StatementContext stmtCtx;
+  mlir::omp::WorkshareOperands clauseOps;
+  genWorkshareClauses(converter, semaCtx, stmtCtx, item->clauses, loc, 
clauseOps);
+
+  return genOpWithBody(
+  OpWithBodyGenInfo(converter, symTable, semaCtx, loc, eval,
+llvm::omp::Directive::OMPD_workshare)
+  .setClauses(&item->clauses),
+  queue, item, clauseOps);
+}
+
 static mlir::omp::TeamsOp
 genTeamsOp(lower::AbstractConverter &converter, lower::SymMap &symTable,
semantics::SemanticsContext &semaCtx, lower::pft::Evaluation &eval,
@@ -2309,10 +2334,7 @@ static void genOMPDispatch(lower::AbstractConverter 
&converter,
   llvm::omp::getOpenMPDirectiveName(dir) + ")");
   // case llvm::omp::Directive::OMPD_workdistribute:
   case llvm::omp::Directive::OMPD_workshare:
-// FIXME: Workshare is not a commonly used OpenMP construct, an
-// implementation for this feature will come later. For the codes
-// that use this construct, add a single construct for now.
-genSingleOp(converter, symTable, semaCtx, eval, loc, queue, item);
+genWorkshareOp(converter, symTable, semaCtx, eval, loc, queue, item);
 break;
   default:
 // Combined and composite constructs should have been split into a sequence

>From 5e01e41362f11f2309dea217ada9026aa437433d Mon Sep 17 00:00:00 2001
From: Ivan Radanov Ivanov 
Date: Sun, 4 Aug 2024 16:02:37 +0900
Subject: [PATCH 2/2] Fix lower test for workshare

---
 flang/test/Lower/OpenMP/workshare.f90 | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/flang/test/Lower/OpenMP/workshare.f90 
b/flang/test/Lower/OpenMP/workshare.f90
index 1e11677a15e1f0..8e771952f5b6da 100644
--- a/flang/test/Lower/OpenMP/workshare.f90
+++ b/flang/test/Lower/OpenMP/workshare.f90
@@ -6,7 +6,7 @@ subroutine sb1(arr)
   integer :: arr(:)
 !CHECK: omp.parallel  {
   !$omp parallel
-!CHECK: omp.single  {
+!CHECK: omp.workshare {
   !$omp workshare
 arr = 0
   !$omp end workshare
@@ -20,7 +20,7 @@ subroutine sb2(arr)
   integer :: arr(:)
 !CHECK: omp.parallel  {
   !$omp parallel
-!CHECK: omp.single nowait {
+!CHECK: omp.workshare nowait {
   !$omp workshare
 arr = 0
   !$omp end workshare nowait
@@ -33,7 +33,7 @@ subroutine sb2(arr)
 subroutine sb3(arr)
   integer :: arr(:)
 !CHECK: omp.parallel  {
-!CHECK: omp.single  {
+!CHECK: omp.workshare  {
   !$omp parallel workshare
 arr = 0
   !$omp end parallel workshare

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [flang] Introduce custom loop nest generation for loops in workshare construct (PR #101445)

2024-08-21 Thread Ivan R. Ivanov via llvm-branch-commits

https://github.com/ivanradanov updated 
https://github.com/llvm/llvm-project/pull/101445

>From 451a9d2f26cfd8cb770d1ae35d834c63fce56b79 Mon Sep 17 00:00:00 2001
From: Ivan Radanov Ivanov 
Date: Wed, 31 Jul 2024 14:12:34 +0900
Subject: [PATCH 1/4] [flang] Introduce ws loop nest generation for HLFIR
 lowering

---
 .../flang/Optimizer/Builder/HLFIRTools.h  | 12 +++--
 flang/lib/Lower/ConvertCall.cpp   |  2 +-
 flang/lib/Lower/OpenMP/ReductionProcessor.cpp |  4 +-
 flang/lib/Optimizer/Builder/HLFIRTools.cpp| 52 ++-
 .../HLFIR/Transforms/BufferizeHLFIR.cpp   |  3 +-
 .../LowerHLFIROrderedAssignments.cpp  | 30 +--
 .../Transforms/OptimizedBufferization.cpp |  6 +--
 7 files changed, 69 insertions(+), 40 deletions(-)

diff --git a/flang/include/flang/Optimizer/Builder/HLFIRTools.h 
b/flang/include/flang/Optimizer/Builder/HLFIRTools.h
index 6b41025eea0780..14e42c6f358e46 100644
--- a/flang/include/flang/Optimizer/Builder/HLFIRTools.h
+++ b/flang/include/flang/Optimizer/Builder/HLFIRTools.h
@@ -357,8 +357,8 @@ hlfir::ElementalOp genElementalOp(
 
 /// Structure to describe a loop nest.
 struct LoopNest {
-  fir::DoLoopOp outerLoop;
-  fir::DoLoopOp innerLoop;
+  mlir::Operation *outerOp;
+  mlir::Block *body;
   llvm::SmallVector oneBasedIndices;
 };
 
@@ -366,11 +366,13 @@ struct LoopNest {
 /// \p isUnordered specifies whether the loops in the loop nest
 /// are unordered.
 LoopNest genLoopNest(mlir::Location loc, fir::FirOpBuilder &builder,
- mlir::ValueRange extents, bool isUnordered = false);
+ mlir::ValueRange extents, bool isUnordered = false,
+ bool emitWsLoop = false);
 inline LoopNest genLoopNest(mlir::Location loc, fir::FirOpBuilder &builder,
-mlir::Value shape, bool isUnordered = false) {
+mlir::Value shape, bool isUnordered = false,
+bool emitWsLoop = false) {
   return genLoopNest(loc, builder, getIndexExtents(loc, builder, shape),
- isUnordered);
+ isUnordered, emitWsLoop);
 }
 
 /// Inline the body of an hlfir.elemental at the current insertion point
diff --git a/flang/lib/Lower/ConvertCall.cpp b/flang/lib/Lower/ConvertCall.cpp
index fd873f55dd844e..0689d6e033dd9c 100644
--- a/flang/lib/Lower/ConvertCall.cpp
+++ b/flang/lib/Lower/ConvertCall.cpp
@@ -2128,7 +2128,7 @@ class ElementalCallBuilder {
   hlfir::genLoopNest(loc, builder, shape, !mustBeOrdered);
   mlir::ValueRange oneBasedIndices = loopNest.oneBasedIndices;
   auto insPt = builder.saveInsertionPoint();
-  builder.setInsertionPointToStart(loopNest.innerLoop.getBody());
+  builder.setInsertionPointToStart(loopNest.body);
   callContext.stmtCtx.pushScope();
   for (auto &preparedActual : loweredActuals)
 if (preparedActual)
diff --git a/flang/lib/Lower/OpenMP/ReductionProcessor.cpp 
b/flang/lib/Lower/OpenMP/ReductionProcessor.cpp
index c3c1f363033c27..72a90dd0d6f29d 100644
--- a/flang/lib/Lower/OpenMP/ReductionProcessor.cpp
+++ b/flang/lib/Lower/OpenMP/ReductionProcessor.cpp
@@ -375,7 +375,7 @@ static void genBoxCombiner(fir::FirOpBuilder &builder, 
mlir::Location loc,
   // know this won't miss any opportuinties for clever elemental inlining
   hlfir::LoopNest nest = hlfir::genLoopNest(
   loc, builder, shapeShift.getExtents(), /*isUnordered=*/true);
-  builder.setInsertionPointToStart(nest.innerLoop.getBody());
+  builder.setInsertionPointToStart(nest.body);
   mlir::Type refTy = fir::ReferenceType::get(seqTy.getEleTy());
   auto lhsEleAddr = builder.create(
   loc, refTy, lhs, shapeShift, /*slice=*/mlir::Value{},
@@ -389,7 +389,7 @@ static void genBoxCombiner(fir::FirOpBuilder &builder, 
mlir::Location loc,
   builder, loc, redId, refTy, lhsEle, rhsEle);
   builder.create(loc, scalarReduction, lhsEleAddr);
 
-  builder.setInsertionPointAfter(nest.outerLoop);
+  builder.setInsertionPointAfter(nest.outerOp);
   builder.create(loc, lhsAddr);
 }
 
diff --git a/flang/lib/Optimizer/Builder/HLFIRTools.cpp 
b/flang/lib/Optimizer/Builder/HLFIRTools.cpp
index 8d0ae2f195178c..cd07cb741eb4bb 100644
--- a/flang/lib/Optimizer/Builder/HLFIRTools.cpp
+++ b/flang/lib/Optimizer/Builder/HLFIRTools.cpp
@@ -20,6 +20,7 @@
 #include "mlir/IR/IRMapping.h"
 #include "mlir/Support/LLVM.h"
 #include "llvm/ADT/TypeSwitch.h"
+#include 
 #include 
 
 // Return explicit extents. If the base is a fir.box, this won't read it to
@@ -855,26 +856,51 @@ mlir::Value hlfir::inlineElementalOp(
 
 hlfir::LoopNest hlfir::genLoopNest(mlir::Location loc,
fir::FirOpBuilder &builder,
-   mlir::ValueRange extents, bool isUnordered) 
{
+   mlir::ValueRange extents, bool isUnordered,
+   bool emitWsLoop) {
   hlfir::LoopNest loopNest;
   assert(!extents.empty() && 

[llvm-branch-commits] [flang] [flang] Lower omp.workshare to other omp constructs (PR #101446)

2024-08-21 Thread Ivan R. Ivanov via llvm-branch-commits

https://github.com/ivanradanov updated 
https://github.com/llvm/llvm-project/pull/101446

error: too big or took too long to generate
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] 5e5d819 - Revert "[flang][NFC] Move OpenMP related passes into a separate directory (#1…"

2024-08-21 Thread via llvm-branch-commits

Author: Ivan R. Ivanov
Date: 2024-08-21T23:15:19+09:00
New Revision: 5e5d819fa261a49a30deae95737563c807964ae5

URL: 
https://github.com/llvm/llvm-project/commit/5e5d819fa261a49a30deae95737563c807964ae5
DIFF: 
https://github.com/llvm/llvm-project/commit/5e5d819fa261a49a30deae95737563c807964ae5.diff

LOG: Revert "[flang][NFC] Move OpenMP related passes into a separate directory 
(#1…"

This reverts commit 87eeed1f0ebe57abffde560c25dd9829dc6038f3.

Added: 
flang/lib/Optimizer/Transforms/OMPFunctionFiltering.cpp
flang/lib/Optimizer/Transforms/OMPMapInfoFinalization.cpp
flang/lib/Optimizer/Transforms/OMPMarkDeclareTarget.cpp

Modified: 
flang/docs/OpenMP-declare-target.md
flang/docs/OpenMP-descriptor-management.md
flang/include/flang/Optimizer/CMakeLists.txt
flang/include/flang/Optimizer/Transforms/Passes.td
flang/include/flang/Tools/CLOptions.inc
flang/lib/Frontend/CMakeLists.txt
flang/lib/Optimizer/CMakeLists.txt
flang/lib/Optimizer/Transforms/CMakeLists.txt
flang/tools/bbc/CMakeLists.txt
flang/tools/fir-opt/CMakeLists.txt
flang/tools/fir-opt/fir-opt.cpp
flang/tools/tco/CMakeLists.txt

Removed: 
flang/include/flang/Optimizer/OpenMP/CMakeLists.txt
flang/include/flang/Optimizer/OpenMP/Passes.h
flang/include/flang/Optimizer/OpenMP/Passes.td
flang/lib/Optimizer/OpenMP/CMakeLists.txt
flang/lib/Optimizer/OpenMP/FunctionFiltering.cpp
flang/lib/Optimizer/OpenMP/MapInfoFinalization.cpp
flang/lib/Optimizer/OpenMP/MarkDeclareTarget.cpp



diff  --git a/flang/docs/OpenMP-declare-target.md 
b/flang/docs/OpenMP-declare-target.md
index 45062469007b65..d29a46807e1eaf 100644
--- a/flang/docs/OpenMP-declare-target.md
+++ b/flang/docs/OpenMP-declare-target.md
@@ -149,7 +149,7 @@ flang/lib/Lower/OpenMP.cpp function 
`genDeclareTargetIntGlobal`.
 
 There are currently two passes within Flang that are related to the processing 
 of `declare target`:
-* `MarkDeclareTarget` - This pass is in charge of marking functions captured
+* `OMPMarkDeclareTarget` - This pass is in charge of marking functions captured
 (called from) in `target` regions or other `declare target` marked functions as
 `declare target`. It does so recursively, i.e. nested calls will also be 
 implicitly marked. It currently will try to mark things as conservatively as 
@@ -157,7 +157,7 @@ possible, e.g. if captured in a `target` region it will 
apply `nohost`, unless
 it encounters a `host` `declare target` in which case it will apply the `any` 
 device type. Functions are handled similarly, except we utilise the parent's 
 device type where possible.
-* `FunctionFiltering` - This is executed after the `MarkDeclareTarget`
+* `OMPFunctionFiltering` - This is executed after the `OMPMarkDeclareTarget`
 pass, and its job is to conservatively remove host functions from
 the module where possible when compiling for the device. This helps make 
 sure that most incompatible code for the host is not lowered for the 

diff  --git a/flang/docs/OpenMP-descriptor-management.md 
b/flang/docs/OpenMP-descriptor-management.md
index 66c153914f70da..d0eb01b00f9bb9 100644
--- a/flang/docs/OpenMP-descriptor-management.md
+++ b/flang/docs/OpenMP-descriptor-management.md
@@ -44,7 +44,7 @@ Currently, Flang will lower these descriptor types in the 
OpenMP lowering (lower
 to all other map types, generating an omp.MapInfoOp containing relevant 
information required for lowering
 the OpenMP dialect to LLVM-IR during the final stages of the MLIR lowering. 
However, after 
 the lowering to FIR/HLFIR has been performed an OpenMP dialect specific pass 
for Fortran, 
-`MapInfoFinalizationPass` (Optimizer/OpenMP/MapInfoFinalization.cpp) will 
expand the 
+`OMPMapInfoFinalizationPass` (Optimizer/OMPMapInfoFinalization.cpp) will 
expand the 
 `omp.MapInfoOp`'s containing descriptors (which currently will be a `BoxType` 
or `BoxAddrOp`) into multiple 
 mappings, with one extra per pointer member in the descriptor that is 
supported on top of the original
 descriptor map operation. These pointers members are linked to the parent 
descriptor by adding them to 
@@ -53,7 +53,7 @@ owning operation's (`omp.TargetOp`, `omp.TargetDataOp` etc.) 
map operand list an
 operation is `IsolatedFromAbove`, it also inserts them as `BlockArgs` to 
canonicalize the mappings and
 simplify lowering.
 
-An example transformation by the `MapInfoFinalizationPass`:
+An example transformation by the `OMPMapInfoFinalizationPass`:
 
 ```
 

diff  --git a/flang/include/flang/Optimizer/CMakeLists.txt 
b/flang/include/flang/Optimizer/CMakeLists.txt
index 3336ac935e1012..89e43a9ee8d621 100644
--- a/flang/include/flang/Optimizer/CMakeLists.txt
+++ b/flang/include/flang/Optimizer/CMakeLists.txt
@@ -2,4 +2,3 @@ add_subdirectory(CodeGen)
 add_subdirectory(Dialect)
 add_subdirectory(HLFIR)
 add_subdirectory(Transforms)
-add_subdirectory(OpenMP)

diff  --git a/flang/incl

[llvm-branch-commits] [llvm] [ctx_prof] API to get the instrumentation of a BB (PR #105468)

2024-08-21 Thread Mircea Trofin via llvm-branch-commits

https://github.com/mtrofin updated 
https://github.com/llvm/llvm-project/pull/105468

>From c5ee379ec43215d8268219ec3cfced3f7f730fc8 Mon Sep 17 00:00:00 2001
From: Mircea Trofin 
Date: Tue, 20 Aug 2024 21:09:16 -0700
Subject: [PATCH] [ctx_prof] API to get the instrumentation of a BB

---
 llvm/include/llvm/Analysis/CtxProfAnalysis.h  |  5 +
 llvm/lib/Analysis/CtxProfAnalysis.cpp |  7 ++
 .../Analysis/CtxProfAnalysisTest.cpp  | 22 +++
 3 files changed, 34 insertions(+)

diff --git a/llvm/include/llvm/Analysis/CtxProfAnalysis.h 
b/llvm/include/llvm/Analysis/CtxProfAnalysis.h
index 23abcbe2c6e9d2..0b4dd8ae3a0dc7 100644
--- a/llvm/include/llvm/Analysis/CtxProfAnalysis.h
+++ b/llvm/include/llvm/Analysis/CtxProfAnalysis.h
@@ -95,7 +95,12 @@ class CtxProfAnalysis : public 
AnalysisInfoMixin {
 
   PGOContextualProfile run(Module &M, ModuleAnalysisManager &MAM);
 
+  /// Get the instruction instrumenting a callsite, or nullptr if that cannot 
be
+  /// found.
   static InstrProfCallsite *getCallsiteInstrumentation(CallBase &CB);
+
+  /// Get the instruction instrumenting a BB, or nullptr if not present.
+  static InstrProfIncrementInst *getBBInstrumentation(BasicBlock &BB);
 };
 
 class CtxProfAnalysisPrinterPass
diff --git a/llvm/lib/Analysis/CtxProfAnalysis.cpp 
b/llvm/lib/Analysis/CtxProfAnalysis.cpp
index fffc8de2b36c8e..46daa4a4506189 100644
--- a/llvm/lib/Analysis/CtxProfAnalysis.cpp
+++ b/llvm/lib/Analysis/CtxProfAnalysis.cpp
@@ -202,6 +202,13 @@ InstrProfCallsite 
*CtxProfAnalysis::getCallsiteInstrumentation(CallBase &CB) {
   return nullptr;
 }
 
+InstrProfIncrementInst *CtxProfAnalysis::getBBInstrumentation(BasicBlock &BB) {
+  for (auto &I : BB)
+if (auto *Incr = dyn_cast(&I))
+  return Incr;
+  return nullptr;
+}
+
 static void
 preorderVisit(const PGOCtxProfContext::CallTargetMapTy &Profiles,
   function_ref Visitor) {
diff --git a/llvm/unittests/Analysis/CtxProfAnalysisTest.cpp 
b/llvm/unittests/Analysis/CtxProfAnalysisTest.cpp
index 5f9bf3ec540eb3..fbe3a6e45109cc 100644
--- a/llvm/unittests/Analysis/CtxProfAnalysisTest.cpp
+++ b/llvm/unittests/Analysis/CtxProfAnalysisTest.cpp
@@ -132,4 +132,26 @@ TEST_F(CtxProfAnalysisTest, GetCallsiteIDNegativeTest) {
   EXPECT_EQ(IndIns, nullptr);
 }
 
+TEST_F(CtxProfAnalysisTest, GetBBIDTest) {
+  ModulePassManager MPM;
+  MPM.addPass(PGOInstrumentationGen(PGOInstrumentationType::CTXPROF));
+  EXPECT_FALSE(MPM.run(*M, MAM).areAllPreserved());
+  auto *F = M->getFunction("foo");
+  ASSERT_NE(F, nullptr);
+  std::map BBNameAndID;
+
+  for (auto &BB : *F) {
+auto *Ins = CtxProfAnalysis::getBBInstrumentation(BB);
+if (Ins)
+  BBNameAndID[BB.getName().str()] =
+  static_cast(Ins->getIndex()->getZExtValue());
+else
+  BBNameAndID[BB.getName().str()] = -1;
+  }
+
+  EXPECT_THAT(BBNameAndID,
+  testing::UnorderedElementsAre(
+  testing::Pair("", 0), testing::Pair("yes", 1),
+  testing::Pair("no", -1), testing::Pair("exit", -1)));
+}
 } // namespace

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [ctx_prof] Add support for ICP (PR #105469)

2024-08-21 Thread Mircea Trofin via llvm-branch-commits

https://github.com/mtrofin updated 
https://github.com/llvm/llvm-project/pull/105469

>From 1edbc3bed4cf6c2726394a346891409d5f548537 Mon Sep 17 00:00:00 2001
From: Mircea Trofin 
Date: Tue, 20 Aug 2024 21:32:23 -0700
Subject: [PATCH] [ctx_prof] Add support for ICP

---
 llvm/include/llvm/Analysis/CtxProfAnalysis.h  |  18 +-
 llvm/include/llvm/IR/IntrinsicInst.h  |   2 +
 .../llvm/ProfileData/PGOCtxProfReader.h   |  20 ++
 .../Transforms/Utils/CallPromotionUtils.h |   4 +
 llvm/lib/Analysis/CtxProfAnalysis.cpp |  80 +---
 llvm/lib/IR/IntrinsicInst.cpp |  10 +
 .../Transforms/Utils/CallPromotionUtils.cpp   |  86 +
 .../Utils/CallPromotionUtilsTest.cpp  | 178 ++
 8 files changed, 365 insertions(+), 33 deletions(-)

diff --git a/llvm/include/llvm/Analysis/CtxProfAnalysis.h 
b/llvm/include/llvm/Analysis/CtxProfAnalysis.h
index 0b4dd8ae3a0dc7..d6c2bb26a091af 100644
--- a/llvm/include/llvm/Analysis/CtxProfAnalysis.h
+++ b/llvm/include/llvm/Analysis/CtxProfAnalysis.h
@@ -73,6 +73,12 @@ class PGOContextualProfile {
 return 
FuncInfo.find(getDefinedFunctionGUID(F))->second.NextCallsiteIndex++;
   }
 
+  using ConstVisitor = function_ref;
+  using Visitor = function_ref;
+
+  void update(Visitor, const Function *F = nullptr);
+  void visit(ConstVisitor, const Function *F = nullptr) const;
+
   const CtxProfFlatProfile flatten() const;
 
   bool invalidate(Module &, const PreservedAnalyses &PA,
@@ -105,13 +111,18 @@ class CtxProfAnalysis : public 
AnalysisInfoMixin {
 
 class CtxProfAnalysisPrinterPass
 : public PassInfoMixin {
-  raw_ostream &OS;
-
 public:
-  explicit CtxProfAnalysisPrinterPass(raw_ostream &OS) : OS(OS) {}
+  enum class PrintMode { Everything, JSON };
+  explicit CtxProfAnalysisPrinterPass(raw_ostream &OS,
+  PrintMode Mode = PrintMode::Everything)
+  : OS(OS), Mode(Mode) {}
 
   PreservedAnalyses run(Module &M, ModuleAnalysisManager &MAM);
   static bool isRequired() { return true; }
+
+private:
+  raw_ostream &OS;
+  const PrintMode Mode;
 };
 
 /// Assign a GUID to functions as metadata. GUID calculation takes linkage into
@@ -134,6 +145,5 @@ class AssignGUIDPass : public PassInfoMixin 
{
   // This should become GlobalValue::getGUID
   static uint64_t getGUID(const Function &F);
 };
-
 } // namespace llvm
 #endif // LLVM_ANALYSIS_CTXPROFANALYSIS_H
diff --git a/llvm/include/llvm/IR/IntrinsicInst.h 
b/llvm/include/llvm/IR/IntrinsicInst.h
index 2f1e2c08c3ecec..bab41efab528e2 100644
--- a/llvm/include/llvm/IR/IntrinsicInst.h
+++ b/llvm/include/llvm/IR/IntrinsicInst.h
@@ -1519,6 +1519,7 @@ class InstrProfCntrInstBase : public InstrProfInstBase {
   ConstantInt *getNumCounters() const;
   // The index of the counter that this instruction acts on.
   ConstantInt *getIndex() const;
+  void setIndex(uint32_t Idx);
 };
 
 /// This represents the llvm.instrprof.cover intrinsic.
@@ -1569,6 +1570,7 @@ class InstrProfCallsite : public InstrProfCntrInstBase {
 return isa(V) && classof(cast(V));
   }
   Value *getCallee() const;
+  void setCallee(Value *);
 };
 
 /// This represents the llvm.instrprof.timestamp intrinsic.
diff --git a/llvm/include/llvm/ProfileData/PGOCtxProfReader.h 
b/llvm/include/llvm/ProfileData/PGOCtxProfReader.h
index 190deaeeacd085..23dcc376508b39 100644
--- a/llvm/include/llvm/ProfileData/PGOCtxProfReader.h
+++ b/llvm/include/llvm/ProfileData/PGOCtxProfReader.h
@@ -57,9 +57,23 @@ class PGOCtxProfContext final {
 
   GlobalValue::GUID guid() const { return GUID; }
   const SmallVectorImpl &counters() const { return Counters; }
+  SmallVectorImpl &counters() { return Counters; }
+
+  uint64_t getEntrycount() const { return Counters[0]; }
+
   const CallsiteMapTy &callsites() const { return Callsites; }
   CallsiteMapTy &callsites() { return Callsites; }
 
+  void ingestContext(uint32_t CSId, PGOCtxProfContext &&Other) {
+auto [Iter, _] = callsites().try_emplace(CSId, CallTargetMapTy());
+Iter->second.emplace(Other.guid(), std::move(Other));
+  }
+
+  void growCounters(uint32_t Size) {
+if (Size >= Counters.size())
+  Counters.resize(Size);
+  }
+
   bool hasCallsite(uint32_t I) const {
 return Callsites.find(I) != Callsites.end();
   }
@@ -68,6 +82,12 @@ class PGOCtxProfContext final {
 assert(hasCallsite(I) && "Callsite not found");
 return Callsites.find(I)->second;
   }
+
+  CallTargetMapTy &callsite(uint32_t I) {
+assert(hasCallsite(I) && "Callsite not found");
+return Callsites.find(I)->second;
+  }
+
   void getContainedGuids(DenseSet &Guids) const;
 };
 
diff --git a/llvm/include/llvm/Transforms/Utils/CallPromotionUtils.h 
b/llvm/include/llvm/Transforms/Utils/CallPromotionUtils.h
index 385831f457038d..58af26f31417b0 100644
--- a/llvm/include/llvm/Transforms/Utils/CallPromotionUtils.h
+++ b/llvm/include/llvm/Transforms/Utils/CallPromotionUtils.h
@@ -14,6 +14,7 @@
 #ifndef LLVM_TRANSFORMS_UTILS_CALLPROMOTIONUTILS_H
 #defin

[llvm-branch-commits] [libcxx] [libc++] Implement std::move_only_function (P0288R9) (PR #94670)

2024-08-21 Thread Louis Dionne via llvm-branch-commits

https://github.com/ldionne edited 
https://github.com/llvm/llvm-project/pull/94670
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [Driver] Add -Wa, options -mmapsyms={default, implicit} (PR #104542)

2024-08-21 Thread Peter Smith via llvm-branch-commits


@@ -7131,6 +7131,8 @@ def massembler_fatal_warnings : Flag<["-"], 
"massembler-fatal-warnings">,
 def crel : Flag<["--"], "crel">,
   HelpText<"Enable CREL relocation format (ELF only)">,
   MarshallingInfoFlag>;
+def mmapsyms_implicit : Flag<["-"], "mmapsyms=implicit">,

smithp35 wrote:

I think it would be useful to have similar help text to llvm-mc 
(https://github.com/llvm/llvm-project/pull/99718/files#diff-e84c9aa6b25b1a4fe2047de3a32ab330e945d2944b14451d310e4b706a39cbafR140)
 

https://github.com/llvm/llvm-project/pull/104542
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [Driver] Add -Wa, options -mmapsyms={default, implicit} (PR #104542)

2024-08-21 Thread Peter Smith via llvm-branch-commits

https://github.com/smithp35 edited 
https://github.com/llvm/llvm-project/pull/104542
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [Driver] Add -Wa, options -mmapsyms={default, implicit} (PR #104542)

2024-08-21 Thread Peter Smith via llvm-branch-commits

https://github.com/smithp35 commented:

I think we could do with some help text. Otherwise code changes look good.

https://github.com/llvm/llvm-project/pull/104542
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] GFX12 VMEM instructions can write VGPR results out of order (PR #105549)

2024-08-21 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad created 
https://github.com/llvm/llvm-project/pull/105549

Fix SIInsertWaitcnts to account for this by adding extra waits to avoid
WAW dependencies.

>From 9a2103df4094af38f59e1adce5414b94672e6d6e Mon Sep 17 00:00:00 2001
From: Jay Foad 
Date: Wed, 21 Aug 2024 16:23:49 +0100
Subject: [PATCH] [AMDGPU] GFX12 VMEM instructions can write VGPR results out
 of order

Fix SIInsertWaitcnts to account for this by adding extra waits to avoid
WAW dependencies.
---
 llvm/lib/Target/AMDGPU/AMDGPU.td  | 23 ++-
 llvm/lib/Target/AMDGPU/GCNSubtarget.h |  3 +++
 llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp   |  7 +++---
 .../buffer-fat-pointer-atomicrmw-fadd.ll  |  3 +++
 .../buffer-fat-pointer-atomicrmw-fmax.ll  |  5 
 .../buffer-fat-pointer-atomicrmw-fmin.ll  |  5 
 amdgcn.struct.buffer.load.format.v3f16.ll |  1 +
 llvm/test/CodeGen/AMDGPU/load-constant-i16.ll | 10 +++-
 llvm/test/CodeGen/AMDGPU/load-global-i16.ll   | 10 
 llvm/test/CodeGen/AMDGPU/load-global-i32.ll   |  2 ++
 .../AMDGPU/spill-csr-frame-ptr-reg-copy.ll|  1 +
 .../CodeGen/AMDGPU/waitcnt-vmcnt-loop.mir |  8 +++
 12 files changed, 64 insertions(+), 14 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td
index 7906e0ee9d7858..9efdbd751d96e3 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.td
@@ -953,6 +953,12 @@ def FeatureRequiredExportPriority : 
SubtargetFeature<"required-export-priority",
   "Export priority must be explicitly manipulated on GFX11.5"
 >;
 
+def FeatureVmemWriteVgprInOrder : SubtargetFeature<"vmem-write-vgpr-in-order",
+  "HasVmemWriteVgprInOrder",
+  "true",
+  "VMEM instructions of the same type write VGPR results in order"
+>;
+
 //======//
 // Subtarget Features (options and debugging)
 //======//
@@ -1123,7 +1129,8 @@ def FeatureSouthernIslands : 
GCNSubtargetFeatureGeneration<"SOUTHERN_ISLANDS",
   FeatureDsSrc2Insts, FeatureLDSBankCount32, FeatureMovrel,
   FeatureTrigReducedRange, FeatureExtendedImageInsts, FeatureImageInsts,
   FeatureGDS, FeatureGWS, FeatureDefaultComponentZero,
-  FeatureAtomicFMinFMaxF32GlobalInsts, FeatureAtomicFMinFMaxF64GlobalInsts
+  FeatureAtomicFMinFMaxF32GlobalInsts, FeatureAtomicFMinFMaxF64GlobalInsts,
+  FeatureVmemWriteVgprInOrder
   ]
 >;
 
@@ -1136,7 +1143,8 @@ def FeatureSeaIslands : 
GCNSubtargetFeatureGeneration<"SEA_ISLANDS",
   FeatureDsSrc2Insts, FeatureExtendedImageInsts, FeatureUnalignedBufferAccess,
   FeatureImageInsts, FeatureGDS, FeatureGWS, FeatureDefaultComponentZero,
   FeatureAtomicFMinFMaxF32GlobalInsts, FeatureAtomicFMinFMaxF64GlobalInsts,
-  FeatureAtomicFMinFMaxF32FlatInsts, FeatureAtomicFMinFMaxF64FlatInsts
+  FeatureAtomicFMinFMaxF32FlatInsts, FeatureAtomicFMinFMaxF64FlatInsts,
+  FeatureVmemWriteVgprInOrder
   ]
 >;
 
@@ -1152,7 +1160,7 @@ def FeatureVolcanicIslands : 
GCNSubtargetFeatureGeneration<"VOLCANIC_ISLANDS",
FeatureGFX7GFX8GFX9Insts, FeatureSMemTimeInst, FeatureMadMacF32Insts,
FeatureDsSrc2Insts, FeatureExtendedImageInsts, FeatureFastDenormalF32,
FeatureUnalignedBufferAccess, FeatureImageInsts, FeatureGDS, FeatureGWS,
-   FeatureDefaultComponentZero
+   FeatureDefaultComponentZero, FeatureVmemWriteVgprInOrder
   ]
 >;
 
@@ -1170,7 +1178,8 @@ def FeatureGFX9 : GCNSubtargetFeatureGeneration<"GFX9",
FeatureScalarFlatScratchInsts, FeatureScalarAtomics, FeatureR128A16,
FeatureA16, FeatureSMemTimeInst, FeatureFastDenormalF32, 
FeatureSupportsXNACK,
FeatureUnalignedBufferAccess, FeatureUnalignedDSAccess,
-   FeatureNegativeScratchOffsetBug, FeatureGWS, FeatureDefaultComponentZero
+   FeatureNegativeScratchOffsetBug, FeatureGWS, FeatureDefaultComponentZero,
+   FeatureVmemWriteVgprInOrder
   ]
 >;
 
@@ -1193,7 +1202,8 @@ def FeatureGFX10 : GCNSubtargetFeatureGeneration<"GFX10",
FeatureGDS, FeatureGWS, FeatureDefaultComponentZero,
FeatureMaxHardClauseLength63,
FeatureAtomicFMinFMaxF32GlobalInsts, FeatureAtomicFMinFMaxF64GlobalInsts,
-   FeatureAtomicFMinFMaxF32FlatInsts, FeatureAtomicFMinFMaxF64FlatInsts
+   FeatureAtomicFMinFMaxF32FlatInsts, FeatureAtomicFMinFMaxF64FlatInsts,
+   FeatureVmemWriteVgprInOrder
   ]
 >;
 
@@ -1215,7 +1225,8 @@ def FeatureGFX11 : GCNSubtargetFeatureGeneration<"GFX11",
FeatureUnalignedBufferAccess, FeatureUnalignedDSAccess, FeatureGDS,
FeatureGWS, FeatureDefaultComponentZero,
FeatureMaxHardClauseLength32,
-   FeatureAtomicFMinFMaxF32GlobalInsts, FeatureAtomicFMinFMaxF32FlatInsts
+   FeatureAtomicFMinFMaxF32GlobalInsts, FeatureAtomicFMinFMaxF32FlatInsts,
+   FeatureVmemWriteVgprInOrder
   ]
 >;
 
diff --git a/llvm/lib/Target/AMDGPU/GCNSubtarget.h 
b/llvm/lib/Target/AMDGPU/GCNSubtarget.h
index 902f51ae358d59..9386bcf0d74b22 100644
--- a/llvm/lib/Target/AMDGPU/GCNSubtarget.h
+++ b/llvm/lib/Target/AMDGPU

[llvm-branch-commits] [llvm] [AMDGPU] Remove one case of vmcnt loop header flushing for GFX12 (PR #105550)

2024-08-21 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad created 
https://github.com/llvm/llvm-project/pull/105550

When a loop contains a VMEM load whose result is only used outside the
loop, do not bother to flush vmcnt in the loop head on GFX12. A wait for
vmcnt will be required inside the loop anyway, because VMEM instructions
can write their VGPR results out of order.

>From e53f75835dd0f0fc9d11b17afbe40de9b4a8a35b Mon Sep 17 00:00:00 2001
From: Jay Foad 
Date: Wed, 21 Aug 2024 16:57:24 +0100
Subject: [PATCH] [AMDGPU] Remove one case of vmcnt loop header flushing for
 GFX12

When a loop contains a VMEM load whose result is only used outside the
loop, do not bother to flush vmcnt in the loop head on GFX12. A wait for
vmcnt will be required inside the loop anyway, because VMEM instructions
can write their VGPR results out of order.
---
 llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp |  2 +-
 llvm/test/CodeGen/AMDGPU/waitcnt-vmcnt-loop.mir | 10 +-
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp 
b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
index 4262e7b5d9c25..eafe20be17d5b 100644
--- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
@@ -2390,7 +2390,7 @@ bool SIInsertWaitcnts::shouldFlushVmCnt(MachineLoop *ML,
   }
   if (!ST->hasVscnt() && HasVMemStore && !HasVMemLoad && UsesVgprLoadedOutside)
 return true;
-  return HasVMemLoad && UsesVgprLoadedOutside;
+  return HasVMemLoad && UsesVgprLoadedOutside && ST->hasVmemWriteVgprInOrder();
 }
 
 bool SIInsertWaitcnts::runOnMachineFunction(MachineFunction &MF) {
diff --git a/llvm/test/CodeGen/AMDGPU/waitcnt-vmcnt-loop.mir 
b/llvm/test/CodeGen/AMDGPU/waitcnt-vmcnt-loop.mir
index bdef55ab956a0..0ddd2aa285b26 100644
--- a/llvm/test/CodeGen/AMDGPU/waitcnt-vmcnt-loop.mir
+++ b/llvm/test/CodeGen/AMDGPU/waitcnt-vmcnt-loop.mir
@@ -295,7 +295,7 @@ body: |
 # GFX12-LABEL: waitcnt_vm_loop2
 # GFX12-LABEL: bb.0:
 # GFX12: BUFFER_LOAD_FORMAT_X_IDXEN
-# GFX12: S_WAIT_LOADCNT 0
+# GFX12-NOT: S_WAIT_LOADCNT 0
 # GFX12-LABEL: bb.1:
 # GFX12: S_WAIT_LOADCNT 0
 # GFX12-LABEL: bb.2:
@@ -342,7 +342,7 @@ body: |
 # GFX12-LABEL: waitcnt_vm_loop2_store
 # GFX12-LABEL: bb.0:
 # GFX12: BUFFER_LOAD_FORMAT_X_IDXEN
-# GFX12: S_WAIT_LOADCNT 0
+# GFX12-NOT: S_WAIT_LOADCNT 0
 # GFX12-LABEL: bb.1:
 # GFX12: S_WAIT_LOADCNT 0
 # GFX12-LABEL: bb.2:
@@ -499,9 +499,9 @@ body: |
 # GFX12-LABEL: waitcnt_vm_loop2_reginterval
 # GFX12-LABEL: bb.0:
 # GFX12: GLOBAL_LOAD_DWORDX4
-# GFX12: S_WAIT_LOADCNT 0
-# GFX12-LABEL: bb.1:
 # GFX12-NOT: S_WAIT_LOADCNT 0
+# GFX12-LABEL: bb.1:
+# GFX12: S_WAIT_LOADCNT 0
 # GFX12-LABEL: bb.2:
 name:waitcnt_vm_loop2_reginterval
 body: |
@@ -600,7 +600,7 @@ body: |
 # GFX12-LABEL: bb.0:
 # GFX12: BUFFER_LOAD_FORMAT_X_IDXEN
 # GFX12: BUFFER_LOAD_FORMAT_X_IDXEN
-# GFX12: S_WAIT_LOADCNT 0
+# GFX12-NOT: S_WAIT_LOADCNT 0
 # GFX12-LABEL: bb.1:
 # GFX12: S_WAIT_LOADCNT 0
 # GFX12-LABEL: bb.2:

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] GFX12 VMEM instructions can write VGPR results out of order (PR #105549)

2024-08-21 Thread Jay Foad via llvm-branch-commits

jayfoad wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/105549?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#105550** https://app.graphite.dev/github/pr/llvm/llvm-project/105550?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#105549** https://app.graphite.dev/github/pr/llvm/llvm-project/105549?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈
* **#105548** https://app.graphite.dev/github/pr/llvm/llvm-project/105548?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`

This stack of pull requests is managed by Graphite. https://stacking.dev/?utm_source=stack-comment";>Learn more about 
stacking.


 Join @jayfoad and the rest of your teammates on https://graphite.dev?utm-source=stack-comment";>https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="11px" height="11px"/> Graphite
  

https://github.com/llvm/llvm-project/pull/105549
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Remove one case of vmcnt loop header flushing for GFX12 (PR #105550)

2024-08-21 Thread Jay Foad via llvm-branch-commits

jayfoad wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/105550?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#105550** https://app.graphite.dev/github/pr/llvm/llvm-project/105550?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈
* **#105549** https://app.graphite.dev/github/pr/llvm/llvm-project/105549?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#105548** https://app.graphite.dev/github/pr/llvm/llvm-project/105548?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`

This stack of pull requests is managed by Graphite. https://stacking.dev/?utm_source=stack-comment";>Learn more about 
stacking.


 Join @jayfoad and the rest of your teammates on https://graphite.dev?utm-source=stack-comment";>https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="11px" height="11px"/> Graphite
  

https://github.com/llvm/llvm-project/pull/105550
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] GFX12 VMEM instructions can write VGPR results out of order (PR #105549)

2024-08-21 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad ready_for_review 
https://github.com/llvm/llvm-project/pull/105549
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Remove one case of vmcnt loop header flushing for GFX12 (PR #105550)

2024-08-21 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad ready_for_review 
https://github.com/llvm/llvm-project/pull/105550
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] GFX12 VMEM instructions can write VGPR results out of order (PR #105549)

2024-08-21 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: Jay Foad (jayfoad)


Changes

Fix SIInsertWaitcnts to account for this by adding extra waits to avoid
WAW dependencies.

---

Patch is 22.41 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/105549.diff


12 Files Affected:

- (modified) llvm/lib/Target/AMDGPU/AMDGPU.td (+17-6) 
- (modified) llvm/lib/Target/AMDGPU/GCNSubtarget.h (+3) 
- (modified) llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp (+4-3) 
- (modified) llvm/test/CodeGen/AMDGPU/buffer-fat-pointer-atomicrmw-fadd.ll (+3) 
- (modified) llvm/test/CodeGen/AMDGPU/buffer-fat-pointer-atomicrmw-fmax.ll (+5) 
- (modified) llvm/test/CodeGen/AMDGPU/buffer-fat-pointer-atomicrmw-fmin.ll (+5) 
- (modified) 
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.struct.buffer.load.format.v3f16.ll (+1) 
- (modified) llvm/test/CodeGen/AMDGPU/load-constant-i16.ll (+9-1) 
- (modified) llvm/test/CodeGen/AMDGPU/load-global-i16.ll (+10) 
- (modified) llvm/test/CodeGen/AMDGPU/load-global-i32.ll (+2) 
- (modified) llvm/test/CodeGen/AMDGPU/spill-csr-frame-ptr-reg-copy.ll (+1) 
- (modified) llvm/test/CodeGen/AMDGPU/waitcnt-vmcnt-loop.mir (+4-4) 


``diff
diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td
index 7906e0ee9d7858..9efdbd751d96e3 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.td
@@ -953,6 +953,12 @@ def FeatureRequiredExportPriority : 
SubtargetFeature<"required-export-priority",
   "Export priority must be explicitly manipulated on GFX11.5"
 >;
 
+def FeatureVmemWriteVgprInOrder : SubtargetFeature<"vmem-write-vgpr-in-order",
+  "HasVmemWriteVgprInOrder",
+  "true",
+  "VMEM instructions of the same type write VGPR results in order"
+>;
+
 //======//
 // Subtarget Features (options and debugging)
 //======//
@@ -1123,7 +1129,8 @@ def FeatureSouthernIslands : 
GCNSubtargetFeatureGeneration<"SOUTHERN_ISLANDS",
   FeatureDsSrc2Insts, FeatureLDSBankCount32, FeatureMovrel,
   FeatureTrigReducedRange, FeatureExtendedImageInsts, FeatureImageInsts,
   FeatureGDS, FeatureGWS, FeatureDefaultComponentZero,
-  FeatureAtomicFMinFMaxF32GlobalInsts, FeatureAtomicFMinFMaxF64GlobalInsts
+  FeatureAtomicFMinFMaxF32GlobalInsts, FeatureAtomicFMinFMaxF64GlobalInsts,
+  FeatureVmemWriteVgprInOrder
   ]
 >;
 
@@ -1136,7 +1143,8 @@ def FeatureSeaIslands : 
GCNSubtargetFeatureGeneration<"SEA_ISLANDS",
   FeatureDsSrc2Insts, FeatureExtendedImageInsts, FeatureUnalignedBufferAccess,
   FeatureImageInsts, FeatureGDS, FeatureGWS, FeatureDefaultComponentZero,
   FeatureAtomicFMinFMaxF32GlobalInsts, FeatureAtomicFMinFMaxF64GlobalInsts,
-  FeatureAtomicFMinFMaxF32FlatInsts, FeatureAtomicFMinFMaxF64FlatInsts
+  FeatureAtomicFMinFMaxF32FlatInsts, FeatureAtomicFMinFMaxF64FlatInsts,
+  FeatureVmemWriteVgprInOrder
   ]
 >;
 
@@ -1152,7 +1160,7 @@ def FeatureVolcanicIslands : 
GCNSubtargetFeatureGeneration<"VOLCANIC_ISLANDS",
FeatureGFX7GFX8GFX9Insts, FeatureSMemTimeInst, FeatureMadMacF32Insts,
FeatureDsSrc2Insts, FeatureExtendedImageInsts, FeatureFastDenormalF32,
FeatureUnalignedBufferAccess, FeatureImageInsts, FeatureGDS, FeatureGWS,
-   FeatureDefaultComponentZero
+   FeatureDefaultComponentZero, FeatureVmemWriteVgprInOrder
   ]
 >;
 
@@ -1170,7 +1178,8 @@ def FeatureGFX9 : GCNSubtargetFeatureGeneration<"GFX9",
FeatureScalarFlatScratchInsts, FeatureScalarAtomics, FeatureR128A16,
FeatureA16, FeatureSMemTimeInst, FeatureFastDenormalF32, 
FeatureSupportsXNACK,
FeatureUnalignedBufferAccess, FeatureUnalignedDSAccess,
-   FeatureNegativeScratchOffsetBug, FeatureGWS, FeatureDefaultComponentZero
+   FeatureNegativeScratchOffsetBug, FeatureGWS, FeatureDefaultComponentZero,
+   FeatureVmemWriteVgprInOrder
   ]
 >;
 
@@ -1193,7 +1202,8 @@ def FeatureGFX10 : GCNSubtargetFeatureGeneration<"GFX10",
FeatureGDS, FeatureGWS, FeatureDefaultComponentZero,
FeatureMaxHardClauseLength63,
FeatureAtomicFMinFMaxF32GlobalInsts, FeatureAtomicFMinFMaxF64GlobalInsts,
-   FeatureAtomicFMinFMaxF32FlatInsts, FeatureAtomicFMinFMaxF64FlatInsts
+   FeatureAtomicFMinFMaxF32FlatInsts, FeatureAtomicFMinFMaxF64FlatInsts,
+   FeatureVmemWriteVgprInOrder
   ]
 >;
 
@@ -1215,7 +1225,8 @@ def FeatureGFX11 : GCNSubtargetFeatureGeneration<"GFX11",
FeatureUnalignedBufferAccess, FeatureUnalignedDSAccess, FeatureGDS,
FeatureGWS, FeatureDefaultComponentZero,
FeatureMaxHardClauseLength32,
-   FeatureAtomicFMinFMaxF32GlobalInsts, FeatureAtomicFMinFMaxF32FlatInsts
+   FeatureAtomicFMinFMaxF32GlobalInsts, FeatureAtomicFMinFMaxF32FlatInsts,
+   FeatureVmemWriteVgprInOrder
   ]
 >;
 
diff --git a/llvm/lib/Target/AMDGPU/GCNSubtarget.h 
b/llvm/lib/Target/AMDGPU/GCNSubtarget.h
index 902f51ae358d59..9386bcf0d74b22 100644
--- a/llvm/lib/Target/AMDGPU/GCNSubtarget.h
+++ b/llvm/lib/Target/AMDGPU/GCNSubtarget.h
@@ -239,6 +239,7 @@ class 

[llvm-branch-commits] [llvm] [AMDGPU] Remove one case of vmcnt loop header flushing for GFX12 (PR #105550)

2024-08-21 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: Jay Foad (jayfoad)


Changes

When a loop contains a VMEM load whose result is only used outside the
loop, do not bother to flush vmcnt in the loop head on GFX12. A wait for
vmcnt will be required inside the loop anyway, because VMEM instructions
can write their VGPR results out of order.

---
Full diff: https://github.com/llvm/llvm-project/pull/105550.diff


2 Files Affected:

- (modified) llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp (+1-1) 
- (modified) llvm/test/CodeGen/AMDGPU/waitcnt-vmcnt-loop.mir (+5-5) 


``diff
diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp 
b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
index 4262e7b5d9c25..eafe20be17d5b 100644
--- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
@@ -2390,7 +2390,7 @@ bool SIInsertWaitcnts::shouldFlushVmCnt(MachineLoop *ML,
   }
   if (!ST->hasVscnt() && HasVMemStore && !HasVMemLoad && UsesVgprLoadedOutside)
 return true;
-  return HasVMemLoad && UsesVgprLoadedOutside;
+  return HasVMemLoad && UsesVgprLoadedOutside && ST->hasVmemWriteVgprInOrder();
 }
 
 bool SIInsertWaitcnts::runOnMachineFunction(MachineFunction &MF) {
diff --git a/llvm/test/CodeGen/AMDGPU/waitcnt-vmcnt-loop.mir 
b/llvm/test/CodeGen/AMDGPU/waitcnt-vmcnt-loop.mir
index bdef55ab956a0..0ddd2aa285b26 100644
--- a/llvm/test/CodeGen/AMDGPU/waitcnt-vmcnt-loop.mir
+++ b/llvm/test/CodeGen/AMDGPU/waitcnt-vmcnt-loop.mir
@@ -295,7 +295,7 @@ body: |
 # GFX12-LABEL: waitcnt_vm_loop2
 # GFX12-LABEL: bb.0:
 # GFX12: BUFFER_LOAD_FORMAT_X_IDXEN
-# GFX12: S_WAIT_LOADCNT 0
+# GFX12-NOT: S_WAIT_LOADCNT 0
 # GFX12-LABEL: bb.1:
 # GFX12: S_WAIT_LOADCNT 0
 # GFX12-LABEL: bb.2:
@@ -342,7 +342,7 @@ body: |
 # GFX12-LABEL: waitcnt_vm_loop2_store
 # GFX12-LABEL: bb.0:
 # GFX12: BUFFER_LOAD_FORMAT_X_IDXEN
-# GFX12: S_WAIT_LOADCNT 0
+# GFX12-NOT: S_WAIT_LOADCNT 0
 # GFX12-LABEL: bb.1:
 # GFX12: S_WAIT_LOADCNT 0
 # GFX12-LABEL: bb.2:
@@ -499,9 +499,9 @@ body: |
 # GFX12-LABEL: waitcnt_vm_loop2_reginterval
 # GFX12-LABEL: bb.0:
 # GFX12: GLOBAL_LOAD_DWORDX4
-# GFX12: S_WAIT_LOADCNT 0
-# GFX12-LABEL: bb.1:
 # GFX12-NOT: S_WAIT_LOADCNT 0
+# GFX12-LABEL: bb.1:
+# GFX12: S_WAIT_LOADCNT 0
 # GFX12-LABEL: bb.2:
 name:waitcnt_vm_loop2_reginterval
 body: |
@@ -600,7 +600,7 @@ body: |
 # GFX12-LABEL: bb.0:
 # GFX12: BUFFER_LOAD_FORMAT_X_IDXEN
 # GFX12: BUFFER_LOAD_FORMAT_X_IDXEN
-# GFX12: S_WAIT_LOADCNT 0
+# GFX12-NOT: S_WAIT_LOADCNT 0
 # GFX12-LABEL: bb.1:
 # GFX12: S_WAIT_LOADCNT 0
 # GFX12-LABEL: bb.2:

``




https://github.com/llvm/llvm-project/pull/105550
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] [LLVM][Coroutines] Create `.noalloc` variant of switch ABI coroutine ramp functions during CoroSplit (PR #99283)

2024-08-21 Thread Yuxuan Chen via llvm-branch-commits


@@ -1967,22 +2047,13 @@ splitCoroutine(Function &F, SmallVectorImpl 
&Clones,
   for (DbgVariableRecord *DVR : DbgVariableRecords)
 coro::salvageDebugInfo(ArgToAllocaMap, *DVR, Shape.OptimizeFrame,
false /*UseEntryValue*/);
-  return Shape;
-}
 
-/// Remove calls to llvm.coro.end in the original function.
-static void removeCoroEndsFromRampFunction(const coro::Shape &Shape) {
-  if (Shape.ABI != coro::ABI::Switch) {
-for (auto *End : Shape.CoroEnds) {
-  replaceCoroEnd(End, Shape, Shape.FramePtr, /*in resume*/ false, nullptr);
-}
-  } else {
-for (llvm::AnyCoroEndInst *End : Shape.CoroEnds) {
-  auto &Context = End->getContext();
-  End->replaceAllUsesWith(ConstantInt::getFalse(Context));
-  End->eraseFromParent();
-}
+  removeCoroEndsFromRampFunction(Shape);
+
+  if (!isNoSuspendCoroutine && Shape.ABI == coro::ABI::Switch) {

yuxuanchen1997 wrote:

This turned out to be easy. I am addressing this with the next push for this 
patch. 

https://github.com/llvm/llvm-project/pull/99283
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [mlir] [OpenMP]Update use_device_clause lowering (PR #101707)

2024-08-21 Thread Akash Banerjee via llvm-branch-commits


@@ -2193,80 +2197,141 @@ llvm::Value *getSizeInBytes(DataLayout &dl, const 
mlir::Type &type,
   return builder.getInt64(dl.getTypeSizeInBits(type) / 8);
 }
 
-void collectMapDataFromMapVars(MapInfoData &mapData,
-   llvm::SmallVectorImpl &mapVars,
-   LLVM::ModuleTranslation &moduleTranslation,
-   DataLayout &dl, llvm::IRBuilderBase &builder) {
-  for (mlir::Value mapValue : mapVars) {
-if (auto mapOp = mlir::dyn_cast_if_present(
-mapValue.getDefiningOp())) {
-  mlir::Value offloadPtr =
-  mapOp.getVarPtrPtr() ? mapOp.getVarPtrPtr() : mapOp.getVarPtr();
-  mapData.OriginalValue.push_back(
-  moduleTranslation.lookupValue(offloadPtr));
-  mapData.Pointers.push_back(mapData.OriginalValue.back());
-
-  if (llvm::Value *refPtr =
-  getRefPtrIfDeclareTarget(offloadPtr,
-   moduleTranslation)) { // declare target
-mapData.IsDeclareTarget.push_back(true);
-mapData.BasePointers.push_back(refPtr);
-  } else { // regular mapped variable
-mapData.IsDeclareTarget.push_back(false);
-mapData.BasePointers.push_back(mapData.OriginalValue.back());
-  }
+static void collectMapDataFromMapOperands(
+MapInfoData &mapData, SmallVectorImpl &mapVars,
+LLVM::ModuleTranslation &moduleTranslation, DataLayout &dl,
+llvm::IRBuilderBase &builder, const ArrayRef &useDevPtrOperands = 
{},
+const ArrayRef &useDevAddrOperands = {}) {
+  // Process MapOperands
+  for (Value mapValue : mapVars) {
+auto mapOp = cast(mapValue.getDefiningOp());
+Value offloadPtr =
+mapOp.getVarPtrPtr() ? mapOp.getVarPtrPtr() : mapOp.getVarPtr();
+mapData.OriginalValue.push_back(moduleTranslation.lookupValue(offloadPtr));
+mapData.Pointers.push_back(mapData.OriginalValue.back());
+
+if (llvm::Value *refPtr =
+getRefPtrIfDeclareTarget(offloadPtr,
+ moduleTranslation)) { // declare target
+  mapData.IsDeclareTarget.push_back(true);
+  mapData.BasePointers.push_back(refPtr);
+} else { // regular mapped variable
+  mapData.IsDeclareTarget.push_back(false);
+  mapData.BasePointers.push_back(mapData.OriginalValue.back());
+}
 
-  mapData.BaseType.push_back(
-  moduleTranslation.convertType(mapOp.getVarType()));
-  mapData.Sizes.push_back(
-  getSizeInBytes(dl, mapOp.getVarType(), mapOp, 
mapData.Pointers.back(),
- mapData.BaseType.back(), builder, moduleTranslation));
-  mapData.MapClause.push_back(mapOp.getOperation());
-  mapData.Types.push_back(
-  llvm::omp::OpenMPOffloadMappingFlags(mapOp.getMapType().value()));
-  mapData.Names.push_back(LLVM::createMappingInformation(
-  mapOp.getLoc(), *moduleTranslation.getOpenMPBuilder()));
-  mapData.DevicePointers.push_back(
-  llvm::OpenMPIRBuilder::DeviceInfoTy::None);
-
-  // Check if this is a member mapping and correctly assign that it is, if
-  // it is a member of a larger object.
-  // TODO: Need better handling of members, and distinguishing of members
-  // that are implicitly allocated on device vs explicitly passed in as
-  // arguments.
-  // TODO: May require some further additions to support nested record
-  // types, i.e. member maps that can have member maps.
-  mapData.IsAMember.push_back(false);
-  for (mlir::Value mapValue : mapVars) {
-if (auto map = mlir::dyn_cast_if_present(
-mapValue.getDefiningOp())) {
-  for (auto member : map.getMembers()) {
-if (member == mapOp) {
-  mapData.IsAMember.back() = true;
-}
+mapData.BaseType.push_back(
+moduleTranslation.convertType(mapOp.getVarType()));
+mapData.Sizes.push_back(
+getSizeInBytes(dl, mapOp.getVarType(), mapOp, mapData.Pointers.back(),
+   mapData.BaseType.back(), builder, moduleTranslation));
+mapData.MapClause.push_back(mapOp.getOperation());
+mapData.Types.push_back(
+llvm::omp::OpenMPOffloadMappingFlags(mapOp.getMapType().value()));
+mapData.Names.push_back(LLVM::createMappingInformation(
+mapOp.getLoc(), *moduleTranslation.getOpenMPBuilder()));
+
mapData.DevicePointers.push_back(llvm::OpenMPIRBuilder::DeviceInfoTy::None);
+mapData.IsAMapping.push_back(true);
+
+// Check if this is a member mapping and correctly assign that it is, if
+// it is a member of a larger object.
+// TODO: Need better handling of members, and distinguishing of members
+// that are implicitly allocated on device vs explicitly passed in as
+// arguments.
+// TODO: May require some further additions to support nested record
+// types, i.e. member maps that can have member maps.
+mapData.IsAMember.push_back(false);
+for (Valu

[llvm-branch-commits] [llvm] [mlir] [OpenMP]Update use_device_clause lowering (PR #101707)

2024-08-21 Thread Akash Banerjee via llvm-branch-commits


@@ -2193,80 +2197,141 @@ llvm::Value *getSizeInBytes(DataLayout &dl, const 
mlir::Type &type,
   return builder.getInt64(dl.getTypeSizeInBits(type) / 8);
 }
 
-void collectMapDataFromMapVars(MapInfoData &mapData,
-   llvm::SmallVectorImpl &mapVars,
-   LLVM::ModuleTranslation &moduleTranslation,
-   DataLayout &dl, llvm::IRBuilderBase &builder) {
-  for (mlir::Value mapValue : mapVars) {
-if (auto mapOp = mlir::dyn_cast_if_present(
-mapValue.getDefiningOp())) {
-  mlir::Value offloadPtr =
-  mapOp.getVarPtrPtr() ? mapOp.getVarPtrPtr() : mapOp.getVarPtr();
-  mapData.OriginalValue.push_back(
-  moduleTranslation.lookupValue(offloadPtr));
-  mapData.Pointers.push_back(mapData.OriginalValue.back());
-
-  if (llvm::Value *refPtr =
-  getRefPtrIfDeclareTarget(offloadPtr,
-   moduleTranslation)) { // declare target
-mapData.IsDeclareTarget.push_back(true);
-mapData.BasePointers.push_back(refPtr);
-  } else { // regular mapped variable
-mapData.IsDeclareTarget.push_back(false);
-mapData.BasePointers.push_back(mapData.OriginalValue.back());
-  }
+static void collectMapDataFromMapOperands(
+MapInfoData &mapData, SmallVectorImpl &mapVars,
+LLVM::ModuleTranslation &moduleTranslation, DataLayout &dl,
+llvm::IRBuilderBase &builder, const ArrayRef &useDevPtrOperands = 
{},
+const ArrayRef &useDevAddrOperands = {}) {
+  // Process MapOperands
+  for (Value mapValue : mapVars) {
+auto mapOp = cast(mapValue.getDefiningOp());

TIFitis wrote:

Yes, I've replaced the others with cast as well.

https://github.com/llvm/llvm-project/pull/101707
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [mlir] [OpenMP]Update use_device_clause lowering (PR #101707)

2024-08-21 Thread Akash Banerjee via llvm-branch-commits


@@ -6357,7 +6357,7 @@ OpenMPIRBuilder::InsertPointTy 
OpenMPIRBuilder::createTargetData(
   // Disable TargetData CodeGen on Device pass.
   if (Config.IsTargetDevice.value_or(false)) {
 if (BodyGenCB)
-  Builder.restoreIP(BodyGenCB(Builder.saveIP(), BodyGenTy::NoPriv));
+  Builder.restoreIP(BodyGenCB(CodeGenIP, BodyGenTy::NoPriv));

TIFitis wrote:

It's because `CodeGenIP` hasn't been restored by the `Builder` at this point. 
Instead of passing `CodeGenIP`, I've moved the `restoreIp` call upward.

https://github.com/llvm/llvm-project/pull/101707
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [ctx_prof] API to get the instrumentation of a BB (PR #105468)

2024-08-21 Thread Mircea Trofin via llvm-branch-commits

https://github.com/mtrofin updated 
https://github.com/llvm/llvm-project/pull/105468

>From f81d31c3311690826bdc1f5c83fc45b4864de035 Mon Sep 17 00:00:00 2001
From: Mircea Trofin 
Date: Tue, 20 Aug 2024 21:09:16 -0700
Subject: [PATCH] [ctx_prof] API to get the instrumentation of a BB

---
 llvm/include/llvm/Analysis/CtxProfAnalysis.h  |  5 +
 llvm/lib/Analysis/CtxProfAnalysis.cpp |  7 ++
 .../Analysis/CtxProfAnalysisTest.cpp  | 22 +++
 3 files changed, 34 insertions(+)

diff --git a/llvm/include/llvm/Analysis/CtxProfAnalysis.h 
b/llvm/include/llvm/Analysis/CtxProfAnalysis.h
index 23abcbe2c6e9d2..0b4dd8ae3a0dc7 100644
--- a/llvm/include/llvm/Analysis/CtxProfAnalysis.h
+++ b/llvm/include/llvm/Analysis/CtxProfAnalysis.h
@@ -95,7 +95,12 @@ class CtxProfAnalysis : public 
AnalysisInfoMixin {
 
   PGOContextualProfile run(Module &M, ModuleAnalysisManager &MAM);
 
+  /// Get the instruction instrumenting a callsite, or nullptr if that cannot 
be
+  /// found.
   static InstrProfCallsite *getCallsiteInstrumentation(CallBase &CB);
+
+  /// Get the instruction instrumenting a BB, or nullptr if not present.
+  static InstrProfIncrementInst *getBBInstrumentation(BasicBlock &BB);
 };
 
 class CtxProfAnalysisPrinterPass
diff --git a/llvm/lib/Analysis/CtxProfAnalysis.cpp 
b/llvm/lib/Analysis/CtxProfAnalysis.cpp
index ceebb2cf06d235..3fc1bc34afb97e 100644
--- a/llvm/lib/Analysis/CtxProfAnalysis.cpp
+++ b/llvm/lib/Analysis/CtxProfAnalysis.cpp
@@ -202,6 +202,13 @@ InstrProfCallsite 
*CtxProfAnalysis::getCallsiteInstrumentation(CallBase &CB) {
   return nullptr;
 }
 
+InstrProfIncrementInst *CtxProfAnalysis::getBBInstrumentation(BasicBlock &BB) {
+  for (auto &I : BB)
+if (auto *Incr = dyn_cast(&I))
+  return Incr;
+  return nullptr;
+}
+
 static void
 preorderVisit(const PGOCtxProfContext::CallTargetMapTy &Profiles,
   function_ref Visitor) {
diff --git a/llvm/unittests/Analysis/CtxProfAnalysisTest.cpp 
b/llvm/unittests/Analysis/CtxProfAnalysisTest.cpp
index 5f9bf3ec540eb3..fbe3a6e45109cc 100644
--- a/llvm/unittests/Analysis/CtxProfAnalysisTest.cpp
+++ b/llvm/unittests/Analysis/CtxProfAnalysisTest.cpp
@@ -132,4 +132,26 @@ TEST_F(CtxProfAnalysisTest, GetCallsiteIDNegativeTest) {
   EXPECT_EQ(IndIns, nullptr);
 }
 
+TEST_F(CtxProfAnalysisTest, GetBBIDTest) {
+  ModulePassManager MPM;
+  MPM.addPass(PGOInstrumentationGen(PGOInstrumentationType::CTXPROF));
+  EXPECT_FALSE(MPM.run(*M, MAM).areAllPreserved());
+  auto *F = M->getFunction("foo");
+  ASSERT_NE(F, nullptr);
+  std::map BBNameAndID;
+
+  for (auto &BB : *F) {
+auto *Ins = CtxProfAnalysis::getBBInstrumentation(BB);
+if (Ins)
+  BBNameAndID[BB.getName().str()] =
+  static_cast(Ins->getIndex()->getZExtValue());
+else
+  BBNameAndID[BB.getName().str()] = -1;
+  }
+
+  EXPECT_THAT(BBNameAndID,
+  testing::UnorderedElementsAre(
+  testing::Pair("", 0), testing::Pair("yes", 1),
+  testing::Pair("no", -1), testing::Pair("exit", -1)));
+}
 } // namespace

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [ctx_prof] Add support for ICP (PR #105469)

2024-08-21 Thread Mircea Trofin via llvm-branch-commits

https://github.com/mtrofin updated 
https://github.com/llvm/llvm-project/pull/105469

>From de6d88788d35cfeace3f694008d446e8175421a0 Mon Sep 17 00:00:00 2001
From: Mircea Trofin 
Date: Tue, 20 Aug 2024 21:32:23 -0700
Subject: [PATCH] [ctx_prof] Add support for ICP

---
 llvm/include/llvm/Analysis/CtxProfAnalysis.h  |  18 +-
 llvm/include/llvm/IR/IntrinsicInst.h  |   2 +
 .../llvm/ProfileData/PGOCtxProfReader.h   |  20 ++
 .../Transforms/Utils/CallPromotionUtils.h |   4 +
 llvm/lib/Analysis/CtxProfAnalysis.cpp |  79 +---
 llvm/lib/IR/IntrinsicInst.cpp |  10 +
 .../Transforms/Utils/CallPromotionUtils.cpp   |  86 +
 .../Utils/CallPromotionUtilsTest.cpp  | 178 ++
 8 files changed, 364 insertions(+), 33 deletions(-)

diff --git a/llvm/include/llvm/Analysis/CtxProfAnalysis.h 
b/llvm/include/llvm/Analysis/CtxProfAnalysis.h
index 0b4dd8ae3a0dc7..d6c2bb26a091af 100644
--- a/llvm/include/llvm/Analysis/CtxProfAnalysis.h
+++ b/llvm/include/llvm/Analysis/CtxProfAnalysis.h
@@ -73,6 +73,12 @@ class PGOContextualProfile {
 return 
FuncInfo.find(getDefinedFunctionGUID(F))->second.NextCallsiteIndex++;
   }
 
+  using ConstVisitor = function_ref;
+  using Visitor = function_ref;
+
+  void update(Visitor, const Function *F = nullptr);
+  void visit(ConstVisitor, const Function *F = nullptr) const;
+
   const CtxProfFlatProfile flatten() const;
 
   bool invalidate(Module &, const PreservedAnalyses &PA,
@@ -105,13 +111,18 @@ class CtxProfAnalysis : public 
AnalysisInfoMixin {
 
 class CtxProfAnalysisPrinterPass
 : public PassInfoMixin {
-  raw_ostream &OS;
-
 public:
-  explicit CtxProfAnalysisPrinterPass(raw_ostream &OS) : OS(OS) {}
+  enum class PrintMode { Everything, JSON };
+  explicit CtxProfAnalysisPrinterPass(raw_ostream &OS,
+  PrintMode Mode = PrintMode::Everything)
+  : OS(OS), Mode(Mode) {}
 
   PreservedAnalyses run(Module &M, ModuleAnalysisManager &MAM);
   static bool isRequired() { return true; }
+
+private:
+  raw_ostream &OS;
+  const PrintMode Mode;
 };
 
 /// Assign a GUID to functions as metadata. GUID calculation takes linkage into
@@ -134,6 +145,5 @@ class AssignGUIDPass : public PassInfoMixin 
{
   // This should become GlobalValue::getGUID
   static uint64_t getGUID(const Function &F);
 };
-
 } // namespace llvm
 #endif // LLVM_ANALYSIS_CTXPROFANALYSIS_H
diff --git a/llvm/include/llvm/IR/IntrinsicInst.h 
b/llvm/include/llvm/IR/IntrinsicInst.h
index 2f1e2c08c3ecec..bab41efab528e2 100644
--- a/llvm/include/llvm/IR/IntrinsicInst.h
+++ b/llvm/include/llvm/IR/IntrinsicInst.h
@@ -1519,6 +1519,7 @@ class InstrProfCntrInstBase : public InstrProfInstBase {
   ConstantInt *getNumCounters() const;
   // The index of the counter that this instruction acts on.
   ConstantInt *getIndex() const;
+  void setIndex(uint32_t Idx);
 };
 
 /// This represents the llvm.instrprof.cover intrinsic.
@@ -1569,6 +1570,7 @@ class InstrProfCallsite : public InstrProfCntrInstBase {
 return isa(V) && classof(cast(V));
   }
   Value *getCallee() const;
+  void setCallee(Value *);
 };
 
 /// This represents the llvm.instrprof.timestamp intrinsic.
diff --git a/llvm/include/llvm/ProfileData/PGOCtxProfReader.h 
b/llvm/include/llvm/ProfileData/PGOCtxProfReader.h
index 190deaeeacd085..23dcc376508b39 100644
--- a/llvm/include/llvm/ProfileData/PGOCtxProfReader.h
+++ b/llvm/include/llvm/ProfileData/PGOCtxProfReader.h
@@ -57,9 +57,23 @@ class PGOCtxProfContext final {
 
   GlobalValue::GUID guid() const { return GUID; }
   const SmallVectorImpl &counters() const { return Counters; }
+  SmallVectorImpl &counters() { return Counters; }
+
+  uint64_t getEntrycount() const { return Counters[0]; }
+
   const CallsiteMapTy &callsites() const { return Callsites; }
   CallsiteMapTy &callsites() { return Callsites; }
 
+  void ingestContext(uint32_t CSId, PGOCtxProfContext &&Other) {
+auto [Iter, _] = callsites().try_emplace(CSId, CallTargetMapTy());
+Iter->second.emplace(Other.guid(), std::move(Other));
+  }
+
+  void growCounters(uint32_t Size) {
+if (Size >= Counters.size())
+  Counters.resize(Size);
+  }
+
   bool hasCallsite(uint32_t I) const {
 return Callsites.find(I) != Callsites.end();
   }
@@ -68,6 +82,12 @@ class PGOCtxProfContext final {
 assert(hasCallsite(I) && "Callsite not found");
 return Callsites.find(I)->second;
   }
+
+  CallTargetMapTy &callsite(uint32_t I) {
+assert(hasCallsite(I) && "Callsite not found");
+return Callsites.find(I)->second;
+  }
+
   void getContainedGuids(DenseSet &Guids) const;
 };
 
diff --git a/llvm/include/llvm/Transforms/Utils/CallPromotionUtils.h 
b/llvm/include/llvm/Transforms/Utils/CallPromotionUtils.h
index 385831f457038d..58af26f31417b0 100644
--- a/llvm/include/llvm/Transforms/Utils/CallPromotionUtils.h
+++ b/llvm/include/llvm/Transforms/Utils/CallPromotionUtils.h
@@ -14,6 +14,7 @@
 #ifndef LLVM_TRANSFORMS_UTILS_CALLPROMOTIONUTILS_H
 #defin

[llvm-branch-commits] [clang] [Serialization] Code cleanups and polish 83233 (PR #83237)

2024-08-21 Thread Ilya Biryukov via llvm-branch-commits

ilya-biryukov wrote:

Sorry for disappearing, I had a holiday, vacation and some unplanned 
interruptions over the last week and the start of this week. I have made really 
good progress and the amount of code that I need to dig through is quite 
manageable.

I should have a repro for you this week.

https://github.com/llvm/llvm-project/pull/83237
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [mlir] [OpenMP]Update use_device_clause lowering (PR #101707)

2024-08-21 Thread Akash Banerjee via llvm-branch-commits

https://github.com/TIFitis updated 
https://github.com/llvm/llvm-project/pull/101707

>From 547b339b175fa996eef8d45c5df8a73967ee94c2 Mon Sep 17 00:00:00 2001
From: Akash Banerjee 
Date: Fri, 2 Aug 2024 17:11:21 +0100
Subject: [PATCH 1/3] [OpenMP]Update use_device_clause lowering

This patch updates the use_device_ptr and use_device_addr clauses to use the 
mapInfoOps for lowering. This allows all the types that are handle by the map 
clauses such as derived types to also be supported by the use_device_clauses.

This is patch 2/2 in a series of patches.
---
 llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp |   2 +-
 .../OpenMP/OpenMPToLLVMIRTranslation.cpp  | 284 ++
 mlir/test/Target/LLVMIR/omptarget-llvm.mlir   |  16 +-
 .../openmp-target-use-device-nested.mlir  |  27 ++
 4 files changed, 194 insertions(+), 135 deletions(-)
 create mode 100644 mlir/test/Target/LLVMIR/openmp-target-use-device-nested.mlir

diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp 
b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
index 83fec194d73904..f5d94069ad6f4c 100644
--- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
@@ -6357,7 +6357,7 @@ OpenMPIRBuilder::InsertPointTy 
OpenMPIRBuilder::createTargetData(
   // Disable TargetData CodeGen on Device pass.
   if (Config.IsTargetDevice.value_or(false)) {
 if (BodyGenCB)
-  Builder.restoreIP(BodyGenCB(Builder.saveIP(), BodyGenTy::NoPriv));
+  Builder.restoreIP(BodyGenCB(CodeGenIP, BodyGenTy::NoPriv));
 return Builder.saveIP();
   }
 
diff --git 
a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp 
b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
index 458d05d5059db7..78c460c50cbe5e 100644
--- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
@@ -2110,6 +2110,8 @@ getRefPtrIfDeclareTarget(mlir::Value value,
 struct MapInfoData : llvm::OpenMPIRBuilder::MapInfosTy {
   llvm::SmallVector IsDeclareTarget;
   llvm::SmallVector IsAMember;
+  // Identify if mapping was added by mapClause or use_device clauses.
+  llvm::SmallVector IsAMapping;
   llvm::SmallVector MapClause;
   llvm::SmallVector OriginalValue;
   // Stripped off array/pointer to get the underlying
@@ -2193,62 +2195,125 @@ llvm::Value *getSizeInBytes(DataLayout &dl, const 
mlir::Type &type,
   return builder.getInt64(dl.getTypeSizeInBits(type) / 8);
 }
 
-void collectMapDataFromMapVars(MapInfoData &mapData,
-   llvm::SmallVectorImpl &mapVars,
-   LLVM::ModuleTranslation &moduleTranslation,
-   DataLayout &dl, llvm::IRBuilderBase &builder) {
+void collectMapDataFromMapOperands(
+MapInfoData &mapData, llvm::SmallVectorImpl &mapVars,
+LLVM::ModuleTranslation &moduleTranslation, DataLayout &dl,
+llvm::IRBuilderBase &builder,
+const llvm::ArrayRef &useDevPtrOperands = {},
+const llvm::ArrayRef &useDevAddrOperands = {}) {
+  // Process MapOperands
   for (mlir::Value mapValue : mapVars) {
-if (auto mapOp = mlir::dyn_cast_if_present(
-mapValue.getDefiningOp())) {
-  mlir::Value offloadPtr =
-  mapOp.getVarPtrPtr() ? mapOp.getVarPtrPtr() : mapOp.getVarPtr();
-  mapData.OriginalValue.push_back(
-  moduleTranslation.lookupValue(offloadPtr));
-  mapData.Pointers.push_back(mapData.OriginalValue.back());
-
-  if (llvm::Value *refPtr =
-  getRefPtrIfDeclareTarget(offloadPtr,
-   moduleTranslation)) { // declare target
-mapData.IsDeclareTarget.push_back(true);
-mapData.BasePointers.push_back(refPtr);
-  } else { // regular mapped variable
-mapData.IsDeclareTarget.push_back(false);
-mapData.BasePointers.push_back(mapData.OriginalValue.back());
-  }
+auto mapOp = mlir::cast(mapValue.getDefiningOp());
+mlir::Value offloadPtr =
+mapOp.getVarPtrPtr() ? mapOp.getVarPtrPtr() : mapOp.getVarPtr();
+mapData.OriginalValue.push_back(moduleTranslation.lookupValue(offloadPtr));
+mapData.Pointers.push_back(mapData.OriginalValue.back());
+
+if (llvm::Value *refPtr =
+getRefPtrIfDeclareTarget(offloadPtr,
+ moduleTranslation)) { // declare target
+  mapData.IsDeclareTarget.push_back(true);
+  mapData.BasePointers.push_back(refPtr);
+} else { // regular mapped variable
+  mapData.IsDeclareTarget.push_back(false);
+  mapData.BasePointers.push_back(mapData.OriginalValue.back());
+}
 
-  mapData.BaseType.push_back(
-  moduleTranslation.convertType(mapOp.getVarType()));
-  mapData.Sizes.push_back(
-  getSizeInBytes(dl, mapOp.getVarType(), mapOp, 
mapData.Pointers.back(),
- mapData.BaseType.back(), builder, moduleTranslation));
-  mapData.MapClause.push_back(mapOp.getO

[llvm-branch-commits] [llvm] [mlir] [OpenMP]Update use_device_clause lowering (PR #101707)

2024-08-21 Thread Akash Banerjee via llvm-branch-commits

TIFitis wrote:

@skatrak Thanks for the review, I've addressed the comments in the latest 
revision.


https://github.com/llvm/llvm-project/pull/101707
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [mlir] [OpenMP]Update use_device_clause lowering (PR #101707)

2024-08-21 Thread Akash Banerjee via llvm-branch-commits


@@ -2439,7 +2504,7 @@ static llvm::omp::OpenMPOffloadMappingFlags 
mapParentWithMembers(
   // data by the descriptor (which itself, is a structure containing
   // runtime information on the dynamically allocated data).
   auto parentClause =
-  llvm::cast(mapData.MapClause[mapDataIndex]);
+  llvm::cast(mapData.MapClause[mapDataIndex]);

TIFitis wrote:

Agreed, I'll create an NFC PR to address this.

https://github.com/llvm/llvm-project/pull/101707
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [Driver] Add -Wa, options -mmapsyms={default, implicit} (PR #104542)

2024-08-21 Thread Fangrui Song via llvm-branch-commits

https://github.com/MaskRay updated 
https://github.com/llvm/llvm-project/pull/104542


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [Driver] Add -Wa, options -mmapsyms={default, implicit} (PR #104542)

2024-08-21 Thread Fangrui Song via llvm-branch-commits

https://github.com/MaskRay updated 
https://github.com/llvm/llvm-project/pull/104542


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [Driver] Add -Wa, options -mmapsyms={default, implicit} (PR #104542)

2024-08-21 Thread Fangrui Song via llvm-branch-commits


@@ -7131,6 +7131,8 @@ def massembler_fatal_warnings : Flag<["-"], 
"massembler-fatal-warnings">,
 def crel : Flag<["--"], "crel">,
   HelpText<"Enable CREL relocation format (ELF only)">,
   MarshallingInfoFlag>;
+def mmapsyms_implicit : Flag<["-"], "mmapsyms=implicit">,

MaskRay wrote:

Thx. Added help message for clang -cc1 and clang -cc1as

https://github.com/llvm/llvm-project/pull/104542
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [ctx_prof] Add support for ICP (PR #105469)

2024-08-21 Thread Mircea Trofin via llvm-branch-commits

https://github.com/mtrofin updated 
https://github.com/llvm/llvm-project/pull/105469

>From d58d308957961ae7442a7b5aa0561f42dea69caf Mon Sep 17 00:00:00 2001
From: Mircea Trofin 
Date: Tue, 20 Aug 2024 21:32:23 -0700
Subject: [PATCH] [ctx_prof] Add support for ICP

---
 llvm/include/llvm/Analysis/CtxProfAnalysis.h  |  18 +-
 llvm/include/llvm/IR/IntrinsicInst.h  |   2 +
 .../llvm/ProfileData/PGOCtxProfReader.h   |  20 ++
 .../Transforms/Utils/CallPromotionUtils.h |   4 +
 llvm/lib/Analysis/CtxProfAnalysis.cpp |  79 +---
 llvm/lib/IR/IntrinsicInst.cpp |  10 +
 .../Transforms/Utils/CallPromotionUtils.cpp   |  86 +
 .../Utils/CallPromotionUtilsTest.cpp  | 178 ++
 8 files changed, 364 insertions(+), 33 deletions(-)

diff --git a/llvm/include/llvm/Analysis/CtxProfAnalysis.h 
b/llvm/include/llvm/Analysis/CtxProfAnalysis.h
index 0b4dd8ae3a0dc7..d6c2bb26a091af 100644
--- a/llvm/include/llvm/Analysis/CtxProfAnalysis.h
+++ b/llvm/include/llvm/Analysis/CtxProfAnalysis.h
@@ -73,6 +73,12 @@ class PGOContextualProfile {
 return 
FuncInfo.find(getDefinedFunctionGUID(F))->second.NextCallsiteIndex++;
   }
 
+  using ConstVisitor = function_ref;
+  using Visitor = function_ref;
+
+  void update(Visitor, const Function *F = nullptr);
+  void visit(ConstVisitor, const Function *F = nullptr) const;
+
   const CtxProfFlatProfile flatten() const;
 
   bool invalidate(Module &, const PreservedAnalyses &PA,
@@ -105,13 +111,18 @@ class CtxProfAnalysis : public 
AnalysisInfoMixin {
 
 class CtxProfAnalysisPrinterPass
 : public PassInfoMixin {
-  raw_ostream &OS;
-
 public:
-  explicit CtxProfAnalysisPrinterPass(raw_ostream &OS) : OS(OS) {}
+  enum class PrintMode { Everything, JSON };
+  explicit CtxProfAnalysisPrinterPass(raw_ostream &OS,
+  PrintMode Mode = PrintMode::Everything)
+  : OS(OS), Mode(Mode) {}
 
   PreservedAnalyses run(Module &M, ModuleAnalysisManager &MAM);
   static bool isRequired() { return true; }
+
+private:
+  raw_ostream &OS;
+  const PrintMode Mode;
 };
 
 /// Assign a GUID to functions as metadata. GUID calculation takes linkage into
@@ -134,6 +145,5 @@ class AssignGUIDPass : public PassInfoMixin 
{
   // This should become GlobalValue::getGUID
   static uint64_t getGUID(const Function &F);
 };
-
 } // namespace llvm
 #endif // LLVM_ANALYSIS_CTXPROFANALYSIS_H
diff --git a/llvm/include/llvm/IR/IntrinsicInst.h 
b/llvm/include/llvm/IR/IntrinsicInst.h
index 2f1e2c08c3ecec..bab41efab528e2 100644
--- a/llvm/include/llvm/IR/IntrinsicInst.h
+++ b/llvm/include/llvm/IR/IntrinsicInst.h
@@ -1519,6 +1519,7 @@ class InstrProfCntrInstBase : public InstrProfInstBase {
   ConstantInt *getNumCounters() const;
   // The index of the counter that this instruction acts on.
   ConstantInt *getIndex() const;
+  void setIndex(uint32_t Idx);
 };
 
 /// This represents the llvm.instrprof.cover intrinsic.
@@ -1569,6 +1570,7 @@ class InstrProfCallsite : public InstrProfCntrInstBase {
 return isa(V) && classof(cast(V));
   }
   Value *getCallee() const;
+  void setCallee(Value *);
 };
 
 /// This represents the llvm.instrprof.timestamp intrinsic.
diff --git a/llvm/include/llvm/ProfileData/PGOCtxProfReader.h 
b/llvm/include/llvm/ProfileData/PGOCtxProfReader.h
index 190deaeeacd085..23dcc376508b39 100644
--- a/llvm/include/llvm/ProfileData/PGOCtxProfReader.h
+++ b/llvm/include/llvm/ProfileData/PGOCtxProfReader.h
@@ -57,9 +57,23 @@ class PGOCtxProfContext final {
 
   GlobalValue::GUID guid() const { return GUID; }
   const SmallVectorImpl &counters() const { return Counters; }
+  SmallVectorImpl &counters() { return Counters; }
+
+  uint64_t getEntrycount() const { return Counters[0]; }
+
   const CallsiteMapTy &callsites() const { return Callsites; }
   CallsiteMapTy &callsites() { return Callsites; }
 
+  void ingestContext(uint32_t CSId, PGOCtxProfContext &&Other) {
+auto [Iter, _] = callsites().try_emplace(CSId, CallTargetMapTy());
+Iter->second.emplace(Other.guid(), std::move(Other));
+  }
+
+  void growCounters(uint32_t Size) {
+if (Size >= Counters.size())
+  Counters.resize(Size);
+  }
+
   bool hasCallsite(uint32_t I) const {
 return Callsites.find(I) != Callsites.end();
   }
@@ -68,6 +82,12 @@ class PGOCtxProfContext final {
 assert(hasCallsite(I) && "Callsite not found");
 return Callsites.find(I)->second;
   }
+
+  CallTargetMapTy &callsite(uint32_t I) {
+assert(hasCallsite(I) && "Callsite not found");
+return Callsites.find(I)->second;
+  }
+
   void getContainedGuids(DenseSet &Guids) const;
 };
 
diff --git a/llvm/include/llvm/Transforms/Utils/CallPromotionUtils.h 
b/llvm/include/llvm/Transforms/Utils/CallPromotionUtils.h
index 385831f457038d..58af26f31417b0 100644
--- a/llvm/include/llvm/Transforms/Utils/CallPromotionUtils.h
+++ b/llvm/include/llvm/Transforms/Utils/CallPromotionUtils.h
@@ -14,6 +14,7 @@
 #ifndef LLVM_TRANSFORMS_UTILS_CALLPROMOTIONUTILS_H
 #defin

[llvm-branch-commits] [llvm] [ctx_prof] Add support for ICP (PR #105469)

2024-08-21 Thread Mircea Trofin via llvm-branch-commits

https://github.com/mtrofin updated 
https://github.com/llvm/llvm-project/pull/105469

>From 0d7c720e67a0213565f0e7c141c4ffa1b91fc5b9 Mon Sep 17 00:00:00 2001
From: Mircea Trofin 
Date: Tue, 20 Aug 2024 21:09:16 -0700
Subject: [PATCH 1/2] [ctx_prof] API to get the instrumentation of a BB

---
 llvm/include/llvm/Analysis/CtxProfAnalysis.h  |  5 +
 llvm/lib/Analysis/CtxProfAnalysis.cpp |  7 ++
 .../Analysis/CtxProfAnalysisTest.cpp  | 22 +++
 3 files changed, 34 insertions(+)

diff --git a/llvm/include/llvm/Analysis/CtxProfAnalysis.h 
b/llvm/include/llvm/Analysis/CtxProfAnalysis.h
index 23abcbe2c6e9d2..0b4dd8ae3a0dc7 100644
--- a/llvm/include/llvm/Analysis/CtxProfAnalysis.h
+++ b/llvm/include/llvm/Analysis/CtxProfAnalysis.h
@@ -95,7 +95,12 @@ class CtxProfAnalysis : public 
AnalysisInfoMixin {
 
   PGOContextualProfile run(Module &M, ModuleAnalysisManager &MAM);
 
+  /// Get the instruction instrumenting a callsite, or nullptr if that cannot 
be
+  /// found.
   static InstrProfCallsite *getCallsiteInstrumentation(CallBase &CB);
+
+  /// Get the instruction instrumenting a BB, or nullptr if not present.
+  static InstrProfIncrementInst *getBBInstrumentation(BasicBlock &BB);
 };
 
 class CtxProfAnalysisPrinterPass
diff --git a/llvm/lib/Analysis/CtxProfAnalysis.cpp 
b/llvm/lib/Analysis/CtxProfAnalysis.cpp
index ceebb2cf06d235..3fc1bc34afb97e 100644
--- a/llvm/lib/Analysis/CtxProfAnalysis.cpp
+++ b/llvm/lib/Analysis/CtxProfAnalysis.cpp
@@ -202,6 +202,13 @@ InstrProfCallsite 
*CtxProfAnalysis::getCallsiteInstrumentation(CallBase &CB) {
   return nullptr;
 }
 
+InstrProfIncrementInst *CtxProfAnalysis::getBBInstrumentation(BasicBlock &BB) {
+  for (auto &I : BB)
+if (auto *Incr = dyn_cast(&I))
+  return Incr;
+  return nullptr;
+}
+
 static void
 preorderVisit(const PGOCtxProfContext::CallTargetMapTy &Profiles,
   function_ref Visitor) {
diff --git a/llvm/unittests/Analysis/CtxProfAnalysisTest.cpp 
b/llvm/unittests/Analysis/CtxProfAnalysisTest.cpp
index 5f9bf3ec540eb3..fbe3a6e45109cc 100644
--- a/llvm/unittests/Analysis/CtxProfAnalysisTest.cpp
+++ b/llvm/unittests/Analysis/CtxProfAnalysisTest.cpp
@@ -132,4 +132,26 @@ TEST_F(CtxProfAnalysisTest, GetCallsiteIDNegativeTest) {
   EXPECT_EQ(IndIns, nullptr);
 }
 
+TEST_F(CtxProfAnalysisTest, GetBBIDTest) {
+  ModulePassManager MPM;
+  MPM.addPass(PGOInstrumentationGen(PGOInstrumentationType::CTXPROF));
+  EXPECT_FALSE(MPM.run(*M, MAM).areAllPreserved());
+  auto *F = M->getFunction("foo");
+  ASSERT_NE(F, nullptr);
+  std::map BBNameAndID;
+
+  for (auto &BB : *F) {
+auto *Ins = CtxProfAnalysis::getBBInstrumentation(BB);
+if (Ins)
+  BBNameAndID[BB.getName().str()] =
+  static_cast(Ins->getIndex()->getZExtValue());
+else
+  BBNameAndID[BB.getName().str()] = -1;
+  }
+
+  EXPECT_THAT(BBNameAndID,
+  testing::UnorderedElementsAre(
+  testing::Pair("", 0), testing::Pair("yes", 1),
+  testing::Pair("no", -1), testing::Pair("exit", -1)));
+}
 } // namespace

>From 61e37e3e1657a7e85e9df2f77feb6957c304851a Mon Sep 17 00:00:00 2001
From: Mircea Trofin 
Date: Tue, 20 Aug 2024 21:32:23 -0700
Subject: [PATCH 2/2] [ctx_prof] Add support for ICP

---
 llvm/include/llvm/Analysis/CtxProfAnalysis.h  |  18 +-
 llvm/include/llvm/IR/IntrinsicInst.h  |   2 +
 .../llvm/ProfileData/PGOCtxProfReader.h   |  20 ++
 .../Transforms/Utils/CallPromotionUtils.h |   4 +
 llvm/lib/Analysis/CtxProfAnalysis.cpp |  79 +---
 llvm/lib/IR/IntrinsicInst.cpp |  10 +
 .../Transforms/Utils/CallPromotionUtils.cpp   |  86 +
 .../Utils/CallPromotionUtilsTest.cpp  | 178 ++
 8 files changed, 364 insertions(+), 33 deletions(-)

diff --git a/llvm/include/llvm/Analysis/CtxProfAnalysis.h 
b/llvm/include/llvm/Analysis/CtxProfAnalysis.h
index 0b4dd8ae3a0dc7..d6c2bb26a091af 100644
--- a/llvm/include/llvm/Analysis/CtxProfAnalysis.h
+++ b/llvm/include/llvm/Analysis/CtxProfAnalysis.h
@@ -73,6 +73,12 @@ class PGOContextualProfile {
 return 
FuncInfo.find(getDefinedFunctionGUID(F))->second.NextCallsiteIndex++;
   }
 
+  using ConstVisitor = function_ref;
+  using Visitor = function_ref;
+
+  void update(Visitor, const Function *F = nullptr);
+  void visit(ConstVisitor, const Function *F = nullptr) const;
+
   const CtxProfFlatProfile flatten() const;
 
   bool invalidate(Module &, const PreservedAnalyses &PA,
@@ -105,13 +111,18 @@ class CtxProfAnalysis : public 
AnalysisInfoMixin {
 
 class CtxProfAnalysisPrinterPass
 : public PassInfoMixin {
-  raw_ostream &OS;
-
 public:
-  explicit CtxProfAnalysisPrinterPass(raw_ostream &OS) : OS(OS) {}
+  enum class PrintMode { Everything, JSON };
+  explicit CtxProfAnalysisPrinterPass(raw_ostream &OS,
+  PrintMode Mode = PrintMode::Everything)
+  : OS(OS), Mode(Mode) {}
 
   PreservedAnalyses run(Module &M, ModuleAnalysisManager &MAM);
  

[llvm-branch-commits] [mlir] [mlir][sparse] refactoring sparse_tensor.iterate lowering pattern implementation. (PR #105566)

2024-08-21 Thread Peiming Liu via llvm-branch-commits

https://github.com/PeimingLiu created 
https://github.com/llvm/llvm-project/pull/105566

[mlir][sparse] refactoring sparse_tensor.iterate lowering pattern 
implementation.

>From 1a32495b27dfd003408dd5b4f12f3db7f0b73b5a Mon Sep 17 00:00:00 2001
From: Peiming Liu 
Date: Thu, 15 Aug 2024 18:10:25 +
Subject: [PATCH] [mlir][sparse] refactoring sparse_tensor.iterate lowering
 pattern implementation.

---
 .../Transforms/SparseIterationToScf.cpp   | 118 ++
 1 file changed, 36 insertions(+), 82 deletions(-)

diff --git a/mlir/lib/Dialect/SparseTensor/Transforms/SparseIterationToScf.cpp 
b/mlir/lib/Dialect/SparseTensor/Transforms/SparseIterationToScf.cpp
index d6c0da4a9e4573..f7fcabb0220b50 100644
--- a/mlir/lib/Dialect/SparseTensor/Transforms/SparseIterationToScf.cpp
+++ b/mlir/lib/Dialect/SparseTensor/Transforms/SparseIterationToScf.cpp
@@ -244,88 +244,41 @@ class SparseIterateOpConverter : public 
OneToNOpConversionPattern {
 std::unique_ptr it =
 iterSpace.extractIterator(rewriter, loc);
 
-if (it->iteratableByFor()) {
-  auto [lo, hi] = it->genForCond(rewriter, loc);
-  Value step = constantIndex(rewriter, loc, 1);
-  SmallVector ivs;
-  for (ValueRange inits : adaptor.getInitArgs())
-llvm::append_range(ivs, inits);
-  scf::ForOp forOp = rewriter.create(loc, lo, hi, step, ivs);
-
-  Block *loopBody = op.getBody();
-  OneToNTypeMapping bodyTypeMapping(loopBody->getArgumentTypes());
-  if (failed(typeConverter->convertSignatureArgs(
-  loopBody->getArgumentTypes(), bodyTypeMapping)))
-return failure();
-  rewriter.applySignatureConversion(loopBody, bodyTypeMapping);
-
-  rewriter.eraseBlock(forOp.getBody());
-  Region &dstRegion = forOp.getRegion();
-  rewriter.inlineRegionBefore(op.getRegion(), dstRegion, dstRegion.end());
-
-  auto yieldOp =
-  llvm::cast(forOp.getBody()->getTerminator());
-
-  rewriter.setInsertionPointToEnd(forOp.getBody());
-  // replace sparse_tensor.yield with scf.yield.
-  rewriter.create(loc, yieldOp.getResults());
-  rewriter.eraseOp(yieldOp);
-
-  const OneToNTypeMapping &resultMapping = adaptor.getResultMapping();
-  rewriter.replaceOp(op, forOp.getResults(), resultMapping);
-} else {
-  SmallVector ivs;
-  // TODO: put iterator at the end of argument list to be consistent with
-  // coiterate operation.
-  llvm::append_range(ivs, it->getCursor());
-  for (ValueRange inits : adaptor.getInitArgs())
-llvm::append_range(ivs, inits);
-
-  assert(llvm::all_of(ivs, [](Value v) { return v != nullptr; }));
-
-  TypeRange types = ValueRange(ivs).getTypes();
-  auto whileOp = rewriter.create(loc, types, ivs);
-  SmallVector l(types.size(), op.getIterator().getLoc());
-
-  // Generates loop conditions.
-  Block *before = rewriter.createBlock(&whileOp.getBefore(), {}, types, l);
-  rewriter.setInsertionPointToStart(before);
-  ValueRange bArgs = before->getArguments();
-  auto [whileCond, remArgs] = it->genWhileCond(rewriter, loc, bArgs);
-  assert(remArgs.size() == adaptor.getInitArgs().size());
-  rewriter.create(loc, whileCond, 
before->getArguments());
-
-  // Generates loop body.
-  Block *loopBody = op.getBody();
-  OneToNTypeMapping bodyTypeMapping(loopBody->getArgumentTypes());
-  if (failed(typeConverter->convertSignatureArgs(
-  loopBody->getArgumentTypes(), bodyTypeMapping)))
-return failure();
-  rewriter.applySignatureConversion(loopBody, bodyTypeMapping);
-
-  Region &dstRegion = whileOp.getAfter();
-  // TODO: handle uses of coordinate!
-  rewriter.inlineRegionBefore(op.getRegion(), dstRegion, dstRegion.end());
-  ValueRange aArgs = whileOp.getAfterArguments();
-  auto yieldOp = llvm::cast(
-  whileOp.getAfterBody()->getTerminator());
-
-  rewriter.setInsertionPointToEnd(whileOp.getAfterBody());
+SmallVector ivs;
+for (ValueRange inits : adaptor.getInitArgs())
+  llvm::append_range(ivs, inits);
+
+// Type conversion on iterate op block.
+OneToNTypeMapping blockTypeMapping(op.getBody()->getArgumentTypes());
+if (failed(typeConverter->convertSignatureArgs(
+op.getBody()->getArgumentTypes(), blockTypeMapping)))
+  return rewriter.notifyMatchFailure(
+  op, "failed to convert iterate region argurment types");
+rewriter.applySignatureConversion(op.getBody(), blockTypeMapping);
+
+Block *block = op.getBody();
+ValueRange ret = genLoopWithIterator(
+rewriter, loc, it.get(), ivs, /*iterFirst=*/true,
+[block](PatternRewriter &rewriter, Location loc, Region &loopBody,
+SparseIterator *it, ValueRange reduc) -> SmallVector {
+  SmallVector blockArgs(it->getCursor());
+  // TODO: Also appends coordinates if used.
+  // blockArgs.push_back(it->deref(rewriter, loc));
+  llvm::a

[llvm-branch-commits] [mlir] [mlir][sparse] unify block arguments order between iterate/coiterate operations. (PR #105567)

2024-08-21 Thread Peiming Liu via llvm-branch-commits

https://github.com/PeimingLiu created 
https://github.com/llvm/llvm-project/pull/105567

[mlir][sparse] unify block arguments order between iterate/coiterate operations.

>From 6fd099fb7039f8fda37d50f1c44cd7afd62cafb7 Mon Sep 17 00:00:00 2001
From: Peiming Liu 
Date: Thu, 15 Aug 2024 21:10:37 +
Subject: [PATCH] [mlir][sparse] unify block arguments order between
 iterate/coiterate operations.

---
 .../SparseTensor/IR/SparseTensorOps.td|  7 ++--
 .../SparseTensor/IR/SparseTensorDialect.cpp   | 31 
 .../Transforms/SparseIterationToScf.cpp   | 36 ++-
 3 files changed, 31 insertions(+), 43 deletions(-)

diff --git a/mlir/include/mlir/Dialect/SparseTensor/IR/SparseTensorOps.td 
b/mlir/include/mlir/Dialect/SparseTensor/IR/SparseTensorOps.td
index 20512f972e67cd..96a61419a541f7 100644
--- a/mlir/include/mlir/Dialect/SparseTensor/IR/SparseTensorOps.td
+++ b/mlir/include/mlir/Dialect/SparseTensor/IR/SparseTensorOps.td
@@ -1644,7 +1644,7 @@ def IterateOp : SparseTensor_Op<"iterate",
   return getIterSpace().getType().getSpaceDim();
 }
 BlockArgument getIterator() {
-  return getRegion().getArguments().front();
+  return getRegion().getArguments().back();
 }
 std::optional getLvlCrd(Level lvl) {
   if (getCrdUsedLvls()[lvl]) {
@@ -1654,9 +1654,8 @@ def IterateOp : SparseTensor_Op<"iterate",
   return std::nullopt;
 }
 Block::BlockArgListType getCrds() {
-  // The first block argument is iterator, the remaining arguments are
-  // referenced coordinates.
-  return getRegion().getArguments().slice(1, getCrdUsedLvls().count());
+  // User-provided iteration arguments -> coords -> iterator.
+  return getRegion().getArguments().slice(getNumRegionIterArgs(), 
getCrdUsedLvls().count());
 }
 unsigned getNumRegionIterArgs() {
   return getRegion().getArguments().size() - 1 - getCrdUsedLvls().count();
diff --git a/mlir/lib/Dialect/SparseTensor/IR/SparseTensorDialect.cpp 
b/mlir/lib/Dialect/SparseTensor/IR/SparseTensorDialect.cpp
index 16856b958d4f13..b21bc1a93036c4 100644
--- a/mlir/lib/Dialect/SparseTensor/IR/SparseTensorDialect.cpp
+++ b/mlir/lib/Dialect/SparseTensor/IR/SparseTensorDialect.cpp
@@ -2228,9 +2228,10 @@ parseSparseIterateLoop(OpAsmParser &parser, 
OperationState &state,
 parser.getNameLoc(),
 "mismatch in number of sparse iterators and sparse spaces");
 
-  if (failed(parseUsedCoordList(parser, state, blockArgs)))
+  SmallVector coords;
+  if (failed(parseUsedCoordList(parser, state, coords)))
 return failure();
-  size_t numCrds = blockArgs.size();
+  size_t numCrds = coords.size();
 
   // Parse "iter_args(%arg = %init, ...)"
   bool hasIterArgs = succeeded(parser.parseOptionalKeyword("iter_args"));
@@ -2238,6 +2239,8 @@ parseSparseIterateLoop(OpAsmParser &parser, 
OperationState &state,
 if (parser.parseAssignmentList(blockArgs, initArgs))
   return failure();
 
+  blockArgs.append(coords);
+
   SmallVector iterSpaceTps;
   // parse ": sparse_tensor.iter_space -> ret"
   if (parser.parseColon() || parser.parseTypeList(iterSpaceTps))
@@ -2267,7 +2270,7 @@ parseSparseIterateLoop(OpAsmParser &parser, 
OperationState &state,
 
   if (hasIterArgs) {
 // Strip off leading args that used for coordinates.
-MutableArrayRef args = MutableArrayRef(blockArgs).drop_front(numCrds);
+MutableArrayRef args = MutableArrayRef(blockArgs).drop_back(numCrds);
 if (args.size() != initArgs.size() || args.size() != state.types.size()) {
   return parser.emitError(
   parser.getNameLoc(),
@@ -2448,18 +2451,18 @@ void IterateOp::build(OpBuilder &builder, 
OperationState &odsState,
   odsState.addTypes(initArgs.getTypes());
   Block *bodyBlock = builder.createBlock(bodyRegion);
 
-  // First argument, sparse iterator
-  bodyBlock->addArgument(
-  llvm::cast(iterSpace.getType()).getIteratorType(),
-  odsState.location);
+  // Starts with a list of user-provided loop arguments.
+  for (Value v : initArgs)
+bodyBlock->addArgument(v.getType(), v.getLoc());
 
-  // Followed by a list of used coordinates.
+  // Follows by a list of used coordinates.
   for (unsigned i = 0, e = crdUsedLvls.count(); i < e; i++)
 bodyBlock->addArgument(builder.getIndexType(), odsState.location);
 
-  // Followed by a list of user-provided loop arguments.
-  for (Value v : initArgs)
-bodyBlock->addArgument(v.getType(), v.getLoc());
+  // Ends with sparse iterator
+  bodyBlock->addArgument(
+  llvm::cast(iterSpace.getType()).getIteratorType(),
+  odsState.location);
 }
 
 ParseResult IterateOp::parse(OpAsmParser &parser, OperationState &result) {
@@ -2473,9 +2476,9 @@ ParseResult IterateOp::parse(OpAsmParser &parser, 
OperationState &result) {
 return parser.emitError(parser.getNameLoc(),
 "expected only one iterator/iteration space");
 
-  iters.append(iterArgs);
+  iterArgs.append(iters);
   Region *body = result.addRegion();

[llvm-branch-commits] [mlir] [mlir][sparse] refactoring sparse_tensor.iterate lowering pattern implementation. (PR #105566)

2024-08-21 Thread Peiming Liu via llvm-branch-commits

https://github.com/PeimingLiu updated 
https://github.com/llvm/llvm-project/pull/105566

>From 937bcd814688e7c6f88ef27b7586254006e0d050 Mon Sep 17 00:00:00 2001
From: Peiming Liu 
Date: Thu, 15 Aug 2024 18:10:25 +
Subject: [PATCH] [mlir][sparse] refactoring sparse_tensor.iterate lowering
 pattern implementation.

stack-info: PR: https://github.com/llvm/llvm-project/pull/105566, branch: 
users/PeimingLiu/stack/2
---
 .../Transforms/SparseIterationToScf.cpp   | 118 ++
 1 file changed, 36 insertions(+), 82 deletions(-)

diff --git a/mlir/lib/Dialect/SparseTensor/Transforms/SparseIterationToScf.cpp 
b/mlir/lib/Dialect/SparseTensor/Transforms/SparseIterationToScf.cpp
index d6c0da4a9e4573..f7fcabb0220b50 100644
--- a/mlir/lib/Dialect/SparseTensor/Transforms/SparseIterationToScf.cpp
+++ b/mlir/lib/Dialect/SparseTensor/Transforms/SparseIterationToScf.cpp
@@ -244,88 +244,41 @@ class SparseIterateOpConverter : public 
OneToNOpConversionPattern {
 std::unique_ptr it =
 iterSpace.extractIterator(rewriter, loc);
 
-if (it->iteratableByFor()) {
-  auto [lo, hi] = it->genForCond(rewriter, loc);
-  Value step = constantIndex(rewriter, loc, 1);
-  SmallVector ivs;
-  for (ValueRange inits : adaptor.getInitArgs())
-llvm::append_range(ivs, inits);
-  scf::ForOp forOp = rewriter.create(loc, lo, hi, step, ivs);
-
-  Block *loopBody = op.getBody();
-  OneToNTypeMapping bodyTypeMapping(loopBody->getArgumentTypes());
-  if (failed(typeConverter->convertSignatureArgs(
-  loopBody->getArgumentTypes(), bodyTypeMapping)))
-return failure();
-  rewriter.applySignatureConversion(loopBody, bodyTypeMapping);
-
-  rewriter.eraseBlock(forOp.getBody());
-  Region &dstRegion = forOp.getRegion();
-  rewriter.inlineRegionBefore(op.getRegion(), dstRegion, dstRegion.end());
-
-  auto yieldOp =
-  llvm::cast(forOp.getBody()->getTerminator());
-
-  rewriter.setInsertionPointToEnd(forOp.getBody());
-  // replace sparse_tensor.yield with scf.yield.
-  rewriter.create(loc, yieldOp.getResults());
-  rewriter.eraseOp(yieldOp);
-
-  const OneToNTypeMapping &resultMapping = adaptor.getResultMapping();
-  rewriter.replaceOp(op, forOp.getResults(), resultMapping);
-} else {
-  SmallVector ivs;
-  // TODO: put iterator at the end of argument list to be consistent with
-  // coiterate operation.
-  llvm::append_range(ivs, it->getCursor());
-  for (ValueRange inits : adaptor.getInitArgs())
-llvm::append_range(ivs, inits);
-
-  assert(llvm::all_of(ivs, [](Value v) { return v != nullptr; }));
-
-  TypeRange types = ValueRange(ivs).getTypes();
-  auto whileOp = rewriter.create(loc, types, ivs);
-  SmallVector l(types.size(), op.getIterator().getLoc());
-
-  // Generates loop conditions.
-  Block *before = rewriter.createBlock(&whileOp.getBefore(), {}, types, l);
-  rewriter.setInsertionPointToStart(before);
-  ValueRange bArgs = before->getArguments();
-  auto [whileCond, remArgs] = it->genWhileCond(rewriter, loc, bArgs);
-  assert(remArgs.size() == adaptor.getInitArgs().size());
-  rewriter.create(loc, whileCond, 
before->getArguments());
-
-  // Generates loop body.
-  Block *loopBody = op.getBody();
-  OneToNTypeMapping bodyTypeMapping(loopBody->getArgumentTypes());
-  if (failed(typeConverter->convertSignatureArgs(
-  loopBody->getArgumentTypes(), bodyTypeMapping)))
-return failure();
-  rewriter.applySignatureConversion(loopBody, bodyTypeMapping);
-
-  Region &dstRegion = whileOp.getAfter();
-  // TODO: handle uses of coordinate!
-  rewriter.inlineRegionBefore(op.getRegion(), dstRegion, dstRegion.end());
-  ValueRange aArgs = whileOp.getAfterArguments();
-  auto yieldOp = llvm::cast(
-  whileOp.getAfterBody()->getTerminator());
-
-  rewriter.setInsertionPointToEnd(whileOp.getAfterBody());
+SmallVector ivs;
+for (ValueRange inits : adaptor.getInitArgs())
+  llvm::append_range(ivs, inits);
+
+// Type conversion on iterate op block.
+OneToNTypeMapping blockTypeMapping(op.getBody()->getArgumentTypes());
+if (failed(typeConverter->convertSignatureArgs(
+op.getBody()->getArgumentTypes(), blockTypeMapping)))
+  return rewriter.notifyMatchFailure(
+  op, "failed to convert iterate region argurment types");
+rewriter.applySignatureConversion(op.getBody(), blockTypeMapping);
+
+Block *block = op.getBody();
+ValueRange ret = genLoopWithIterator(
+rewriter, loc, it.get(), ivs, /*iterFirst=*/true,
+[block](PatternRewriter &rewriter, Location loc, Region &loopBody,
+SparseIterator *it, ValueRange reduc) -> SmallVector {
+  SmallVector blockArgs(it->getCursor());
+  // TODO: Also appends coordinates if used.
+  // blockArgs.push_back(it->deref(rewriter, loc));
+ 

[llvm-branch-commits] [mlir] [mlir][sparse] unify block arguments order between iterate/coiterate operations. (PR #105567)

2024-08-21 Thread Peiming Liu via llvm-branch-commits

https://github.com/PeimingLiu updated 
https://github.com/llvm/llvm-project/pull/105567

>From 3f83d7a1eadc1101fb96707ecd348925e5aaed70 Mon Sep 17 00:00:00 2001
From: Peiming Liu 
Date: Thu, 15 Aug 2024 21:10:37 +
Subject: [PATCH] [mlir][sparse] unify block arguments order between
 iterate/coiterate operations.

stack-info: PR: https://github.com/llvm/llvm-project/pull/105567, branch: 
users/PeimingLiu/stack/3
---
 .../SparseTensor/IR/SparseTensorOps.td|  7 ++--
 .../SparseTensor/IR/SparseTensorDialect.cpp   | 31 
 .../Transforms/SparseIterationToScf.cpp   | 36 ++-
 3 files changed, 31 insertions(+), 43 deletions(-)

diff --git a/mlir/include/mlir/Dialect/SparseTensor/IR/SparseTensorOps.td 
b/mlir/include/mlir/Dialect/SparseTensor/IR/SparseTensorOps.td
index 20512f972e67cd..96a61419a541f7 100644
--- a/mlir/include/mlir/Dialect/SparseTensor/IR/SparseTensorOps.td
+++ b/mlir/include/mlir/Dialect/SparseTensor/IR/SparseTensorOps.td
@@ -1644,7 +1644,7 @@ def IterateOp : SparseTensor_Op<"iterate",
   return getIterSpace().getType().getSpaceDim();
 }
 BlockArgument getIterator() {
-  return getRegion().getArguments().front();
+  return getRegion().getArguments().back();
 }
 std::optional getLvlCrd(Level lvl) {
   if (getCrdUsedLvls()[lvl]) {
@@ -1654,9 +1654,8 @@ def IterateOp : SparseTensor_Op<"iterate",
   return std::nullopt;
 }
 Block::BlockArgListType getCrds() {
-  // The first block argument is iterator, the remaining arguments are
-  // referenced coordinates.
-  return getRegion().getArguments().slice(1, getCrdUsedLvls().count());
+  // User-provided iteration arguments -> coords -> iterator.
+  return getRegion().getArguments().slice(getNumRegionIterArgs(), 
getCrdUsedLvls().count());
 }
 unsigned getNumRegionIterArgs() {
   return getRegion().getArguments().size() - 1 - getCrdUsedLvls().count();
diff --git a/mlir/lib/Dialect/SparseTensor/IR/SparseTensorDialect.cpp 
b/mlir/lib/Dialect/SparseTensor/IR/SparseTensorDialect.cpp
index 16856b958d4f13..b21bc1a93036c4 100644
--- a/mlir/lib/Dialect/SparseTensor/IR/SparseTensorDialect.cpp
+++ b/mlir/lib/Dialect/SparseTensor/IR/SparseTensorDialect.cpp
@@ -2228,9 +2228,10 @@ parseSparseIterateLoop(OpAsmParser &parser, 
OperationState &state,
 parser.getNameLoc(),
 "mismatch in number of sparse iterators and sparse spaces");
 
-  if (failed(parseUsedCoordList(parser, state, blockArgs)))
+  SmallVector coords;
+  if (failed(parseUsedCoordList(parser, state, coords)))
 return failure();
-  size_t numCrds = blockArgs.size();
+  size_t numCrds = coords.size();
 
   // Parse "iter_args(%arg = %init, ...)"
   bool hasIterArgs = succeeded(parser.parseOptionalKeyword("iter_args"));
@@ -2238,6 +2239,8 @@ parseSparseIterateLoop(OpAsmParser &parser, 
OperationState &state,
 if (parser.parseAssignmentList(blockArgs, initArgs))
   return failure();
 
+  blockArgs.append(coords);
+
   SmallVector iterSpaceTps;
   // parse ": sparse_tensor.iter_space -> ret"
   if (parser.parseColon() || parser.parseTypeList(iterSpaceTps))
@@ -2267,7 +2270,7 @@ parseSparseIterateLoop(OpAsmParser &parser, 
OperationState &state,
 
   if (hasIterArgs) {
 // Strip off leading args that used for coordinates.
-MutableArrayRef args = MutableArrayRef(blockArgs).drop_front(numCrds);
+MutableArrayRef args = MutableArrayRef(blockArgs).drop_back(numCrds);
 if (args.size() != initArgs.size() || args.size() != state.types.size()) {
   return parser.emitError(
   parser.getNameLoc(),
@@ -2448,18 +2451,18 @@ void IterateOp::build(OpBuilder &builder, 
OperationState &odsState,
   odsState.addTypes(initArgs.getTypes());
   Block *bodyBlock = builder.createBlock(bodyRegion);
 
-  // First argument, sparse iterator
-  bodyBlock->addArgument(
-  llvm::cast(iterSpace.getType()).getIteratorType(),
-  odsState.location);
+  // Starts with a list of user-provided loop arguments.
+  for (Value v : initArgs)
+bodyBlock->addArgument(v.getType(), v.getLoc());
 
-  // Followed by a list of used coordinates.
+  // Follows by a list of used coordinates.
   for (unsigned i = 0, e = crdUsedLvls.count(); i < e; i++)
 bodyBlock->addArgument(builder.getIndexType(), odsState.location);
 
-  // Followed by a list of user-provided loop arguments.
-  for (Value v : initArgs)
-bodyBlock->addArgument(v.getType(), v.getLoc());
+  // Ends with sparse iterator
+  bodyBlock->addArgument(
+  llvm::cast(iterSpace.getType()).getIteratorType(),
+  odsState.location);
 }
 
 ParseResult IterateOp::parse(OpAsmParser &parser, OperationState &result) {
@@ -2473,9 +2476,9 @@ ParseResult IterateOp::parse(OpAsmParser &parser, 
OperationState &result) {
 return parser.emitError(parser.getNameLoc(),
 "expected only one iterator/iteration space");
 
-  iters.append(iterArgs);
+  iterArgs.append(iters);
   Region *body = r

[llvm-branch-commits] [mlir] [mlir][sparse] unify block arguments order between iterate/coiterate operations. (PR #105567)

2024-08-21 Thread Peiming Liu via llvm-branch-commits

https://github.com/PeimingLiu updated 
https://github.com/llvm/llvm-project/pull/105567

>From 3f83d7a1eadc1101fb96707ecd348925e5aaed70 Mon Sep 17 00:00:00 2001
From: Peiming Liu 
Date: Thu, 15 Aug 2024 21:10:37 +
Subject: [PATCH] [mlir][sparse] unify block arguments order between
 iterate/coiterate operations.

stack-info: PR: https://github.com/llvm/llvm-project/pull/105567, branch: 
users/PeimingLiu/stack/3
---
 .../SparseTensor/IR/SparseTensorOps.td|  7 ++--
 .../SparseTensor/IR/SparseTensorDialect.cpp   | 31 
 .../Transforms/SparseIterationToScf.cpp   | 36 ++-
 3 files changed, 31 insertions(+), 43 deletions(-)

diff --git a/mlir/include/mlir/Dialect/SparseTensor/IR/SparseTensorOps.td 
b/mlir/include/mlir/Dialect/SparseTensor/IR/SparseTensorOps.td
index 20512f972e67cd..96a61419a541f7 100644
--- a/mlir/include/mlir/Dialect/SparseTensor/IR/SparseTensorOps.td
+++ b/mlir/include/mlir/Dialect/SparseTensor/IR/SparseTensorOps.td
@@ -1644,7 +1644,7 @@ def IterateOp : SparseTensor_Op<"iterate",
   return getIterSpace().getType().getSpaceDim();
 }
 BlockArgument getIterator() {
-  return getRegion().getArguments().front();
+  return getRegion().getArguments().back();
 }
 std::optional getLvlCrd(Level lvl) {
   if (getCrdUsedLvls()[lvl]) {
@@ -1654,9 +1654,8 @@ def IterateOp : SparseTensor_Op<"iterate",
   return std::nullopt;
 }
 Block::BlockArgListType getCrds() {
-  // The first block argument is iterator, the remaining arguments are
-  // referenced coordinates.
-  return getRegion().getArguments().slice(1, getCrdUsedLvls().count());
+  // User-provided iteration arguments -> coords -> iterator.
+  return getRegion().getArguments().slice(getNumRegionIterArgs(), 
getCrdUsedLvls().count());
 }
 unsigned getNumRegionIterArgs() {
   return getRegion().getArguments().size() - 1 - getCrdUsedLvls().count();
diff --git a/mlir/lib/Dialect/SparseTensor/IR/SparseTensorDialect.cpp 
b/mlir/lib/Dialect/SparseTensor/IR/SparseTensorDialect.cpp
index 16856b958d4f13..b21bc1a93036c4 100644
--- a/mlir/lib/Dialect/SparseTensor/IR/SparseTensorDialect.cpp
+++ b/mlir/lib/Dialect/SparseTensor/IR/SparseTensorDialect.cpp
@@ -2228,9 +2228,10 @@ parseSparseIterateLoop(OpAsmParser &parser, 
OperationState &state,
 parser.getNameLoc(),
 "mismatch in number of sparse iterators and sparse spaces");
 
-  if (failed(parseUsedCoordList(parser, state, blockArgs)))
+  SmallVector coords;
+  if (failed(parseUsedCoordList(parser, state, coords)))
 return failure();
-  size_t numCrds = blockArgs.size();
+  size_t numCrds = coords.size();
 
   // Parse "iter_args(%arg = %init, ...)"
   bool hasIterArgs = succeeded(parser.parseOptionalKeyword("iter_args"));
@@ -2238,6 +2239,8 @@ parseSparseIterateLoop(OpAsmParser &parser, 
OperationState &state,
 if (parser.parseAssignmentList(blockArgs, initArgs))
   return failure();
 
+  blockArgs.append(coords);
+
   SmallVector iterSpaceTps;
   // parse ": sparse_tensor.iter_space -> ret"
   if (parser.parseColon() || parser.parseTypeList(iterSpaceTps))
@@ -2267,7 +2270,7 @@ parseSparseIterateLoop(OpAsmParser &parser, 
OperationState &state,
 
   if (hasIterArgs) {
 // Strip off leading args that used for coordinates.
-MutableArrayRef args = MutableArrayRef(blockArgs).drop_front(numCrds);
+MutableArrayRef args = MutableArrayRef(blockArgs).drop_back(numCrds);
 if (args.size() != initArgs.size() || args.size() != state.types.size()) {
   return parser.emitError(
   parser.getNameLoc(),
@@ -2448,18 +2451,18 @@ void IterateOp::build(OpBuilder &builder, 
OperationState &odsState,
   odsState.addTypes(initArgs.getTypes());
   Block *bodyBlock = builder.createBlock(bodyRegion);
 
-  // First argument, sparse iterator
-  bodyBlock->addArgument(
-  llvm::cast(iterSpace.getType()).getIteratorType(),
-  odsState.location);
+  // Starts with a list of user-provided loop arguments.
+  for (Value v : initArgs)
+bodyBlock->addArgument(v.getType(), v.getLoc());
 
-  // Followed by a list of used coordinates.
+  // Follows by a list of used coordinates.
   for (unsigned i = 0, e = crdUsedLvls.count(); i < e; i++)
 bodyBlock->addArgument(builder.getIndexType(), odsState.location);
 
-  // Followed by a list of user-provided loop arguments.
-  for (Value v : initArgs)
-bodyBlock->addArgument(v.getType(), v.getLoc());
+  // Ends with sparse iterator
+  bodyBlock->addArgument(
+  llvm::cast(iterSpace.getType()).getIteratorType(),
+  odsState.location);
 }
 
 ParseResult IterateOp::parse(OpAsmParser &parser, OperationState &result) {
@@ -2473,9 +2476,9 @@ ParseResult IterateOp::parse(OpAsmParser &parser, 
OperationState &result) {
 return parser.emitError(parser.getNameLoc(),
 "expected only one iterator/iteration space");
 
-  iters.append(iterArgs);
+  iterArgs.append(iters);
   Region *body = r

[llvm-branch-commits] [mlir] [mlir][sparse] refactoring sparse_tensor.iterate lowering pattern implementation. (PR #105566)

2024-08-21 Thread Peiming Liu via llvm-branch-commits

https://github.com/PeimingLiu updated 
https://github.com/llvm/llvm-project/pull/105566

>From 937bcd814688e7c6f88ef27b7586254006e0d050 Mon Sep 17 00:00:00 2001
From: Peiming Liu 
Date: Thu, 15 Aug 2024 18:10:25 +
Subject: [PATCH] [mlir][sparse] refactoring sparse_tensor.iterate lowering
 pattern implementation.

stack-info: PR: https://github.com/llvm/llvm-project/pull/105566, branch: 
users/PeimingLiu/stack/2
---
 .../Transforms/SparseIterationToScf.cpp   | 118 ++
 1 file changed, 36 insertions(+), 82 deletions(-)

diff --git a/mlir/lib/Dialect/SparseTensor/Transforms/SparseIterationToScf.cpp 
b/mlir/lib/Dialect/SparseTensor/Transforms/SparseIterationToScf.cpp
index d6c0da4a9e4573..f7fcabb0220b50 100644
--- a/mlir/lib/Dialect/SparseTensor/Transforms/SparseIterationToScf.cpp
+++ b/mlir/lib/Dialect/SparseTensor/Transforms/SparseIterationToScf.cpp
@@ -244,88 +244,41 @@ class SparseIterateOpConverter : public 
OneToNOpConversionPattern {
 std::unique_ptr it =
 iterSpace.extractIterator(rewriter, loc);
 
-if (it->iteratableByFor()) {
-  auto [lo, hi] = it->genForCond(rewriter, loc);
-  Value step = constantIndex(rewriter, loc, 1);
-  SmallVector ivs;
-  for (ValueRange inits : adaptor.getInitArgs())
-llvm::append_range(ivs, inits);
-  scf::ForOp forOp = rewriter.create(loc, lo, hi, step, ivs);
-
-  Block *loopBody = op.getBody();
-  OneToNTypeMapping bodyTypeMapping(loopBody->getArgumentTypes());
-  if (failed(typeConverter->convertSignatureArgs(
-  loopBody->getArgumentTypes(), bodyTypeMapping)))
-return failure();
-  rewriter.applySignatureConversion(loopBody, bodyTypeMapping);
-
-  rewriter.eraseBlock(forOp.getBody());
-  Region &dstRegion = forOp.getRegion();
-  rewriter.inlineRegionBefore(op.getRegion(), dstRegion, dstRegion.end());
-
-  auto yieldOp =
-  llvm::cast(forOp.getBody()->getTerminator());
-
-  rewriter.setInsertionPointToEnd(forOp.getBody());
-  // replace sparse_tensor.yield with scf.yield.
-  rewriter.create(loc, yieldOp.getResults());
-  rewriter.eraseOp(yieldOp);
-
-  const OneToNTypeMapping &resultMapping = adaptor.getResultMapping();
-  rewriter.replaceOp(op, forOp.getResults(), resultMapping);
-} else {
-  SmallVector ivs;
-  // TODO: put iterator at the end of argument list to be consistent with
-  // coiterate operation.
-  llvm::append_range(ivs, it->getCursor());
-  for (ValueRange inits : adaptor.getInitArgs())
-llvm::append_range(ivs, inits);
-
-  assert(llvm::all_of(ivs, [](Value v) { return v != nullptr; }));
-
-  TypeRange types = ValueRange(ivs).getTypes();
-  auto whileOp = rewriter.create(loc, types, ivs);
-  SmallVector l(types.size(), op.getIterator().getLoc());
-
-  // Generates loop conditions.
-  Block *before = rewriter.createBlock(&whileOp.getBefore(), {}, types, l);
-  rewriter.setInsertionPointToStart(before);
-  ValueRange bArgs = before->getArguments();
-  auto [whileCond, remArgs] = it->genWhileCond(rewriter, loc, bArgs);
-  assert(remArgs.size() == adaptor.getInitArgs().size());
-  rewriter.create(loc, whileCond, 
before->getArguments());
-
-  // Generates loop body.
-  Block *loopBody = op.getBody();
-  OneToNTypeMapping bodyTypeMapping(loopBody->getArgumentTypes());
-  if (failed(typeConverter->convertSignatureArgs(
-  loopBody->getArgumentTypes(), bodyTypeMapping)))
-return failure();
-  rewriter.applySignatureConversion(loopBody, bodyTypeMapping);
-
-  Region &dstRegion = whileOp.getAfter();
-  // TODO: handle uses of coordinate!
-  rewriter.inlineRegionBefore(op.getRegion(), dstRegion, dstRegion.end());
-  ValueRange aArgs = whileOp.getAfterArguments();
-  auto yieldOp = llvm::cast(
-  whileOp.getAfterBody()->getTerminator());
-
-  rewriter.setInsertionPointToEnd(whileOp.getAfterBody());
+SmallVector ivs;
+for (ValueRange inits : adaptor.getInitArgs())
+  llvm::append_range(ivs, inits);
+
+// Type conversion on iterate op block.
+OneToNTypeMapping blockTypeMapping(op.getBody()->getArgumentTypes());
+if (failed(typeConverter->convertSignatureArgs(
+op.getBody()->getArgumentTypes(), blockTypeMapping)))
+  return rewriter.notifyMatchFailure(
+  op, "failed to convert iterate region argurment types");
+rewriter.applySignatureConversion(op.getBody(), blockTypeMapping);
+
+Block *block = op.getBody();
+ValueRange ret = genLoopWithIterator(
+rewriter, loc, it.get(), ivs, /*iterFirst=*/true,
+[block](PatternRewriter &rewriter, Location loc, Region &loopBody,
+SparseIterator *it, ValueRange reduc) -> SmallVector {
+  SmallVector blockArgs(it->getCursor());
+  // TODO: Also appends coordinates if used.
+  // blockArgs.push_back(it->deref(rewriter, loc));
+ 

[llvm-branch-commits] [mlir] [mlir][sparse] unify block arguments order between iterate/coiterate operations. (PR #105567)

2024-08-21 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-mlir-sparse

Author: Peiming Liu (PeimingLiu)


Changes

Stacked PRs:
 * __->__#105567
 * #105566
 * #105565


--- --- ---

### [mlir][sparse] unify block arguments order between iterate/coiterate 
operations.



---
Full diff: https://github.com/llvm/llvm-project/pull/105567.diff


3 Files Affected:

- (modified) mlir/include/mlir/Dialect/SparseTensor/IR/SparseTensorOps.td 
(+3-4) 
- (modified) mlir/lib/Dialect/SparseTensor/IR/SparseTensorDialect.cpp (+17-14) 
- (modified) mlir/lib/Dialect/SparseTensor/Transforms/SparseIterationToScf.cpp 
(+11-25) 


``diff
diff --git a/mlir/include/mlir/Dialect/SparseTensor/IR/SparseTensorOps.td 
b/mlir/include/mlir/Dialect/SparseTensor/IR/SparseTensorOps.td
index 20512f972e67cd..96a61419a541f7 100644
--- a/mlir/include/mlir/Dialect/SparseTensor/IR/SparseTensorOps.td
+++ b/mlir/include/mlir/Dialect/SparseTensor/IR/SparseTensorOps.td
@@ -1644,7 +1644,7 @@ def IterateOp : SparseTensor_Op<"iterate",
   return getIterSpace().getType().getSpaceDim();
 }
 BlockArgument getIterator() {
-  return getRegion().getArguments().front();
+  return getRegion().getArguments().back();
 }
 std::optional getLvlCrd(Level lvl) {
   if (getCrdUsedLvls()[lvl]) {
@@ -1654,9 +1654,8 @@ def IterateOp : SparseTensor_Op<"iterate",
   return std::nullopt;
 }
 Block::BlockArgListType getCrds() {
-  // The first block argument is iterator, the remaining arguments are
-  // referenced coordinates.
-  return getRegion().getArguments().slice(1, getCrdUsedLvls().count());
+  // User-provided iteration arguments -> coords -> iterator.
+  return getRegion().getArguments().slice(getNumRegionIterArgs(), 
getCrdUsedLvls().count());
 }
 unsigned getNumRegionIterArgs() {
   return getRegion().getArguments().size() - 1 - getCrdUsedLvls().count();
diff --git a/mlir/lib/Dialect/SparseTensor/IR/SparseTensorDialect.cpp 
b/mlir/lib/Dialect/SparseTensor/IR/SparseTensorDialect.cpp
index 16856b958d4f13..b21bc1a93036c4 100644
--- a/mlir/lib/Dialect/SparseTensor/IR/SparseTensorDialect.cpp
+++ b/mlir/lib/Dialect/SparseTensor/IR/SparseTensorDialect.cpp
@@ -2228,9 +2228,10 @@ parseSparseIterateLoop(OpAsmParser &parser, 
OperationState &state,
 parser.getNameLoc(),
 "mismatch in number of sparse iterators and sparse spaces");
 
-  if (failed(parseUsedCoordList(parser, state, blockArgs)))
+  SmallVector coords;
+  if (failed(parseUsedCoordList(parser, state, coords)))
 return failure();
-  size_t numCrds = blockArgs.size();
+  size_t numCrds = coords.size();
 
   // Parse "iter_args(%arg = %init, ...)"
   bool hasIterArgs = succeeded(parser.parseOptionalKeyword("iter_args"));
@@ -2238,6 +2239,8 @@ parseSparseIterateLoop(OpAsmParser &parser, 
OperationState &state,
 if (parser.parseAssignmentList(blockArgs, initArgs))
   return failure();
 
+  blockArgs.append(coords);
+
   SmallVector iterSpaceTps;
   // parse ": sparse_tensor.iter_space -> ret"
   if (parser.parseColon() || parser.parseTypeList(iterSpaceTps))
@@ -2267,7 +2270,7 @@ parseSparseIterateLoop(OpAsmParser &parser, 
OperationState &state,
 
   if (hasIterArgs) {
 // Strip off leading args that used for coordinates.
-MutableArrayRef args = MutableArrayRef(blockArgs).drop_front(numCrds);
+MutableArrayRef args = MutableArrayRef(blockArgs).drop_back(numCrds);
 if (args.size() != initArgs.size() || args.size() != state.types.size()) {
   return parser.emitError(
   parser.getNameLoc(),
@@ -2448,18 +2451,18 @@ void IterateOp::build(OpBuilder &builder, 
OperationState &odsState,
   odsState.addTypes(initArgs.getTypes());
   Block *bodyBlock = builder.createBlock(bodyRegion);
 
-  // First argument, sparse iterator
-  bodyBlock->addArgument(
-  llvm::cast(iterSpace.getType()).getIteratorType(),
-  odsState.location);
+  // Starts with a list of user-provided loop arguments.
+  for (Value v : initArgs)
+bodyBlock->addArgument(v.getType(), v.getLoc());
 
-  // Followed by a list of used coordinates.
+  // Follows by a list of used coordinates.
   for (unsigned i = 0, e = crdUsedLvls.count(); i < e; i++)
 bodyBlock->addArgument(builder.getIndexType(), odsState.location);
 
-  // Followed by a list of user-provided loop arguments.
-  for (Value v : initArgs)
-bodyBlock->addArgument(v.getType(), v.getLoc());
+  // Ends with sparse iterator
+  bodyBlock->addArgument(
+  llvm::cast(iterSpace.getType()).getIteratorType(),
+  odsState.location);
 }
 
 ParseResult IterateOp::parse(OpAsmParser &parser, OperationState &result) {
@@ -2473,9 +2476,9 @@ ParseResult IterateOp::parse(OpAsmParser &parser, 
OperationState &result) {
 return parser.emitError(parser.getNameLoc(),
 "expected only one iterator/iteration space");
 
-  iters.append(iterArgs);
+  iterArgs.append(iters);
   Region *body = result.addRegion();
-  if (parser.parseRegion(*body, iters))

[llvm-branch-commits] [mlir] [mlir][sparse] refactoring sparse_tensor.iterate lowering pattern implementation. (PR #105566)

2024-08-21 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-mlir-sparse

Author: Peiming Liu (PeimingLiu)


Changes

Stacked PRs:
 * #105567
 * __->__#105566
 * #105565


--- --- ---

### [mlir][sparse] refactoring sparse_tensor.iterate lowering pattern 
implementation.



---
Full diff: https://github.com/llvm/llvm-project/pull/105566.diff


1 Files Affected:

- (modified) mlir/lib/Dialect/SparseTensor/Transforms/SparseIterationToScf.cpp 
(+36-82) 


``diff
diff --git a/mlir/lib/Dialect/SparseTensor/Transforms/SparseIterationToScf.cpp 
b/mlir/lib/Dialect/SparseTensor/Transforms/SparseIterationToScf.cpp
index d6c0da4a9e457..f7fcabb0220b5 100644
--- a/mlir/lib/Dialect/SparseTensor/Transforms/SparseIterationToScf.cpp
+++ b/mlir/lib/Dialect/SparseTensor/Transforms/SparseIterationToScf.cpp
@@ -244,88 +244,41 @@ class SparseIterateOpConverter : public 
OneToNOpConversionPattern {
 std::unique_ptr it =
 iterSpace.extractIterator(rewriter, loc);
 
-if (it->iteratableByFor()) {
-  auto [lo, hi] = it->genForCond(rewriter, loc);
-  Value step = constantIndex(rewriter, loc, 1);
-  SmallVector ivs;
-  for (ValueRange inits : adaptor.getInitArgs())
-llvm::append_range(ivs, inits);
-  scf::ForOp forOp = rewriter.create(loc, lo, hi, step, ivs);
-
-  Block *loopBody = op.getBody();
-  OneToNTypeMapping bodyTypeMapping(loopBody->getArgumentTypes());
-  if (failed(typeConverter->convertSignatureArgs(
-  loopBody->getArgumentTypes(), bodyTypeMapping)))
-return failure();
-  rewriter.applySignatureConversion(loopBody, bodyTypeMapping);
-
-  rewriter.eraseBlock(forOp.getBody());
-  Region &dstRegion = forOp.getRegion();
-  rewriter.inlineRegionBefore(op.getRegion(), dstRegion, dstRegion.end());
-
-  auto yieldOp =
-  llvm::cast(forOp.getBody()->getTerminator());
-
-  rewriter.setInsertionPointToEnd(forOp.getBody());
-  // replace sparse_tensor.yield with scf.yield.
-  rewriter.create(loc, yieldOp.getResults());
-  rewriter.eraseOp(yieldOp);
-
-  const OneToNTypeMapping &resultMapping = adaptor.getResultMapping();
-  rewriter.replaceOp(op, forOp.getResults(), resultMapping);
-} else {
-  SmallVector ivs;
-  // TODO: put iterator at the end of argument list to be consistent with
-  // coiterate operation.
-  llvm::append_range(ivs, it->getCursor());
-  for (ValueRange inits : adaptor.getInitArgs())
-llvm::append_range(ivs, inits);
-
-  assert(llvm::all_of(ivs, [](Value v) { return v != nullptr; }));
-
-  TypeRange types = ValueRange(ivs).getTypes();
-  auto whileOp = rewriter.create(loc, types, ivs);
-  SmallVector l(types.size(), op.getIterator().getLoc());
-
-  // Generates loop conditions.
-  Block *before = rewriter.createBlock(&whileOp.getBefore(), {}, types, l);
-  rewriter.setInsertionPointToStart(before);
-  ValueRange bArgs = before->getArguments();
-  auto [whileCond, remArgs] = it->genWhileCond(rewriter, loc, bArgs);
-  assert(remArgs.size() == adaptor.getInitArgs().size());
-  rewriter.create(loc, whileCond, 
before->getArguments());
-
-  // Generates loop body.
-  Block *loopBody = op.getBody();
-  OneToNTypeMapping bodyTypeMapping(loopBody->getArgumentTypes());
-  if (failed(typeConverter->convertSignatureArgs(
-  loopBody->getArgumentTypes(), bodyTypeMapping)))
-return failure();
-  rewriter.applySignatureConversion(loopBody, bodyTypeMapping);
-
-  Region &dstRegion = whileOp.getAfter();
-  // TODO: handle uses of coordinate!
-  rewriter.inlineRegionBefore(op.getRegion(), dstRegion, dstRegion.end());
-  ValueRange aArgs = whileOp.getAfterArguments();
-  auto yieldOp = llvm::cast(
-  whileOp.getAfterBody()->getTerminator());
-
-  rewriter.setInsertionPointToEnd(whileOp.getAfterBody());
+SmallVector ivs;
+for (ValueRange inits : adaptor.getInitArgs())
+  llvm::append_range(ivs, inits);
+
+// Type conversion on iterate op block.
+OneToNTypeMapping blockTypeMapping(op.getBody()->getArgumentTypes());
+if (failed(typeConverter->convertSignatureArgs(
+op.getBody()->getArgumentTypes(), blockTypeMapping)))
+  return rewriter.notifyMatchFailure(
+  op, "failed to convert iterate region argurment types");
+rewriter.applySignatureConversion(op.getBody(), blockTypeMapping);
+
+Block *block = op.getBody();
+ValueRange ret = genLoopWithIterator(
+rewriter, loc, it.get(), ivs, /*iterFirst=*/true,
+[block](PatternRewriter &rewriter, Location loc, Region &loopBody,
+SparseIterator *it, ValueRange reduc) -> SmallVector {
+  SmallVector blockArgs(it->getCursor());
+  // TODO: Also appends coordinates if used.
+  // blockArgs.push_back(it->deref(rewriter, loc));
+  llvm::append_range(blockArgs, reduc);
+
+  Block *dstBlock = &loopBody.getBlocks(

[llvm-branch-commits] [llvm] [AMDGPU] GFX12 VMEM instructions can write VGPR results out of order (PR #105549)

2024-08-21 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm approved this pull request.


https://github.com/llvm/llvm-project/pull/105549
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [DirectX] Lower `@llvm.dx.handle.fromBinding` to DXIL ops (PR #104251)

2024-08-21 Thread Joshua Batista via llvm-branch-commits


@@ -117,6 +119,10 @@ class ResourceInfo {
 
   MSInfo MultiSample;
 
+  // We need a default constructor if we want to insert this in a MapVector.
+  ResourceInfo() {}
+  friend class MapVector;

bob80905 wrote:

Where is this being inserted in a MapVector? Is it DRM?

https://github.com/llvm/llvm-project/pull/104251
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] DAG: Check if is_fpclass is custom, instead of isLegalOrCustom (PR #105577)

2024-08-21 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm created 
https://github.com/llvm/llvm-project/pull/105577

For some reason, isOperationLegalOrCustom is not the same as
isOperationLegal || isOperationCustom. Unfortunately, it checks
if the type is legal which makes it uesless for custom lowering
on non-legal types (which is always ppcf128).

Really the DAG builder shouldn't be going to expand this in the
builder, it makes it difficult to work with. It's only here to work
around the DAG requiring legal integer types the same size as
the FP type after type legalization.

>From b57fb07c93a8052805110626786a8e242213c983 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Wed, 21 Aug 2024 20:15:55 +0400
Subject: [PATCH] DAG: Check if is_fpclass is custom, instead of
 isLegalOrCustom

For some reason, isOperationLegalOrCustom is not the same as
isOperationLegal || isOperationCustom. Unfortunately, it checks
if the type is legal which makes it uesless for custom lowering
on non-legal types (which is always ppcf128).

Really the DAG builder shouldn't be going to expand this in the
builder, it makes it difficult to work with. It's only here to work
around the DAG requiring legal integer types the same size as
the FP type after type legalization.
---
 .../SelectionDAG/SelectionDAGBuilder.cpp  |   3 +-
 llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp |  17 +-
 llvm/test/CodeGen/AMDGPU/fract-match.ll   |  10 +-
 .../CodeGen/AMDGPU/llvm.is.fpclass.f16.ll | 205 +++---
 llvm/test/CodeGen/PowerPC/is_fpclass.ll   |  37 ++--
 5 files changed, 160 insertions(+), 112 deletions(-)

diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp 
b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index 60dcb118542785..09a3def6586493 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -7032,7 +7032,8 @@ void SelectionDAGBuilder::visitIntrinsicCall(const 
CallInst &I,
 // If ISD::IS_FPCLASS should be expanded, do it right now, because the
 // expansion can use illegal types. Making expansion early allows
 // legalizing these types prior to selection.
-if (!TLI.isOperationLegalOrCustom(ISD::IS_FPCLASS, ArgVT)) {
+if (!TLI.isOperationLegal(ISD::IS_FPCLASS, ArgVT) &&
+!TLI.isOperationCustom(ISD::IS_FPCLASS, ArgVT)) {
   SDValue Result = TLI.expandIS_FPCLASS(DestVT, Op, Test, Flags, sdl, DAG);
   setValue(&I, Result);
   return;
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
index e57c8f8b7b4835..866e04bcc7fb2d 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
@@ -426,12 +426,17 @@ AMDGPUTargetLowering::AMDGPUTargetLowering(const 
TargetMachine &TM,
   // FIXME: These IS_FPCLASS vector fp types are marked custom so it reaches
   // scalarization code. Can be removed when IS_FPCLASS expand isn't called by
   // default unless marked custom/legal.
-  setOperationAction(
-  ISD::IS_FPCLASS,
-  {MVT::v2f16, MVT::v3f16, MVT::v4f16, MVT::v16f16, MVT::v2f32, MVT::v3f32,
-   MVT::v4f32, MVT::v5f32, MVT::v6f32, MVT::v7f32, MVT::v8f32, MVT::v16f32,
-   MVT::v2f64, MVT::v3f64, MVT::v4f64, MVT::v8f64, MVT::v16f64},
-  Custom);
+  setOperationAction(ISD::IS_FPCLASS,
+ {MVT::v2f32, MVT::v3f32, MVT::v4f32, MVT::v5f32,
+  MVT::v6f32, MVT::v7f32, MVT::v8f32, MVT::v16f32,
+  MVT::v2f64, MVT::v3f64, MVT::v4f64, MVT::v8f64,
+  MVT::v16f64},
+ Custom);
+
+  if (isTypeLegal(MVT::f16))
+setOperationAction(ISD::IS_FPCLASS,
+   {MVT::v2f16, MVT::v3f16, MVT::v4f16, MVT::v16f16},
+   Custom);
 
   // Expand to fneg + fadd.
   setOperationAction(ISD::FSUB, MVT::f64, Expand);
diff --git a/llvm/test/CodeGen/AMDGPU/fract-match.ll 
b/llvm/test/CodeGen/AMDGPU/fract-match.ll
index 1b28ddb2c58620..b212b9caf8400e 100644
--- a/llvm/test/CodeGen/AMDGPU/fract-match.ll
+++ b/llvm/test/CodeGen/AMDGPU/fract-match.ll
@@ -2135,16 +2135,16 @@ define <2 x half> @safe_math_fract_v2f16(<2 x half> %x, 
ptr addrspace(1) nocaptu
 ; GFX8-LABEL: safe_math_fract_v2f16:
 ; GFX8:   ; %bb.0: ; %entry
 ; GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-NEXT:v_mov_b32_e32 v6, 0x204
+; GFX8-NEXT:s_movk_i32 s6, 0x204
 ; GFX8-NEXT:v_floor_f16_sdwa v3, v0 dst_sel:DWORD dst_unused:UNUSED_PAD 
src0_sel:WORD_1
 ; GFX8-NEXT:v_floor_f16_e32 v4, v0
-; GFX8-NEXT:v_cmp_class_f16_sdwa s[4:5], v0, v6 src0_sel:WORD_1 
src1_sel:DWORD
+; GFX8-NEXT:v_fract_f16_sdwa v5, v0 dst_sel:DWORD dst_unused:UNUSED_PAD 
src0_sel:WORD_1
+; GFX8-NEXT:v_cmp_class_f16_sdwa s[4:5], v0, s6 src0_sel:WORD_1 
src1_sel:DWORD
 ; GFX8-NEXT:v_pack_b32_f16 v3, v4, v3
 ; GFX8-NEXT:v_fract_f16_e32 v4, v0
-; GFX8-NEXT:v_fract_f16_sdwa v5, v0 dst_sel:DWORD dst_unused:UNUSED_

[llvm-branch-commits] [llvm] DAG: Check if is_fpclass is custom, instead of isLegalOrCustom (PR #105577)

2024-08-21 Thread Matt Arsenault via llvm-branch-commits

arsenm wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/105577?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#105577** https://app.graphite.dev/github/pr/llvm/llvm-project/105577?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈
* **#105540** https://app.graphite.dev/github/pr/llvm/llvm-project/105540?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`

This stack of pull requests is managed by Graphite. https://stacking.dev/?utm_source=stack-comment";>Learn more about 
stacking.


 Join @arsenm and the rest of your teammates on https://graphite.dev?utm-source=stack-comment";>https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="11px" height="11px"/> Graphite
  

https://github.com/llvm/llvm-project/pull/105577
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] DAG: Check if is_fpclass is custom, instead of isLegalOrCustom (PR #105577)

2024-08-21 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-llvm-selectiondag

Author: Matt Arsenault (arsenm)


Changes

For some reason, isOperationLegalOrCustom is not the same as
isOperationLegal || isOperationCustom. Unfortunately, it checks
if the type is legal which makes it uesless for custom lowering
on non-legal types (which is always ppcf128).

Really the DAG builder shouldn't be going to expand this in the
builder, it makes it difficult to work with. It's only here to work
around the DAG requiring legal integer types the same size as
the FP type after type legalization.

---
Full diff: https://github.com/llvm/llvm-project/pull/105577.diff


5 Files Affected:

- (modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp (+2-1) 
- (modified) llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp (+11-6) 
- (modified) llvm/test/CodeGen/AMDGPU/fract-match.ll (+5-5) 
- (modified) llvm/test/CodeGen/AMDGPU/llvm.is.fpclass.f16.ll (+128-77) 
- (modified) llvm/test/CodeGen/PowerPC/is_fpclass.ll (+14-23) 


``diff
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp 
b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index 60dcb118542785..09a3def6586493 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -7032,7 +7032,8 @@ void SelectionDAGBuilder::visitIntrinsicCall(const 
CallInst &I,
 // If ISD::IS_FPCLASS should be expanded, do it right now, because the
 // expansion can use illegal types. Making expansion early allows
 // legalizing these types prior to selection.
-if (!TLI.isOperationLegalOrCustom(ISD::IS_FPCLASS, ArgVT)) {
+if (!TLI.isOperationLegal(ISD::IS_FPCLASS, ArgVT) &&
+!TLI.isOperationCustom(ISD::IS_FPCLASS, ArgVT)) {
   SDValue Result = TLI.expandIS_FPCLASS(DestVT, Op, Test, Flags, sdl, DAG);
   setValue(&I, Result);
   return;
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
index e57c8f8b7b4835..866e04bcc7fb2d 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
@@ -426,12 +426,17 @@ AMDGPUTargetLowering::AMDGPUTargetLowering(const 
TargetMachine &TM,
   // FIXME: These IS_FPCLASS vector fp types are marked custom so it reaches
   // scalarization code. Can be removed when IS_FPCLASS expand isn't called by
   // default unless marked custom/legal.
-  setOperationAction(
-  ISD::IS_FPCLASS,
-  {MVT::v2f16, MVT::v3f16, MVT::v4f16, MVT::v16f16, MVT::v2f32, MVT::v3f32,
-   MVT::v4f32, MVT::v5f32, MVT::v6f32, MVT::v7f32, MVT::v8f32, MVT::v16f32,
-   MVT::v2f64, MVT::v3f64, MVT::v4f64, MVT::v8f64, MVT::v16f64},
-  Custom);
+  setOperationAction(ISD::IS_FPCLASS,
+ {MVT::v2f32, MVT::v3f32, MVT::v4f32, MVT::v5f32,
+  MVT::v6f32, MVT::v7f32, MVT::v8f32, MVT::v16f32,
+  MVT::v2f64, MVT::v3f64, MVT::v4f64, MVT::v8f64,
+  MVT::v16f64},
+ Custom);
+
+  if (isTypeLegal(MVT::f16))
+setOperationAction(ISD::IS_FPCLASS,
+   {MVT::v2f16, MVT::v3f16, MVT::v4f16, MVT::v16f16},
+   Custom);
 
   // Expand to fneg + fadd.
   setOperationAction(ISD::FSUB, MVT::f64, Expand);
diff --git a/llvm/test/CodeGen/AMDGPU/fract-match.ll 
b/llvm/test/CodeGen/AMDGPU/fract-match.ll
index 1b28ddb2c58620..b212b9caf8400e 100644
--- a/llvm/test/CodeGen/AMDGPU/fract-match.ll
+++ b/llvm/test/CodeGen/AMDGPU/fract-match.ll
@@ -2135,16 +2135,16 @@ define <2 x half> @safe_math_fract_v2f16(<2 x half> %x, 
ptr addrspace(1) nocaptu
 ; GFX8-LABEL: safe_math_fract_v2f16:
 ; GFX8:   ; %bb.0: ; %entry
 ; GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-NEXT:v_mov_b32_e32 v6, 0x204
+; GFX8-NEXT:s_movk_i32 s6, 0x204
 ; GFX8-NEXT:v_floor_f16_sdwa v3, v0 dst_sel:DWORD dst_unused:UNUSED_PAD 
src0_sel:WORD_1
 ; GFX8-NEXT:v_floor_f16_e32 v4, v0
-; GFX8-NEXT:v_cmp_class_f16_sdwa s[4:5], v0, v6 src0_sel:WORD_1 
src1_sel:DWORD
+; GFX8-NEXT:v_fract_f16_sdwa v5, v0 dst_sel:DWORD dst_unused:UNUSED_PAD 
src0_sel:WORD_1
+; GFX8-NEXT:v_cmp_class_f16_sdwa s[4:5], v0, s6 src0_sel:WORD_1 
src1_sel:DWORD
 ; GFX8-NEXT:v_pack_b32_f16 v3, v4, v3
 ; GFX8-NEXT:v_fract_f16_e32 v4, v0
-; GFX8-NEXT:v_fract_f16_sdwa v5, v0 dst_sel:DWORD dst_unused:UNUSED_PAD 
src0_sel:WORD_1
-; GFX8-NEXT:v_cmp_class_f16_e32 vcc, v0, v6
 ; GFX8-NEXT:v_cndmask_b32_e64 v5, v5, 0, s[4:5]
-; GFX8-NEXT:v_cndmask_b32_e64 v0, v4, 0, vcc
+; GFX8-NEXT:v_cmp_class_f16_e64 s[4:5], v0, s6
+; GFX8-NEXT:v_cndmask_b32_e64 v0, v4, 0, s[4:5]
 ; GFX8-NEXT:v_pack_b32_f16 v0, v0, v5
 ; GFX8-NEXT:global_store_dword v[1:2], v3, off
 ; GFX8-NEXT:s_waitcnt vmcnt(0)
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.is.fpclass.f16.ll 
b/llvm/test/CodeGen/AMDGPU/llvm.is.fpclass.f16.ll
index 9c248bd6e8b2aa..3d8e9e60973053 100644
--- a/llvm

[llvm-branch-commits] [llvm] DAG: Check if is_fpclass is custom, instead of isLegalOrCustom (PR #105577)

2024-08-21 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm ready_for_review 
https://github.com/llvm/llvm-project/pull/105577
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] [LLVM][Coroutines] Create `.noalloc` variant of switch ABI coroutine ramp functions during CoroSplit (PR #99283)

2024-08-21 Thread Yuxuan Chen via llvm-branch-commits


@@ -1455,6 +1462,62 @@ struct SwitchCoroutineSplitter {
 setCoroInfo(F, Shape, Clones);
   }
 
+  static Function *createNoAllocVariant(Function &F, coro::Shape &Shape,

yuxuanchen1997 wrote:

This is done. Thanks. 

https://github.com/llvm/llvm-project/pull/99283
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LLVM][Coroutines] Create `.noalloc` variant of switch ABI coroutine ramp functions during CoroSplit (PR #99283)

2024-08-21 Thread Yuxuan Chen via llvm-branch-commits

https://github.com/yuxuanchen1997 updated 
https://github.com/llvm/llvm-project/pull/99283

>From be91ecd53679df7536616132b3492d53a0642ef4 Mon Sep 17 00:00:00 2001
From: Yuxuan Chen 
Date: Tue, 4 Jun 2024 23:22:00 -0700
Subject: [PATCH] [Clang] Introduce [[clang::coro_await_elidable]]

---
 llvm/docs/Coroutines.rst  |  22 +++
 llvm/lib/Transforms/Coroutines/CoroInternal.h |   4 +
 llvm/lib/Transforms/Coroutines/CoroSplit.cpp  | 142 ++
 llvm/lib/Transforms/Coroutines/Coroutines.cpp |  27 
 llvm/test/Transforms/Coroutines/ArgAddr.ll|   6 +-
 .../Transforms/Coroutines/coro-alloca-07.ll   |   2 +-
 .../coro-alloca-loop-carried-address.ll   |   2 +-
 .../Coroutines/coro-lifetime-end.ll   |   6 +-
 .../Coroutines/coro-spill-after-phi.ll|   2 +-
 .../Transforms/Coroutines/coro-split-00.ll|   9 +-
 10 files changed, 187 insertions(+), 35 deletions(-)

diff --git a/llvm/docs/Coroutines.rst b/llvm/docs/Coroutines.rst
index 36092325e536fb..13cb2d768a3bf8 100644
--- a/llvm/docs/Coroutines.rst
+++ b/llvm/docs/Coroutines.rst
@@ -2022,6 +2022,12 @@ The pass CoroSplit builds coroutine frame and outlines 
resume and destroy parts
 into separate functions. This pass also lowers `coro.await.suspend.void`_,
 `coro.await.suspend.bool`_ and `coro.await.suspend.handle`_ intrinsics.
 
+CoroAnnotationElide
+---
+This pass finds all usages of coroutines that are "must elide" and replaces
+`coro.begin` intrinsic with an address of a coroutine frame placed on its 
caller
+and replaces `coro.alloc` and `coro.free` intrinsics with `false` and `null`
+respectively to remove the deallocation code.
 
 CoroElide
 -
@@ -2049,6 +2055,22 @@ the coroutine must reach the final suspend point when it 
get destroyed.
 
 This attribute only works for switched-resume coroutines now.
 
+coro_must_elide
+---
+
+When a Call or Invoke instruction is marked with `coro_must_elide`,
+CoroAnnotationElidePass performs heap elision when possible. Note that for
+recursive or mutually recursive functions this elision is usually not possible.
+
+
+coro_gen_noalloc_ramp
+-
+
+This attribute hints CoroSplitPass to generate a `f.noalloc` ramp function for
+a given coroutine `f`. For any call or invoke instruction that calls `f` and
+attributed as `coro_must_elide`, CoroAnnotationElidePass is able to redirect
+the call to use the `.noalloc` variant.
+
 Metadata
 
 
diff --git a/llvm/lib/Transforms/Coroutines/CoroInternal.h 
b/llvm/lib/Transforms/Coroutines/CoroInternal.h
index d535ad7f85d74a..760c0bf894c9e0 100644
--- a/llvm/lib/Transforms/Coroutines/CoroInternal.h
+++ b/llvm/lib/Transforms/Coroutines/CoroInternal.h
@@ -26,6 +26,10 @@ bool declaresIntrinsics(const Module &M,
 const std::initializer_list);
 void replaceCoroFree(CoroIdInst *CoroId, bool Elide);
 
+void suppressCoroAllocs(CoroIdInst *CoroId);
+void suppressCoroAllocs(LLVMContext &Context,
+ArrayRef CoroAllocs);
+
 /// Attempts to rewrite the location operand of debug intrinsics in terms of
 /// the coroutine frame pointer, folding pointer offsets into the DIExpression
 /// of the intrinsic.
diff --git a/llvm/lib/Transforms/Coroutines/CoroSplit.cpp 
b/llvm/lib/Transforms/Coroutines/CoroSplit.cpp
index 40bc932c3e0eef..111ebf6d5163d6 100644
--- a/llvm/lib/Transforms/Coroutines/CoroSplit.cpp
+++ b/llvm/lib/Transforms/Coroutines/CoroSplit.cpp
@@ -25,6 +25,7 @@
 #include "llvm/ADT/PriorityWorklist.h"
 #include "llvm/ADT/SmallPtrSet.h"
 #include "llvm/ADT/SmallVector.h"
+#include "llvm/ADT/StringExtras.h"
 #include "llvm/ADT/StringRef.h"
 #include "llvm/ADT/Twine.h"
 #include "llvm/Analysis/CFG.h"
@@ -1177,6 +1178,14 @@ static void 
updateAsyncFuncPointerContextSize(coro::Shape &Shape) {
   Shape.AsyncLowering.AsyncFuncPointer->setInitializer(NewFuncPtrStruct);
 }
 
+static TypeSize getFrameSizeForShape(coro::Shape &Shape) {
+  // In the same function all coro.sizes should have the same result type.
+  auto *SizeIntrin = Shape.CoroSizes.back();
+  Module *M = SizeIntrin->getModule();
+  const DataLayout &DL = M->getDataLayout();
+  return DL.getTypeAllocSize(Shape.FrameTy);
+}
+
 static void replaceFrameSizeAndAlignment(coro::Shape &Shape) {
   if (Shape.ABI == coro::ABI::Async)
 updateAsyncFuncPointerContextSize(Shape);
@@ -1192,10 +1201,8 @@ static void replaceFrameSizeAndAlignment(coro::Shape 
&Shape) {
 
   // In the same function all coro.sizes should have the same result type.
   auto *SizeIntrin = Shape.CoroSizes.back();
-  Module *M = SizeIntrin->getModule();
-  const DataLayout &DL = M->getDataLayout();
-  auto Size = DL.getTypeAllocSize(Shape.FrameTy);
-  auto *SizeConstant = ConstantInt::get(SizeIntrin->getType(), Size);
+  auto *SizeConstant =
+  ConstantInt::get(SizeIntrin->getType(), getFrameSizeForShape(Shape));
 
   for (CoroSizeInst *CS : Shape.CoroSizes) {
 CS->replaceAllUsesWith(SizeConstant);
@@ -145

[llvm-branch-commits] [llvm] [LLVM][Coroutines] Transform "coro_must_elide" calls to switch ABI coroutines to the `noalloc` variant (PR #99285)

2024-08-21 Thread Yuxuan Chen via llvm-branch-commits

https://github.com/yuxuanchen1997 updated 
https://github.com/llvm/llvm-project/pull/99285

>From 7ca8d8b7d1dfd1d901721dd45f83f861068f9ea0 Mon Sep 17 00:00:00 2001
From: Yuxuan Chen 
Date: Mon, 15 Jul 2024 15:01:39 -0700
Subject: [PATCH] add CoroAnnotationElidePass

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: https://phabricator.intern.facebook.com/D60250514
---
 .../Coroutines/CoroAnnotationElide.h  |  36 +
 llvm/lib/Passes/PassBuilder.cpp   |   1 +
 llvm/lib/Passes/PassBuilderPipelines.cpp  |  10 +-
 llvm/lib/Passes/PassRegistry.def  |   1 +
 llvm/lib/Transforms/Coroutines/CMakeLists.txt |   1 +
 .../Coroutines/CoroAnnotationElide.cpp| 143 ++
 llvm/test/Other/new-pm-defaults.ll|   1 +
 .../Other/new-pm-thinlto-postlink-defaults.ll |   1 +
 .../new-pm-thinlto-postlink-pgo-defaults.ll   |   1 +
 ...-pm-thinlto-postlink-samplepgo-defaults.ll |   1 +
 .../Coroutines/coro-transform-must-elide.ll   |  76 ++
 11 files changed, 270 insertions(+), 2 deletions(-)
 create mode 100644 
llvm/include/llvm/Transforms/Coroutines/CoroAnnotationElide.h
 create mode 100644 llvm/lib/Transforms/Coroutines/CoroAnnotationElide.cpp
 create mode 100644 llvm/test/Transforms/Coroutines/coro-transform-must-elide.ll

diff --git a/llvm/include/llvm/Transforms/Coroutines/CoroAnnotationElide.h 
b/llvm/include/llvm/Transforms/Coroutines/CoroAnnotationElide.h
new file mode 100644
index 00..2d6e84bdd66423
--- /dev/null
+++ b/llvm/include/llvm/Transforms/Coroutines/CoroAnnotationElide.h
@@ -0,0 +1,36 @@
+//===- CoroAnnotationElide.h - Optimizing a coro_must_elide call 
--===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// \file
+// This pass transforms all Call or Invoke instructions that are annotated
+// "coro_must_elide" to call the `.noalloc` variant of coroutine instead.
+// The frame of the callee coroutine is allocated inside the caller. A pointer
+// to the allocated frame will be passed into the `.noalloc` ramp function.
+//
+//===--===//
+
+#ifndef LLVM_TRANSFORMS_COROUTINES_COROANNOTATIONELIDE_H
+#define LLVM_TRANSFORMS_COROUTINES_COROANNOTATIONELIDE_H
+
+#include "llvm/Analysis/CGSCCPassManager.h"
+#include "llvm/Analysis/LazyCallGraph.h"
+#include "llvm/IR/PassManager.h"
+
+namespace llvm {
+
+struct CoroAnnotationElidePass : PassInfoMixin {
+  CoroAnnotationElidePass() {}
+
+  PreservedAnalyses run(LazyCallGraph::SCC &C, CGSCCAnalysisManager &AM,
+LazyCallGraph &CG, CGSCCUpdateResult &UR);
+
+  static bool isRequired() { return false; }
+};
+} // end namespace llvm
+
+#endif // LLVM_TRANSFORMS_COROUTINES_COROANNOTATIONELIDE_H
diff --git a/llvm/lib/Passes/PassBuilder.cpp b/llvm/lib/Passes/PassBuilder.cpp
index 17eed97fd950c9..c2b99a0d1f8cea 100644
--- a/llvm/lib/Passes/PassBuilder.cpp
+++ b/llvm/lib/Passes/PassBuilder.cpp
@@ -138,6 +138,7 @@
 #include "llvm/Target/TargetMachine.h"
 #include "llvm/Transforms/AggressiveInstCombine/AggressiveInstCombine.h"
 #include "llvm/Transforms/CFGuard.h"
+#include "llvm/Transforms/Coroutines/CoroAnnotationElide.h"
 #include "llvm/Transforms/Coroutines/CoroCleanup.h"
 #include "llvm/Transforms/Coroutines/CoroConditionalWrapper.h"
 #include "llvm/Transforms/Coroutines/CoroEarly.h"
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp 
b/llvm/lib/Passes/PassBuilderPipelines.cpp
index 1184123c7710f0..992b4fca8a6919 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -33,6 +33,7 @@
 #include "llvm/Support/VirtualFileSystem.h"
 #include "llvm/Target/TargetMachine.h"
 #include "llvm/Transforms/AggressiveInstCombine/AggressiveInstCombine.h"
+#include "llvm/Transforms/Coroutines/CoroAnnotationElide.h"
 #include "llvm/Transforms/Coroutines/CoroCleanup.h"
 #include "llvm/Transforms/Coroutines/CoroConditionalWrapper.h"
 #include "llvm/Transforms/Coroutines/CoroEarly.h"
@@ -984,8 +985,10 @@ PassBuilder::buildInlinerPipeline(OptimizationLevel Level,
   MainCGPipeline.addPass(createCGSCCToFunctionPassAdaptor(
   RequireAnalysisPass()));
 
-  if (Phase != ThinOrFullLTOPhase::ThinLTOPreLink)
+  if (Phase != ThinOrFullLTOPhase::ThinLTOPreLink) {
 MainCGPipeline.addPass(CoroSplitPass(Level != OptimizationLevel::O0));
+MainCGPipeline.addPass(CoroAnnotationElidePass());
+  }
 
   // Make sure we don't affect potential future NoRerun CGSCC adaptors.
   MIWP.addLateModulePass(createModuleToFunctionPassAdaptor(
@@ -1027,9 +1030,12 @@ 
PassBuilder::buildModuleInlinerPipeline(OptimizationLevel Level,
   buildFunctionSimplificationPipeline(Level, Phase),
   PT

[llvm-branch-commits] [llvm] [LLVM][Coroutines] Create `.noalloc` variant of switch ABI coroutine ramp functions during CoroSplit (PR #99283)

2024-08-21 Thread Yuxuan Chen via llvm-branch-commits

yuxuanchen1997 wrote:

@ChuanqiXu9 I have changed this patch to only conditionally create the 
`.noalloc` variant based on an attribute (which is controlled by FE). Let me 
know if this is good to go.

https://github.com/llvm/llvm-project/pull/99283
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LLVM][Coroutines] Transform "coro_must_elide" calls to switch ABI coroutines to the `noalloc` variant (PR #99285)

2024-08-21 Thread Yuxuan Chen via llvm-branch-commits


@@ -968,8 +969,8 @@ PassBuilder::buildInlinerPipeline(OptimizationLevel Level,
   // it's been modified since.
   MainCGPipeline.addPass(createCGSCCToFunctionPassAdaptor(
   RequireAnalysisPass()));
-
   MainCGPipeline.addPass(CoroSplitPass(Level != OptimizationLevel::O0));
+  MainCGPipeline.addPass(CoroAnnotationElidePass());

yuxuanchen1997 wrote:

I applied this suggestion. However, looking at the 
`buildModuleInlinerPipeline`. It looks like it uses an adapter that runs single 
CGSCC pass on every function in the module. This won't work well for 
`CoroAnnotationElidePass` actually. It depends on the callee to be split, but 
not the caller. 

Thinking about this, this is actually the same condition as the old 
`CoroElidePass`. Maybe the right thing to do here is to make this pass a 
function pass instead and use `createCGSCCToFunctionPassAdaptor`. What do you 
think?

https://github.com/llvm/llvm-project/pull/99285
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [DirectX] Lower `@llvm.dx.handle.fromBinding` to DXIL ops (PR #104251)

2024-08-21 Thread David Peixotto via llvm-branch-commits


@@ -683,6 +685,14 @@ def Dot4 :  DXILOp<56, dot4> {
   let attributes = [Attributes];
 }
 
+def CreateHandle : DXILOp<57, createHandle> {
+  let Doc = "creates the handle to a resource";
+  // ResourceClass, RangeID, Index, NonUniform
+  let arguments = [Int8Ty, Int32Ty, Int32Ty, Int1Ty];
+  let result = HandleTy;
+  let stages = [Stages];

dmpots wrote:

This should be invalid starting in DXIL_1_6 I think, right? Did we add a way to 
express that in the TD definition?

https://github.com/llvm/llvm-project/pull/104251
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [DirectX] Lower `@llvm.dx.handle.fromBinding` to DXIL ops (PR #104251)

2024-08-21 Thread David Peixotto via llvm-branch-commits


@@ -119,6 +123,119 @@ class OpLowerer {
 });
   }
 
+  Value *createTmpHandleCast(Value *V, Type *Ty) {
+Function *CastFn = Intrinsic::getDeclaration(&M, Intrinsic::dx_cast_handle,
+ {Ty, V->getType()});
+CallInst *Cast = OpBuilder.getIRB().CreateCall(CastFn, {V});
+CleanupCasts.push_back(Cast);
+return Cast;
+  }
+
+  void cleanupHandleCasts() {
+SmallVector ToRemove;
+SmallVector CastFns;
+
+for (CallInst *Cast : CleanupCasts) {
+  CastFns.push_back(Cast->getCalledFunction());
+  // All of the ops should be using `dx.types.Handle` at this point, so if
+  // we're not producing that we should be part of a pair. Track this so we

dmpots wrote:

It's not clear from reading what "it should be part of a pair" means and why it 
must be true. Can we expand the comment here to explain? Is there an assert we 
should add here as well?

https://github.com/llvm/llvm-project/pull/104251
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [DirectX] Lower `@llvm.dx.handle.fromBinding` to DXIL ops (PR #104251)

2024-08-21 Thread David Peixotto via llvm-branch-commits


@@ -119,6 +123,119 @@ class OpLowerer {
 });
   }
 
+  Value *createTmpHandleCast(Value *V, Type *Ty) {
+Function *CastFn = Intrinsic::getDeclaration(&M, Intrinsic::dx_cast_handle,
+ {Ty, V->getType()});
+CallInst *Cast = OpBuilder.getIRB().CreateCall(CastFn, {V});
+CleanupCasts.push_back(Cast);
+return Cast;
+  }
+
+  void cleanupHandleCasts() {
+SmallVector ToRemove;
+SmallVector CastFns;
+
+for (CallInst *Cast : CleanupCasts) {
+  CastFns.push_back(Cast->getCalledFunction());
+  // All of the ops should be using `dx.types.Handle` at this point, so if
+  // we're not producing that we should be part of a pair. Track this so we
+  // can remove it at the end.
+  if (Cast->getType() != OpBuilder.getHandleType()) {
+ToRemove.push_back(Cast);
+continue;
+  }
+  // Otherwise, we're the second handle in a pair. Forward the arguments 
and
+  // remove the (second) cast.
+  CallInst *Def = cast(Cast->getOperand(0));
+  assert(Def->getIntrinsicID() == Intrinsic::dx_cast_handle &&
+ "Unbalanced pair of temporary handle casts");
+  Cast->replaceAllUsesWith(Def->getOperand(0));
+  Cast->eraseFromParent();
+}
+for (CallInst *Cast : ToRemove) {
+  assert(Cast->user_empty() && "Temporary handle cast still has users");
+  Cast->eraseFromParent();
+}
+llvm::sort(CastFns);
+CastFns.erase(llvm::unique(CastFns), CastFns.end());
+for (Function *F : CastFns)
+  F->eraseFromParent();
+
+CleanupCasts.clear();
+  }
+
+  void lowerToCreateHandle(Function &F) {
+IRBuilder<> &IRB = OpBuilder.getIRB();
+Type *Int8Ty = IRB.getInt8Ty();
+Type *Int32Ty = IRB.getInt32Ty();
+
+replaceFunction(F, [&](CallInst *CI) -> Error {
+  IRB.SetInsertPoint(CI);
+
+  dxil::ResourceInfo &RI = DRM[CI];
+  dxil::ResourceInfo::ResourceBinding Binding = RI.getBinding();
+
+  std::array Args{
+  ConstantInt::get(Int8Ty, llvm::to_underlying(RI.getResourceClass())),
+  ConstantInt::get(Int32Ty, Binding.RecordID), CI->getArgOperand(3),
+  CI->getArgOperand(4)};
+  Expected OpCall =
+  OpBuilder.tryCreateOp(OpCode::CreateHandle, Args);
+  if (Error E = OpCall.takeError())
+return E;
+
+  Value *Cast = createTmpHandleCast(*OpCall, CI->getType());
+
+  CI->replaceAllUsesWith(Cast);
+  CI->eraseFromParent();
+  return Error::success();
+});
+  }
+
+  void lowerToBindAndAnnotateHandle(Function &F) {
+IRBuilder<> &IRB = OpBuilder.getIRB();
+
+replaceFunction(F, [&](CallInst *CI) -> Error {
+  IRB.SetInsertPoint(CI);
+
+  dxil::ResourceInfo &RI = DRM[CI];
+  dxil::ResourceInfo::ResourceBinding Binding = RI.getBinding();
+  std::pair Props = RI.getAnnotateProps();
+
+  Constant *ResBind = OpBuilder.getResBind(
+  Binding.LowerBound, Binding.LowerBound + Binding.Size - 1,

dmpots wrote:

Is this going to do the right thing for unbounded resource array size? I think 
that should have an upper bound of UINT_MAX.

https://github.com/llvm/llvm-project/pull/104251
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [DirectX] Lower `@llvm.dx.handle.fromBinding` to DXIL ops (PR #104251)

2024-08-21 Thread David Peixotto via llvm-branch-commits


@@ -119,6 +123,119 @@ class OpLowerer {
 });
   }
 
+  Value *createTmpHandleCast(Value *V, Type *Ty) {
+Function *CastFn = Intrinsic::getDeclaration(&M, Intrinsic::dx_cast_handle,
+ {Ty, V->getType()});
+CallInst *Cast = OpBuilder.getIRB().CreateCall(CastFn, {V});
+CleanupCasts.push_back(Cast);
+return Cast;
+  }
+
+  void cleanupHandleCasts() {
+SmallVector ToRemove;
+SmallVector CastFns;
+
+for (CallInst *Cast : CleanupCasts) {
+  CastFns.push_back(Cast->getCalledFunction());
+  // All of the ops should be using `dx.types.Handle` at this point, so if
+  // we're not producing that we should be part of a pair. Track this so we
+  // can remove it at the end.
+  if (Cast->getType() != OpBuilder.getHandleType()) {
+ToRemove.push_back(Cast);
+continue;
+  }
+  // Otherwise, we're the second handle in a pair. Forward the arguments 
and
+  // remove the (second) cast.
+  CallInst *Def = cast(Cast->getOperand(0));
+  assert(Def->getIntrinsicID() == Intrinsic::dx_cast_handle &&
+ "Unbalanced pair of temporary handle casts");
+  Cast->replaceAllUsesWith(Def->getOperand(0));
+  Cast->eraseFromParent();
+}
+for (CallInst *Cast : ToRemove) {
+  assert(Cast->user_empty() && "Temporary handle cast still has users");
+  Cast->eraseFromParent();
+}
+llvm::sort(CastFns);
+CastFns.erase(llvm::unique(CastFns), CastFns.end());
+for (Function *F : CastFns)
+  F->eraseFromParent();
+
+CleanupCasts.clear();
+  }
+
+  void lowerToCreateHandle(Function &F) {
+IRBuilder<> &IRB = OpBuilder.getIRB();
+Type *Int8Ty = IRB.getInt8Ty();
+Type *Int32Ty = IRB.getInt32Ty();
+
+replaceFunction(F, [&](CallInst *CI) -> Error {
+  IRB.SetInsertPoint(CI);
+
+  dxil::ResourceInfo &RI = DRM[CI];
+  dxil::ResourceInfo::ResourceBinding Binding = RI.getBinding();
+
+  std::array Args{
+  ConstantInt::get(Int8Ty, llvm::to_underlying(RI.getResourceClass())),
+  ConstantInt::get(Int32Ty, Binding.RecordID), CI->getArgOperand(3),
+  CI->getArgOperand(4)};
+  Expected OpCall =
+  OpBuilder.tryCreateOp(OpCode::CreateHandle, Args);
+  if (Error E = OpCall.takeError())
+return E;
+
+  Value *Cast = createTmpHandleCast(*OpCall, CI->getType());
+
+  CI->replaceAllUsesWith(Cast);
+  CI->eraseFromParent();
+  return Error::success();
+});
+  }
+
+  void lowerToBindAndAnnotateHandle(Function &F) {
+IRBuilder<> &IRB = OpBuilder.getIRB();
+
+replaceFunction(F, [&](CallInst *CI) -> Error {
+  IRB.SetInsertPoint(CI);
+
+  dxil::ResourceInfo &RI = DRM[CI];
+  dxil::ResourceInfo::ResourceBinding Binding = RI.getBinding();
+  std::pair Props = RI.getAnnotateProps();
+
+  Constant *ResBind = OpBuilder.getResBind(
+  Binding.LowerBound, Binding.LowerBound + Binding.Size - 1,
+  Binding.Space, RI.getResourceClass());
+  std::array BindArgs{ResBind, CI->getArgOperand(3),
+  CI->getArgOperand(4)};
+  Expected OpBind =
+  OpBuilder.tryCreateOp(OpCode::CreateHandleFromBinding, BindArgs);
+  if (Error E = OpBind.takeError())
+return E;
+
+  std::array AnnotateArgs{
+  *OpBind, OpBuilder.getResProps(Props.first, Props.second)};
+  Expected OpAnnotate =
+  OpBuilder.tryCreateOp(OpCode::AnnotateHandle, AnnotateArgs);
+  if (Error E = OpAnnotate.takeError())
+return E;
+
+  Value *Cast = createTmpHandleCast(*OpAnnotate, CI->getType());
+
+  CI->replaceAllUsesWith(Cast);
+  CI->eraseFromParent();
+
+  return Error::success();
+});
+  }
+
+  void lowerHandleFromBinding(Function &F) {

dmpots wrote:

This seems to be a more complicated lowering that a straightforward translation 
(well not this function but its implementation details). Can we add a 
high-level description of what the lowering does?

Like the need for and usage of `dx_cast_handle` would be good to explain.

https://github.com/llvm/llvm-project/pull/104251
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [DirectX] Lower `@llvm.dx.handle.fromBinding` to DXIL ops (PR #104251)

2024-08-21 Thread David Peixotto via llvm-branch-commits


@@ -0,0 +1,61 @@
+; RUN: opt -S -dxil-op-lower %s | FileCheck %s

dmpots wrote:

I don't see tests for either

1. Unbounded resource arrays
2. Non-constant index into resource arrays

I think it would be good to have tests for these.

https://github.com/llvm/llvm-project/pull/104251
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [DirectX] Lower `@llvm.dx.handle.fromBinding` to DXIL ops (PR #104251)

2024-08-21 Thread David Peixotto via llvm-branch-commits


@@ -119,6 +123,119 @@ class OpLowerer {
 });
   }
 
+  Value *createTmpHandleCast(Value *V, Type *Ty) {
+Function *CastFn = Intrinsic::getDeclaration(&M, Intrinsic::dx_cast_handle,
+ {Ty, V->getType()});
+CallInst *Cast = OpBuilder.getIRB().CreateCall(CastFn, {V});
+CleanupCasts.push_back(Cast);
+return Cast;
+  }
+
+  void cleanupHandleCasts() {
+SmallVector ToRemove;
+SmallVector CastFns;
+
+for (CallInst *Cast : CleanupCasts) {
+  CastFns.push_back(Cast->getCalledFunction());
+  // All of the ops should be using `dx.types.Handle` at this point, so if
+  // we're not producing that we should be part of a pair. Track this so we
+  // can remove it at the end.
+  if (Cast->getType() != OpBuilder.getHandleType()) {
+ToRemove.push_back(Cast);
+continue;
+  }
+  // Otherwise, we're the second handle in a pair. Forward the arguments 
and
+  // remove the (second) cast.
+  CallInst *Def = cast(Cast->getOperand(0));
+  assert(Def->getIntrinsicID() == Intrinsic::dx_cast_handle &&
+ "Unbalanced pair of temporary handle casts");
+  Cast->replaceAllUsesWith(Def->getOperand(0));
+  Cast->eraseFromParent();
+}
+for (CallInst *Cast : ToRemove) {
+  assert(Cast->user_empty() && "Temporary handle cast still has users");
+  Cast->eraseFromParent();
+}
+llvm::sort(CastFns);
+CastFns.erase(llvm::unique(CastFns), CastFns.end());
+for (Function *F : CastFns)
+  F->eraseFromParent();

dmpots wrote:

The explanation is good, can we get that added as a comment?

https://github.com/llvm/llvm-project/pull/104251
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [ctx_prof] Add support for ICP (PR #105469)

2024-08-21 Thread Mircea Trofin via llvm-branch-commits

https://github.com/mtrofin updated 
https://github.com/llvm/llvm-project/pull/105469

>From d58d308957961ae7442a7b5aa0561f42dea69caf Mon Sep 17 00:00:00 2001
From: Mircea Trofin 
Date: Tue, 20 Aug 2024 21:32:23 -0700
Subject: [PATCH] [ctx_prof] Add support for ICP

---
 llvm/include/llvm/Analysis/CtxProfAnalysis.h  |  18 +-
 llvm/include/llvm/IR/IntrinsicInst.h  |   2 +
 .../llvm/ProfileData/PGOCtxProfReader.h   |  20 ++
 .../Transforms/Utils/CallPromotionUtils.h |   4 +
 llvm/lib/Analysis/CtxProfAnalysis.cpp |  79 +---
 llvm/lib/IR/IntrinsicInst.cpp |  10 +
 .../Transforms/Utils/CallPromotionUtils.cpp   |  86 +
 .../Utils/CallPromotionUtilsTest.cpp  | 178 ++
 8 files changed, 364 insertions(+), 33 deletions(-)

diff --git a/llvm/include/llvm/Analysis/CtxProfAnalysis.h 
b/llvm/include/llvm/Analysis/CtxProfAnalysis.h
index 0b4dd8ae3a0dc7..d6c2bb26a091af 100644
--- a/llvm/include/llvm/Analysis/CtxProfAnalysis.h
+++ b/llvm/include/llvm/Analysis/CtxProfAnalysis.h
@@ -73,6 +73,12 @@ class PGOContextualProfile {
 return 
FuncInfo.find(getDefinedFunctionGUID(F))->second.NextCallsiteIndex++;
   }
 
+  using ConstVisitor = function_ref;
+  using Visitor = function_ref;
+
+  void update(Visitor, const Function *F = nullptr);
+  void visit(ConstVisitor, const Function *F = nullptr) const;
+
   const CtxProfFlatProfile flatten() const;
 
   bool invalidate(Module &, const PreservedAnalyses &PA,
@@ -105,13 +111,18 @@ class CtxProfAnalysis : public 
AnalysisInfoMixin {
 
 class CtxProfAnalysisPrinterPass
 : public PassInfoMixin {
-  raw_ostream &OS;
-
 public:
-  explicit CtxProfAnalysisPrinterPass(raw_ostream &OS) : OS(OS) {}
+  enum class PrintMode { Everything, JSON };
+  explicit CtxProfAnalysisPrinterPass(raw_ostream &OS,
+  PrintMode Mode = PrintMode::Everything)
+  : OS(OS), Mode(Mode) {}
 
   PreservedAnalyses run(Module &M, ModuleAnalysisManager &MAM);
   static bool isRequired() { return true; }
+
+private:
+  raw_ostream &OS;
+  const PrintMode Mode;
 };
 
 /// Assign a GUID to functions as metadata. GUID calculation takes linkage into
@@ -134,6 +145,5 @@ class AssignGUIDPass : public PassInfoMixin 
{
   // This should become GlobalValue::getGUID
   static uint64_t getGUID(const Function &F);
 };
-
 } // namespace llvm
 #endif // LLVM_ANALYSIS_CTXPROFANALYSIS_H
diff --git a/llvm/include/llvm/IR/IntrinsicInst.h 
b/llvm/include/llvm/IR/IntrinsicInst.h
index 2f1e2c08c3ecec..bab41efab528e2 100644
--- a/llvm/include/llvm/IR/IntrinsicInst.h
+++ b/llvm/include/llvm/IR/IntrinsicInst.h
@@ -1519,6 +1519,7 @@ class InstrProfCntrInstBase : public InstrProfInstBase {
   ConstantInt *getNumCounters() const;
   // The index of the counter that this instruction acts on.
   ConstantInt *getIndex() const;
+  void setIndex(uint32_t Idx);
 };
 
 /// This represents the llvm.instrprof.cover intrinsic.
@@ -1569,6 +1570,7 @@ class InstrProfCallsite : public InstrProfCntrInstBase {
 return isa(V) && classof(cast(V));
   }
   Value *getCallee() const;
+  void setCallee(Value *);
 };
 
 /// This represents the llvm.instrprof.timestamp intrinsic.
diff --git a/llvm/include/llvm/ProfileData/PGOCtxProfReader.h 
b/llvm/include/llvm/ProfileData/PGOCtxProfReader.h
index 190deaeeacd085..23dcc376508b39 100644
--- a/llvm/include/llvm/ProfileData/PGOCtxProfReader.h
+++ b/llvm/include/llvm/ProfileData/PGOCtxProfReader.h
@@ -57,9 +57,23 @@ class PGOCtxProfContext final {
 
   GlobalValue::GUID guid() const { return GUID; }
   const SmallVectorImpl &counters() const { return Counters; }
+  SmallVectorImpl &counters() { return Counters; }
+
+  uint64_t getEntrycount() const { return Counters[0]; }
+
   const CallsiteMapTy &callsites() const { return Callsites; }
   CallsiteMapTy &callsites() { return Callsites; }
 
+  void ingestContext(uint32_t CSId, PGOCtxProfContext &&Other) {
+auto [Iter, _] = callsites().try_emplace(CSId, CallTargetMapTy());
+Iter->second.emplace(Other.guid(), std::move(Other));
+  }
+
+  void growCounters(uint32_t Size) {
+if (Size >= Counters.size())
+  Counters.resize(Size);
+  }
+
   bool hasCallsite(uint32_t I) const {
 return Callsites.find(I) != Callsites.end();
   }
@@ -68,6 +82,12 @@ class PGOCtxProfContext final {
 assert(hasCallsite(I) && "Callsite not found");
 return Callsites.find(I)->second;
   }
+
+  CallTargetMapTy &callsite(uint32_t I) {
+assert(hasCallsite(I) && "Callsite not found");
+return Callsites.find(I)->second;
+  }
+
   void getContainedGuids(DenseSet &Guids) const;
 };
 
diff --git a/llvm/include/llvm/Transforms/Utils/CallPromotionUtils.h 
b/llvm/include/llvm/Transforms/Utils/CallPromotionUtils.h
index 385831f457038d..58af26f31417b0 100644
--- a/llvm/include/llvm/Transforms/Utils/CallPromotionUtils.h
+++ b/llvm/include/llvm/Transforms/Utils/CallPromotionUtils.h
@@ -14,6 +14,7 @@
 #ifndef LLVM_TRANSFORMS_UTILS_CALLPROMOTIONUTILS_H
 #defin

[llvm-branch-commits] [clang] [misexpect] Support missing-annotations diagnostics from frontend profile data (PR #96524)

2024-08-21 Thread Paul Kirth via llvm-branch-commits

https://github.com/ilovepi updated 
https://github.com/llvm/llvm-project/pull/96524

>From 49aabf7bbc1cf30274c034b1cf2babc1fd851b31 Mon Sep 17 00:00:00 2001
From: Paul Kirth 
Date: Thu, 22 Aug 2024 00:19:28 +
Subject: [PATCH] Use split-file in test, and add test for switch statements

Created using spr 1.3.4
---
 .../missing-annotations-branch.proftext   | 17 -
 .../test/Profile/missing-annotations-branch.c | 62 ++
 .../test/Profile/missing-annotations-switch.c | 64 +++
 clang/test/Profile/missing-annotations.c  | 44 -
 4 files changed, 126 insertions(+), 61 deletions(-)
 delete mode 100644 
clang/test/Profile/Inputs/missing-annotations-branch.proftext
 create mode 100644 clang/test/Profile/missing-annotations-branch.c
 create mode 100644 clang/test/Profile/missing-annotations-switch.c
 delete mode 100644 clang/test/Profile/missing-annotations.c

diff --git a/clang/test/Profile/Inputs/missing-annotations-branch.proftext 
b/clang/test/Profile/Inputs/missing-annotations-branch.proftext
deleted file mode 100644
index 81c857b9a84fb3..00
--- a/clang/test/Profile/Inputs/missing-annotations-branch.proftext
+++ /dev/null
@@ -1,17 +0,0 @@
-bar
-# Func Hash:
-11262309464
-# Num Counters:
-2
-# Counter Values:
-2000
-0
-
-fizz
-# Func Hash:
-11262309464
-# Num Counters:
-2
-# Counter Values:
-0
-100
diff --git a/clang/test/Profile/missing-annotations-branch.c 
b/clang/test/Profile/missing-annotations-branch.c
new file mode 100644
index 00..fa764d9238c8a7
--- /dev/null
+++ b/clang/test/Profile/missing-annotations-branch.c
@@ -0,0 +1,62 @@
+// RUN: rm -rf %t && mkdir -p %t
+// RUN: split-file %s %t
+
+/// Test that missing-annotations detects branches that are hot, but not 
annotated.
+// RUN: llvm-profdata merge %t/a.proftext -o %t/profdata
+// RUN: %clang_cc1 %t/a.c -O2 -o - -emit-llvm 
-fprofile-instrument-use-path=%t/profdata -verify -mllvm 
-pgo-missing-annotations -Rpass=missing-annotations  
-fdiagnostics-misexpect-tolerance=10
+
+/// Test that we don't report any diagnostics, if the threshold isn't exceeded.
+// RUN: %clang_cc1 %t/a.c -O2 -o - -emit-llvm 
-fprofile-instrument-use-path=%t/profdata -mllvm -pgo-missing-annotations 
-Rpass=missing-annotations  2>&1 | FileCheck -implicit-check-not=remark %s
+
+//--- a.c
+// foo-no-diagnostics
+#define UNLIKELY(x) __builtin_expect(!!(x), 0)
+
+int foo(int);
+int baz(int);
+int buzz(void);
+
+const int inner_loop = 100;
+const int outer_loop = 2000;
+
+int bar(void) { //  imprecise-remark-re {{Extremely hot condition. Consider 
adding llvm.expect intrinsic{{.*
+  int a = buzz();
+  int x = 0;
+  if (a % (outer_loop * inner_loop) == 0) { // expected-remark {{Extremely hot 
condition. Consider adding llvm.expect intrinsic}}
+x = baz(a);
+  } else {
+x = foo(50);
+  }
+  return x;
+}
+
+int fizz(void) {
+  int a = buzz();
+  int x = 0;
+  if ((a % (outer_loop * inner_loop) == 0)) { // expected-remark-re 
{{Extremely hot condition. Consider adding llvm.expect intrinsic{{.*}
+x = baz(a);
+  } else {
+x = foo(50);
+  }
+  return x;
+}
+
+//--- a.proftext
+bar
+# Func Hash:
+11262309464
+# Num Counters:
+2
+# Counter Values:
+1901
+99
+
+fizz
+# Func Hash:
+11262309464
+# Num Counters:
+2
+# Counter Values:
+1901
+99
+
diff --git a/clang/test/Profile/missing-annotations-switch.c 
b/clang/test/Profile/missing-annotations-switch.c
new file mode 100644
index 00..2d7ea0865ac8a1
--- /dev/null
+++ b/clang/test/Profile/missing-annotations-switch.c
@@ -0,0 +1,64 @@
+// RUN: rm -rf %t && mkdir -p %t
+// RUN: split-file %s %t
+
+/// Test that missing-annotations detects switch conditions that are hot, but 
not annotated.
+// RUN: llvm-profdata merge %t/a.proftext -o %t/profdata
+// RUN: %clang_cc1 %t/a.c -O2 -o - -emit-llvm 
-fprofile-instrument-use-path=%t/profdata -verify -mllvm 
-pgo-missing-annotations -Rpass=missing-annotations 
-fdiagnostics-misexpect-tolerance=10
+
+/// Test that we don't report any diagnostics, if the threshold isn't exceeded.
+// RUN: %clang_cc1 %t/a.c -O2 -o - -emit-llvm 
-fprofile-instrument-use-path=%t/profdata -mllvm -pgo-missing-annotations 
-Rpass=missing-annotations  2>&1 | FileCheck -implicit-check-not=remark %s
+
+//--- a.c
+#define inner_loop 1000
+#define outer_loop 20
+#define arry_size 25
+
+int arry[arry_size] = {0};
+
+int rand(void);
+int sum(int *buff, int size);
+int random_sample(int *buff, int size);
+
+int main(void) {
+  int val = 0;
+
+  int j, k;
+  for (j = 0; j < outer_loop; ++j) {
+for (k = 0; k < inner_loop; ++k) {
+  unsigned condition = rand() % 1;
+  switch (condition) { // expected-remark {{Extremely hot condition. 
Consider adding llvm.expect intrinsic}}
+
+  case 0:
+val += sum(arry, arry_size);
+break;
+  case 1:
+  case 2:
+  case 3:
+break;
+  default:
+val += random_sample(arry, arry_size);
+break;
+  } // end sw

[llvm-branch-commits] [clang] [misexpect] Support missing-annotations diagnostics from frontend profile data (PR #96524)

2024-08-21 Thread Paul Kirth via llvm-branch-commits

https://github.com/ilovepi edited 
https://github.com/llvm/llvm-project/pull/96524
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang][misexpect] Add support to clang for profitable annotation diagnostics (PR #96525)

2024-08-21 Thread Paul Kirth via llvm-branch-commits

https://github.com/ilovepi updated 
https://github.com/llvm/llvm-project/pull/96525


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang][misexpect] Add support to clang for profitable annotation diagnostics (PR #96525)

2024-08-21 Thread Paul Kirth via llvm-branch-commits

https://github.com/ilovepi updated 
https://github.com/llvm/llvm-project/pull/96525


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] Reland "[asan] Remove debug tracing from `report_globals` (#104404)" (PR #105601)

2024-08-21 Thread Vitaly Buka via llvm-branch-commits

https://github.com/vitalybuka created 
https://github.com/llvm/llvm-project/pull/105601

This reverts commit 2704b804bec50c2b016bf678bd534c330ec655b6
and relands #104404.

The Darwin should not fail after #105599.



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] Reland "[asan] Remove debug tracing from `report_globals` (#104404)" (PR #105601)

2024-08-21 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-compiler-rt-sanitizer

Author: Vitaly Buka (vitalybuka)


Changes

This reverts commit 2704b804bec50c2b016bf678bd534c330ec655b6
and relands #104404.

The Darwin should not fail after #105599.


---
Full diff: https://github.com/llvm/llvm-project/pull/105601.diff


10 Files Affected:

- (modified) compiler-rt/lib/asan/asan_flags.inc (+2-5) 
- (modified) compiler-rt/lib/asan/asan_globals.cpp (+8-11) 
- (modified) compiler-rt/test/asan/TestCases/Linux/initialization-nobug-lld.cpp 
(+1-1) 
- (modified) compiler-rt/test/asan/TestCases/Linux/odr_indicator_unregister.cpp 
(+1-1) 
- (modified) compiler-rt/test/asan/TestCases/Linux/odr_indicators.cpp (+2-2) 
- (modified) compiler-rt/test/asan/TestCases/Windows/dll_global_dead_strip.c 
(+2-2) 
- (modified) 
compiler-rt/test/asan/TestCases/Windows/dll_report_globals_symbolization_at_startup.cpp
 (+1-1) 
- (modified) compiler-rt/test/asan/TestCases/Windows/global_dead_strip.c (+2-2) 
- (modified) 
compiler-rt/test/asan/TestCases/Windows/report_globals_vs_freelibrary.cpp 
(+1-1) 
- (modified) compiler-rt/test/asan/TestCases/initialization-nobug.cpp (+4-4) 


``diff
diff --git a/compiler-rt/lib/asan/asan_flags.inc 
b/compiler-rt/lib/asan/asan_flags.inc
index fad1577d912a5e..5e0ced9706e664 100644
--- a/compiler-rt/lib/asan/asan_flags.inc
+++ b/compiler-rt/lib/asan/asan_flags.inc
@@ -36,11 +36,8 @@ ASAN_FLAG(int, max_redzone, 2048,
 ASAN_FLAG(
 bool, debug, false,
 "If set, prints some debugging information and does additional checks.")
-ASAN_FLAG(
-int, report_globals, 1,
-"Controls the way to handle globals (0 - don't detect buffer overflow on "
-"globals, 1 - detect buffer overflow, 2 - print data about registered "
-"globals).")
+ASAN_FLAG(bool, report_globals, true,
+  "If set, detect and report errors on globals .")
 ASAN_FLAG(bool, check_initialization_order, false,
   "If set, attempts to catch initialization order issues.")
 ASAN_FLAG(
diff --git a/compiler-rt/lib/asan/asan_globals.cpp 
b/compiler-rt/lib/asan/asan_globals.cpp
index c83b782cb85f89..a1211430b1268a 100644
--- a/compiler-rt/lib/asan/asan_globals.cpp
+++ b/compiler-rt/lib/asan/asan_globals.cpp
@@ -22,6 +22,7 @@
 #include "asan_thread.h"
 #include "sanitizer_common/sanitizer_common.h"
 #include "sanitizer_common/sanitizer_dense_map.h"
+#include "sanitizer_common/sanitizer_internal_defs.h"
 #include "sanitizer_common/sanitizer_list.h"
 #include "sanitizer_common/sanitizer_mutex.h"
 #include "sanitizer_common/sanitizer_placement_new.h"
@@ -179,7 +180,7 @@ int GetGlobalsForAddress(uptr addr, Global *globals, u32 
*reg_sites,
   int res = 0;
   for (const auto &l : list_of_all_globals) {
 const Global &g = *l.g;
-if (flags()->report_globals >= 2)
+if (UNLIKELY(common_flags()->verbosity >= 3))
   ReportGlobal(g, "Search");
 if (IsAddressNearGlobal(addr, g)) {
   internal_memcpy(&globals[res], &g, sizeof(g));
@@ -270,7 +271,7 @@ static inline bool UseODRIndicator(const Global *g) {
 // so we store the globals in a map.
 static void RegisterGlobal(const Global *g) SANITIZER_REQUIRES(mu_for_globals) 
{
   CHECK(AsanInited());
-  if (flags()->report_globals >= 2)
+  if (UNLIKELY(common_flags()->verbosity >= 3))
 ReportGlobal(*g, "Added");
   CHECK(flags()->report_globals);
   CHECK(AddrIsInMem(g->beg));
@@ -307,7 +308,7 @@ static void RegisterGlobal(const Global *g) 
SANITIZER_REQUIRES(mu_for_globals) {
 static void UnregisterGlobal(const Global *g)
 SANITIZER_REQUIRES(mu_for_globals) {
   CHECK(AsanInited());
-  if (flags()->report_globals >= 2)
+  if (UNLIKELY(common_flags()->verbosity >= 3))
 ReportGlobal(*g, "Removed");
   CHECK(flags()->report_globals);
   CHECK(AddrIsInMem(g->beg));
@@ -438,7 +439,7 @@ void __asan_register_globals(__asan_global *globals, uptr 
n) {
   }
   GlobalRegistrationSite site = {stack_id, &globals[0], &globals[n - 1]};
   global_registration_site_vector->push_back(site);
-  if (flags()->report_globals >= 2) {
+  if (UNLIKELY(common_flags()->verbosity >= 3)) {
 PRINT_CURRENT_STACK();
 Printf("=== ID %d; %p %p\n", stack_id, (void *)&globals[0],
(void *)&globals[n - 1]);
@@ -497,9 +498,7 @@ void __asan_before_dynamic_init(const char *module_name) {
   Lock lock(&mu_for_globals);
   if (current_dynamic_init_module_name == module_name)
 return;
-  if (flags()->report_globals >= 3)
-Printf("DynInitPoison module: %s\n", module_name);
-
+  VPrintf(2, "DynInitPoison module: %s\n", module_name);
   if (current_dynamic_init_module_name == nullptr) {
 // First call, poison all globals from other modules.
 DynInitGlobals().forEach([&](auto &kv) {
@@ -545,8 +544,7 @@ static void UnpoisonBeforeMain(void) {
   return;
 allow_after_dynamic_init = true;
   }
-  if (flags()->report_globals >= 3)
-Printf("UnpoisonBeforeMain\n");
+  VPrintf(2, "UnpoisonBeforeMain\n");
   __asan_after_dynamic_init();
 }
 
@@ -570,8 +568,7 @@ void __

[llvm-branch-commits] [DirectX] Lower `@llvm.dx.handle.fromBinding` to DXIL ops (PR #104251)

2024-08-21 Thread Justin Bogner via llvm-branch-commits

https://github.com/bogner updated 
https://github.com/llvm/llvm-project/pull/104251


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [DirectX] Lower `@llvm.dx.handle.fromBinding` to DXIL ops (PR #104251)

2024-08-21 Thread Justin Bogner via llvm-branch-commits

https://github.com/bogner updated 
https://github.com/llvm/llvm-project/pull/104251


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [DirectX] Implement metadata lowering for resources (PR #104447)

2024-08-21 Thread Justin Bogner via llvm-branch-commits

https://github.com/bogner updated 
https://github.com/llvm/llvm-project/pull/104447


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [DirectX] Implement metadata lowering for resources (PR #104447)

2024-08-21 Thread Justin Bogner via llvm-branch-commits

https://github.com/bogner updated 
https://github.com/llvm/llvm-project/pull/104447


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [DirectX] Add resource handling to the DXIL pretty printer (PR #104448)

2024-08-21 Thread Justin Bogner via llvm-branch-commits

https://github.com/bogner updated 
https://github.com/llvm/llvm-project/pull/104448


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [DirectX] Add resource handling to the DXIL pretty printer (PR #104448)

2024-08-21 Thread Justin Bogner via llvm-branch-commits

https://github.com/bogner updated 
https://github.com/llvm/llvm-project/pull/104448


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [DirectX] Implement metadata lowering for resources (PR #104447)

2024-08-21 Thread Justin Bogner via llvm-branch-commits


@@ -13,27 +13,52 @@
 #include "DXILShaderFlags.h"
 #include "DirectX.h"
 #include "llvm/ADT/StringSet.h"
+#include "llvm/Analysis/DXILResource.h"
 #include "llvm/IR/Constants.h"
 #include "llvm/IR/Metadata.h"
 #include "llvm/IR/Module.h"
+#include "llvm/InitializePasses.h"
 #include "llvm/Pass.h"
 #include "llvm/TargetParser/Triple.h"
 
 using namespace llvm;
 using namespace llvm::dxil;
 
-static void emitResourceMetadata(Module &M,
+static void emitResourceMetadata(Module &M, const DXILResourceMap &DRM,
  const dxil::Resources &MDResources) {
-  Metadata *SRVMD = nullptr, *UAVMD = nullptr, *CBufMD = nullptr,
-   *SmpMD = nullptr;
-  bool HasResources = false;
+  LLVMContext &Context = M.getContext();
+
+  SmallVector SRVs, UAVs, CBufs, Smps;
+  for (auto [_, RI] : DRM) {
+switch (RI.getResourceClass()) {

bogner wrote:

I've updated the API in #105602 - you can now iterate over the unique 
resources, and there are helpers to just iterate `.srvs()`, `.uavs()`, etc.

https://github.com/llvm/llvm-project/pull/104447
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LLVM][Coroutines] Create `.noalloc` variant of switch ABI coroutine ramp functions during CoroSplit (PR #99283)

2024-08-21 Thread Adrian Vogelsgesang via llvm-branch-commits


@@ -2049,6 +2055,22 @@ the coroutine must reach the final suspend point when it 
get destroyed.
 
 This attribute only works for switched-resume coroutines now.
 
+coro_must_elide
+---
+
+When a Call or Invoke instruction is marked with `coro_must_elide`,
+CoroAnnotationElidePass performs heap elision when possible. Note that for

vogelsgesang wrote:

I think the name `coro_must_elide` is a misnormer. "must elide" sounds as if it 
would be a compilation error if elision fails. However, this is not the case

https://github.com/llvm/llvm-project/pull/99283
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LLVM][Coroutines] Create `.noalloc` variant of switch ABI coroutine ramp functions during CoroSplit (PR #99283)

2024-08-21 Thread Yuxuan Chen via llvm-branch-commits


@@ -2049,6 +2055,22 @@ the coroutine must reach the final suspend point when it 
get destroyed.
 
 This attribute only works for switched-resume coroutines now.
 
+coro_must_elide
+---
+
+When a Call or Invoke instruction is marked with `coro_must_elide`,
+CoroAnnotationElidePass performs heap elision when possible. Note that for

yuxuanchen1997 wrote:

What about `coro_elide_safe`?

https://github.com/llvm/llvm-project/pull/99283
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LLVM][Coroutines] Create `.noalloc` variant of switch ABI coroutine ramp functions during CoroSplit (PR #99283)

2024-08-21 Thread Adrian Vogelsgesang via llvm-branch-commits


@@ -2049,6 +2055,22 @@ the coroutine must reach the final suspend point when it 
get destroyed.
 
 This attribute only works for switched-resume coroutines now.
 
+coro_must_elide
+---
+
+When a Call or Invoke instruction is marked with `coro_must_elide`,
+CoroAnnotationElidePass performs heap elision when possible. Note that for

vogelsgesang wrote:

love it!

https://github.com/llvm/llvm-project/pull/99283
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [DirectX] Add resource handling to the DXIL pretty printer (PR #104448)

2024-08-21 Thread Farzon Lotfi via llvm-branch-commits


@@ -10,23 +10,235 @@
 #include "DXILResourceAnalysis.h"
 #include "DirectX.h"
 #include "llvm/ADT/StringRef.h"
+#include "llvm/Analysis/DXILResource.h"
 #include "llvm/IR/PassManager.h"
+#include "llvm/InitializePasses.h"
 #include "llvm/Pass.h"
+#include "llvm/Support/FormatAdapters.h"
 #include "llvm/Support/FormatVariadic.h"
 #include "llvm/Support/raw_ostream.h"
 
 using namespace llvm;
 
-static void prettyPrintResources(raw_ostream &OS,
+static constexpr StringRef getRCName(dxil::ResourceClass RC) {

farzonl wrote:

Feel free to ignore, I was thinking of a different  way to do this that would 
have a tighter coupling of  Names and prefixes:

```cpp
struct ResourceClassInfo {
  const StringRef name;
  const StringRef prefix;
};

llvm::DenseMap createResourceClassMap() 
{
  return {
{dxil::ResourceClass::SRV, {"SRV", "t"}},
{dxil::ResourceClass::UAV, {"UAV", "u"}},
{dxil::ResourceClass::CBuffer, {"cbuffer", "cb"}},
{dxil::ResourceClass::Sampler, {"sampler", "s"}}
  };
}

static const llvm::DenseMap 
ResourceClassMap = createResourceClassMap();

StringRef getRCName(dxil::ResourceClass RC) {
  return ResourceClassMap.lookup(RC).name;
}

StringRef getRCPrefix(dxil::ResourceClass RC) {
  return ResourceClassMap.lookup(RC).prefix;
}
```

https://github.com/llvm/llvm-project/pull/104448
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [DirectX] Add resource handling to the DXIL pretty printer (PR #104448)

2024-08-21 Thread Farzon Lotfi via llvm-branch-commits

https://github.com/farzonl edited 
https://github.com/llvm/llvm-project/pull/104448
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [DirectX] Add resource handling to the DXIL pretty printer (PR #104448)

2024-08-21 Thread Farzon Lotfi via llvm-branch-commits

https://github.com/farzonl approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/104448
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [flang][omp] Emit omp.workshare in frontend (PR #101444)

2024-08-21 Thread Ivan R. Ivanov via llvm-branch-commits

https://github.com/ivanradanov updated 
https://github.com/llvm/llvm-project/pull/101444

>From e5789180a3dd1fd8c46a5d7dfc446921110642ca Mon Sep 17 00:00:00 2001
From: Ivan Radanov Ivanov 
Date: Wed, 31 Jul 2024 14:11:47 +0900
Subject: [PATCH 1/2] [flang][omp] Emit omp.workshare in frontend

---
 flang/lib/Lower/OpenMP/OpenMP.cpp | 30 ++
 1 file changed, 26 insertions(+), 4 deletions(-)

diff --git a/flang/lib/Lower/OpenMP/OpenMP.cpp 
b/flang/lib/Lower/OpenMP/OpenMP.cpp
index d614db8b68ef65..83c90374afa5e3 100644
--- a/flang/lib/Lower/OpenMP/OpenMP.cpp
+++ b/flang/lib/Lower/OpenMP/OpenMP.cpp
@@ -1272,6 +1272,15 @@ static void genTaskwaitClauses(lower::AbstractConverter 
&converter,
   loc, llvm::omp::Directive::OMPD_taskwait);
 }
 
+static void genWorkshareClauses(lower::AbstractConverter &converter,
+semantics::SemanticsContext &semaCtx,
+lower::StatementContext &stmtCtx,
+const List &clauses, mlir::Location 
loc,
+mlir::omp::WorkshareOperands &clauseOps) {
+  ClauseProcessor cp(converter, semaCtx, clauses);
+  cp.processNowait(clauseOps);
+}
+
 static void genTeamsClauses(lower::AbstractConverter &converter,
 semantics::SemanticsContext &semaCtx,
 lower::StatementContext &stmtCtx,
@@ -1897,6 +1906,22 @@ genTaskyieldOp(lower::AbstractConverter &converter, 
lower::SymMap &symTable,
   return converter.getFirOpBuilder().create(loc);
 }
 
+static mlir::omp::WorkshareOp
+genWorkshareOp(lower::AbstractConverter &converter, lower::SymMap &symTable,
+   semantics::SemanticsContext &semaCtx, lower::pft::Evaluation &eval,
+   mlir::Location loc, const ConstructQueue &queue,
+   ConstructQueue::iterator item) {
+  lower::StatementContext stmtCtx;
+  mlir::omp::WorkshareOperands clauseOps;
+  genWorkshareClauses(converter, semaCtx, stmtCtx, item->clauses, loc, 
clauseOps);
+
+  return genOpWithBody(
+  OpWithBodyGenInfo(converter, symTable, semaCtx, loc, eval,
+llvm::omp::Directive::OMPD_workshare)
+  .setClauses(&item->clauses),
+  queue, item, clauseOps);
+}
+
 static mlir::omp::TeamsOp
 genTeamsOp(lower::AbstractConverter &converter, lower::SymMap &symTable,
semantics::SemanticsContext &semaCtx, lower::pft::Evaluation &eval,
@@ -2309,10 +2334,7 @@ static void genOMPDispatch(lower::AbstractConverter 
&converter,
   llvm::omp::getOpenMPDirectiveName(dir) + ")");
   // case llvm::omp::Directive::OMPD_workdistribute:
   case llvm::omp::Directive::OMPD_workshare:
-// FIXME: Workshare is not a commonly used OpenMP construct, an
-// implementation for this feature will come later. For the codes
-// that use this construct, add a single construct for now.
-genSingleOp(converter, symTable, semaCtx, eval, loc, queue, item);
+genWorkshareOp(converter, symTable, semaCtx, eval, loc, queue, item);
 break;
   default:
 // Combined and composite constructs should have been split into a sequence

>From 70daa016c0c39861926b1b82e31b96db005cfba1 Mon Sep 17 00:00:00 2001
From: Ivan Radanov Ivanov 
Date: Sun, 4 Aug 2024 16:02:37 +0900
Subject: [PATCH 2/2] Fix lower test for workshare

---
 flang/test/Lower/OpenMP/workshare.f90 | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/flang/test/Lower/OpenMP/workshare.f90 
b/flang/test/Lower/OpenMP/workshare.f90
index 1e11677a15e1f0..8e771952f5b6da 100644
--- a/flang/test/Lower/OpenMP/workshare.f90
+++ b/flang/test/Lower/OpenMP/workshare.f90
@@ -6,7 +6,7 @@ subroutine sb1(arr)
   integer :: arr(:)
 !CHECK: omp.parallel  {
   !$omp parallel
-!CHECK: omp.single  {
+!CHECK: omp.workshare {
   !$omp workshare
 arr = 0
   !$omp end workshare
@@ -20,7 +20,7 @@ subroutine sb2(arr)
   integer :: arr(:)
 !CHECK: omp.parallel  {
   !$omp parallel
-!CHECK: omp.single nowait {
+!CHECK: omp.workshare nowait {
   !$omp workshare
 arr = 0
   !$omp end workshare nowait
@@ -33,7 +33,7 @@ subroutine sb2(arr)
 subroutine sb3(arr)
   integer :: arr(:)
 !CHECK: omp.parallel  {
-!CHECK: omp.single  {
+!CHECK: omp.workshare  {
   !$omp parallel workshare
 arr = 0
   !$omp end parallel workshare

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [flang] Introduce custom loop nest generation for loops in workshare construct (PR #101445)

2024-08-21 Thread Ivan R. Ivanov via llvm-branch-commits

https://github.com/ivanradanov updated 
https://github.com/llvm/llvm-project/pull/101445

>From 81606df746e9862c330681ed8ae9113a43e577a2 Mon Sep 17 00:00:00 2001
From: Ivan Radanov Ivanov 
Date: Wed, 31 Jul 2024 14:12:34 +0900
Subject: [PATCH 1/4] [flang] Introduce ws loop nest generation for HLFIR
 lowering

---
 .../flang/Optimizer/Builder/HLFIRTools.h  | 12 +++--
 flang/lib/Lower/ConvertCall.cpp   |  2 +-
 flang/lib/Lower/OpenMP/ReductionProcessor.cpp |  4 +-
 flang/lib/Optimizer/Builder/HLFIRTools.cpp| 52 ++-
 .../HLFIR/Transforms/BufferizeHLFIR.cpp   |  3 +-
 .../LowerHLFIROrderedAssignments.cpp  | 30 +--
 .../Transforms/OptimizedBufferization.cpp |  6 +--
 7 files changed, 69 insertions(+), 40 deletions(-)

diff --git a/flang/include/flang/Optimizer/Builder/HLFIRTools.h 
b/flang/include/flang/Optimizer/Builder/HLFIRTools.h
index 6b41025eea0780..14e42c6f358e46 100644
--- a/flang/include/flang/Optimizer/Builder/HLFIRTools.h
+++ b/flang/include/flang/Optimizer/Builder/HLFIRTools.h
@@ -357,8 +357,8 @@ hlfir::ElementalOp genElementalOp(
 
 /// Structure to describe a loop nest.
 struct LoopNest {
-  fir::DoLoopOp outerLoop;
-  fir::DoLoopOp innerLoop;
+  mlir::Operation *outerOp;
+  mlir::Block *body;
   llvm::SmallVector oneBasedIndices;
 };
 
@@ -366,11 +366,13 @@ struct LoopNest {
 /// \p isUnordered specifies whether the loops in the loop nest
 /// are unordered.
 LoopNest genLoopNest(mlir::Location loc, fir::FirOpBuilder &builder,
- mlir::ValueRange extents, bool isUnordered = false);
+ mlir::ValueRange extents, bool isUnordered = false,
+ bool emitWsLoop = false);
 inline LoopNest genLoopNest(mlir::Location loc, fir::FirOpBuilder &builder,
-mlir::Value shape, bool isUnordered = false) {
+mlir::Value shape, bool isUnordered = false,
+bool emitWsLoop = false) {
   return genLoopNest(loc, builder, getIndexExtents(loc, builder, shape),
- isUnordered);
+ isUnordered, emitWsLoop);
 }
 
 /// Inline the body of an hlfir.elemental at the current insertion point
diff --git a/flang/lib/Lower/ConvertCall.cpp b/flang/lib/Lower/ConvertCall.cpp
index fd873f55dd844e..0689d6e033dd9c 100644
--- a/flang/lib/Lower/ConvertCall.cpp
+++ b/flang/lib/Lower/ConvertCall.cpp
@@ -2128,7 +2128,7 @@ class ElementalCallBuilder {
   hlfir::genLoopNest(loc, builder, shape, !mustBeOrdered);
   mlir::ValueRange oneBasedIndices = loopNest.oneBasedIndices;
   auto insPt = builder.saveInsertionPoint();
-  builder.setInsertionPointToStart(loopNest.innerLoop.getBody());
+  builder.setInsertionPointToStart(loopNest.body);
   callContext.stmtCtx.pushScope();
   for (auto &preparedActual : loweredActuals)
 if (preparedActual)
diff --git a/flang/lib/Lower/OpenMP/ReductionProcessor.cpp 
b/flang/lib/Lower/OpenMP/ReductionProcessor.cpp
index c3c1f363033c27..72a90dd0d6f29d 100644
--- a/flang/lib/Lower/OpenMP/ReductionProcessor.cpp
+++ b/flang/lib/Lower/OpenMP/ReductionProcessor.cpp
@@ -375,7 +375,7 @@ static void genBoxCombiner(fir::FirOpBuilder &builder, 
mlir::Location loc,
   // know this won't miss any opportuinties for clever elemental inlining
   hlfir::LoopNest nest = hlfir::genLoopNest(
   loc, builder, shapeShift.getExtents(), /*isUnordered=*/true);
-  builder.setInsertionPointToStart(nest.innerLoop.getBody());
+  builder.setInsertionPointToStart(nest.body);
   mlir::Type refTy = fir::ReferenceType::get(seqTy.getEleTy());
   auto lhsEleAddr = builder.create(
   loc, refTy, lhs, shapeShift, /*slice=*/mlir::Value{},
@@ -389,7 +389,7 @@ static void genBoxCombiner(fir::FirOpBuilder &builder, 
mlir::Location loc,
   builder, loc, redId, refTy, lhsEle, rhsEle);
   builder.create(loc, scalarReduction, lhsEleAddr);
 
-  builder.setInsertionPointAfter(nest.outerLoop);
+  builder.setInsertionPointAfter(nest.outerOp);
   builder.create(loc, lhsAddr);
 }
 
diff --git a/flang/lib/Optimizer/Builder/HLFIRTools.cpp 
b/flang/lib/Optimizer/Builder/HLFIRTools.cpp
index 8d0ae2f195178c..cd07cb741eb4bb 100644
--- a/flang/lib/Optimizer/Builder/HLFIRTools.cpp
+++ b/flang/lib/Optimizer/Builder/HLFIRTools.cpp
@@ -20,6 +20,7 @@
 #include "mlir/IR/IRMapping.h"
 #include "mlir/Support/LLVM.h"
 #include "llvm/ADT/TypeSwitch.h"
+#include 
 #include 
 
 // Return explicit extents. If the base is a fir.box, this won't read it to
@@ -855,26 +856,51 @@ mlir::Value hlfir::inlineElementalOp(
 
 hlfir::LoopNest hlfir::genLoopNest(mlir::Location loc,
fir::FirOpBuilder &builder,
-   mlir::ValueRange extents, bool isUnordered) 
{
+   mlir::ValueRange extents, bool isUnordered,
+   bool emitWsLoop) {
   hlfir::LoopNest loopNest;
   assert(!extents.empty() && 

[llvm-branch-commits] [lld] release/19.x: [lld-macho] Fix crash: ObjC category merge + relative method lists (#104081) (PR #105615)

2024-08-21 Thread via llvm-branch-commits

https://github.com/llvmbot created 
https://github.com/llvm/llvm-project/pull/105615

Backport 0df91893efc752a76c7bbe6b063d66c8a2fa0d55

Requested by: @alx32

>From 643fd0b1a2a2fb73ea54f4f2ac6e6bb61238b99e Mon Sep 17 00:00:00 2001
From: alx32 <103613512+al...@users.noreply.github.com>
Date: Wed, 14 Aug 2024 19:30:41 -0700
Subject: [PATCH] [lld-macho] Fix crash: ObjC category merge + relative method
 lists (#104081)

A crash was happening when both ObjC Category Merging and Relative
method lists were enabled.

ObjC Category Merging creates new data sections and adds them by calling
`addInputSection`. `addInputSection` uses the symbols within the added
section to determine which container to actually add the section to.

The issue is that ObjC Category merging is calling `addInputSection`
before actually adding the relevant symbols the the added section. This
causes `addInputSection` to add the `InputSection` to the wrong
container, eventually resulting in a crash.

To fix this, we ensure that ObjC Category Merging calls
`addInputSection` only after the symbols have been added to the
`InputSection`.

(cherry picked from commit 0df91893efc752a76c7bbe6b063d66c8a2fa0d55)
---
 lld/MachO/ObjC.cpp| 10 +-
 .../MachO/objc-category-merging-minimal.s | 20 +--
 2 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/lld/MachO/ObjC.cpp b/lld/MachO/ObjC.cpp
index 9c056f40aa943f..39d885188d34ac 100644
--- a/lld/MachO/ObjC.cpp
+++ b/lld/MachO/ObjC.cpp
@@ -873,7 +873,6 @@ Defined *ObjcCategoryMerger::emitAndLinkProtocolList(
   infoCategoryWriter.catPtrListInfo.align);
   listSec->parent = infoCategoryWriter.catPtrListInfo.outputSection;
   listSec->live = true;
-  addInputSection(listSec);
 
   listSec->parent = infoCategoryWriter.catPtrListInfo.outputSection;
 
@@ -889,6 +888,7 @@ Defined *ObjcCategoryMerger::emitAndLinkProtocolList(
 
   ptrListSym->used = true;
   parentSym->getObjectFile()->symbols.push_back(ptrListSym);
+  addInputSection(listSec);
 
   createSymbolReference(parentSym, ptrListSym, linkAtOffset,
 infoCategoryWriter.catBodyInfo.relocTemplate);
@@ -933,7 +933,6 @@ void ObjcCategoryMerger::emitAndLinkPointerList(
   infoCategoryWriter.catPtrListInfo.align);
   listSec->parent = infoCategoryWriter.catPtrListInfo.outputSection;
   listSec->live = true;
-  addInputSection(listSec);
 
   listSec->parent = infoCategoryWriter.catPtrListInfo.outputSection;
 
@@ -949,6 +948,7 @@ void ObjcCategoryMerger::emitAndLinkPointerList(
 
   ptrListSym->used = true;
   parentSym->getObjectFile()->symbols.push_back(ptrListSym);
+  addInputSection(listSec);
 
   createSymbolReference(parentSym, ptrListSym, linkAtOffset,
 infoCategoryWriter.catBodyInfo.relocTemplate);
@@ -974,7 +974,6 @@ ObjcCategoryMerger::emitCatListEntrySec(const std::string 
&forCategoryName,
bodyData, infoCategoryWriter.catListInfo.align);
   newCatList->parent = infoCategoryWriter.catListInfo.outputSection;
   newCatList->live = true;
-  addInputSection(newCatList);
 
   newCatList->parent = infoCategoryWriter.catListInfo.outputSection;
 
@@ -990,6 +989,7 @@ ObjcCategoryMerger::emitCatListEntrySec(const std::string 
&forCategoryName,
 
   catListSym->used = true;
   objFile->symbols.push_back(catListSym);
+  addInputSection(newCatList);
   return catListSym;
 }
 
@@ -1012,7 +1012,6 @@ Defined *ObjcCategoryMerger::emitCategoryBody(const 
std::string &name,
bodyData, infoCategoryWriter.catBodyInfo.align);
   newBodySec->parent = infoCategoryWriter.catBodyInfo.outputSection;
   newBodySec->live = true;
-  addInputSection(newBodySec);
 
   std::string symName =
   objc::symbol_names::category + baseClassName + "(" + name + ")";
@@ -1025,6 +1024,7 @@ Defined *ObjcCategoryMerger::emitCategoryBody(const 
std::string &name,
 
   catBodySym->used = true;
   objFile->symbols.push_back(catBodySym);
+  addInputSection(newBodySec);
 
   createSymbolReference(catBodySym, nameSym, catLayout.nameOffset,
 infoCategoryWriter.catBodyInfo.relocTemplate);
@@ -1245,7 +1245,6 @@ void 
ObjcCategoryMerger::generateCatListForNonErasedCategories(
   infoCategoryWriter.catListInfo.align);
   listSec->parent = infoCategoryWriter.catListInfo.outputSection;
   listSec->live = true;
-  addInputSection(listSec);
 
   std::string slotSymName = "<__objc_catlist slot for category ";
   slotSymName += nonErasedCatBody->getName();
@@ -1260,6 +1259,7 @@ void 
ObjcCategoryMerger::generateCatListForNonErasedCategories(
 
   catListSlotSym->used = true;
   objFile->symbols.push_back(catListSlotSym);
+  addInputSection(listSec);
 
   // Now link the category body into the newly created slot
   createSymbolReference(catListSlotSym, nonErasedCatBody, 0,
diff --git a/lld/test/MachO/objc-category-merging-minimal.s 
b/lld/test/MachO/objc-categ

  1   2   >