[llvm-branch-commits] [llvm] use default intrinsic attrs for BPF packet loads (PR #105314)
https://github.com/nikic approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/105314 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/19.x: [AMDGPU] Disable inline constants for pseudo scalar transcendentals (#104395) (PR #105472)
https://github.com/jayfoad approved this pull request. https://github.com/llvm/llvm-project/pull/105472 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [mlir][OpenMP] Convert reduction alloc region to LLVMIR (PR #102524)
@@ -594,45 +594,85 @@ convertOmpOrderedRegion(Operation &opInst, llvm::IRBuilderBase &builder, /// Allocate space for privatized reduction variables. template -static void allocByValReductionVars( -T loop, ArrayRef reductionArgs, llvm::IRBuilderBase &builder, -LLVM::ModuleTranslation &moduleTranslation, -llvm::OpenMPIRBuilder::InsertPointTy &allocaIP, -SmallVectorImpl &reductionDecls, -SmallVectorImpl &privateReductionVariables, -DenseMap &reductionVariableMap, -llvm::ArrayRef isByRefs) { +static LogicalResult +allocReductionVars(T loop, ArrayRef reductionArgs, + llvm::IRBuilderBase &builder, + LLVM::ModuleTranslation &moduleTranslation, + llvm::OpenMPIRBuilder::InsertPointTy &allocaIP, + SmallVectorImpl &reductionDecls, + SmallVectorImpl &privateReductionVariables, + DenseMap &reductionVariableMap, + llvm::ArrayRef isByRefs) { llvm::IRBuilderBase::InsertPointGuard guard(builder); builder.SetInsertPoint(allocaIP.getBlock()->getTerminator()); + // delay creating stores until after all allocas + SmallVector> storesToCreate; + storesToCreate.reserve(loop.getNumReductionVars()); + for (std::size_t i = 0; i < loop.getNumReductionVars(); ++i) { -if (isByRefs[i]) - continue; -llvm::Value *var = builder.CreateAlloca( -moduleTranslation.convertType(reductionDecls[i].getType())); -moduleTranslation.mapValue(reductionArgs[i], var); -privateReductionVariables[i] = var; -reductionVariableMap.try_emplace(loop.getReductionVars()[i], var); +Region &allocRegion = reductionDecls[i].getAllocRegion(); +if (isByRefs[i]) { + if (allocRegion.empty()) tblah wrote: The alloc region is optional. If it isn't included it could still be included in the initialization region as normal. This could happen for example if there is no part of allocation that is on the stack (because we don't want a call to malloc mixed into the middle of allocas). https://github.com/llvm/llvm-project/pull/102524 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [FixIrreducible] Use CycleInfo instead of a custom SCC traversal (PR #101386)
https://github.com/ssahasra edited https://github.com/llvm/llvm-project/pull/101386 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [MLIR][omp] Add omp.workshare op (PR #101443)
https://github.com/tblah approved this pull request. LGTM. Thanks for the updates https://github.com/llvm/llvm-project/pull/101443 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [FixIrreducible] Use CycleInfo instead of a custom SCC traversal (PR #101386)
@@ -189,6 +195,21 @@ template class GenericCycle { //@{ using const_entry_iterator = typename SmallVectorImpl::const_iterator; + const_entry_iterator entry_begin() const { +return const_entry_iterator{Entries.begin()}; ssahasra wrote: Fixed. https://github.com/llvm/llvm-project/pull/101386 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [FixIrreducible] Use CycleInfo instead of a custom SCC traversal (PR #101386)
@@ -107,6 +107,12 @@ template class GenericCycle { return is_contained(Entries, Block); } + /// \brief Replace all entries with \p Block as single entry. + void setSingleEntry(BlockT *Block) { +Entries.clear(); +Entries.push_back(Block); ssahasra wrote: Fixed. https://github.com/llvm/llvm-project/pull/101386 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [FixIrreducible] Use CycleInfo instead of a custom SCC traversal (PR #101386)
ssahasra wrote: > This needs a finer method that redirects only specific edges. Either that, or > we let the pass destroy some cycles. But updating `CycleInfo` for these > missing subcycles may be a fair amount of work too, so I would rather do it > the right way. This now depends on the newly refactored ControlFlowHub, which correctly reroutes only the relevant edges. The effect was already caught in an existing test with nested cycles and a common header, so no new test needs to be written for this. https://github.com/llvm/llvm-project/pull/101386 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [FixIrreducible] Use CycleInfo instead of a custom SCC traversal (PR #101386)
ssahasra wrote: > Note that I have not yet finished verifying all the lit tests. I might also > have to add a few more tests, especially involving a mix of irreducible and > reducible cycles that are siblings and/or nested inside each other in various > combinations. Especially with some overlap in the entry and header nodes. - New tests added that involve nesting with common header or entry nodes. Existing tests also covered some relevant combinations. - Verified all tests. https://github.com/llvm/llvm-project/pull/101386 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [FixIrreducible] Use CycleInfo instead of a custom SCC traversal (PR #101386)
https://github.com/ssahasra edited https://github.com/llvm/llvm-project/pull/101386 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [FixIrreducible] Use CycleInfo instead of a custom SCC traversal (PR #101386)
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/101386 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang][omp] Emit omp.workshare in frontend (PR #101444)
tblah wrote: > Should we have a `-use-experimental-workshare` or similar flag to facilitate > some temporary in-tree development as this may require more moving pieces? A flag like that sounds appropriate yes. The current code changes look good. https://github.com/llvm/llvm-project/pull/101444 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] Backport taint analysis slowdown regression fix (PR #105516)
https://github.com/steakhal created https://github.com/llvm/llvm-project/pull/105516 Same as the cherry-picked commit + the release notes. >From 1d10df6937e914e610da9c5818ba09ff711beb05 Mon Sep 17 00:00:00 2001 From: Balazs Benics Date: Wed, 21 Aug 2024 14:24:56 +0200 Subject: [PATCH 1/2] [analyzer] Limit `isTainted()` by skipping complicated symbols (#105493) As discussed in https://discourse.llvm.org/t/rfc-make-istainted-and-complex-symbols-friends/79570/10 Some `isTainted()` queries can blow up the analysis times, and effectively halt the analysis under specific workloads. We don't really have the time now to do a caching re-implementation of `isTainted()`, so we need to workaround the case. The workaround with the smallest blast radius was to limit what symbols `isTainted()` does the query (by walking the SymExpr). So far, the threshold 10 worked for us, but this value can be overridden using the "max-tainted-symbol-complexity" config value. This new option is "deprecated" from the getgo, as I expect this issue to be fixed within the next few months and I don't want users to override this value anyways. If they do, this message will let them know that they are on their own, and the next release may break them (as we no longer recognize this option if we drop it). Mitigates #89720 CPP-5414 (cherry picked from commit 848658955a9d2d42ea3e319d191e2dcd5d76c837) --- .../StaticAnalyzer/Core/AnalyzerOptions.def | 5 ++ clang/lib/StaticAnalyzer/Checkers/Taint.cpp | 7 +++ clang/test/Analysis/analyzer-config.c | 1 + clang/test/Analysis/taint-generic.c | 49 ++- 4 files changed, 61 insertions(+), 1 deletion(-) diff --git a/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def b/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def index 29aa6a3b8a16e7..737bc8e86cfb6a 100644 --- a/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def +++ b/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def @@ -407,6 +407,11 @@ ANALYZER_OPTION( ANALYZER_OPTION(unsigned, MaxSymbolComplexity, "max-symbol-complexity", "The maximum complexity of symbolic constraint.", 35) +// HACK:https://discourse.llvm.org/t/rfc-make-istainted-and-complex-symbols-friends/79570 +// Ideally, we should get rid of this option soon. +ANALYZER_OPTION(unsigned, MaxTaintedSymbolComplexity, "max-tainted-symbol-complexity", +"[DEPRECATED] The maximum complexity of a symbol to carry taint", 9) + ANALYZER_OPTION(unsigned, MaxTimesInlineLarge, "max-times-inline-large", "The maximum times a large function could be inlined.", 32) diff --git a/clang/lib/StaticAnalyzer/Checkers/Taint.cpp b/clang/lib/StaticAnalyzer/Checkers/Taint.cpp index 6362c82b009d72..0bb5739db4b756 100644 --- a/clang/lib/StaticAnalyzer/Checkers/Taint.cpp +++ b/clang/lib/StaticAnalyzer/Checkers/Taint.cpp @@ -12,6 +12,7 @@ #include "clang/StaticAnalyzer/Checkers/Taint.h" #include "clang/StaticAnalyzer/Core/BugReporter/BugReporter.h" +#include "clang/StaticAnalyzer/Core/PathSensitive/AnalysisManager.h" #include "clang/StaticAnalyzer/Core/PathSensitive/ProgramStateTrait.h" #include @@ -256,6 +257,12 @@ std::vector taint::getTaintedSymbolsImpl(ProgramStateRef State, if (!Sym) return TaintedSymbols; + // HACK:https://discourse.llvm.org/t/rfc-make-istainted-and-complex-symbols-friends/79570 + if (const auto &Opts = State->getAnalysisManager().getAnalyzerOptions(); + Sym->computeComplexity() > Opts.MaxTaintedSymbolComplexity) { +return {}; + } + // Traverse all the symbols this symbol depends on to see if any are tainted. for (SymbolRef SubSym : Sym->symbols()) { if (!isa(SubSym)) diff --git a/clang/test/Analysis/analyzer-config.c b/clang/test/Analysis/analyzer-config.c index 2a4c40005a4dc0..1ee0d8e4eecebd 100644 --- a/clang/test/Analysis/analyzer-config.c +++ b/clang/test/Analysis/analyzer-config.c @@ -96,6 +96,7 @@ // CHECK-NEXT: max-inlinable-size = 100 // CHECK-NEXT: max-nodes = 225000 // CHECK-NEXT: max-symbol-complexity = 35 +// CHECK-NEXT: max-tainted-symbol-complexity = 9 // CHECK-NEXT: max-times-inline-large = 32 // CHECK-NEXT: min-cfg-size-treat-functions-as-large = 14 // CHECK-NEXT: mode = deep diff --git a/clang/test/Analysis/taint-generic.c b/clang/test/Analysis/taint-generic.c index b0df85f237298d..1c139312734bca 100644 --- a/clang/test/Analysis/taint-generic.c +++ b/clang/test/Analysis/taint-generic.c @@ -63,6 +63,7 @@ void clang_analyzer_isTainted_char(char); void clang_analyzer_isTainted_wchar(wchar_t); void clang_analyzer_isTainted_charp(char*); void clang_analyzer_isTainted_int(int); +void clang_analyzer_dump_int(int); int coin(); @@ -459,7 +460,53 @@ unsigned radar11369570_hanging(const unsigned char *arr, int l) { longcmp(a, t, c); l -= 12; } - return 5/a; // expected-warning {{Division by a tainted value, possibly zero}} + return 5/a; // FIXME: Should be a "div by
[llvm-branch-commits] [clang] Backport taint analysis slowdown regression fix (PR #105516)
https://github.com/steakhal milestoned https://github.com/llvm/llvm-project/pull/105516 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] Backport taint analysis slowdown regression fix (PR #105516)
llvmbot wrote: @llvm/pr-subscribers-clang-static-analyzer-1 Author: Balazs Benics (steakhal) Changes Same as the cherry-picked commit + the release notes. --- Full diff: https://github.com/llvm/llvm-project/pull/105516.diff 5 Files Affected: - (modified) clang/docs/ReleaseNotes.rst (+5) - (modified) clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def (+5) - (modified) clang/lib/StaticAnalyzer/Checkers/Taint.cpp (+7) - (modified) clang/test/Analysis/analyzer-config.c (+1) - (modified) clang/test/Analysis/taint-generic.c (+48-1) ``diff diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst index 17ddbfe910f878..fa69fcb9aa29bf 100644 --- a/clang/docs/ReleaseNotes.rst +++ b/clang/docs/ReleaseNotes.rst @@ -1391,6 +1391,11 @@ Crash and bug fixes - Z3 crosschecking (aka. Z3 refutation) is now bounded, and can't consume more total time than the eymbolic execution itself. (#GH97298) +- In clang-18, we regressed in terms of analysis time for projects having many + nested loops with buffer indexing or shifting or other binary operations. + For example, functions computing different hash values. Some of this slowdown + was attributed to taint analysis, which is fixed now. (#GH105493) + - ``std::addressof``, ``std::as_const``, ``std::forward``, ``std::forward_like``, ``std::move``, ``std::move_if_noexcept``, are now modeled just like their builtin counterpart. (#GH94193) diff --git a/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def b/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def index 29aa6a3b8a16e7..737bc8e86cfb6a 100644 --- a/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def +++ b/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def @@ -407,6 +407,11 @@ ANALYZER_OPTION( ANALYZER_OPTION(unsigned, MaxSymbolComplexity, "max-symbol-complexity", "The maximum complexity of symbolic constraint.", 35) +// HACK:https://discourse.llvm.org/t/rfc-make-istainted-and-complex-symbols-friends/79570 +// Ideally, we should get rid of this option soon. +ANALYZER_OPTION(unsigned, MaxTaintedSymbolComplexity, "max-tainted-symbol-complexity", +"[DEPRECATED] The maximum complexity of a symbol to carry taint", 9) + ANALYZER_OPTION(unsigned, MaxTimesInlineLarge, "max-times-inline-large", "The maximum times a large function could be inlined.", 32) diff --git a/clang/lib/StaticAnalyzer/Checkers/Taint.cpp b/clang/lib/StaticAnalyzer/Checkers/Taint.cpp index 6362c82b009d72..0bb5739db4b756 100644 --- a/clang/lib/StaticAnalyzer/Checkers/Taint.cpp +++ b/clang/lib/StaticAnalyzer/Checkers/Taint.cpp @@ -12,6 +12,7 @@ #include "clang/StaticAnalyzer/Checkers/Taint.h" #include "clang/StaticAnalyzer/Core/BugReporter/BugReporter.h" +#include "clang/StaticAnalyzer/Core/PathSensitive/AnalysisManager.h" #include "clang/StaticAnalyzer/Core/PathSensitive/ProgramStateTrait.h" #include @@ -256,6 +257,12 @@ std::vector taint::getTaintedSymbolsImpl(ProgramStateRef State, if (!Sym) return TaintedSymbols; + // HACK:https://discourse.llvm.org/t/rfc-make-istainted-and-complex-symbols-friends/79570 + if (const auto &Opts = State->getAnalysisManager().getAnalyzerOptions(); + Sym->computeComplexity() > Opts.MaxTaintedSymbolComplexity) { +return {}; + } + // Traverse all the symbols this symbol depends on to see if any are tainted. for (SymbolRef SubSym : Sym->symbols()) { if (!isa(SubSym)) diff --git a/clang/test/Analysis/analyzer-config.c b/clang/test/Analysis/analyzer-config.c index 2a4c40005a4dc0..1ee0d8e4eecebd 100644 --- a/clang/test/Analysis/analyzer-config.c +++ b/clang/test/Analysis/analyzer-config.c @@ -96,6 +96,7 @@ // CHECK-NEXT: max-inlinable-size = 100 // CHECK-NEXT: max-nodes = 225000 // CHECK-NEXT: max-symbol-complexity = 35 +// CHECK-NEXT: max-tainted-symbol-complexity = 9 // CHECK-NEXT: max-times-inline-large = 32 // CHECK-NEXT: min-cfg-size-treat-functions-as-large = 14 // CHECK-NEXT: mode = deep diff --git a/clang/test/Analysis/taint-generic.c b/clang/test/Analysis/taint-generic.c index b0df85f237298d..1c139312734bca 100644 --- a/clang/test/Analysis/taint-generic.c +++ b/clang/test/Analysis/taint-generic.c @@ -63,6 +63,7 @@ void clang_analyzer_isTainted_char(char); void clang_analyzer_isTainted_wchar(wchar_t); void clang_analyzer_isTainted_charp(char*); void clang_analyzer_isTainted_int(int); +void clang_analyzer_dump_int(int); int coin(); @@ -459,7 +460,53 @@ unsigned radar11369570_hanging(const unsigned char *arr, int l) { longcmp(a, t, c); l -= 12; } - return 5/a; // expected-warning {{Division by a tainted value, possibly zero}} + return 5/a; // FIXME: Should be a "div by tainted" warning here. +} + +// This computation used to take a very long time. +void complex_taint_queries(const int *p) { + int tainted = 0; + scanf("%d", &tainted); + + // Make "tmp" tainted. + int tmp = tainted + tainted; + clang_a
[llvm-branch-commits] [clang] Backport taint analysis slowdown regression fix (PR #105516)
https://github.com/steakhal edited https://github.com/llvm/llvm-project/pull/105516 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [MLIR][omp] Add omp.workshare op (PR #101443)
https://github.com/ivanradanov closed https://github.com/llvm/llvm-project/pull/101443 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [MLIR][omp] Add omp.workshare op (PR #101443)
https://github.com/ivanradanov reopened https://github.com/llvm/llvm-project/pull/101443 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [WIP][flang] Introduce HLFIR lowerings to omp.workshare_loop_nest (PR #104748)
https://github.com/ivanradanov updated https://github.com/llvm/llvm-project/pull/104748 >From 4b1c15bf4dcd753e35ec5c1118b107ea058c58df Mon Sep 17 00:00:00 2001 From: Ivan Radanov Ivanov Date: Sun, 4 Aug 2024 17:33:52 +0900 Subject: [PATCH 1/5] Add workshare loop wrapper lowerings --- .../lib/Optimizer/HLFIR/Transforms/BufferizeHLFIR.cpp | 6 -- .../HLFIR/Transforms/OptimizedBufferization.cpp| 10 +++--- 2 files changed, 11 insertions(+), 5 deletions(-) diff --git a/flang/lib/Optimizer/HLFIR/Transforms/BufferizeHLFIR.cpp b/flang/lib/Optimizer/HLFIR/Transforms/BufferizeHLFIR.cpp index b608677c526310..1848dbe2c7a2c2 100644 --- a/flang/lib/Optimizer/HLFIR/Transforms/BufferizeHLFIR.cpp +++ b/flang/lib/Optimizer/HLFIR/Transforms/BufferizeHLFIR.cpp @@ -26,12 +26,13 @@ #include "flang/Optimizer/HLFIR/HLFIRDialect.h" #include "flang/Optimizer/HLFIR/HLFIROps.h" #include "flang/Optimizer/HLFIR/Passes.h" +#include "flang/Optimizer/OpenMP/Passes.h" +#include "mlir/Dialect/OpenMP/OpenMPDialect.h" #include "mlir/IR/Dominance.h" #include "mlir/IR/PatternMatch.h" #include "mlir/Pass/Pass.h" #include "mlir/Pass/PassManager.h" #include "mlir/Transforms/DialectConversion.h" -#include "mlir/Dialect/OpenMP/OpenMPDialect.h" #include "llvm/ADT/TypeSwitch.h" namespace hlfir { @@ -792,7 +793,8 @@ struct ElementalOpConversion // Generate a loop nest looping around the fir.elemental shape and clone // fir.elemental region inside the inner loop. hlfir::LoopNest loopNest = -hlfir::genLoopNest(loc, builder, extents, !elemental.isOrdered()); +hlfir::genLoopNest(loc, builder, extents, !elemental.isOrdered(), + flangomp::shouldUseWorkshareLowering(elemental)); auto insPt = builder.saveInsertionPoint(); builder.setInsertionPointToStart(loopNest.body); auto yield = hlfir::inlineElementalOp(loc, builder, elemental, diff --git a/flang/lib/Optimizer/HLFIR/Transforms/OptimizedBufferization.cpp b/flang/lib/Optimizer/HLFIR/Transforms/OptimizedBufferization.cpp index 3a0a98dc594463..f014724861e333 100644 --- a/flang/lib/Optimizer/HLFIR/Transforms/OptimizedBufferization.cpp +++ b/flang/lib/Optimizer/HLFIR/Transforms/OptimizedBufferization.cpp @@ -20,6 +20,7 @@ #include "flang/Optimizer/HLFIR/HLFIRDialect.h" #include "flang/Optimizer/HLFIR/HLFIROps.h" #include "flang/Optimizer/HLFIR/Passes.h" +#include "flang/Optimizer/OpenMP/Passes.h" #include "flang/Optimizer/Transforms/Utils.h" #include "mlir/Dialect/Func/IR/FuncOps.h" #include "mlir/IR/Dominance.h" @@ -482,7 +483,8 @@ llvm::LogicalResult ElementalAssignBufferization::matchAndRewrite( // Generate a loop nest looping around the hlfir.elemental shape and clone // hlfir.elemental region inside the inner loop hlfir::LoopNest loopNest = - hlfir::genLoopNest(loc, builder, extents, !elemental.isOrdered()); + hlfir::genLoopNest(loc, builder, extents, !elemental.isOrdered(), + flangomp::shouldUseWorkshareLowering(elemental)); builder.setInsertionPointToStart(loopNest.body); auto yield = hlfir::inlineElementalOp(loc, builder, elemental, loopNest.oneBasedIndices); @@ -553,7 +555,8 @@ llvm::LogicalResult BroadcastAssignBufferization::matchAndRewrite( llvm::SmallVector extents = hlfir::getIndexExtents(loc, builder, shape); hlfir::LoopNest loopNest = - hlfir::genLoopNest(loc, builder, extents, /*isUnordered=*/true); + hlfir::genLoopNest(loc, builder, extents, /*isUnordered=*/true, + flangomp::shouldUseWorkshareLowering(assign)); builder.setInsertionPointToStart(loopNest.body); auto arrayElement = hlfir::getElementAt(loc, builder, lhs, loopNest.oneBasedIndices); @@ -648,7 +651,8 @@ llvm::LogicalResult VariableAssignBufferization::matchAndRewrite( llvm::SmallVector extents = hlfir::getIndexExtents(loc, builder, shape); hlfir::LoopNest loopNest = - hlfir::genLoopNest(loc, builder, extents, /*isUnordered=*/true); + hlfir::genLoopNest(loc, builder, extents, /*isUnordered=*/true, + flangomp::shouldUseWorkshareLowering(assign)); builder.setInsertionPointToStart(loopNest.body); auto rhsArrayElement = hlfir::getElementAt(loc, builder, rhs, loopNest.oneBasedIndices); >From a79d7c8cee84295ef7281b0b6aabf2ea5ed50b9e Mon Sep 17 00:00:00 2001 From: Ivan Radanov Ivanov Date: Mon, 19 Aug 2024 15:01:31 +0900 Subject: [PATCH 2/5] Bufferize test --- flang/test/HLFIR/bufferize-workshare.fir | 58 1 file changed, 58 insertions(+) create mode 100644 flang/test/HLFIR/bufferize-workshare.fir diff --git a/flang/test/HLFIR/bufferize-workshare.fir b/flang/test/HLFIR/bufferize-workshare.fir new file mode 100644 index 00..86a2f031478dd7 --- /dev/null +++ b/flang/test/HLFIR/bufferize-workshare.fir @@ -0,0 +1,58 @@ +// RUN: fir-opt --bufferize-hlfir %s | FileCheck %s + +// CH
[llvm-branch-commits] [flang] [flang][omp] Emit omp.workshare in frontend (PR #101444)
https://github.com/ivanradanov updated https://github.com/llvm/llvm-project/pull/101444 >From 3d1258582adc0ec506a23dc3efdba371c29612ca Mon Sep 17 00:00:00 2001 From: Ivan Radanov Ivanov Date: Wed, 31 Jul 2024 14:11:47 +0900 Subject: [PATCH 1/2] [flang][omp] Emit omp.workshare in frontend --- flang/lib/Lower/OpenMP/OpenMP.cpp | 30 ++ 1 file changed, 26 insertions(+), 4 deletions(-) diff --git a/flang/lib/Lower/OpenMP/OpenMP.cpp b/flang/lib/Lower/OpenMP/OpenMP.cpp index d614db8b68ef65..83c90374afa5e3 100644 --- a/flang/lib/Lower/OpenMP/OpenMP.cpp +++ b/flang/lib/Lower/OpenMP/OpenMP.cpp @@ -1272,6 +1272,15 @@ static void genTaskwaitClauses(lower::AbstractConverter &converter, loc, llvm::omp::Directive::OMPD_taskwait); } +static void genWorkshareClauses(lower::AbstractConverter &converter, +semantics::SemanticsContext &semaCtx, +lower::StatementContext &stmtCtx, +const List &clauses, mlir::Location loc, +mlir::omp::WorkshareOperands &clauseOps) { + ClauseProcessor cp(converter, semaCtx, clauses); + cp.processNowait(clauseOps); +} + static void genTeamsClauses(lower::AbstractConverter &converter, semantics::SemanticsContext &semaCtx, lower::StatementContext &stmtCtx, @@ -1897,6 +1906,22 @@ genTaskyieldOp(lower::AbstractConverter &converter, lower::SymMap &symTable, return converter.getFirOpBuilder().create(loc); } +static mlir::omp::WorkshareOp +genWorkshareOp(lower::AbstractConverter &converter, lower::SymMap &symTable, + semantics::SemanticsContext &semaCtx, lower::pft::Evaluation &eval, + mlir::Location loc, const ConstructQueue &queue, + ConstructQueue::iterator item) { + lower::StatementContext stmtCtx; + mlir::omp::WorkshareOperands clauseOps; + genWorkshareClauses(converter, semaCtx, stmtCtx, item->clauses, loc, clauseOps); + + return genOpWithBody( + OpWithBodyGenInfo(converter, symTable, semaCtx, loc, eval, +llvm::omp::Directive::OMPD_workshare) + .setClauses(&item->clauses), + queue, item, clauseOps); +} + static mlir::omp::TeamsOp genTeamsOp(lower::AbstractConverter &converter, lower::SymMap &symTable, semantics::SemanticsContext &semaCtx, lower::pft::Evaluation &eval, @@ -2309,10 +2334,7 @@ static void genOMPDispatch(lower::AbstractConverter &converter, llvm::omp::getOpenMPDirectiveName(dir) + ")"); // case llvm::omp::Directive::OMPD_workdistribute: case llvm::omp::Directive::OMPD_workshare: -// FIXME: Workshare is not a commonly used OpenMP construct, an -// implementation for this feature will come later. For the codes -// that use this construct, add a single construct for now. -genSingleOp(converter, symTable, semaCtx, eval, loc, queue, item); +genWorkshareOp(converter, symTable, semaCtx, eval, loc, queue, item); break; default: // Combined and composite constructs should have been split into a sequence >From 5e01e41362f11f2309dea217ada9026aa437433d Mon Sep 17 00:00:00 2001 From: Ivan Radanov Ivanov Date: Sun, 4 Aug 2024 16:02:37 +0900 Subject: [PATCH 2/2] Fix lower test for workshare --- flang/test/Lower/OpenMP/workshare.f90 | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/flang/test/Lower/OpenMP/workshare.f90 b/flang/test/Lower/OpenMP/workshare.f90 index 1e11677a15e1f0..8e771952f5b6da 100644 --- a/flang/test/Lower/OpenMP/workshare.f90 +++ b/flang/test/Lower/OpenMP/workshare.f90 @@ -6,7 +6,7 @@ subroutine sb1(arr) integer :: arr(:) !CHECK: omp.parallel { !$omp parallel -!CHECK: omp.single { +!CHECK: omp.workshare { !$omp workshare arr = 0 !$omp end workshare @@ -20,7 +20,7 @@ subroutine sb2(arr) integer :: arr(:) !CHECK: omp.parallel { !$omp parallel -!CHECK: omp.single nowait { +!CHECK: omp.workshare nowait { !$omp workshare arr = 0 !$omp end workshare nowait @@ -33,7 +33,7 @@ subroutine sb2(arr) subroutine sb3(arr) integer :: arr(:) !CHECK: omp.parallel { -!CHECK: omp.single { +!CHECK: omp.workshare { !$omp parallel workshare arr = 0 !$omp end parallel workshare ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang] Introduce custom loop nest generation for loops in workshare construct (PR #101445)
https://github.com/ivanradanov updated https://github.com/llvm/llvm-project/pull/101445 >From 451a9d2f26cfd8cb770d1ae35d834c63fce56b79 Mon Sep 17 00:00:00 2001 From: Ivan Radanov Ivanov Date: Wed, 31 Jul 2024 14:12:34 +0900 Subject: [PATCH 1/4] [flang] Introduce ws loop nest generation for HLFIR lowering --- .../flang/Optimizer/Builder/HLFIRTools.h | 12 +++-- flang/lib/Lower/ConvertCall.cpp | 2 +- flang/lib/Lower/OpenMP/ReductionProcessor.cpp | 4 +- flang/lib/Optimizer/Builder/HLFIRTools.cpp| 52 ++- .../HLFIR/Transforms/BufferizeHLFIR.cpp | 3 +- .../LowerHLFIROrderedAssignments.cpp | 30 +-- .../Transforms/OptimizedBufferization.cpp | 6 +-- 7 files changed, 69 insertions(+), 40 deletions(-) diff --git a/flang/include/flang/Optimizer/Builder/HLFIRTools.h b/flang/include/flang/Optimizer/Builder/HLFIRTools.h index 6b41025eea0780..14e42c6f358e46 100644 --- a/flang/include/flang/Optimizer/Builder/HLFIRTools.h +++ b/flang/include/flang/Optimizer/Builder/HLFIRTools.h @@ -357,8 +357,8 @@ hlfir::ElementalOp genElementalOp( /// Structure to describe a loop nest. struct LoopNest { - fir::DoLoopOp outerLoop; - fir::DoLoopOp innerLoop; + mlir::Operation *outerOp; + mlir::Block *body; llvm::SmallVector oneBasedIndices; }; @@ -366,11 +366,13 @@ struct LoopNest { /// \p isUnordered specifies whether the loops in the loop nest /// are unordered. LoopNest genLoopNest(mlir::Location loc, fir::FirOpBuilder &builder, - mlir::ValueRange extents, bool isUnordered = false); + mlir::ValueRange extents, bool isUnordered = false, + bool emitWsLoop = false); inline LoopNest genLoopNest(mlir::Location loc, fir::FirOpBuilder &builder, -mlir::Value shape, bool isUnordered = false) { +mlir::Value shape, bool isUnordered = false, +bool emitWsLoop = false) { return genLoopNest(loc, builder, getIndexExtents(loc, builder, shape), - isUnordered); + isUnordered, emitWsLoop); } /// Inline the body of an hlfir.elemental at the current insertion point diff --git a/flang/lib/Lower/ConvertCall.cpp b/flang/lib/Lower/ConvertCall.cpp index fd873f55dd844e..0689d6e033dd9c 100644 --- a/flang/lib/Lower/ConvertCall.cpp +++ b/flang/lib/Lower/ConvertCall.cpp @@ -2128,7 +2128,7 @@ class ElementalCallBuilder { hlfir::genLoopNest(loc, builder, shape, !mustBeOrdered); mlir::ValueRange oneBasedIndices = loopNest.oneBasedIndices; auto insPt = builder.saveInsertionPoint(); - builder.setInsertionPointToStart(loopNest.innerLoop.getBody()); + builder.setInsertionPointToStart(loopNest.body); callContext.stmtCtx.pushScope(); for (auto &preparedActual : loweredActuals) if (preparedActual) diff --git a/flang/lib/Lower/OpenMP/ReductionProcessor.cpp b/flang/lib/Lower/OpenMP/ReductionProcessor.cpp index c3c1f363033c27..72a90dd0d6f29d 100644 --- a/flang/lib/Lower/OpenMP/ReductionProcessor.cpp +++ b/flang/lib/Lower/OpenMP/ReductionProcessor.cpp @@ -375,7 +375,7 @@ static void genBoxCombiner(fir::FirOpBuilder &builder, mlir::Location loc, // know this won't miss any opportuinties for clever elemental inlining hlfir::LoopNest nest = hlfir::genLoopNest( loc, builder, shapeShift.getExtents(), /*isUnordered=*/true); - builder.setInsertionPointToStart(nest.innerLoop.getBody()); + builder.setInsertionPointToStart(nest.body); mlir::Type refTy = fir::ReferenceType::get(seqTy.getEleTy()); auto lhsEleAddr = builder.create( loc, refTy, lhs, shapeShift, /*slice=*/mlir::Value{}, @@ -389,7 +389,7 @@ static void genBoxCombiner(fir::FirOpBuilder &builder, mlir::Location loc, builder, loc, redId, refTy, lhsEle, rhsEle); builder.create(loc, scalarReduction, lhsEleAddr); - builder.setInsertionPointAfter(nest.outerLoop); + builder.setInsertionPointAfter(nest.outerOp); builder.create(loc, lhsAddr); } diff --git a/flang/lib/Optimizer/Builder/HLFIRTools.cpp b/flang/lib/Optimizer/Builder/HLFIRTools.cpp index 8d0ae2f195178c..cd07cb741eb4bb 100644 --- a/flang/lib/Optimizer/Builder/HLFIRTools.cpp +++ b/flang/lib/Optimizer/Builder/HLFIRTools.cpp @@ -20,6 +20,7 @@ #include "mlir/IR/IRMapping.h" #include "mlir/Support/LLVM.h" #include "llvm/ADT/TypeSwitch.h" +#include #include // Return explicit extents. If the base is a fir.box, this won't read it to @@ -855,26 +856,51 @@ mlir::Value hlfir::inlineElementalOp( hlfir::LoopNest hlfir::genLoopNest(mlir::Location loc, fir::FirOpBuilder &builder, - mlir::ValueRange extents, bool isUnordered) { + mlir::ValueRange extents, bool isUnordered, + bool emitWsLoop) { hlfir::LoopNest loopNest; assert(!extents.empty() &&
[llvm-branch-commits] [flang] [flang] Lower omp.workshare to other omp constructs (PR #101446)
https://github.com/ivanradanov updated https://github.com/llvm/llvm-project/pull/101446 error: too big or took too long to generate ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] 5e5d819 - Revert "[flang][NFC] Move OpenMP related passes into a separate directory (#1…"
Author: Ivan R. Ivanov Date: 2024-08-21T23:15:19+09:00 New Revision: 5e5d819fa261a49a30deae95737563c807964ae5 URL: https://github.com/llvm/llvm-project/commit/5e5d819fa261a49a30deae95737563c807964ae5 DIFF: https://github.com/llvm/llvm-project/commit/5e5d819fa261a49a30deae95737563c807964ae5.diff LOG: Revert "[flang][NFC] Move OpenMP related passes into a separate directory (#1…" This reverts commit 87eeed1f0ebe57abffde560c25dd9829dc6038f3. Added: flang/lib/Optimizer/Transforms/OMPFunctionFiltering.cpp flang/lib/Optimizer/Transforms/OMPMapInfoFinalization.cpp flang/lib/Optimizer/Transforms/OMPMarkDeclareTarget.cpp Modified: flang/docs/OpenMP-declare-target.md flang/docs/OpenMP-descriptor-management.md flang/include/flang/Optimizer/CMakeLists.txt flang/include/flang/Optimizer/Transforms/Passes.td flang/include/flang/Tools/CLOptions.inc flang/lib/Frontend/CMakeLists.txt flang/lib/Optimizer/CMakeLists.txt flang/lib/Optimizer/Transforms/CMakeLists.txt flang/tools/bbc/CMakeLists.txt flang/tools/fir-opt/CMakeLists.txt flang/tools/fir-opt/fir-opt.cpp flang/tools/tco/CMakeLists.txt Removed: flang/include/flang/Optimizer/OpenMP/CMakeLists.txt flang/include/flang/Optimizer/OpenMP/Passes.h flang/include/flang/Optimizer/OpenMP/Passes.td flang/lib/Optimizer/OpenMP/CMakeLists.txt flang/lib/Optimizer/OpenMP/FunctionFiltering.cpp flang/lib/Optimizer/OpenMP/MapInfoFinalization.cpp flang/lib/Optimizer/OpenMP/MarkDeclareTarget.cpp diff --git a/flang/docs/OpenMP-declare-target.md b/flang/docs/OpenMP-declare-target.md index 45062469007b65..d29a46807e1eaf 100644 --- a/flang/docs/OpenMP-declare-target.md +++ b/flang/docs/OpenMP-declare-target.md @@ -149,7 +149,7 @@ flang/lib/Lower/OpenMP.cpp function `genDeclareTargetIntGlobal`. There are currently two passes within Flang that are related to the processing of `declare target`: -* `MarkDeclareTarget` - This pass is in charge of marking functions captured +* `OMPMarkDeclareTarget` - This pass is in charge of marking functions captured (called from) in `target` regions or other `declare target` marked functions as `declare target`. It does so recursively, i.e. nested calls will also be implicitly marked. It currently will try to mark things as conservatively as @@ -157,7 +157,7 @@ possible, e.g. if captured in a `target` region it will apply `nohost`, unless it encounters a `host` `declare target` in which case it will apply the `any` device type. Functions are handled similarly, except we utilise the parent's device type where possible. -* `FunctionFiltering` - This is executed after the `MarkDeclareTarget` +* `OMPFunctionFiltering` - This is executed after the `OMPMarkDeclareTarget` pass, and its job is to conservatively remove host functions from the module where possible when compiling for the device. This helps make sure that most incompatible code for the host is not lowered for the diff --git a/flang/docs/OpenMP-descriptor-management.md b/flang/docs/OpenMP-descriptor-management.md index 66c153914f70da..d0eb01b00f9bb9 100644 --- a/flang/docs/OpenMP-descriptor-management.md +++ b/flang/docs/OpenMP-descriptor-management.md @@ -44,7 +44,7 @@ Currently, Flang will lower these descriptor types in the OpenMP lowering (lower to all other map types, generating an omp.MapInfoOp containing relevant information required for lowering the OpenMP dialect to LLVM-IR during the final stages of the MLIR lowering. However, after the lowering to FIR/HLFIR has been performed an OpenMP dialect specific pass for Fortran, -`MapInfoFinalizationPass` (Optimizer/OpenMP/MapInfoFinalization.cpp) will expand the +`OMPMapInfoFinalizationPass` (Optimizer/OMPMapInfoFinalization.cpp) will expand the `omp.MapInfoOp`'s containing descriptors (which currently will be a `BoxType` or `BoxAddrOp`) into multiple mappings, with one extra per pointer member in the descriptor that is supported on top of the original descriptor map operation. These pointers members are linked to the parent descriptor by adding them to @@ -53,7 +53,7 @@ owning operation's (`omp.TargetOp`, `omp.TargetDataOp` etc.) map operand list an operation is `IsolatedFromAbove`, it also inserts them as `BlockArgs` to canonicalize the mappings and simplify lowering. -An example transformation by the `MapInfoFinalizationPass`: +An example transformation by the `OMPMapInfoFinalizationPass`: ``` diff --git a/flang/include/flang/Optimizer/CMakeLists.txt b/flang/include/flang/Optimizer/CMakeLists.txt index 3336ac935e1012..89e43a9ee8d621 100644 --- a/flang/include/flang/Optimizer/CMakeLists.txt +++ b/flang/include/flang/Optimizer/CMakeLists.txt @@ -2,4 +2,3 @@ add_subdirectory(CodeGen) add_subdirectory(Dialect) add_subdirectory(HLFIR) add_subdirectory(Transforms) -add_subdirectory(OpenMP) diff --git a/flang/incl
[llvm-branch-commits] [llvm] [ctx_prof] API to get the instrumentation of a BB (PR #105468)
https://github.com/mtrofin updated https://github.com/llvm/llvm-project/pull/105468 >From c5ee379ec43215d8268219ec3cfced3f7f730fc8 Mon Sep 17 00:00:00 2001 From: Mircea Trofin Date: Tue, 20 Aug 2024 21:09:16 -0700 Subject: [PATCH] [ctx_prof] API to get the instrumentation of a BB --- llvm/include/llvm/Analysis/CtxProfAnalysis.h | 5 + llvm/lib/Analysis/CtxProfAnalysis.cpp | 7 ++ .../Analysis/CtxProfAnalysisTest.cpp | 22 +++ 3 files changed, 34 insertions(+) diff --git a/llvm/include/llvm/Analysis/CtxProfAnalysis.h b/llvm/include/llvm/Analysis/CtxProfAnalysis.h index 23abcbe2c6e9d2..0b4dd8ae3a0dc7 100644 --- a/llvm/include/llvm/Analysis/CtxProfAnalysis.h +++ b/llvm/include/llvm/Analysis/CtxProfAnalysis.h @@ -95,7 +95,12 @@ class CtxProfAnalysis : public AnalysisInfoMixin { PGOContextualProfile run(Module &M, ModuleAnalysisManager &MAM); + /// Get the instruction instrumenting a callsite, or nullptr if that cannot be + /// found. static InstrProfCallsite *getCallsiteInstrumentation(CallBase &CB); + + /// Get the instruction instrumenting a BB, or nullptr if not present. + static InstrProfIncrementInst *getBBInstrumentation(BasicBlock &BB); }; class CtxProfAnalysisPrinterPass diff --git a/llvm/lib/Analysis/CtxProfAnalysis.cpp b/llvm/lib/Analysis/CtxProfAnalysis.cpp index fffc8de2b36c8e..46daa4a4506189 100644 --- a/llvm/lib/Analysis/CtxProfAnalysis.cpp +++ b/llvm/lib/Analysis/CtxProfAnalysis.cpp @@ -202,6 +202,13 @@ InstrProfCallsite *CtxProfAnalysis::getCallsiteInstrumentation(CallBase &CB) { return nullptr; } +InstrProfIncrementInst *CtxProfAnalysis::getBBInstrumentation(BasicBlock &BB) { + for (auto &I : BB) +if (auto *Incr = dyn_cast(&I)) + return Incr; + return nullptr; +} + static void preorderVisit(const PGOCtxProfContext::CallTargetMapTy &Profiles, function_ref Visitor) { diff --git a/llvm/unittests/Analysis/CtxProfAnalysisTest.cpp b/llvm/unittests/Analysis/CtxProfAnalysisTest.cpp index 5f9bf3ec540eb3..fbe3a6e45109cc 100644 --- a/llvm/unittests/Analysis/CtxProfAnalysisTest.cpp +++ b/llvm/unittests/Analysis/CtxProfAnalysisTest.cpp @@ -132,4 +132,26 @@ TEST_F(CtxProfAnalysisTest, GetCallsiteIDNegativeTest) { EXPECT_EQ(IndIns, nullptr); } +TEST_F(CtxProfAnalysisTest, GetBBIDTest) { + ModulePassManager MPM; + MPM.addPass(PGOInstrumentationGen(PGOInstrumentationType::CTXPROF)); + EXPECT_FALSE(MPM.run(*M, MAM).areAllPreserved()); + auto *F = M->getFunction("foo"); + ASSERT_NE(F, nullptr); + std::map BBNameAndID; + + for (auto &BB : *F) { +auto *Ins = CtxProfAnalysis::getBBInstrumentation(BB); +if (Ins) + BBNameAndID[BB.getName().str()] = + static_cast(Ins->getIndex()->getZExtValue()); +else + BBNameAndID[BB.getName().str()] = -1; + } + + EXPECT_THAT(BBNameAndID, + testing::UnorderedElementsAre( + testing::Pair("", 0), testing::Pair("yes", 1), + testing::Pair("no", -1), testing::Pair("exit", -1))); +} } // namespace ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [ctx_prof] Add support for ICP (PR #105469)
https://github.com/mtrofin updated https://github.com/llvm/llvm-project/pull/105469 >From 1edbc3bed4cf6c2726394a346891409d5f548537 Mon Sep 17 00:00:00 2001 From: Mircea Trofin Date: Tue, 20 Aug 2024 21:32:23 -0700 Subject: [PATCH] [ctx_prof] Add support for ICP --- llvm/include/llvm/Analysis/CtxProfAnalysis.h | 18 +- llvm/include/llvm/IR/IntrinsicInst.h | 2 + .../llvm/ProfileData/PGOCtxProfReader.h | 20 ++ .../Transforms/Utils/CallPromotionUtils.h | 4 + llvm/lib/Analysis/CtxProfAnalysis.cpp | 80 +--- llvm/lib/IR/IntrinsicInst.cpp | 10 + .../Transforms/Utils/CallPromotionUtils.cpp | 86 + .../Utils/CallPromotionUtilsTest.cpp | 178 ++ 8 files changed, 365 insertions(+), 33 deletions(-) diff --git a/llvm/include/llvm/Analysis/CtxProfAnalysis.h b/llvm/include/llvm/Analysis/CtxProfAnalysis.h index 0b4dd8ae3a0dc7..d6c2bb26a091af 100644 --- a/llvm/include/llvm/Analysis/CtxProfAnalysis.h +++ b/llvm/include/llvm/Analysis/CtxProfAnalysis.h @@ -73,6 +73,12 @@ class PGOContextualProfile { return FuncInfo.find(getDefinedFunctionGUID(F))->second.NextCallsiteIndex++; } + using ConstVisitor = function_ref; + using Visitor = function_ref; + + void update(Visitor, const Function *F = nullptr); + void visit(ConstVisitor, const Function *F = nullptr) const; + const CtxProfFlatProfile flatten() const; bool invalidate(Module &, const PreservedAnalyses &PA, @@ -105,13 +111,18 @@ class CtxProfAnalysis : public AnalysisInfoMixin { class CtxProfAnalysisPrinterPass : public PassInfoMixin { - raw_ostream &OS; - public: - explicit CtxProfAnalysisPrinterPass(raw_ostream &OS) : OS(OS) {} + enum class PrintMode { Everything, JSON }; + explicit CtxProfAnalysisPrinterPass(raw_ostream &OS, + PrintMode Mode = PrintMode::Everything) + : OS(OS), Mode(Mode) {} PreservedAnalyses run(Module &M, ModuleAnalysisManager &MAM); static bool isRequired() { return true; } + +private: + raw_ostream &OS; + const PrintMode Mode; }; /// Assign a GUID to functions as metadata. GUID calculation takes linkage into @@ -134,6 +145,5 @@ class AssignGUIDPass : public PassInfoMixin { // This should become GlobalValue::getGUID static uint64_t getGUID(const Function &F); }; - } // namespace llvm #endif // LLVM_ANALYSIS_CTXPROFANALYSIS_H diff --git a/llvm/include/llvm/IR/IntrinsicInst.h b/llvm/include/llvm/IR/IntrinsicInst.h index 2f1e2c08c3ecec..bab41efab528e2 100644 --- a/llvm/include/llvm/IR/IntrinsicInst.h +++ b/llvm/include/llvm/IR/IntrinsicInst.h @@ -1519,6 +1519,7 @@ class InstrProfCntrInstBase : public InstrProfInstBase { ConstantInt *getNumCounters() const; // The index of the counter that this instruction acts on. ConstantInt *getIndex() const; + void setIndex(uint32_t Idx); }; /// This represents the llvm.instrprof.cover intrinsic. @@ -1569,6 +1570,7 @@ class InstrProfCallsite : public InstrProfCntrInstBase { return isa(V) && classof(cast(V)); } Value *getCallee() const; + void setCallee(Value *); }; /// This represents the llvm.instrprof.timestamp intrinsic. diff --git a/llvm/include/llvm/ProfileData/PGOCtxProfReader.h b/llvm/include/llvm/ProfileData/PGOCtxProfReader.h index 190deaeeacd085..23dcc376508b39 100644 --- a/llvm/include/llvm/ProfileData/PGOCtxProfReader.h +++ b/llvm/include/llvm/ProfileData/PGOCtxProfReader.h @@ -57,9 +57,23 @@ class PGOCtxProfContext final { GlobalValue::GUID guid() const { return GUID; } const SmallVectorImpl &counters() const { return Counters; } + SmallVectorImpl &counters() { return Counters; } + + uint64_t getEntrycount() const { return Counters[0]; } + const CallsiteMapTy &callsites() const { return Callsites; } CallsiteMapTy &callsites() { return Callsites; } + void ingestContext(uint32_t CSId, PGOCtxProfContext &&Other) { +auto [Iter, _] = callsites().try_emplace(CSId, CallTargetMapTy()); +Iter->second.emplace(Other.guid(), std::move(Other)); + } + + void growCounters(uint32_t Size) { +if (Size >= Counters.size()) + Counters.resize(Size); + } + bool hasCallsite(uint32_t I) const { return Callsites.find(I) != Callsites.end(); } @@ -68,6 +82,12 @@ class PGOCtxProfContext final { assert(hasCallsite(I) && "Callsite not found"); return Callsites.find(I)->second; } + + CallTargetMapTy &callsite(uint32_t I) { +assert(hasCallsite(I) && "Callsite not found"); +return Callsites.find(I)->second; + } + void getContainedGuids(DenseSet &Guids) const; }; diff --git a/llvm/include/llvm/Transforms/Utils/CallPromotionUtils.h b/llvm/include/llvm/Transforms/Utils/CallPromotionUtils.h index 385831f457038d..58af26f31417b0 100644 --- a/llvm/include/llvm/Transforms/Utils/CallPromotionUtils.h +++ b/llvm/include/llvm/Transforms/Utils/CallPromotionUtils.h @@ -14,6 +14,7 @@ #ifndef LLVM_TRANSFORMS_UTILS_CALLPROMOTIONUTILS_H #defin
[llvm-branch-commits] [libcxx] [libc++] Implement std::move_only_function (P0288R9) (PR #94670)
https://github.com/ldionne edited https://github.com/llvm/llvm-project/pull/94670 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [Driver] Add -Wa, options -mmapsyms={default, implicit} (PR #104542)
@@ -7131,6 +7131,8 @@ def massembler_fatal_warnings : Flag<["-"], "massembler-fatal-warnings">, def crel : Flag<["--"], "crel">, HelpText<"Enable CREL relocation format (ELF only)">, MarshallingInfoFlag>; +def mmapsyms_implicit : Flag<["-"], "mmapsyms=implicit">, smithp35 wrote: I think it would be useful to have similar help text to llvm-mc (https://github.com/llvm/llvm-project/pull/99718/files#diff-e84c9aa6b25b1a4fe2047de3a32ab330e945d2944b14451d310e4b706a39cbafR140) https://github.com/llvm/llvm-project/pull/104542 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [Driver] Add -Wa, options -mmapsyms={default, implicit} (PR #104542)
https://github.com/smithp35 edited https://github.com/llvm/llvm-project/pull/104542 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [Driver] Add -Wa, options -mmapsyms={default, implicit} (PR #104542)
https://github.com/smithp35 commented: I think we could do with some help text. Otherwise code changes look good. https://github.com/llvm/llvm-project/pull/104542 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] GFX12 VMEM instructions can write VGPR results out of order (PR #105549)
https://github.com/jayfoad created https://github.com/llvm/llvm-project/pull/105549 Fix SIInsertWaitcnts to account for this by adding extra waits to avoid WAW dependencies. >From 9a2103df4094af38f59e1adce5414b94672e6d6e Mon Sep 17 00:00:00 2001 From: Jay Foad Date: Wed, 21 Aug 2024 16:23:49 +0100 Subject: [PATCH] [AMDGPU] GFX12 VMEM instructions can write VGPR results out of order Fix SIInsertWaitcnts to account for this by adding extra waits to avoid WAW dependencies. --- llvm/lib/Target/AMDGPU/AMDGPU.td | 23 ++- llvm/lib/Target/AMDGPU/GCNSubtarget.h | 3 +++ llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp | 7 +++--- .../buffer-fat-pointer-atomicrmw-fadd.ll | 3 +++ .../buffer-fat-pointer-atomicrmw-fmax.ll | 5 .../buffer-fat-pointer-atomicrmw-fmin.ll | 5 amdgcn.struct.buffer.load.format.v3f16.ll | 1 + llvm/test/CodeGen/AMDGPU/load-constant-i16.ll | 10 +++- llvm/test/CodeGen/AMDGPU/load-global-i16.ll | 10 llvm/test/CodeGen/AMDGPU/load-global-i32.ll | 2 ++ .../AMDGPU/spill-csr-frame-ptr-reg-copy.ll| 1 + .../CodeGen/AMDGPU/waitcnt-vmcnt-loop.mir | 8 +++ 12 files changed, 64 insertions(+), 14 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td index 7906e0ee9d7858..9efdbd751d96e3 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPU.td +++ b/llvm/lib/Target/AMDGPU/AMDGPU.td @@ -953,6 +953,12 @@ def FeatureRequiredExportPriority : SubtargetFeature<"required-export-priority", "Export priority must be explicitly manipulated on GFX11.5" >; +def FeatureVmemWriteVgprInOrder : SubtargetFeature<"vmem-write-vgpr-in-order", + "HasVmemWriteVgprInOrder", + "true", + "VMEM instructions of the same type write VGPR results in order" +>; + //======// // Subtarget Features (options and debugging) //======// @@ -1123,7 +1129,8 @@ def FeatureSouthernIslands : GCNSubtargetFeatureGeneration<"SOUTHERN_ISLANDS", FeatureDsSrc2Insts, FeatureLDSBankCount32, FeatureMovrel, FeatureTrigReducedRange, FeatureExtendedImageInsts, FeatureImageInsts, FeatureGDS, FeatureGWS, FeatureDefaultComponentZero, - FeatureAtomicFMinFMaxF32GlobalInsts, FeatureAtomicFMinFMaxF64GlobalInsts + FeatureAtomicFMinFMaxF32GlobalInsts, FeatureAtomicFMinFMaxF64GlobalInsts, + FeatureVmemWriteVgprInOrder ] >; @@ -1136,7 +1143,8 @@ def FeatureSeaIslands : GCNSubtargetFeatureGeneration<"SEA_ISLANDS", FeatureDsSrc2Insts, FeatureExtendedImageInsts, FeatureUnalignedBufferAccess, FeatureImageInsts, FeatureGDS, FeatureGWS, FeatureDefaultComponentZero, FeatureAtomicFMinFMaxF32GlobalInsts, FeatureAtomicFMinFMaxF64GlobalInsts, - FeatureAtomicFMinFMaxF32FlatInsts, FeatureAtomicFMinFMaxF64FlatInsts + FeatureAtomicFMinFMaxF32FlatInsts, FeatureAtomicFMinFMaxF64FlatInsts, + FeatureVmemWriteVgprInOrder ] >; @@ -1152,7 +1160,7 @@ def FeatureVolcanicIslands : GCNSubtargetFeatureGeneration<"VOLCANIC_ISLANDS", FeatureGFX7GFX8GFX9Insts, FeatureSMemTimeInst, FeatureMadMacF32Insts, FeatureDsSrc2Insts, FeatureExtendedImageInsts, FeatureFastDenormalF32, FeatureUnalignedBufferAccess, FeatureImageInsts, FeatureGDS, FeatureGWS, - FeatureDefaultComponentZero + FeatureDefaultComponentZero, FeatureVmemWriteVgprInOrder ] >; @@ -1170,7 +1178,8 @@ def FeatureGFX9 : GCNSubtargetFeatureGeneration<"GFX9", FeatureScalarFlatScratchInsts, FeatureScalarAtomics, FeatureR128A16, FeatureA16, FeatureSMemTimeInst, FeatureFastDenormalF32, FeatureSupportsXNACK, FeatureUnalignedBufferAccess, FeatureUnalignedDSAccess, - FeatureNegativeScratchOffsetBug, FeatureGWS, FeatureDefaultComponentZero + FeatureNegativeScratchOffsetBug, FeatureGWS, FeatureDefaultComponentZero, + FeatureVmemWriteVgprInOrder ] >; @@ -1193,7 +1202,8 @@ def FeatureGFX10 : GCNSubtargetFeatureGeneration<"GFX10", FeatureGDS, FeatureGWS, FeatureDefaultComponentZero, FeatureMaxHardClauseLength63, FeatureAtomicFMinFMaxF32GlobalInsts, FeatureAtomicFMinFMaxF64GlobalInsts, - FeatureAtomicFMinFMaxF32FlatInsts, FeatureAtomicFMinFMaxF64FlatInsts + FeatureAtomicFMinFMaxF32FlatInsts, FeatureAtomicFMinFMaxF64FlatInsts, + FeatureVmemWriteVgprInOrder ] >; @@ -1215,7 +1225,8 @@ def FeatureGFX11 : GCNSubtargetFeatureGeneration<"GFX11", FeatureUnalignedBufferAccess, FeatureUnalignedDSAccess, FeatureGDS, FeatureGWS, FeatureDefaultComponentZero, FeatureMaxHardClauseLength32, - FeatureAtomicFMinFMaxF32GlobalInsts, FeatureAtomicFMinFMaxF32FlatInsts + FeatureAtomicFMinFMaxF32GlobalInsts, FeatureAtomicFMinFMaxF32FlatInsts, + FeatureVmemWriteVgprInOrder ] >; diff --git a/llvm/lib/Target/AMDGPU/GCNSubtarget.h b/llvm/lib/Target/AMDGPU/GCNSubtarget.h index 902f51ae358d59..9386bcf0d74b22 100644 --- a/llvm/lib/Target/AMDGPU/GCNSubtarget.h +++ b/llvm/lib/Target/AMDGPU
[llvm-branch-commits] [llvm] [AMDGPU] Remove one case of vmcnt loop header flushing for GFX12 (PR #105550)
https://github.com/jayfoad created https://github.com/llvm/llvm-project/pull/105550 When a loop contains a VMEM load whose result is only used outside the loop, do not bother to flush vmcnt in the loop head on GFX12. A wait for vmcnt will be required inside the loop anyway, because VMEM instructions can write their VGPR results out of order. >From e53f75835dd0f0fc9d11b17afbe40de9b4a8a35b Mon Sep 17 00:00:00 2001 From: Jay Foad Date: Wed, 21 Aug 2024 16:57:24 +0100 Subject: [PATCH] [AMDGPU] Remove one case of vmcnt loop header flushing for GFX12 When a loop contains a VMEM load whose result is only used outside the loop, do not bother to flush vmcnt in the loop head on GFX12. A wait for vmcnt will be required inside the loop anyway, because VMEM instructions can write their VGPR results out of order. --- llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp | 2 +- llvm/test/CodeGen/AMDGPU/waitcnt-vmcnt-loop.mir | 10 +- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp index 4262e7b5d9c25..eafe20be17d5b 100644 --- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp +++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp @@ -2390,7 +2390,7 @@ bool SIInsertWaitcnts::shouldFlushVmCnt(MachineLoop *ML, } if (!ST->hasVscnt() && HasVMemStore && !HasVMemLoad && UsesVgprLoadedOutside) return true; - return HasVMemLoad && UsesVgprLoadedOutside; + return HasVMemLoad && UsesVgprLoadedOutside && ST->hasVmemWriteVgprInOrder(); } bool SIInsertWaitcnts::runOnMachineFunction(MachineFunction &MF) { diff --git a/llvm/test/CodeGen/AMDGPU/waitcnt-vmcnt-loop.mir b/llvm/test/CodeGen/AMDGPU/waitcnt-vmcnt-loop.mir index bdef55ab956a0..0ddd2aa285b26 100644 --- a/llvm/test/CodeGen/AMDGPU/waitcnt-vmcnt-loop.mir +++ b/llvm/test/CodeGen/AMDGPU/waitcnt-vmcnt-loop.mir @@ -295,7 +295,7 @@ body: | # GFX12-LABEL: waitcnt_vm_loop2 # GFX12-LABEL: bb.0: # GFX12: BUFFER_LOAD_FORMAT_X_IDXEN -# GFX12: S_WAIT_LOADCNT 0 +# GFX12-NOT: S_WAIT_LOADCNT 0 # GFX12-LABEL: bb.1: # GFX12: S_WAIT_LOADCNT 0 # GFX12-LABEL: bb.2: @@ -342,7 +342,7 @@ body: | # GFX12-LABEL: waitcnt_vm_loop2_store # GFX12-LABEL: bb.0: # GFX12: BUFFER_LOAD_FORMAT_X_IDXEN -# GFX12: S_WAIT_LOADCNT 0 +# GFX12-NOT: S_WAIT_LOADCNT 0 # GFX12-LABEL: bb.1: # GFX12: S_WAIT_LOADCNT 0 # GFX12-LABEL: bb.2: @@ -499,9 +499,9 @@ body: | # GFX12-LABEL: waitcnt_vm_loop2_reginterval # GFX12-LABEL: bb.0: # GFX12: GLOBAL_LOAD_DWORDX4 -# GFX12: S_WAIT_LOADCNT 0 -# GFX12-LABEL: bb.1: # GFX12-NOT: S_WAIT_LOADCNT 0 +# GFX12-LABEL: bb.1: +# GFX12: S_WAIT_LOADCNT 0 # GFX12-LABEL: bb.2: name:waitcnt_vm_loop2_reginterval body: | @@ -600,7 +600,7 @@ body: | # GFX12-LABEL: bb.0: # GFX12: BUFFER_LOAD_FORMAT_X_IDXEN # GFX12: BUFFER_LOAD_FORMAT_X_IDXEN -# GFX12: S_WAIT_LOADCNT 0 +# GFX12-NOT: S_WAIT_LOADCNT 0 # GFX12-LABEL: bb.1: # GFX12: S_WAIT_LOADCNT 0 # GFX12-LABEL: bb.2: ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] GFX12 VMEM instructions can write VGPR results out of order (PR #105549)
jayfoad wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/105549?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#105550** https://app.graphite.dev/github/pr/llvm/llvm-project/105550?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#105549** https://app.graphite.dev/github/pr/llvm/llvm-project/105549?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> 👈 * **#105548** https://app.graphite.dev/github/pr/llvm/llvm-project/105548?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by Graphite. https://stacking.dev/?utm_source=stack-comment";>Learn more about stacking. Join @jayfoad and the rest of your teammates on https://graphite.dev?utm-source=stack-comment";>https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="11px" height="11px"/> Graphite https://github.com/llvm/llvm-project/pull/105549 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Remove one case of vmcnt loop header flushing for GFX12 (PR #105550)
jayfoad wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/105550?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#105550** https://app.graphite.dev/github/pr/llvm/llvm-project/105550?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> 👈 * **#105549** https://app.graphite.dev/github/pr/llvm/llvm-project/105549?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#105548** https://app.graphite.dev/github/pr/llvm/llvm-project/105548?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by Graphite. https://stacking.dev/?utm_source=stack-comment";>Learn more about stacking. Join @jayfoad and the rest of your teammates on https://graphite.dev?utm-source=stack-comment";>https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="11px" height="11px"/> Graphite https://github.com/llvm/llvm-project/pull/105550 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] GFX12 VMEM instructions can write VGPR results out of order (PR #105549)
https://github.com/jayfoad ready_for_review https://github.com/llvm/llvm-project/pull/105549 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Remove one case of vmcnt loop header flushing for GFX12 (PR #105550)
https://github.com/jayfoad ready_for_review https://github.com/llvm/llvm-project/pull/105550 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] GFX12 VMEM instructions can write VGPR results out of order (PR #105549)
llvmbot wrote: @llvm/pr-subscribers-backend-amdgpu Author: Jay Foad (jayfoad) Changes Fix SIInsertWaitcnts to account for this by adding extra waits to avoid WAW dependencies. --- Patch is 22.41 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/105549.diff 12 Files Affected: - (modified) llvm/lib/Target/AMDGPU/AMDGPU.td (+17-6) - (modified) llvm/lib/Target/AMDGPU/GCNSubtarget.h (+3) - (modified) llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp (+4-3) - (modified) llvm/test/CodeGen/AMDGPU/buffer-fat-pointer-atomicrmw-fadd.ll (+3) - (modified) llvm/test/CodeGen/AMDGPU/buffer-fat-pointer-atomicrmw-fmax.ll (+5) - (modified) llvm/test/CodeGen/AMDGPU/buffer-fat-pointer-atomicrmw-fmin.ll (+5) - (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.struct.buffer.load.format.v3f16.ll (+1) - (modified) llvm/test/CodeGen/AMDGPU/load-constant-i16.ll (+9-1) - (modified) llvm/test/CodeGen/AMDGPU/load-global-i16.ll (+10) - (modified) llvm/test/CodeGen/AMDGPU/load-global-i32.ll (+2) - (modified) llvm/test/CodeGen/AMDGPU/spill-csr-frame-ptr-reg-copy.ll (+1) - (modified) llvm/test/CodeGen/AMDGPU/waitcnt-vmcnt-loop.mir (+4-4) ``diff diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td index 7906e0ee9d7858..9efdbd751d96e3 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPU.td +++ b/llvm/lib/Target/AMDGPU/AMDGPU.td @@ -953,6 +953,12 @@ def FeatureRequiredExportPriority : SubtargetFeature<"required-export-priority", "Export priority must be explicitly manipulated on GFX11.5" >; +def FeatureVmemWriteVgprInOrder : SubtargetFeature<"vmem-write-vgpr-in-order", + "HasVmemWriteVgprInOrder", + "true", + "VMEM instructions of the same type write VGPR results in order" +>; + //======// // Subtarget Features (options and debugging) //======// @@ -1123,7 +1129,8 @@ def FeatureSouthernIslands : GCNSubtargetFeatureGeneration<"SOUTHERN_ISLANDS", FeatureDsSrc2Insts, FeatureLDSBankCount32, FeatureMovrel, FeatureTrigReducedRange, FeatureExtendedImageInsts, FeatureImageInsts, FeatureGDS, FeatureGWS, FeatureDefaultComponentZero, - FeatureAtomicFMinFMaxF32GlobalInsts, FeatureAtomicFMinFMaxF64GlobalInsts + FeatureAtomicFMinFMaxF32GlobalInsts, FeatureAtomicFMinFMaxF64GlobalInsts, + FeatureVmemWriteVgprInOrder ] >; @@ -1136,7 +1143,8 @@ def FeatureSeaIslands : GCNSubtargetFeatureGeneration<"SEA_ISLANDS", FeatureDsSrc2Insts, FeatureExtendedImageInsts, FeatureUnalignedBufferAccess, FeatureImageInsts, FeatureGDS, FeatureGWS, FeatureDefaultComponentZero, FeatureAtomicFMinFMaxF32GlobalInsts, FeatureAtomicFMinFMaxF64GlobalInsts, - FeatureAtomicFMinFMaxF32FlatInsts, FeatureAtomicFMinFMaxF64FlatInsts + FeatureAtomicFMinFMaxF32FlatInsts, FeatureAtomicFMinFMaxF64FlatInsts, + FeatureVmemWriteVgprInOrder ] >; @@ -1152,7 +1160,7 @@ def FeatureVolcanicIslands : GCNSubtargetFeatureGeneration<"VOLCANIC_ISLANDS", FeatureGFX7GFX8GFX9Insts, FeatureSMemTimeInst, FeatureMadMacF32Insts, FeatureDsSrc2Insts, FeatureExtendedImageInsts, FeatureFastDenormalF32, FeatureUnalignedBufferAccess, FeatureImageInsts, FeatureGDS, FeatureGWS, - FeatureDefaultComponentZero + FeatureDefaultComponentZero, FeatureVmemWriteVgprInOrder ] >; @@ -1170,7 +1178,8 @@ def FeatureGFX9 : GCNSubtargetFeatureGeneration<"GFX9", FeatureScalarFlatScratchInsts, FeatureScalarAtomics, FeatureR128A16, FeatureA16, FeatureSMemTimeInst, FeatureFastDenormalF32, FeatureSupportsXNACK, FeatureUnalignedBufferAccess, FeatureUnalignedDSAccess, - FeatureNegativeScratchOffsetBug, FeatureGWS, FeatureDefaultComponentZero + FeatureNegativeScratchOffsetBug, FeatureGWS, FeatureDefaultComponentZero, + FeatureVmemWriteVgprInOrder ] >; @@ -1193,7 +1202,8 @@ def FeatureGFX10 : GCNSubtargetFeatureGeneration<"GFX10", FeatureGDS, FeatureGWS, FeatureDefaultComponentZero, FeatureMaxHardClauseLength63, FeatureAtomicFMinFMaxF32GlobalInsts, FeatureAtomicFMinFMaxF64GlobalInsts, - FeatureAtomicFMinFMaxF32FlatInsts, FeatureAtomicFMinFMaxF64FlatInsts + FeatureAtomicFMinFMaxF32FlatInsts, FeatureAtomicFMinFMaxF64FlatInsts, + FeatureVmemWriteVgprInOrder ] >; @@ -1215,7 +1225,8 @@ def FeatureGFX11 : GCNSubtargetFeatureGeneration<"GFX11", FeatureUnalignedBufferAccess, FeatureUnalignedDSAccess, FeatureGDS, FeatureGWS, FeatureDefaultComponentZero, FeatureMaxHardClauseLength32, - FeatureAtomicFMinFMaxF32GlobalInsts, FeatureAtomicFMinFMaxF32FlatInsts + FeatureAtomicFMinFMaxF32GlobalInsts, FeatureAtomicFMinFMaxF32FlatInsts, + FeatureVmemWriteVgprInOrder ] >; diff --git a/llvm/lib/Target/AMDGPU/GCNSubtarget.h b/llvm/lib/Target/AMDGPU/GCNSubtarget.h index 902f51ae358d59..9386bcf0d74b22 100644 --- a/llvm/lib/Target/AMDGPU/GCNSubtarget.h +++ b/llvm/lib/Target/AMDGPU/GCNSubtarget.h @@ -239,6 +239,7 @@ class
[llvm-branch-commits] [llvm] [AMDGPU] Remove one case of vmcnt loop header flushing for GFX12 (PR #105550)
llvmbot wrote: @llvm/pr-subscribers-backend-amdgpu Author: Jay Foad (jayfoad) Changes When a loop contains a VMEM load whose result is only used outside the loop, do not bother to flush vmcnt in the loop head on GFX12. A wait for vmcnt will be required inside the loop anyway, because VMEM instructions can write their VGPR results out of order. --- Full diff: https://github.com/llvm/llvm-project/pull/105550.diff 2 Files Affected: - (modified) llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp (+1-1) - (modified) llvm/test/CodeGen/AMDGPU/waitcnt-vmcnt-loop.mir (+5-5) ``diff diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp index 4262e7b5d9c25..eafe20be17d5b 100644 --- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp +++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp @@ -2390,7 +2390,7 @@ bool SIInsertWaitcnts::shouldFlushVmCnt(MachineLoop *ML, } if (!ST->hasVscnt() && HasVMemStore && !HasVMemLoad && UsesVgprLoadedOutside) return true; - return HasVMemLoad && UsesVgprLoadedOutside; + return HasVMemLoad && UsesVgprLoadedOutside && ST->hasVmemWriteVgprInOrder(); } bool SIInsertWaitcnts::runOnMachineFunction(MachineFunction &MF) { diff --git a/llvm/test/CodeGen/AMDGPU/waitcnt-vmcnt-loop.mir b/llvm/test/CodeGen/AMDGPU/waitcnt-vmcnt-loop.mir index bdef55ab956a0..0ddd2aa285b26 100644 --- a/llvm/test/CodeGen/AMDGPU/waitcnt-vmcnt-loop.mir +++ b/llvm/test/CodeGen/AMDGPU/waitcnt-vmcnt-loop.mir @@ -295,7 +295,7 @@ body: | # GFX12-LABEL: waitcnt_vm_loop2 # GFX12-LABEL: bb.0: # GFX12: BUFFER_LOAD_FORMAT_X_IDXEN -# GFX12: S_WAIT_LOADCNT 0 +# GFX12-NOT: S_WAIT_LOADCNT 0 # GFX12-LABEL: bb.1: # GFX12: S_WAIT_LOADCNT 0 # GFX12-LABEL: bb.2: @@ -342,7 +342,7 @@ body: | # GFX12-LABEL: waitcnt_vm_loop2_store # GFX12-LABEL: bb.0: # GFX12: BUFFER_LOAD_FORMAT_X_IDXEN -# GFX12: S_WAIT_LOADCNT 0 +# GFX12-NOT: S_WAIT_LOADCNT 0 # GFX12-LABEL: bb.1: # GFX12: S_WAIT_LOADCNT 0 # GFX12-LABEL: bb.2: @@ -499,9 +499,9 @@ body: | # GFX12-LABEL: waitcnt_vm_loop2_reginterval # GFX12-LABEL: bb.0: # GFX12: GLOBAL_LOAD_DWORDX4 -# GFX12: S_WAIT_LOADCNT 0 -# GFX12-LABEL: bb.1: # GFX12-NOT: S_WAIT_LOADCNT 0 +# GFX12-LABEL: bb.1: +# GFX12: S_WAIT_LOADCNT 0 # GFX12-LABEL: bb.2: name:waitcnt_vm_loop2_reginterval body: | @@ -600,7 +600,7 @@ body: | # GFX12-LABEL: bb.0: # GFX12: BUFFER_LOAD_FORMAT_X_IDXEN # GFX12: BUFFER_LOAD_FORMAT_X_IDXEN -# GFX12: S_WAIT_LOADCNT 0 +# GFX12-NOT: S_WAIT_LOADCNT 0 # GFX12-LABEL: bb.1: # GFX12: S_WAIT_LOADCNT 0 # GFX12-LABEL: bb.2: `` https://github.com/llvm/llvm-project/pull/105550 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] [LLVM][Coroutines] Create `.noalloc` variant of switch ABI coroutine ramp functions during CoroSplit (PR #99283)
@@ -1967,22 +2047,13 @@ splitCoroutine(Function &F, SmallVectorImpl &Clones, for (DbgVariableRecord *DVR : DbgVariableRecords) coro::salvageDebugInfo(ArgToAllocaMap, *DVR, Shape.OptimizeFrame, false /*UseEntryValue*/); - return Shape; -} -/// Remove calls to llvm.coro.end in the original function. -static void removeCoroEndsFromRampFunction(const coro::Shape &Shape) { - if (Shape.ABI != coro::ABI::Switch) { -for (auto *End : Shape.CoroEnds) { - replaceCoroEnd(End, Shape, Shape.FramePtr, /*in resume*/ false, nullptr); -} - } else { -for (llvm::AnyCoroEndInst *End : Shape.CoroEnds) { - auto &Context = End->getContext(); - End->replaceAllUsesWith(ConstantInt::getFalse(Context)); - End->eraseFromParent(); -} + removeCoroEndsFromRampFunction(Shape); + + if (!isNoSuspendCoroutine && Shape.ABI == coro::ABI::Switch) { yuxuanchen1997 wrote: This turned out to be easy. I am addressing this with the next push for this patch. https://github.com/llvm/llvm-project/pull/99283 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [mlir] [OpenMP]Update use_device_clause lowering (PR #101707)
@@ -2193,80 +2197,141 @@ llvm::Value *getSizeInBytes(DataLayout &dl, const mlir::Type &type, return builder.getInt64(dl.getTypeSizeInBits(type) / 8); } -void collectMapDataFromMapVars(MapInfoData &mapData, - llvm::SmallVectorImpl &mapVars, - LLVM::ModuleTranslation &moduleTranslation, - DataLayout &dl, llvm::IRBuilderBase &builder) { - for (mlir::Value mapValue : mapVars) { -if (auto mapOp = mlir::dyn_cast_if_present( -mapValue.getDefiningOp())) { - mlir::Value offloadPtr = - mapOp.getVarPtrPtr() ? mapOp.getVarPtrPtr() : mapOp.getVarPtr(); - mapData.OriginalValue.push_back( - moduleTranslation.lookupValue(offloadPtr)); - mapData.Pointers.push_back(mapData.OriginalValue.back()); - - if (llvm::Value *refPtr = - getRefPtrIfDeclareTarget(offloadPtr, - moduleTranslation)) { // declare target -mapData.IsDeclareTarget.push_back(true); -mapData.BasePointers.push_back(refPtr); - } else { // regular mapped variable -mapData.IsDeclareTarget.push_back(false); -mapData.BasePointers.push_back(mapData.OriginalValue.back()); - } +static void collectMapDataFromMapOperands( +MapInfoData &mapData, SmallVectorImpl &mapVars, +LLVM::ModuleTranslation &moduleTranslation, DataLayout &dl, +llvm::IRBuilderBase &builder, const ArrayRef &useDevPtrOperands = {}, +const ArrayRef &useDevAddrOperands = {}) { + // Process MapOperands + for (Value mapValue : mapVars) { +auto mapOp = cast(mapValue.getDefiningOp()); +Value offloadPtr = +mapOp.getVarPtrPtr() ? mapOp.getVarPtrPtr() : mapOp.getVarPtr(); +mapData.OriginalValue.push_back(moduleTranslation.lookupValue(offloadPtr)); +mapData.Pointers.push_back(mapData.OriginalValue.back()); + +if (llvm::Value *refPtr = +getRefPtrIfDeclareTarget(offloadPtr, + moduleTranslation)) { // declare target + mapData.IsDeclareTarget.push_back(true); + mapData.BasePointers.push_back(refPtr); +} else { // regular mapped variable + mapData.IsDeclareTarget.push_back(false); + mapData.BasePointers.push_back(mapData.OriginalValue.back()); +} - mapData.BaseType.push_back( - moduleTranslation.convertType(mapOp.getVarType())); - mapData.Sizes.push_back( - getSizeInBytes(dl, mapOp.getVarType(), mapOp, mapData.Pointers.back(), - mapData.BaseType.back(), builder, moduleTranslation)); - mapData.MapClause.push_back(mapOp.getOperation()); - mapData.Types.push_back( - llvm::omp::OpenMPOffloadMappingFlags(mapOp.getMapType().value())); - mapData.Names.push_back(LLVM::createMappingInformation( - mapOp.getLoc(), *moduleTranslation.getOpenMPBuilder())); - mapData.DevicePointers.push_back( - llvm::OpenMPIRBuilder::DeviceInfoTy::None); - - // Check if this is a member mapping and correctly assign that it is, if - // it is a member of a larger object. - // TODO: Need better handling of members, and distinguishing of members - // that are implicitly allocated on device vs explicitly passed in as - // arguments. - // TODO: May require some further additions to support nested record - // types, i.e. member maps that can have member maps. - mapData.IsAMember.push_back(false); - for (mlir::Value mapValue : mapVars) { -if (auto map = mlir::dyn_cast_if_present( -mapValue.getDefiningOp())) { - for (auto member : map.getMembers()) { -if (member == mapOp) { - mapData.IsAMember.back() = true; -} +mapData.BaseType.push_back( +moduleTranslation.convertType(mapOp.getVarType())); +mapData.Sizes.push_back( +getSizeInBytes(dl, mapOp.getVarType(), mapOp, mapData.Pointers.back(), + mapData.BaseType.back(), builder, moduleTranslation)); +mapData.MapClause.push_back(mapOp.getOperation()); +mapData.Types.push_back( +llvm::omp::OpenMPOffloadMappingFlags(mapOp.getMapType().value())); +mapData.Names.push_back(LLVM::createMappingInformation( +mapOp.getLoc(), *moduleTranslation.getOpenMPBuilder())); + mapData.DevicePointers.push_back(llvm::OpenMPIRBuilder::DeviceInfoTy::None); +mapData.IsAMapping.push_back(true); + +// Check if this is a member mapping and correctly assign that it is, if +// it is a member of a larger object. +// TODO: Need better handling of members, and distinguishing of members +// that are implicitly allocated on device vs explicitly passed in as +// arguments. +// TODO: May require some further additions to support nested record +// types, i.e. member maps that can have member maps. +mapData.IsAMember.push_back(false); +for (Valu
[llvm-branch-commits] [llvm] [mlir] [OpenMP]Update use_device_clause lowering (PR #101707)
@@ -2193,80 +2197,141 @@ llvm::Value *getSizeInBytes(DataLayout &dl, const mlir::Type &type, return builder.getInt64(dl.getTypeSizeInBits(type) / 8); } -void collectMapDataFromMapVars(MapInfoData &mapData, - llvm::SmallVectorImpl &mapVars, - LLVM::ModuleTranslation &moduleTranslation, - DataLayout &dl, llvm::IRBuilderBase &builder) { - for (mlir::Value mapValue : mapVars) { -if (auto mapOp = mlir::dyn_cast_if_present( -mapValue.getDefiningOp())) { - mlir::Value offloadPtr = - mapOp.getVarPtrPtr() ? mapOp.getVarPtrPtr() : mapOp.getVarPtr(); - mapData.OriginalValue.push_back( - moduleTranslation.lookupValue(offloadPtr)); - mapData.Pointers.push_back(mapData.OriginalValue.back()); - - if (llvm::Value *refPtr = - getRefPtrIfDeclareTarget(offloadPtr, - moduleTranslation)) { // declare target -mapData.IsDeclareTarget.push_back(true); -mapData.BasePointers.push_back(refPtr); - } else { // regular mapped variable -mapData.IsDeclareTarget.push_back(false); -mapData.BasePointers.push_back(mapData.OriginalValue.back()); - } +static void collectMapDataFromMapOperands( +MapInfoData &mapData, SmallVectorImpl &mapVars, +LLVM::ModuleTranslation &moduleTranslation, DataLayout &dl, +llvm::IRBuilderBase &builder, const ArrayRef &useDevPtrOperands = {}, +const ArrayRef &useDevAddrOperands = {}) { + // Process MapOperands + for (Value mapValue : mapVars) { +auto mapOp = cast(mapValue.getDefiningOp()); TIFitis wrote: Yes, I've replaced the others with cast as well. https://github.com/llvm/llvm-project/pull/101707 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [mlir] [OpenMP]Update use_device_clause lowering (PR #101707)
@@ -6357,7 +6357,7 @@ OpenMPIRBuilder::InsertPointTy OpenMPIRBuilder::createTargetData( // Disable TargetData CodeGen on Device pass. if (Config.IsTargetDevice.value_or(false)) { if (BodyGenCB) - Builder.restoreIP(BodyGenCB(Builder.saveIP(), BodyGenTy::NoPriv)); + Builder.restoreIP(BodyGenCB(CodeGenIP, BodyGenTy::NoPriv)); TIFitis wrote: It's because `CodeGenIP` hasn't been restored by the `Builder` at this point. Instead of passing `CodeGenIP`, I've moved the `restoreIp` call upward. https://github.com/llvm/llvm-project/pull/101707 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [ctx_prof] API to get the instrumentation of a BB (PR #105468)
https://github.com/mtrofin updated https://github.com/llvm/llvm-project/pull/105468 >From f81d31c3311690826bdc1f5c83fc45b4864de035 Mon Sep 17 00:00:00 2001 From: Mircea Trofin Date: Tue, 20 Aug 2024 21:09:16 -0700 Subject: [PATCH] [ctx_prof] API to get the instrumentation of a BB --- llvm/include/llvm/Analysis/CtxProfAnalysis.h | 5 + llvm/lib/Analysis/CtxProfAnalysis.cpp | 7 ++ .../Analysis/CtxProfAnalysisTest.cpp | 22 +++ 3 files changed, 34 insertions(+) diff --git a/llvm/include/llvm/Analysis/CtxProfAnalysis.h b/llvm/include/llvm/Analysis/CtxProfAnalysis.h index 23abcbe2c6e9d2..0b4dd8ae3a0dc7 100644 --- a/llvm/include/llvm/Analysis/CtxProfAnalysis.h +++ b/llvm/include/llvm/Analysis/CtxProfAnalysis.h @@ -95,7 +95,12 @@ class CtxProfAnalysis : public AnalysisInfoMixin { PGOContextualProfile run(Module &M, ModuleAnalysisManager &MAM); + /// Get the instruction instrumenting a callsite, or nullptr if that cannot be + /// found. static InstrProfCallsite *getCallsiteInstrumentation(CallBase &CB); + + /// Get the instruction instrumenting a BB, or nullptr if not present. + static InstrProfIncrementInst *getBBInstrumentation(BasicBlock &BB); }; class CtxProfAnalysisPrinterPass diff --git a/llvm/lib/Analysis/CtxProfAnalysis.cpp b/llvm/lib/Analysis/CtxProfAnalysis.cpp index ceebb2cf06d235..3fc1bc34afb97e 100644 --- a/llvm/lib/Analysis/CtxProfAnalysis.cpp +++ b/llvm/lib/Analysis/CtxProfAnalysis.cpp @@ -202,6 +202,13 @@ InstrProfCallsite *CtxProfAnalysis::getCallsiteInstrumentation(CallBase &CB) { return nullptr; } +InstrProfIncrementInst *CtxProfAnalysis::getBBInstrumentation(BasicBlock &BB) { + for (auto &I : BB) +if (auto *Incr = dyn_cast(&I)) + return Incr; + return nullptr; +} + static void preorderVisit(const PGOCtxProfContext::CallTargetMapTy &Profiles, function_ref Visitor) { diff --git a/llvm/unittests/Analysis/CtxProfAnalysisTest.cpp b/llvm/unittests/Analysis/CtxProfAnalysisTest.cpp index 5f9bf3ec540eb3..fbe3a6e45109cc 100644 --- a/llvm/unittests/Analysis/CtxProfAnalysisTest.cpp +++ b/llvm/unittests/Analysis/CtxProfAnalysisTest.cpp @@ -132,4 +132,26 @@ TEST_F(CtxProfAnalysisTest, GetCallsiteIDNegativeTest) { EXPECT_EQ(IndIns, nullptr); } +TEST_F(CtxProfAnalysisTest, GetBBIDTest) { + ModulePassManager MPM; + MPM.addPass(PGOInstrumentationGen(PGOInstrumentationType::CTXPROF)); + EXPECT_FALSE(MPM.run(*M, MAM).areAllPreserved()); + auto *F = M->getFunction("foo"); + ASSERT_NE(F, nullptr); + std::map BBNameAndID; + + for (auto &BB : *F) { +auto *Ins = CtxProfAnalysis::getBBInstrumentation(BB); +if (Ins) + BBNameAndID[BB.getName().str()] = + static_cast(Ins->getIndex()->getZExtValue()); +else + BBNameAndID[BB.getName().str()] = -1; + } + + EXPECT_THAT(BBNameAndID, + testing::UnorderedElementsAre( + testing::Pair("", 0), testing::Pair("yes", 1), + testing::Pair("no", -1), testing::Pair("exit", -1))); +} } // namespace ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [ctx_prof] Add support for ICP (PR #105469)
https://github.com/mtrofin updated https://github.com/llvm/llvm-project/pull/105469 >From de6d88788d35cfeace3f694008d446e8175421a0 Mon Sep 17 00:00:00 2001 From: Mircea Trofin Date: Tue, 20 Aug 2024 21:32:23 -0700 Subject: [PATCH] [ctx_prof] Add support for ICP --- llvm/include/llvm/Analysis/CtxProfAnalysis.h | 18 +- llvm/include/llvm/IR/IntrinsicInst.h | 2 + .../llvm/ProfileData/PGOCtxProfReader.h | 20 ++ .../Transforms/Utils/CallPromotionUtils.h | 4 + llvm/lib/Analysis/CtxProfAnalysis.cpp | 79 +--- llvm/lib/IR/IntrinsicInst.cpp | 10 + .../Transforms/Utils/CallPromotionUtils.cpp | 86 + .../Utils/CallPromotionUtilsTest.cpp | 178 ++ 8 files changed, 364 insertions(+), 33 deletions(-) diff --git a/llvm/include/llvm/Analysis/CtxProfAnalysis.h b/llvm/include/llvm/Analysis/CtxProfAnalysis.h index 0b4dd8ae3a0dc7..d6c2bb26a091af 100644 --- a/llvm/include/llvm/Analysis/CtxProfAnalysis.h +++ b/llvm/include/llvm/Analysis/CtxProfAnalysis.h @@ -73,6 +73,12 @@ class PGOContextualProfile { return FuncInfo.find(getDefinedFunctionGUID(F))->second.NextCallsiteIndex++; } + using ConstVisitor = function_ref; + using Visitor = function_ref; + + void update(Visitor, const Function *F = nullptr); + void visit(ConstVisitor, const Function *F = nullptr) const; + const CtxProfFlatProfile flatten() const; bool invalidate(Module &, const PreservedAnalyses &PA, @@ -105,13 +111,18 @@ class CtxProfAnalysis : public AnalysisInfoMixin { class CtxProfAnalysisPrinterPass : public PassInfoMixin { - raw_ostream &OS; - public: - explicit CtxProfAnalysisPrinterPass(raw_ostream &OS) : OS(OS) {} + enum class PrintMode { Everything, JSON }; + explicit CtxProfAnalysisPrinterPass(raw_ostream &OS, + PrintMode Mode = PrintMode::Everything) + : OS(OS), Mode(Mode) {} PreservedAnalyses run(Module &M, ModuleAnalysisManager &MAM); static bool isRequired() { return true; } + +private: + raw_ostream &OS; + const PrintMode Mode; }; /// Assign a GUID to functions as metadata. GUID calculation takes linkage into @@ -134,6 +145,5 @@ class AssignGUIDPass : public PassInfoMixin { // This should become GlobalValue::getGUID static uint64_t getGUID(const Function &F); }; - } // namespace llvm #endif // LLVM_ANALYSIS_CTXPROFANALYSIS_H diff --git a/llvm/include/llvm/IR/IntrinsicInst.h b/llvm/include/llvm/IR/IntrinsicInst.h index 2f1e2c08c3ecec..bab41efab528e2 100644 --- a/llvm/include/llvm/IR/IntrinsicInst.h +++ b/llvm/include/llvm/IR/IntrinsicInst.h @@ -1519,6 +1519,7 @@ class InstrProfCntrInstBase : public InstrProfInstBase { ConstantInt *getNumCounters() const; // The index of the counter that this instruction acts on. ConstantInt *getIndex() const; + void setIndex(uint32_t Idx); }; /// This represents the llvm.instrprof.cover intrinsic. @@ -1569,6 +1570,7 @@ class InstrProfCallsite : public InstrProfCntrInstBase { return isa(V) && classof(cast(V)); } Value *getCallee() const; + void setCallee(Value *); }; /// This represents the llvm.instrprof.timestamp intrinsic. diff --git a/llvm/include/llvm/ProfileData/PGOCtxProfReader.h b/llvm/include/llvm/ProfileData/PGOCtxProfReader.h index 190deaeeacd085..23dcc376508b39 100644 --- a/llvm/include/llvm/ProfileData/PGOCtxProfReader.h +++ b/llvm/include/llvm/ProfileData/PGOCtxProfReader.h @@ -57,9 +57,23 @@ class PGOCtxProfContext final { GlobalValue::GUID guid() const { return GUID; } const SmallVectorImpl &counters() const { return Counters; } + SmallVectorImpl &counters() { return Counters; } + + uint64_t getEntrycount() const { return Counters[0]; } + const CallsiteMapTy &callsites() const { return Callsites; } CallsiteMapTy &callsites() { return Callsites; } + void ingestContext(uint32_t CSId, PGOCtxProfContext &&Other) { +auto [Iter, _] = callsites().try_emplace(CSId, CallTargetMapTy()); +Iter->second.emplace(Other.guid(), std::move(Other)); + } + + void growCounters(uint32_t Size) { +if (Size >= Counters.size()) + Counters.resize(Size); + } + bool hasCallsite(uint32_t I) const { return Callsites.find(I) != Callsites.end(); } @@ -68,6 +82,12 @@ class PGOCtxProfContext final { assert(hasCallsite(I) && "Callsite not found"); return Callsites.find(I)->second; } + + CallTargetMapTy &callsite(uint32_t I) { +assert(hasCallsite(I) && "Callsite not found"); +return Callsites.find(I)->second; + } + void getContainedGuids(DenseSet &Guids) const; }; diff --git a/llvm/include/llvm/Transforms/Utils/CallPromotionUtils.h b/llvm/include/llvm/Transforms/Utils/CallPromotionUtils.h index 385831f457038d..58af26f31417b0 100644 --- a/llvm/include/llvm/Transforms/Utils/CallPromotionUtils.h +++ b/llvm/include/llvm/Transforms/Utils/CallPromotionUtils.h @@ -14,6 +14,7 @@ #ifndef LLVM_TRANSFORMS_UTILS_CALLPROMOTIONUTILS_H #defin
[llvm-branch-commits] [clang] [Serialization] Code cleanups and polish 83233 (PR #83237)
ilya-biryukov wrote: Sorry for disappearing, I had a holiday, vacation and some unplanned interruptions over the last week and the start of this week. I have made really good progress and the amount of code that I need to dig through is quite manageable. I should have a repro for you this week. https://github.com/llvm/llvm-project/pull/83237 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [mlir] [OpenMP]Update use_device_clause lowering (PR #101707)
https://github.com/TIFitis updated https://github.com/llvm/llvm-project/pull/101707 >From 547b339b175fa996eef8d45c5df8a73967ee94c2 Mon Sep 17 00:00:00 2001 From: Akash Banerjee Date: Fri, 2 Aug 2024 17:11:21 +0100 Subject: [PATCH 1/3] [OpenMP]Update use_device_clause lowering This patch updates the use_device_ptr and use_device_addr clauses to use the mapInfoOps for lowering. This allows all the types that are handle by the map clauses such as derived types to also be supported by the use_device_clauses. This is patch 2/2 in a series of patches. --- llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp | 2 +- .../OpenMP/OpenMPToLLVMIRTranslation.cpp | 284 ++ mlir/test/Target/LLVMIR/omptarget-llvm.mlir | 16 +- .../openmp-target-use-device-nested.mlir | 27 ++ 4 files changed, 194 insertions(+), 135 deletions(-) create mode 100644 mlir/test/Target/LLVMIR/openmp-target-use-device-nested.mlir diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp index 83fec194d73904..f5d94069ad6f4c 100644 --- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp +++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp @@ -6357,7 +6357,7 @@ OpenMPIRBuilder::InsertPointTy OpenMPIRBuilder::createTargetData( // Disable TargetData CodeGen on Device pass. if (Config.IsTargetDevice.value_or(false)) { if (BodyGenCB) - Builder.restoreIP(BodyGenCB(Builder.saveIP(), BodyGenTy::NoPriv)); + Builder.restoreIP(BodyGenCB(CodeGenIP, BodyGenTy::NoPriv)); return Builder.saveIP(); } diff --git a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp index 458d05d5059db7..78c460c50cbe5e 100644 --- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp +++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp @@ -2110,6 +2110,8 @@ getRefPtrIfDeclareTarget(mlir::Value value, struct MapInfoData : llvm::OpenMPIRBuilder::MapInfosTy { llvm::SmallVector IsDeclareTarget; llvm::SmallVector IsAMember; + // Identify if mapping was added by mapClause or use_device clauses. + llvm::SmallVector IsAMapping; llvm::SmallVector MapClause; llvm::SmallVector OriginalValue; // Stripped off array/pointer to get the underlying @@ -2193,62 +2195,125 @@ llvm::Value *getSizeInBytes(DataLayout &dl, const mlir::Type &type, return builder.getInt64(dl.getTypeSizeInBits(type) / 8); } -void collectMapDataFromMapVars(MapInfoData &mapData, - llvm::SmallVectorImpl &mapVars, - LLVM::ModuleTranslation &moduleTranslation, - DataLayout &dl, llvm::IRBuilderBase &builder) { +void collectMapDataFromMapOperands( +MapInfoData &mapData, llvm::SmallVectorImpl &mapVars, +LLVM::ModuleTranslation &moduleTranslation, DataLayout &dl, +llvm::IRBuilderBase &builder, +const llvm::ArrayRef &useDevPtrOperands = {}, +const llvm::ArrayRef &useDevAddrOperands = {}) { + // Process MapOperands for (mlir::Value mapValue : mapVars) { -if (auto mapOp = mlir::dyn_cast_if_present( -mapValue.getDefiningOp())) { - mlir::Value offloadPtr = - mapOp.getVarPtrPtr() ? mapOp.getVarPtrPtr() : mapOp.getVarPtr(); - mapData.OriginalValue.push_back( - moduleTranslation.lookupValue(offloadPtr)); - mapData.Pointers.push_back(mapData.OriginalValue.back()); - - if (llvm::Value *refPtr = - getRefPtrIfDeclareTarget(offloadPtr, - moduleTranslation)) { // declare target -mapData.IsDeclareTarget.push_back(true); -mapData.BasePointers.push_back(refPtr); - } else { // regular mapped variable -mapData.IsDeclareTarget.push_back(false); -mapData.BasePointers.push_back(mapData.OriginalValue.back()); - } +auto mapOp = mlir::cast(mapValue.getDefiningOp()); +mlir::Value offloadPtr = +mapOp.getVarPtrPtr() ? mapOp.getVarPtrPtr() : mapOp.getVarPtr(); +mapData.OriginalValue.push_back(moduleTranslation.lookupValue(offloadPtr)); +mapData.Pointers.push_back(mapData.OriginalValue.back()); + +if (llvm::Value *refPtr = +getRefPtrIfDeclareTarget(offloadPtr, + moduleTranslation)) { // declare target + mapData.IsDeclareTarget.push_back(true); + mapData.BasePointers.push_back(refPtr); +} else { // regular mapped variable + mapData.IsDeclareTarget.push_back(false); + mapData.BasePointers.push_back(mapData.OriginalValue.back()); +} - mapData.BaseType.push_back( - moduleTranslation.convertType(mapOp.getVarType())); - mapData.Sizes.push_back( - getSizeInBytes(dl, mapOp.getVarType(), mapOp, mapData.Pointers.back(), - mapData.BaseType.back(), builder, moduleTranslation)); - mapData.MapClause.push_back(mapOp.getO
[llvm-branch-commits] [llvm] [mlir] [OpenMP]Update use_device_clause lowering (PR #101707)
TIFitis wrote: @skatrak Thanks for the review, I've addressed the comments in the latest revision. https://github.com/llvm/llvm-project/pull/101707 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [mlir] [OpenMP]Update use_device_clause lowering (PR #101707)
@@ -2439,7 +2504,7 @@ static llvm::omp::OpenMPOffloadMappingFlags mapParentWithMembers( // data by the descriptor (which itself, is a structure containing // runtime information on the dynamically allocated data). auto parentClause = - llvm::cast(mapData.MapClause[mapDataIndex]); + llvm::cast(mapData.MapClause[mapDataIndex]); TIFitis wrote: Agreed, I'll create an NFC PR to address this. https://github.com/llvm/llvm-project/pull/101707 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [Driver] Add -Wa, options -mmapsyms={default, implicit} (PR #104542)
https://github.com/MaskRay updated https://github.com/llvm/llvm-project/pull/104542 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [Driver] Add -Wa, options -mmapsyms={default, implicit} (PR #104542)
https://github.com/MaskRay updated https://github.com/llvm/llvm-project/pull/104542 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [Driver] Add -Wa, options -mmapsyms={default, implicit} (PR #104542)
@@ -7131,6 +7131,8 @@ def massembler_fatal_warnings : Flag<["-"], "massembler-fatal-warnings">, def crel : Flag<["--"], "crel">, HelpText<"Enable CREL relocation format (ELF only)">, MarshallingInfoFlag>; +def mmapsyms_implicit : Flag<["-"], "mmapsyms=implicit">, MaskRay wrote: Thx. Added help message for clang -cc1 and clang -cc1as https://github.com/llvm/llvm-project/pull/104542 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [ctx_prof] Add support for ICP (PR #105469)
https://github.com/mtrofin updated https://github.com/llvm/llvm-project/pull/105469 >From d58d308957961ae7442a7b5aa0561f42dea69caf Mon Sep 17 00:00:00 2001 From: Mircea Trofin Date: Tue, 20 Aug 2024 21:32:23 -0700 Subject: [PATCH] [ctx_prof] Add support for ICP --- llvm/include/llvm/Analysis/CtxProfAnalysis.h | 18 +- llvm/include/llvm/IR/IntrinsicInst.h | 2 + .../llvm/ProfileData/PGOCtxProfReader.h | 20 ++ .../Transforms/Utils/CallPromotionUtils.h | 4 + llvm/lib/Analysis/CtxProfAnalysis.cpp | 79 +--- llvm/lib/IR/IntrinsicInst.cpp | 10 + .../Transforms/Utils/CallPromotionUtils.cpp | 86 + .../Utils/CallPromotionUtilsTest.cpp | 178 ++ 8 files changed, 364 insertions(+), 33 deletions(-) diff --git a/llvm/include/llvm/Analysis/CtxProfAnalysis.h b/llvm/include/llvm/Analysis/CtxProfAnalysis.h index 0b4dd8ae3a0dc7..d6c2bb26a091af 100644 --- a/llvm/include/llvm/Analysis/CtxProfAnalysis.h +++ b/llvm/include/llvm/Analysis/CtxProfAnalysis.h @@ -73,6 +73,12 @@ class PGOContextualProfile { return FuncInfo.find(getDefinedFunctionGUID(F))->second.NextCallsiteIndex++; } + using ConstVisitor = function_ref; + using Visitor = function_ref; + + void update(Visitor, const Function *F = nullptr); + void visit(ConstVisitor, const Function *F = nullptr) const; + const CtxProfFlatProfile flatten() const; bool invalidate(Module &, const PreservedAnalyses &PA, @@ -105,13 +111,18 @@ class CtxProfAnalysis : public AnalysisInfoMixin { class CtxProfAnalysisPrinterPass : public PassInfoMixin { - raw_ostream &OS; - public: - explicit CtxProfAnalysisPrinterPass(raw_ostream &OS) : OS(OS) {} + enum class PrintMode { Everything, JSON }; + explicit CtxProfAnalysisPrinterPass(raw_ostream &OS, + PrintMode Mode = PrintMode::Everything) + : OS(OS), Mode(Mode) {} PreservedAnalyses run(Module &M, ModuleAnalysisManager &MAM); static bool isRequired() { return true; } + +private: + raw_ostream &OS; + const PrintMode Mode; }; /// Assign a GUID to functions as metadata. GUID calculation takes linkage into @@ -134,6 +145,5 @@ class AssignGUIDPass : public PassInfoMixin { // This should become GlobalValue::getGUID static uint64_t getGUID(const Function &F); }; - } // namespace llvm #endif // LLVM_ANALYSIS_CTXPROFANALYSIS_H diff --git a/llvm/include/llvm/IR/IntrinsicInst.h b/llvm/include/llvm/IR/IntrinsicInst.h index 2f1e2c08c3ecec..bab41efab528e2 100644 --- a/llvm/include/llvm/IR/IntrinsicInst.h +++ b/llvm/include/llvm/IR/IntrinsicInst.h @@ -1519,6 +1519,7 @@ class InstrProfCntrInstBase : public InstrProfInstBase { ConstantInt *getNumCounters() const; // The index of the counter that this instruction acts on. ConstantInt *getIndex() const; + void setIndex(uint32_t Idx); }; /// This represents the llvm.instrprof.cover intrinsic. @@ -1569,6 +1570,7 @@ class InstrProfCallsite : public InstrProfCntrInstBase { return isa(V) && classof(cast(V)); } Value *getCallee() const; + void setCallee(Value *); }; /// This represents the llvm.instrprof.timestamp intrinsic. diff --git a/llvm/include/llvm/ProfileData/PGOCtxProfReader.h b/llvm/include/llvm/ProfileData/PGOCtxProfReader.h index 190deaeeacd085..23dcc376508b39 100644 --- a/llvm/include/llvm/ProfileData/PGOCtxProfReader.h +++ b/llvm/include/llvm/ProfileData/PGOCtxProfReader.h @@ -57,9 +57,23 @@ class PGOCtxProfContext final { GlobalValue::GUID guid() const { return GUID; } const SmallVectorImpl &counters() const { return Counters; } + SmallVectorImpl &counters() { return Counters; } + + uint64_t getEntrycount() const { return Counters[0]; } + const CallsiteMapTy &callsites() const { return Callsites; } CallsiteMapTy &callsites() { return Callsites; } + void ingestContext(uint32_t CSId, PGOCtxProfContext &&Other) { +auto [Iter, _] = callsites().try_emplace(CSId, CallTargetMapTy()); +Iter->second.emplace(Other.guid(), std::move(Other)); + } + + void growCounters(uint32_t Size) { +if (Size >= Counters.size()) + Counters.resize(Size); + } + bool hasCallsite(uint32_t I) const { return Callsites.find(I) != Callsites.end(); } @@ -68,6 +82,12 @@ class PGOCtxProfContext final { assert(hasCallsite(I) && "Callsite not found"); return Callsites.find(I)->second; } + + CallTargetMapTy &callsite(uint32_t I) { +assert(hasCallsite(I) && "Callsite not found"); +return Callsites.find(I)->second; + } + void getContainedGuids(DenseSet &Guids) const; }; diff --git a/llvm/include/llvm/Transforms/Utils/CallPromotionUtils.h b/llvm/include/llvm/Transforms/Utils/CallPromotionUtils.h index 385831f457038d..58af26f31417b0 100644 --- a/llvm/include/llvm/Transforms/Utils/CallPromotionUtils.h +++ b/llvm/include/llvm/Transforms/Utils/CallPromotionUtils.h @@ -14,6 +14,7 @@ #ifndef LLVM_TRANSFORMS_UTILS_CALLPROMOTIONUTILS_H #defin
[llvm-branch-commits] [llvm] [ctx_prof] Add support for ICP (PR #105469)
https://github.com/mtrofin updated https://github.com/llvm/llvm-project/pull/105469 >From 0d7c720e67a0213565f0e7c141c4ffa1b91fc5b9 Mon Sep 17 00:00:00 2001 From: Mircea Trofin Date: Tue, 20 Aug 2024 21:09:16 -0700 Subject: [PATCH 1/2] [ctx_prof] API to get the instrumentation of a BB --- llvm/include/llvm/Analysis/CtxProfAnalysis.h | 5 + llvm/lib/Analysis/CtxProfAnalysis.cpp | 7 ++ .../Analysis/CtxProfAnalysisTest.cpp | 22 +++ 3 files changed, 34 insertions(+) diff --git a/llvm/include/llvm/Analysis/CtxProfAnalysis.h b/llvm/include/llvm/Analysis/CtxProfAnalysis.h index 23abcbe2c6e9d2..0b4dd8ae3a0dc7 100644 --- a/llvm/include/llvm/Analysis/CtxProfAnalysis.h +++ b/llvm/include/llvm/Analysis/CtxProfAnalysis.h @@ -95,7 +95,12 @@ class CtxProfAnalysis : public AnalysisInfoMixin { PGOContextualProfile run(Module &M, ModuleAnalysisManager &MAM); + /// Get the instruction instrumenting a callsite, or nullptr if that cannot be + /// found. static InstrProfCallsite *getCallsiteInstrumentation(CallBase &CB); + + /// Get the instruction instrumenting a BB, or nullptr if not present. + static InstrProfIncrementInst *getBBInstrumentation(BasicBlock &BB); }; class CtxProfAnalysisPrinterPass diff --git a/llvm/lib/Analysis/CtxProfAnalysis.cpp b/llvm/lib/Analysis/CtxProfAnalysis.cpp index ceebb2cf06d235..3fc1bc34afb97e 100644 --- a/llvm/lib/Analysis/CtxProfAnalysis.cpp +++ b/llvm/lib/Analysis/CtxProfAnalysis.cpp @@ -202,6 +202,13 @@ InstrProfCallsite *CtxProfAnalysis::getCallsiteInstrumentation(CallBase &CB) { return nullptr; } +InstrProfIncrementInst *CtxProfAnalysis::getBBInstrumentation(BasicBlock &BB) { + for (auto &I : BB) +if (auto *Incr = dyn_cast(&I)) + return Incr; + return nullptr; +} + static void preorderVisit(const PGOCtxProfContext::CallTargetMapTy &Profiles, function_ref Visitor) { diff --git a/llvm/unittests/Analysis/CtxProfAnalysisTest.cpp b/llvm/unittests/Analysis/CtxProfAnalysisTest.cpp index 5f9bf3ec540eb3..fbe3a6e45109cc 100644 --- a/llvm/unittests/Analysis/CtxProfAnalysisTest.cpp +++ b/llvm/unittests/Analysis/CtxProfAnalysisTest.cpp @@ -132,4 +132,26 @@ TEST_F(CtxProfAnalysisTest, GetCallsiteIDNegativeTest) { EXPECT_EQ(IndIns, nullptr); } +TEST_F(CtxProfAnalysisTest, GetBBIDTest) { + ModulePassManager MPM; + MPM.addPass(PGOInstrumentationGen(PGOInstrumentationType::CTXPROF)); + EXPECT_FALSE(MPM.run(*M, MAM).areAllPreserved()); + auto *F = M->getFunction("foo"); + ASSERT_NE(F, nullptr); + std::map BBNameAndID; + + for (auto &BB : *F) { +auto *Ins = CtxProfAnalysis::getBBInstrumentation(BB); +if (Ins) + BBNameAndID[BB.getName().str()] = + static_cast(Ins->getIndex()->getZExtValue()); +else + BBNameAndID[BB.getName().str()] = -1; + } + + EXPECT_THAT(BBNameAndID, + testing::UnorderedElementsAre( + testing::Pair("", 0), testing::Pair("yes", 1), + testing::Pair("no", -1), testing::Pair("exit", -1))); +} } // namespace >From 61e37e3e1657a7e85e9df2f77feb6957c304851a Mon Sep 17 00:00:00 2001 From: Mircea Trofin Date: Tue, 20 Aug 2024 21:32:23 -0700 Subject: [PATCH 2/2] [ctx_prof] Add support for ICP --- llvm/include/llvm/Analysis/CtxProfAnalysis.h | 18 +- llvm/include/llvm/IR/IntrinsicInst.h | 2 + .../llvm/ProfileData/PGOCtxProfReader.h | 20 ++ .../Transforms/Utils/CallPromotionUtils.h | 4 + llvm/lib/Analysis/CtxProfAnalysis.cpp | 79 +--- llvm/lib/IR/IntrinsicInst.cpp | 10 + .../Transforms/Utils/CallPromotionUtils.cpp | 86 + .../Utils/CallPromotionUtilsTest.cpp | 178 ++ 8 files changed, 364 insertions(+), 33 deletions(-) diff --git a/llvm/include/llvm/Analysis/CtxProfAnalysis.h b/llvm/include/llvm/Analysis/CtxProfAnalysis.h index 0b4dd8ae3a0dc7..d6c2bb26a091af 100644 --- a/llvm/include/llvm/Analysis/CtxProfAnalysis.h +++ b/llvm/include/llvm/Analysis/CtxProfAnalysis.h @@ -73,6 +73,12 @@ class PGOContextualProfile { return FuncInfo.find(getDefinedFunctionGUID(F))->second.NextCallsiteIndex++; } + using ConstVisitor = function_ref; + using Visitor = function_ref; + + void update(Visitor, const Function *F = nullptr); + void visit(ConstVisitor, const Function *F = nullptr) const; + const CtxProfFlatProfile flatten() const; bool invalidate(Module &, const PreservedAnalyses &PA, @@ -105,13 +111,18 @@ class CtxProfAnalysis : public AnalysisInfoMixin { class CtxProfAnalysisPrinterPass : public PassInfoMixin { - raw_ostream &OS; - public: - explicit CtxProfAnalysisPrinterPass(raw_ostream &OS) : OS(OS) {} + enum class PrintMode { Everything, JSON }; + explicit CtxProfAnalysisPrinterPass(raw_ostream &OS, + PrintMode Mode = PrintMode::Everything) + : OS(OS), Mode(Mode) {} PreservedAnalyses run(Module &M, ModuleAnalysisManager &MAM);
[llvm-branch-commits] [mlir] [mlir][sparse] refactoring sparse_tensor.iterate lowering pattern implementation. (PR #105566)
https://github.com/PeimingLiu created https://github.com/llvm/llvm-project/pull/105566 [mlir][sparse] refactoring sparse_tensor.iterate lowering pattern implementation. >From 1a32495b27dfd003408dd5b4f12f3db7f0b73b5a Mon Sep 17 00:00:00 2001 From: Peiming Liu Date: Thu, 15 Aug 2024 18:10:25 + Subject: [PATCH] [mlir][sparse] refactoring sparse_tensor.iterate lowering pattern implementation. --- .../Transforms/SparseIterationToScf.cpp | 118 ++ 1 file changed, 36 insertions(+), 82 deletions(-) diff --git a/mlir/lib/Dialect/SparseTensor/Transforms/SparseIterationToScf.cpp b/mlir/lib/Dialect/SparseTensor/Transforms/SparseIterationToScf.cpp index d6c0da4a9e4573..f7fcabb0220b50 100644 --- a/mlir/lib/Dialect/SparseTensor/Transforms/SparseIterationToScf.cpp +++ b/mlir/lib/Dialect/SparseTensor/Transforms/SparseIterationToScf.cpp @@ -244,88 +244,41 @@ class SparseIterateOpConverter : public OneToNOpConversionPattern { std::unique_ptr it = iterSpace.extractIterator(rewriter, loc); -if (it->iteratableByFor()) { - auto [lo, hi] = it->genForCond(rewriter, loc); - Value step = constantIndex(rewriter, loc, 1); - SmallVector ivs; - for (ValueRange inits : adaptor.getInitArgs()) -llvm::append_range(ivs, inits); - scf::ForOp forOp = rewriter.create(loc, lo, hi, step, ivs); - - Block *loopBody = op.getBody(); - OneToNTypeMapping bodyTypeMapping(loopBody->getArgumentTypes()); - if (failed(typeConverter->convertSignatureArgs( - loopBody->getArgumentTypes(), bodyTypeMapping))) -return failure(); - rewriter.applySignatureConversion(loopBody, bodyTypeMapping); - - rewriter.eraseBlock(forOp.getBody()); - Region &dstRegion = forOp.getRegion(); - rewriter.inlineRegionBefore(op.getRegion(), dstRegion, dstRegion.end()); - - auto yieldOp = - llvm::cast(forOp.getBody()->getTerminator()); - - rewriter.setInsertionPointToEnd(forOp.getBody()); - // replace sparse_tensor.yield with scf.yield. - rewriter.create(loc, yieldOp.getResults()); - rewriter.eraseOp(yieldOp); - - const OneToNTypeMapping &resultMapping = adaptor.getResultMapping(); - rewriter.replaceOp(op, forOp.getResults(), resultMapping); -} else { - SmallVector ivs; - // TODO: put iterator at the end of argument list to be consistent with - // coiterate operation. - llvm::append_range(ivs, it->getCursor()); - for (ValueRange inits : adaptor.getInitArgs()) -llvm::append_range(ivs, inits); - - assert(llvm::all_of(ivs, [](Value v) { return v != nullptr; })); - - TypeRange types = ValueRange(ivs).getTypes(); - auto whileOp = rewriter.create(loc, types, ivs); - SmallVector l(types.size(), op.getIterator().getLoc()); - - // Generates loop conditions. - Block *before = rewriter.createBlock(&whileOp.getBefore(), {}, types, l); - rewriter.setInsertionPointToStart(before); - ValueRange bArgs = before->getArguments(); - auto [whileCond, remArgs] = it->genWhileCond(rewriter, loc, bArgs); - assert(remArgs.size() == adaptor.getInitArgs().size()); - rewriter.create(loc, whileCond, before->getArguments()); - - // Generates loop body. - Block *loopBody = op.getBody(); - OneToNTypeMapping bodyTypeMapping(loopBody->getArgumentTypes()); - if (failed(typeConverter->convertSignatureArgs( - loopBody->getArgumentTypes(), bodyTypeMapping))) -return failure(); - rewriter.applySignatureConversion(loopBody, bodyTypeMapping); - - Region &dstRegion = whileOp.getAfter(); - // TODO: handle uses of coordinate! - rewriter.inlineRegionBefore(op.getRegion(), dstRegion, dstRegion.end()); - ValueRange aArgs = whileOp.getAfterArguments(); - auto yieldOp = llvm::cast( - whileOp.getAfterBody()->getTerminator()); - - rewriter.setInsertionPointToEnd(whileOp.getAfterBody()); +SmallVector ivs; +for (ValueRange inits : adaptor.getInitArgs()) + llvm::append_range(ivs, inits); + +// Type conversion on iterate op block. +OneToNTypeMapping blockTypeMapping(op.getBody()->getArgumentTypes()); +if (failed(typeConverter->convertSignatureArgs( +op.getBody()->getArgumentTypes(), blockTypeMapping))) + return rewriter.notifyMatchFailure( + op, "failed to convert iterate region argurment types"); +rewriter.applySignatureConversion(op.getBody(), blockTypeMapping); + +Block *block = op.getBody(); +ValueRange ret = genLoopWithIterator( +rewriter, loc, it.get(), ivs, /*iterFirst=*/true, +[block](PatternRewriter &rewriter, Location loc, Region &loopBody, +SparseIterator *it, ValueRange reduc) -> SmallVector { + SmallVector blockArgs(it->getCursor()); + // TODO: Also appends coordinates if used. + // blockArgs.push_back(it->deref(rewriter, loc)); + llvm::a
[llvm-branch-commits] [mlir] [mlir][sparse] unify block arguments order between iterate/coiterate operations. (PR #105567)
https://github.com/PeimingLiu created https://github.com/llvm/llvm-project/pull/105567 [mlir][sparse] unify block arguments order between iterate/coiterate operations. >From 6fd099fb7039f8fda37d50f1c44cd7afd62cafb7 Mon Sep 17 00:00:00 2001 From: Peiming Liu Date: Thu, 15 Aug 2024 21:10:37 + Subject: [PATCH] [mlir][sparse] unify block arguments order between iterate/coiterate operations. --- .../SparseTensor/IR/SparseTensorOps.td| 7 ++-- .../SparseTensor/IR/SparseTensorDialect.cpp | 31 .../Transforms/SparseIterationToScf.cpp | 36 ++- 3 files changed, 31 insertions(+), 43 deletions(-) diff --git a/mlir/include/mlir/Dialect/SparseTensor/IR/SparseTensorOps.td b/mlir/include/mlir/Dialect/SparseTensor/IR/SparseTensorOps.td index 20512f972e67cd..96a61419a541f7 100644 --- a/mlir/include/mlir/Dialect/SparseTensor/IR/SparseTensorOps.td +++ b/mlir/include/mlir/Dialect/SparseTensor/IR/SparseTensorOps.td @@ -1644,7 +1644,7 @@ def IterateOp : SparseTensor_Op<"iterate", return getIterSpace().getType().getSpaceDim(); } BlockArgument getIterator() { - return getRegion().getArguments().front(); + return getRegion().getArguments().back(); } std::optional getLvlCrd(Level lvl) { if (getCrdUsedLvls()[lvl]) { @@ -1654,9 +1654,8 @@ def IterateOp : SparseTensor_Op<"iterate", return std::nullopt; } Block::BlockArgListType getCrds() { - // The first block argument is iterator, the remaining arguments are - // referenced coordinates. - return getRegion().getArguments().slice(1, getCrdUsedLvls().count()); + // User-provided iteration arguments -> coords -> iterator. + return getRegion().getArguments().slice(getNumRegionIterArgs(), getCrdUsedLvls().count()); } unsigned getNumRegionIterArgs() { return getRegion().getArguments().size() - 1 - getCrdUsedLvls().count(); diff --git a/mlir/lib/Dialect/SparseTensor/IR/SparseTensorDialect.cpp b/mlir/lib/Dialect/SparseTensor/IR/SparseTensorDialect.cpp index 16856b958d4f13..b21bc1a93036c4 100644 --- a/mlir/lib/Dialect/SparseTensor/IR/SparseTensorDialect.cpp +++ b/mlir/lib/Dialect/SparseTensor/IR/SparseTensorDialect.cpp @@ -2228,9 +2228,10 @@ parseSparseIterateLoop(OpAsmParser &parser, OperationState &state, parser.getNameLoc(), "mismatch in number of sparse iterators and sparse spaces"); - if (failed(parseUsedCoordList(parser, state, blockArgs))) + SmallVector coords; + if (failed(parseUsedCoordList(parser, state, coords))) return failure(); - size_t numCrds = blockArgs.size(); + size_t numCrds = coords.size(); // Parse "iter_args(%arg = %init, ...)" bool hasIterArgs = succeeded(parser.parseOptionalKeyword("iter_args")); @@ -2238,6 +2239,8 @@ parseSparseIterateLoop(OpAsmParser &parser, OperationState &state, if (parser.parseAssignmentList(blockArgs, initArgs)) return failure(); + blockArgs.append(coords); + SmallVector iterSpaceTps; // parse ": sparse_tensor.iter_space -> ret" if (parser.parseColon() || parser.parseTypeList(iterSpaceTps)) @@ -2267,7 +2270,7 @@ parseSparseIterateLoop(OpAsmParser &parser, OperationState &state, if (hasIterArgs) { // Strip off leading args that used for coordinates. -MutableArrayRef args = MutableArrayRef(blockArgs).drop_front(numCrds); +MutableArrayRef args = MutableArrayRef(blockArgs).drop_back(numCrds); if (args.size() != initArgs.size() || args.size() != state.types.size()) { return parser.emitError( parser.getNameLoc(), @@ -2448,18 +2451,18 @@ void IterateOp::build(OpBuilder &builder, OperationState &odsState, odsState.addTypes(initArgs.getTypes()); Block *bodyBlock = builder.createBlock(bodyRegion); - // First argument, sparse iterator - bodyBlock->addArgument( - llvm::cast(iterSpace.getType()).getIteratorType(), - odsState.location); + // Starts with a list of user-provided loop arguments. + for (Value v : initArgs) +bodyBlock->addArgument(v.getType(), v.getLoc()); - // Followed by a list of used coordinates. + // Follows by a list of used coordinates. for (unsigned i = 0, e = crdUsedLvls.count(); i < e; i++) bodyBlock->addArgument(builder.getIndexType(), odsState.location); - // Followed by a list of user-provided loop arguments. - for (Value v : initArgs) -bodyBlock->addArgument(v.getType(), v.getLoc()); + // Ends with sparse iterator + bodyBlock->addArgument( + llvm::cast(iterSpace.getType()).getIteratorType(), + odsState.location); } ParseResult IterateOp::parse(OpAsmParser &parser, OperationState &result) { @@ -2473,9 +2476,9 @@ ParseResult IterateOp::parse(OpAsmParser &parser, OperationState &result) { return parser.emitError(parser.getNameLoc(), "expected only one iterator/iteration space"); - iters.append(iterArgs); + iterArgs.append(iters); Region *body = result.addRegion();
[llvm-branch-commits] [mlir] [mlir][sparse] refactoring sparse_tensor.iterate lowering pattern implementation. (PR #105566)
https://github.com/PeimingLiu updated https://github.com/llvm/llvm-project/pull/105566 >From 937bcd814688e7c6f88ef27b7586254006e0d050 Mon Sep 17 00:00:00 2001 From: Peiming Liu Date: Thu, 15 Aug 2024 18:10:25 + Subject: [PATCH] [mlir][sparse] refactoring sparse_tensor.iterate lowering pattern implementation. stack-info: PR: https://github.com/llvm/llvm-project/pull/105566, branch: users/PeimingLiu/stack/2 --- .../Transforms/SparseIterationToScf.cpp | 118 ++ 1 file changed, 36 insertions(+), 82 deletions(-) diff --git a/mlir/lib/Dialect/SparseTensor/Transforms/SparseIterationToScf.cpp b/mlir/lib/Dialect/SparseTensor/Transforms/SparseIterationToScf.cpp index d6c0da4a9e4573..f7fcabb0220b50 100644 --- a/mlir/lib/Dialect/SparseTensor/Transforms/SparseIterationToScf.cpp +++ b/mlir/lib/Dialect/SparseTensor/Transforms/SparseIterationToScf.cpp @@ -244,88 +244,41 @@ class SparseIterateOpConverter : public OneToNOpConversionPattern { std::unique_ptr it = iterSpace.extractIterator(rewriter, loc); -if (it->iteratableByFor()) { - auto [lo, hi] = it->genForCond(rewriter, loc); - Value step = constantIndex(rewriter, loc, 1); - SmallVector ivs; - for (ValueRange inits : adaptor.getInitArgs()) -llvm::append_range(ivs, inits); - scf::ForOp forOp = rewriter.create(loc, lo, hi, step, ivs); - - Block *loopBody = op.getBody(); - OneToNTypeMapping bodyTypeMapping(loopBody->getArgumentTypes()); - if (failed(typeConverter->convertSignatureArgs( - loopBody->getArgumentTypes(), bodyTypeMapping))) -return failure(); - rewriter.applySignatureConversion(loopBody, bodyTypeMapping); - - rewriter.eraseBlock(forOp.getBody()); - Region &dstRegion = forOp.getRegion(); - rewriter.inlineRegionBefore(op.getRegion(), dstRegion, dstRegion.end()); - - auto yieldOp = - llvm::cast(forOp.getBody()->getTerminator()); - - rewriter.setInsertionPointToEnd(forOp.getBody()); - // replace sparse_tensor.yield with scf.yield. - rewriter.create(loc, yieldOp.getResults()); - rewriter.eraseOp(yieldOp); - - const OneToNTypeMapping &resultMapping = adaptor.getResultMapping(); - rewriter.replaceOp(op, forOp.getResults(), resultMapping); -} else { - SmallVector ivs; - // TODO: put iterator at the end of argument list to be consistent with - // coiterate operation. - llvm::append_range(ivs, it->getCursor()); - for (ValueRange inits : adaptor.getInitArgs()) -llvm::append_range(ivs, inits); - - assert(llvm::all_of(ivs, [](Value v) { return v != nullptr; })); - - TypeRange types = ValueRange(ivs).getTypes(); - auto whileOp = rewriter.create(loc, types, ivs); - SmallVector l(types.size(), op.getIterator().getLoc()); - - // Generates loop conditions. - Block *before = rewriter.createBlock(&whileOp.getBefore(), {}, types, l); - rewriter.setInsertionPointToStart(before); - ValueRange bArgs = before->getArguments(); - auto [whileCond, remArgs] = it->genWhileCond(rewriter, loc, bArgs); - assert(remArgs.size() == adaptor.getInitArgs().size()); - rewriter.create(loc, whileCond, before->getArguments()); - - // Generates loop body. - Block *loopBody = op.getBody(); - OneToNTypeMapping bodyTypeMapping(loopBody->getArgumentTypes()); - if (failed(typeConverter->convertSignatureArgs( - loopBody->getArgumentTypes(), bodyTypeMapping))) -return failure(); - rewriter.applySignatureConversion(loopBody, bodyTypeMapping); - - Region &dstRegion = whileOp.getAfter(); - // TODO: handle uses of coordinate! - rewriter.inlineRegionBefore(op.getRegion(), dstRegion, dstRegion.end()); - ValueRange aArgs = whileOp.getAfterArguments(); - auto yieldOp = llvm::cast( - whileOp.getAfterBody()->getTerminator()); - - rewriter.setInsertionPointToEnd(whileOp.getAfterBody()); +SmallVector ivs; +for (ValueRange inits : adaptor.getInitArgs()) + llvm::append_range(ivs, inits); + +// Type conversion on iterate op block. +OneToNTypeMapping blockTypeMapping(op.getBody()->getArgumentTypes()); +if (failed(typeConverter->convertSignatureArgs( +op.getBody()->getArgumentTypes(), blockTypeMapping))) + return rewriter.notifyMatchFailure( + op, "failed to convert iterate region argurment types"); +rewriter.applySignatureConversion(op.getBody(), blockTypeMapping); + +Block *block = op.getBody(); +ValueRange ret = genLoopWithIterator( +rewriter, loc, it.get(), ivs, /*iterFirst=*/true, +[block](PatternRewriter &rewriter, Location loc, Region &loopBody, +SparseIterator *it, ValueRange reduc) -> SmallVector { + SmallVector blockArgs(it->getCursor()); + // TODO: Also appends coordinates if used. + // blockArgs.push_back(it->deref(rewriter, loc)); +
[llvm-branch-commits] [mlir] [mlir][sparse] unify block arguments order between iterate/coiterate operations. (PR #105567)
https://github.com/PeimingLiu updated https://github.com/llvm/llvm-project/pull/105567 >From 3f83d7a1eadc1101fb96707ecd348925e5aaed70 Mon Sep 17 00:00:00 2001 From: Peiming Liu Date: Thu, 15 Aug 2024 21:10:37 + Subject: [PATCH] [mlir][sparse] unify block arguments order between iterate/coiterate operations. stack-info: PR: https://github.com/llvm/llvm-project/pull/105567, branch: users/PeimingLiu/stack/3 --- .../SparseTensor/IR/SparseTensorOps.td| 7 ++-- .../SparseTensor/IR/SparseTensorDialect.cpp | 31 .../Transforms/SparseIterationToScf.cpp | 36 ++- 3 files changed, 31 insertions(+), 43 deletions(-) diff --git a/mlir/include/mlir/Dialect/SparseTensor/IR/SparseTensorOps.td b/mlir/include/mlir/Dialect/SparseTensor/IR/SparseTensorOps.td index 20512f972e67cd..96a61419a541f7 100644 --- a/mlir/include/mlir/Dialect/SparseTensor/IR/SparseTensorOps.td +++ b/mlir/include/mlir/Dialect/SparseTensor/IR/SparseTensorOps.td @@ -1644,7 +1644,7 @@ def IterateOp : SparseTensor_Op<"iterate", return getIterSpace().getType().getSpaceDim(); } BlockArgument getIterator() { - return getRegion().getArguments().front(); + return getRegion().getArguments().back(); } std::optional getLvlCrd(Level lvl) { if (getCrdUsedLvls()[lvl]) { @@ -1654,9 +1654,8 @@ def IterateOp : SparseTensor_Op<"iterate", return std::nullopt; } Block::BlockArgListType getCrds() { - // The first block argument is iterator, the remaining arguments are - // referenced coordinates. - return getRegion().getArguments().slice(1, getCrdUsedLvls().count()); + // User-provided iteration arguments -> coords -> iterator. + return getRegion().getArguments().slice(getNumRegionIterArgs(), getCrdUsedLvls().count()); } unsigned getNumRegionIterArgs() { return getRegion().getArguments().size() - 1 - getCrdUsedLvls().count(); diff --git a/mlir/lib/Dialect/SparseTensor/IR/SparseTensorDialect.cpp b/mlir/lib/Dialect/SparseTensor/IR/SparseTensorDialect.cpp index 16856b958d4f13..b21bc1a93036c4 100644 --- a/mlir/lib/Dialect/SparseTensor/IR/SparseTensorDialect.cpp +++ b/mlir/lib/Dialect/SparseTensor/IR/SparseTensorDialect.cpp @@ -2228,9 +2228,10 @@ parseSparseIterateLoop(OpAsmParser &parser, OperationState &state, parser.getNameLoc(), "mismatch in number of sparse iterators and sparse spaces"); - if (failed(parseUsedCoordList(parser, state, blockArgs))) + SmallVector coords; + if (failed(parseUsedCoordList(parser, state, coords))) return failure(); - size_t numCrds = blockArgs.size(); + size_t numCrds = coords.size(); // Parse "iter_args(%arg = %init, ...)" bool hasIterArgs = succeeded(parser.parseOptionalKeyword("iter_args")); @@ -2238,6 +2239,8 @@ parseSparseIterateLoop(OpAsmParser &parser, OperationState &state, if (parser.parseAssignmentList(blockArgs, initArgs)) return failure(); + blockArgs.append(coords); + SmallVector iterSpaceTps; // parse ": sparse_tensor.iter_space -> ret" if (parser.parseColon() || parser.parseTypeList(iterSpaceTps)) @@ -2267,7 +2270,7 @@ parseSparseIterateLoop(OpAsmParser &parser, OperationState &state, if (hasIterArgs) { // Strip off leading args that used for coordinates. -MutableArrayRef args = MutableArrayRef(blockArgs).drop_front(numCrds); +MutableArrayRef args = MutableArrayRef(blockArgs).drop_back(numCrds); if (args.size() != initArgs.size() || args.size() != state.types.size()) { return parser.emitError( parser.getNameLoc(), @@ -2448,18 +2451,18 @@ void IterateOp::build(OpBuilder &builder, OperationState &odsState, odsState.addTypes(initArgs.getTypes()); Block *bodyBlock = builder.createBlock(bodyRegion); - // First argument, sparse iterator - bodyBlock->addArgument( - llvm::cast(iterSpace.getType()).getIteratorType(), - odsState.location); + // Starts with a list of user-provided loop arguments. + for (Value v : initArgs) +bodyBlock->addArgument(v.getType(), v.getLoc()); - // Followed by a list of used coordinates. + // Follows by a list of used coordinates. for (unsigned i = 0, e = crdUsedLvls.count(); i < e; i++) bodyBlock->addArgument(builder.getIndexType(), odsState.location); - // Followed by a list of user-provided loop arguments. - for (Value v : initArgs) -bodyBlock->addArgument(v.getType(), v.getLoc()); + // Ends with sparse iterator + bodyBlock->addArgument( + llvm::cast(iterSpace.getType()).getIteratorType(), + odsState.location); } ParseResult IterateOp::parse(OpAsmParser &parser, OperationState &result) { @@ -2473,9 +2476,9 @@ ParseResult IterateOp::parse(OpAsmParser &parser, OperationState &result) { return parser.emitError(parser.getNameLoc(), "expected only one iterator/iteration space"); - iters.append(iterArgs); + iterArgs.append(iters); Region *body = r
[llvm-branch-commits] [mlir] [mlir][sparse] unify block arguments order between iterate/coiterate operations. (PR #105567)
https://github.com/PeimingLiu updated https://github.com/llvm/llvm-project/pull/105567 >From 3f83d7a1eadc1101fb96707ecd348925e5aaed70 Mon Sep 17 00:00:00 2001 From: Peiming Liu Date: Thu, 15 Aug 2024 21:10:37 + Subject: [PATCH] [mlir][sparse] unify block arguments order between iterate/coiterate operations. stack-info: PR: https://github.com/llvm/llvm-project/pull/105567, branch: users/PeimingLiu/stack/3 --- .../SparseTensor/IR/SparseTensorOps.td| 7 ++-- .../SparseTensor/IR/SparseTensorDialect.cpp | 31 .../Transforms/SparseIterationToScf.cpp | 36 ++- 3 files changed, 31 insertions(+), 43 deletions(-) diff --git a/mlir/include/mlir/Dialect/SparseTensor/IR/SparseTensorOps.td b/mlir/include/mlir/Dialect/SparseTensor/IR/SparseTensorOps.td index 20512f972e67cd..96a61419a541f7 100644 --- a/mlir/include/mlir/Dialect/SparseTensor/IR/SparseTensorOps.td +++ b/mlir/include/mlir/Dialect/SparseTensor/IR/SparseTensorOps.td @@ -1644,7 +1644,7 @@ def IterateOp : SparseTensor_Op<"iterate", return getIterSpace().getType().getSpaceDim(); } BlockArgument getIterator() { - return getRegion().getArguments().front(); + return getRegion().getArguments().back(); } std::optional getLvlCrd(Level lvl) { if (getCrdUsedLvls()[lvl]) { @@ -1654,9 +1654,8 @@ def IterateOp : SparseTensor_Op<"iterate", return std::nullopt; } Block::BlockArgListType getCrds() { - // The first block argument is iterator, the remaining arguments are - // referenced coordinates. - return getRegion().getArguments().slice(1, getCrdUsedLvls().count()); + // User-provided iteration arguments -> coords -> iterator. + return getRegion().getArguments().slice(getNumRegionIterArgs(), getCrdUsedLvls().count()); } unsigned getNumRegionIterArgs() { return getRegion().getArguments().size() - 1 - getCrdUsedLvls().count(); diff --git a/mlir/lib/Dialect/SparseTensor/IR/SparseTensorDialect.cpp b/mlir/lib/Dialect/SparseTensor/IR/SparseTensorDialect.cpp index 16856b958d4f13..b21bc1a93036c4 100644 --- a/mlir/lib/Dialect/SparseTensor/IR/SparseTensorDialect.cpp +++ b/mlir/lib/Dialect/SparseTensor/IR/SparseTensorDialect.cpp @@ -2228,9 +2228,10 @@ parseSparseIterateLoop(OpAsmParser &parser, OperationState &state, parser.getNameLoc(), "mismatch in number of sparse iterators and sparse spaces"); - if (failed(parseUsedCoordList(parser, state, blockArgs))) + SmallVector coords; + if (failed(parseUsedCoordList(parser, state, coords))) return failure(); - size_t numCrds = blockArgs.size(); + size_t numCrds = coords.size(); // Parse "iter_args(%arg = %init, ...)" bool hasIterArgs = succeeded(parser.parseOptionalKeyword("iter_args")); @@ -2238,6 +2239,8 @@ parseSparseIterateLoop(OpAsmParser &parser, OperationState &state, if (parser.parseAssignmentList(blockArgs, initArgs)) return failure(); + blockArgs.append(coords); + SmallVector iterSpaceTps; // parse ": sparse_tensor.iter_space -> ret" if (parser.parseColon() || parser.parseTypeList(iterSpaceTps)) @@ -2267,7 +2270,7 @@ parseSparseIterateLoop(OpAsmParser &parser, OperationState &state, if (hasIterArgs) { // Strip off leading args that used for coordinates. -MutableArrayRef args = MutableArrayRef(blockArgs).drop_front(numCrds); +MutableArrayRef args = MutableArrayRef(blockArgs).drop_back(numCrds); if (args.size() != initArgs.size() || args.size() != state.types.size()) { return parser.emitError( parser.getNameLoc(), @@ -2448,18 +2451,18 @@ void IterateOp::build(OpBuilder &builder, OperationState &odsState, odsState.addTypes(initArgs.getTypes()); Block *bodyBlock = builder.createBlock(bodyRegion); - // First argument, sparse iterator - bodyBlock->addArgument( - llvm::cast(iterSpace.getType()).getIteratorType(), - odsState.location); + // Starts with a list of user-provided loop arguments. + for (Value v : initArgs) +bodyBlock->addArgument(v.getType(), v.getLoc()); - // Followed by a list of used coordinates. + // Follows by a list of used coordinates. for (unsigned i = 0, e = crdUsedLvls.count(); i < e; i++) bodyBlock->addArgument(builder.getIndexType(), odsState.location); - // Followed by a list of user-provided loop arguments. - for (Value v : initArgs) -bodyBlock->addArgument(v.getType(), v.getLoc()); + // Ends with sparse iterator + bodyBlock->addArgument( + llvm::cast(iterSpace.getType()).getIteratorType(), + odsState.location); } ParseResult IterateOp::parse(OpAsmParser &parser, OperationState &result) { @@ -2473,9 +2476,9 @@ ParseResult IterateOp::parse(OpAsmParser &parser, OperationState &result) { return parser.emitError(parser.getNameLoc(), "expected only one iterator/iteration space"); - iters.append(iterArgs); + iterArgs.append(iters); Region *body = r
[llvm-branch-commits] [mlir] [mlir][sparse] refactoring sparse_tensor.iterate lowering pattern implementation. (PR #105566)
https://github.com/PeimingLiu updated https://github.com/llvm/llvm-project/pull/105566 >From 937bcd814688e7c6f88ef27b7586254006e0d050 Mon Sep 17 00:00:00 2001 From: Peiming Liu Date: Thu, 15 Aug 2024 18:10:25 + Subject: [PATCH] [mlir][sparse] refactoring sparse_tensor.iterate lowering pattern implementation. stack-info: PR: https://github.com/llvm/llvm-project/pull/105566, branch: users/PeimingLiu/stack/2 --- .../Transforms/SparseIterationToScf.cpp | 118 ++ 1 file changed, 36 insertions(+), 82 deletions(-) diff --git a/mlir/lib/Dialect/SparseTensor/Transforms/SparseIterationToScf.cpp b/mlir/lib/Dialect/SparseTensor/Transforms/SparseIterationToScf.cpp index d6c0da4a9e4573..f7fcabb0220b50 100644 --- a/mlir/lib/Dialect/SparseTensor/Transforms/SparseIterationToScf.cpp +++ b/mlir/lib/Dialect/SparseTensor/Transforms/SparseIterationToScf.cpp @@ -244,88 +244,41 @@ class SparseIterateOpConverter : public OneToNOpConversionPattern { std::unique_ptr it = iterSpace.extractIterator(rewriter, loc); -if (it->iteratableByFor()) { - auto [lo, hi] = it->genForCond(rewriter, loc); - Value step = constantIndex(rewriter, loc, 1); - SmallVector ivs; - for (ValueRange inits : adaptor.getInitArgs()) -llvm::append_range(ivs, inits); - scf::ForOp forOp = rewriter.create(loc, lo, hi, step, ivs); - - Block *loopBody = op.getBody(); - OneToNTypeMapping bodyTypeMapping(loopBody->getArgumentTypes()); - if (failed(typeConverter->convertSignatureArgs( - loopBody->getArgumentTypes(), bodyTypeMapping))) -return failure(); - rewriter.applySignatureConversion(loopBody, bodyTypeMapping); - - rewriter.eraseBlock(forOp.getBody()); - Region &dstRegion = forOp.getRegion(); - rewriter.inlineRegionBefore(op.getRegion(), dstRegion, dstRegion.end()); - - auto yieldOp = - llvm::cast(forOp.getBody()->getTerminator()); - - rewriter.setInsertionPointToEnd(forOp.getBody()); - // replace sparse_tensor.yield with scf.yield. - rewriter.create(loc, yieldOp.getResults()); - rewriter.eraseOp(yieldOp); - - const OneToNTypeMapping &resultMapping = adaptor.getResultMapping(); - rewriter.replaceOp(op, forOp.getResults(), resultMapping); -} else { - SmallVector ivs; - // TODO: put iterator at the end of argument list to be consistent with - // coiterate operation. - llvm::append_range(ivs, it->getCursor()); - for (ValueRange inits : adaptor.getInitArgs()) -llvm::append_range(ivs, inits); - - assert(llvm::all_of(ivs, [](Value v) { return v != nullptr; })); - - TypeRange types = ValueRange(ivs).getTypes(); - auto whileOp = rewriter.create(loc, types, ivs); - SmallVector l(types.size(), op.getIterator().getLoc()); - - // Generates loop conditions. - Block *before = rewriter.createBlock(&whileOp.getBefore(), {}, types, l); - rewriter.setInsertionPointToStart(before); - ValueRange bArgs = before->getArguments(); - auto [whileCond, remArgs] = it->genWhileCond(rewriter, loc, bArgs); - assert(remArgs.size() == adaptor.getInitArgs().size()); - rewriter.create(loc, whileCond, before->getArguments()); - - // Generates loop body. - Block *loopBody = op.getBody(); - OneToNTypeMapping bodyTypeMapping(loopBody->getArgumentTypes()); - if (failed(typeConverter->convertSignatureArgs( - loopBody->getArgumentTypes(), bodyTypeMapping))) -return failure(); - rewriter.applySignatureConversion(loopBody, bodyTypeMapping); - - Region &dstRegion = whileOp.getAfter(); - // TODO: handle uses of coordinate! - rewriter.inlineRegionBefore(op.getRegion(), dstRegion, dstRegion.end()); - ValueRange aArgs = whileOp.getAfterArguments(); - auto yieldOp = llvm::cast( - whileOp.getAfterBody()->getTerminator()); - - rewriter.setInsertionPointToEnd(whileOp.getAfterBody()); +SmallVector ivs; +for (ValueRange inits : adaptor.getInitArgs()) + llvm::append_range(ivs, inits); + +// Type conversion on iterate op block. +OneToNTypeMapping blockTypeMapping(op.getBody()->getArgumentTypes()); +if (failed(typeConverter->convertSignatureArgs( +op.getBody()->getArgumentTypes(), blockTypeMapping))) + return rewriter.notifyMatchFailure( + op, "failed to convert iterate region argurment types"); +rewriter.applySignatureConversion(op.getBody(), blockTypeMapping); + +Block *block = op.getBody(); +ValueRange ret = genLoopWithIterator( +rewriter, loc, it.get(), ivs, /*iterFirst=*/true, +[block](PatternRewriter &rewriter, Location loc, Region &loopBody, +SparseIterator *it, ValueRange reduc) -> SmallVector { + SmallVector blockArgs(it->getCursor()); + // TODO: Also appends coordinates if used. + // blockArgs.push_back(it->deref(rewriter, loc)); +
[llvm-branch-commits] [mlir] [mlir][sparse] unify block arguments order between iterate/coiterate operations. (PR #105567)
llvmbot wrote: @llvm/pr-subscribers-mlir-sparse Author: Peiming Liu (PeimingLiu) Changes Stacked PRs: * __->__#105567 * #105566 * #105565 --- --- --- ### [mlir][sparse] unify block arguments order between iterate/coiterate operations. --- Full diff: https://github.com/llvm/llvm-project/pull/105567.diff 3 Files Affected: - (modified) mlir/include/mlir/Dialect/SparseTensor/IR/SparseTensorOps.td (+3-4) - (modified) mlir/lib/Dialect/SparseTensor/IR/SparseTensorDialect.cpp (+17-14) - (modified) mlir/lib/Dialect/SparseTensor/Transforms/SparseIterationToScf.cpp (+11-25) ``diff diff --git a/mlir/include/mlir/Dialect/SparseTensor/IR/SparseTensorOps.td b/mlir/include/mlir/Dialect/SparseTensor/IR/SparseTensorOps.td index 20512f972e67cd..96a61419a541f7 100644 --- a/mlir/include/mlir/Dialect/SparseTensor/IR/SparseTensorOps.td +++ b/mlir/include/mlir/Dialect/SparseTensor/IR/SparseTensorOps.td @@ -1644,7 +1644,7 @@ def IterateOp : SparseTensor_Op<"iterate", return getIterSpace().getType().getSpaceDim(); } BlockArgument getIterator() { - return getRegion().getArguments().front(); + return getRegion().getArguments().back(); } std::optional getLvlCrd(Level lvl) { if (getCrdUsedLvls()[lvl]) { @@ -1654,9 +1654,8 @@ def IterateOp : SparseTensor_Op<"iterate", return std::nullopt; } Block::BlockArgListType getCrds() { - // The first block argument is iterator, the remaining arguments are - // referenced coordinates. - return getRegion().getArguments().slice(1, getCrdUsedLvls().count()); + // User-provided iteration arguments -> coords -> iterator. + return getRegion().getArguments().slice(getNumRegionIterArgs(), getCrdUsedLvls().count()); } unsigned getNumRegionIterArgs() { return getRegion().getArguments().size() - 1 - getCrdUsedLvls().count(); diff --git a/mlir/lib/Dialect/SparseTensor/IR/SparseTensorDialect.cpp b/mlir/lib/Dialect/SparseTensor/IR/SparseTensorDialect.cpp index 16856b958d4f13..b21bc1a93036c4 100644 --- a/mlir/lib/Dialect/SparseTensor/IR/SparseTensorDialect.cpp +++ b/mlir/lib/Dialect/SparseTensor/IR/SparseTensorDialect.cpp @@ -2228,9 +2228,10 @@ parseSparseIterateLoop(OpAsmParser &parser, OperationState &state, parser.getNameLoc(), "mismatch in number of sparse iterators and sparse spaces"); - if (failed(parseUsedCoordList(parser, state, blockArgs))) + SmallVector coords; + if (failed(parseUsedCoordList(parser, state, coords))) return failure(); - size_t numCrds = blockArgs.size(); + size_t numCrds = coords.size(); // Parse "iter_args(%arg = %init, ...)" bool hasIterArgs = succeeded(parser.parseOptionalKeyword("iter_args")); @@ -2238,6 +2239,8 @@ parseSparseIterateLoop(OpAsmParser &parser, OperationState &state, if (parser.parseAssignmentList(blockArgs, initArgs)) return failure(); + blockArgs.append(coords); + SmallVector iterSpaceTps; // parse ": sparse_tensor.iter_space -> ret" if (parser.parseColon() || parser.parseTypeList(iterSpaceTps)) @@ -2267,7 +2270,7 @@ parseSparseIterateLoop(OpAsmParser &parser, OperationState &state, if (hasIterArgs) { // Strip off leading args that used for coordinates. -MutableArrayRef args = MutableArrayRef(blockArgs).drop_front(numCrds); +MutableArrayRef args = MutableArrayRef(blockArgs).drop_back(numCrds); if (args.size() != initArgs.size() || args.size() != state.types.size()) { return parser.emitError( parser.getNameLoc(), @@ -2448,18 +2451,18 @@ void IterateOp::build(OpBuilder &builder, OperationState &odsState, odsState.addTypes(initArgs.getTypes()); Block *bodyBlock = builder.createBlock(bodyRegion); - // First argument, sparse iterator - bodyBlock->addArgument( - llvm::cast(iterSpace.getType()).getIteratorType(), - odsState.location); + // Starts with a list of user-provided loop arguments. + for (Value v : initArgs) +bodyBlock->addArgument(v.getType(), v.getLoc()); - // Followed by a list of used coordinates. + // Follows by a list of used coordinates. for (unsigned i = 0, e = crdUsedLvls.count(); i < e; i++) bodyBlock->addArgument(builder.getIndexType(), odsState.location); - // Followed by a list of user-provided loop arguments. - for (Value v : initArgs) -bodyBlock->addArgument(v.getType(), v.getLoc()); + // Ends with sparse iterator + bodyBlock->addArgument( + llvm::cast(iterSpace.getType()).getIteratorType(), + odsState.location); } ParseResult IterateOp::parse(OpAsmParser &parser, OperationState &result) { @@ -2473,9 +2476,9 @@ ParseResult IterateOp::parse(OpAsmParser &parser, OperationState &result) { return parser.emitError(parser.getNameLoc(), "expected only one iterator/iteration space"); - iters.append(iterArgs); + iterArgs.append(iters); Region *body = result.addRegion(); - if (parser.parseRegion(*body, iters))
[llvm-branch-commits] [mlir] [mlir][sparse] refactoring sparse_tensor.iterate lowering pattern implementation. (PR #105566)
llvmbot wrote: @llvm/pr-subscribers-mlir-sparse Author: Peiming Liu (PeimingLiu) Changes Stacked PRs: * #105567 * __->__#105566 * #105565 --- --- --- ### [mlir][sparse] refactoring sparse_tensor.iterate lowering pattern implementation. --- Full diff: https://github.com/llvm/llvm-project/pull/105566.diff 1 Files Affected: - (modified) mlir/lib/Dialect/SparseTensor/Transforms/SparseIterationToScf.cpp (+36-82) ``diff diff --git a/mlir/lib/Dialect/SparseTensor/Transforms/SparseIterationToScf.cpp b/mlir/lib/Dialect/SparseTensor/Transforms/SparseIterationToScf.cpp index d6c0da4a9e457..f7fcabb0220b5 100644 --- a/mlir/lib/Dialect/SparseTensor/Transforms/SparseIterationToScf.cpp +++ b/mlir/lib/Dialect/SparseTensor/Transforms/SparseIterationToScf.cpp @@ -244,88 +244,41 @@ class SparseIterateOpConverter : public OneToNOpConversionPattern { std::unique_ptr it = iterSpace.extractIterator(rewriter, loc); -if (it->iteratableByFor()) { - auto [lo, hi] = it->genForCond(rewriter, loc); - Value step = constantIndex(rewriter, loc, 1); - SmallVector ivs; - for (ValueRange inits : adaptor.getInitArgs()) -llvm::append_range(ivs, inits); - scf::ForOp forOp = rewriter.create(loc, lo, hi, step, ivs); - - Block *loopBody = op.getBody(); - OneToNTypeMapping bodyTypeMapping(loopBody->getArgumentTypes()); - if (failed(typeConverter->convertSignatureArgs( - loopBody->getArgumentTypes(), bodyTypeMapping))) -return failure(); - rewriter.applySignatureConversion(loopBody, bodyTypeMapping); - - rewriter.eraseBlock(forOp.getBody()); - Region &dstRegion = forOp.getRegion(); - rewriter.inlineRegionBefore(op.getRegion(), dstRegion, dstRegion.end()); - - auto yieldOp = - llvm::cast(forOp.getBody()->getTerminator()); - - rewriter.setInsertionPointToEnd(forOp.getBody()); - // replace sparse_tensor.yield with scf.yield. - rewriter.create(loc, yieldOp.getResults()); - rewriter.eraseOp(yieldOp); - - const OneToNTypeMapping &resultMapping = adaptor.getResultMapping(); - rewriter.replaceOp(op, forOp.getResults(), resultMapping); -} else { - SmallVector ivs; - // TODO: put iterator at the end of argument list to be consistent with - // coiterate operation. - llvm::append_range(ivs, it->getCursor()); - for (ValueRange inits : adaptor.getInitArgs()) -llvm::append_range(ivs, inits); - - assert(llvm::all_of(ivs, [](Value v) { return v != nullptr; })); - - TypeRange types = ValueRange(ivs).getTypes(); - auto whileOp = rewriter.create(loc, types, ivs); - SmallVector l(types.size(), op.getIterator().getLoc()); - - // Generates loop conditions. - Block *before = rewriter.createBlock(&whileOp.getBefore(), {}, types, l); - rewriter.setInsertionPointToStart(before); - ValueRange bArgs = before->getArguments(); - auto [whileCond, remArgs] = it->genWhileCond(rewriter, loc, bArgs); - assert(remArgs.size() == adaptor.getInitArgs().size()); - rewriter.create(loc, whileCond, before->getArguments()); - - // Generates loop body. - Block *loopBody = op.getBody(); - OneToNTypeMapping bodyTypeMapping(loopBody->getArgumentTypes()); - if (failed(typeConverter->convertSignatureArgs( - loopBody->getArgumentTypes(), bodyTypeMapping))) -return failure(); - rewriter.applySignatureConversion(loopBody, bodyTypeMapping); - - Region &dstRegion = whileOp.getAfter(); - // TODO: handle uses of coordinate! - rewriter.inlineRegionBefore(op.getRegion(), dstRegion, dstRegion.end()); - ValueRange aArgs = whileOp.getAfterArguments(); - auto yieldOp = llvm::cast( - whileOp.getAfterBody()->getTerminator()); - - rewriter.setInsertionPointToEnd(whileOp.getAfterBody()); +SmallVector ivs; +for (ValueRange inits : adaptor.getInitArgs()) + llvm::append_range(ivs, inits); + +// Type conversion on iterate op block. +OneToNTypeMapping blockTypeMapping(op.getBody()->getArgumentTypes()); +if (failed(typeConverter->convertSignatureArgs( +op.getBody()->getArgumentTypes(), blockTypeMapping))) + return rewriter.notifyMatchFailure( + op, "failed to convert iterate region argurment types"); +rewriter.applySignatureConversion(op.getBody(), blockTypeMapping); + +Block *block = op.getBody(); +ValueRange ret = genLoopWithIterator( +rewriter, loc, it.get(), ivs, /*iterFirst=*/true, +[block](PatternRewriter &rewriter, Location loc, Region &loopBody, +SparseIterator *it, ValueRange reduc) -> SmallVector { + SmallVector blockArgs(it->getCursor()); + // TODO: Also appends coordinates if used. + // blockArgs.push_back(it->deref(rewriter, loc)); + llvm::append_range(blockArgs, reduc); + + Block *dstBlock = &loopBody.getBlocks(
[llvm-branch-commits] [llvm] [AMDGPU] GFX12 VMEM instructions can write VGPR results out of order (PR #105549)
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/105549 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [DirectX] Lower `@llvm.dx.handle.fromBinding` to DXIL ops (PR #104251)
@@ -117,6 +119,10 @@ class ResourceInfo { MSInfo MultiSample; + // We need a default constructor if we want to insert this in a MapVector. + ResourceInfo() {} + friend class MapVector; bob80905 wrote: Where is this being inserted in a MapVector? Is it DRM? https://github.com/llvm/llvm-project/pull/104251 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] DAG: Check if is_fpclass is custom, instead of isLegalOrCustom (PR #105577)
https://github.com/arsenm created https://github.com/llvm/llvm-project/pull/105577 For some reason, isOperationLegalOrCustom is not the same as isOperationLegal || isOperationCustom. Unfortunately, it checks if the type is legal which makes it uesless for custom lowering on non-legal types (which is always ppcf128). Really the DAG builder shouldn't be going to expand this in the builder, it makes it difficult to work with. It's only here to work around the DAG requiring legal integer types the same size as the FP type after type legalization. >From b57fb07c93a8052805110626786a8e242213c983 Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Wed, 21 Aug 2024 20:15:55 +0400 Subject: [PATCH] DAG: Check if is_fpclass is custom, instead of isLegalOrCustom For some reason, isOperationLegalOrCustom is not the same as isOperationLegal || isOperationCustom. Unfortunately, it checks if the type is legal which makes it uesless for custom lowering on non-legal types (which is always ppcf128). Really the DAG builder shouldn't be going to expand this in the builder, it makes it difficult to work with. It's only here to work around the DAG requiring legal integer types the same size as the FP type after type legalization. --- .../SelectionDAG/SelectionDAGBuilder.cpp | 3 +- llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp | 17 +- llvm/test/CodeGen/AMDGPU/fract-match.ll | 10 +- .../CodeGen/AMDGPU/llvm.is.fpclass.f16.ll | 205 +++--- llvm/test/CodeGen/PowerPC/is_fpclass.ll | 37 ++-- 5 files changed, 160 insertions(+), 112 deletions(-) diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp index 60dcb118542785..09a3def6586493 100644 --- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp @@ -7032,7 +7032,8 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I, // If ISD::IS_FPCLASS should be expanded, do it right now, because the // expansion can use illegal types. Making expansion early allows // legalizing these types prior to selection. -if (!TLI.isOperationLegalOrCustom(ISD::IS_FPCLASS, ArgVT)) { +if (!TLI.isOperationLegal(ISD::IS_FPCLASS, ArgVT) && +!TLI.isOperationCustom(ISD::IS_FPCLASS, ArgVT)) { SDValue Result = TLI.expandIS_FPCLASS(DestVT, Op, Test, Flags, sdl, DAG); setValue(&I, Result); return; diff --git a/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp b/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp index e57c8f8b7b4835..866e04bcc7fb2d 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp @@ -426,12 +426,17 @@ AMDGPUTargetLowering::AMDGPUTargetLowering(const TargetMachine &TM, // FIXME: These IS_FPCLASS vector fp types are marked custom so it reaches // scalarization code. Can be removed when IS_FPCLASS expand isn't called by // default unless marked custom/legal. - setOperationAction( - ISD::IS_FPCLASS, - {MVT::v2f16, MVT::v3f16, MVT::v4f16, MVT::v16f16, MVT::v2f32, MVT::v3f32, - MVT::v4f32, MVT::v5f32, MVT::v6f32, MVT::v7f32, MVT::v8f32, MVT::v16f32, - MVT::v2f64, MVT::v3f64, MVT::v4f64, MVT::v8f64, MVT::v16f64}, - Custom); + setOperationAction(ISD::IS_FPCLASS, + {MVT::v2f32, MVT::v3f32, MVT::v4f32, MVT::v5f32, + MVT::v6f32, MVT::v7f32, MVT::v8f32, MVT::v16f32, + MVT::v2f64, MVT::v3f64, MVT::v4f64, MVT::v8f64, + MVT::v16f64}, + Custom); + + if (isTypeLegal(MVT::f16)) +setOperationAction(ISD::IS_FPCLASS, + {MVT::v2f16, MVT::v3f16, MVT::v4f16, MVT::v16f16}, + Custom); // Expand to fneg + fadd. setOperationAction(ISD::FSUB, MVT::f64, Expand); diff --git a/llvm/test/CodeGen/AMDGPU/fract-match.ll b/llvm/test/CodeGen/AMDGPU/fract-match.ll index 1b28ddb2c58620..b212b9caf8400e 100644 --- a/llvm/test/CodeGen/AMDGPU/fract-match.ll +++ b/llvm/test/CodeGen/AMDGPU/fract-match.ll @@ -2135,16 +2135,16 @@ define <2 x half> @safe_math_fract_v2f16(<2 x half> %x, ptr addrspace(1) nocaptu ; GFX8-LABEL: safe_math_fract_v2f16: ; GFX8: ; %bb.0: ; %entry ; GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX8-NEXT:v_mov_b32_e32 v6, 0x204 +; GFX8-NEXT:s_movk_i32 s6, 0x204 ; GFX8-NEXT:v_floor_f16_sdwa v3, v0 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 ; GFX8-NEXT:v_floor_f16_e32 v4, v0 -; GFX8-NEXT:v_cmp_class_f16_sdwa s[4:5], v0, v6 src0_sel:WORD_1 src1_sel:DWORD +; GFX8-NEXT:v_fract_f16_sdwa v5, v0 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 +; GFX8-NEXT:v_cmp_class_f16_sdwa s[4:5], v0, s6 src0_sel:WORD_1 src1_sel:DWORD ; GFX8-NEXT:v_pack_b32_f16 v3, v4, v3 ; GFX8-NEXT:v_fract_f16_e32 v4, v0 -; GFX8-NEXT:v_fract_f16_sdwa v5, v0 dst_sel:DWORD dst_unused:UNUSED_
[llvm-branch-commits] [llvm] DAG: Check if is_fpclass is custom, instead of isLegalOrCustom (PR #105577)
arsenm wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/105577?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#105577** https://app.graphite.dev/github/pr/llvm/llvm-project/105577?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> 👈 * **#105540** https://app.graphite.dev/github/pr/llvm/llvm-project/105540?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by Graphite. https://stacking.dev/?utm_source=stack-comment";>Learn more about stacking. Join @arsenm and the rest of your teammates on https://graphite.dev?utm-source=stack-comment";>https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="11px" height="11px"/> Graphite https://github.com/llvm/llvm-project/pull/105577 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] DAG: Check if is_fpclass is custom, instead of isLegalOrCustom (PR #105577)
llvmbot wrote: @llvm/pr-subscribers-llvm-selectiondag Author: Matt Arsenault (arsenm) Changes For some reason, isOperationLegalOrCustom is not the same as isOperationLegal || isOperationCustom. Unfortunately, it checks if the type is legal which makes it uesless for custom lowering on non-legal types (which is always ppcf128). Really the DAG builder shouldn't be going to expand this in the builder, it makes it difficult to work with. It's only here to work around the DAG requiring legal integer types the same size as the FP type after type legalization. --- Full diff: https://github.com/llvm/llvm-project/pull/105577.diff 5 Files Affected: - (modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp (+2-1) - (modified) llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp (+11-6) - (modified) llvm/test/CodeGen/AMDGPU/fract-match.ll (+5-5) - (modified) llvm/test/CodeGen/AMDGPU/llvm.is.fpclass.f16.ll (+128-77) - (modified) llvm/test/CodeGen/PowerPC/is_fpclass.ll (+14-23) ``diff diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp index 60dcb118542785..09a3def6586493 100644 --- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp @@ -7032,7 +7032,8 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I, // If ISD::IS_FPCLASS should be expanded, do it right now, because the // expansion can use illegal types. Making expansion early allows // legalizing these types prior to selection. -if (!TLI.isOperationLegalOrCustom(ISD::IS_FPCLASS, ArgVT)) { +if (!TLI.isOperationLegal(ISD::IS_FPCLASS, ArgVT) && +!TLI.isOperationCustom(ISD::IS_FPCLASS, ArgVT)) { SDValue Result = TLI.expandIS_FPCLASS(DestVT, Op, Test, Flags, sdl, DAG); setValue(&I, Result); return; diff --git a/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp b/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp index e57c8f8b7b4835..866e04bcc7fb2d 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp @@ -426,12 +426,17 @@ AMDGPUTargetLowering::AMDGPUTargetLowering(const TargetMachine &TM, // FIXME: These IS_FPCLASS vector fp types are marked custom so it reaches // scalarization code. Can be removed when IS_FPCLASS expand isn't called by // default unless marked custom/legal. - setOperationAction( - ISD::IS_FPCLASS, - {MVT::v2f16, MVT::v3f16, MVT::v4f16, MVT::v16f16, MVT::v2f32, MVT::v3f32, - MVT::v4f32, MVT::v5f32, MVT::v6f32, MVT::v7f32, MVT::v8f32, MVT::v16f32, - MVT::v2f64, MVT::v3f64, MVT::v4f64, MVT::v8f64, MVT::v16f64}, - Custom); + setOperationAction(ISD::IS_FPCLASS, + {MVT::v2f32, MVT::v3f32, MVT::v4f32, MVT::v5f32, + MVT::v6f32, MVT::v7f32, MVT::v8f32, MVT::v16f32, + MVT::v2f64, MVT::v3f64, MVT::v4f64, MVT::v8f64, + MVT::v16f64}, + Custom); + + if (isTypeLegal(MVT::f16)) +setOperationAction(ISD::IS_FPCLASS, + {MVT::v2f16, MVT::v3f16, MVT::v4f16, MVT::v16f16}, + Custom); // Expand to fneg + fadd. setOperationAction(ISD::FSUB, MVT::f64, Expand); diff --git a/llvm/test/CodeGen/AMDGPU/fract-match.ll b/llvm/test/CodeGen/AMDGPU/fract-match.ll index 1b28ddb2c58620..b212b9caf8400e 100644 --- a/llvm/test/CodeGen/AMDGPU/fract-match.ll +++ b/llvm/test/CodeGen/AMDGPU/fract-match.ll @@ -2135,16 +2135,16 @@ define <2 x half> @safe_math_fract_v2f16(<2 x half> %x, ptr addrspace(1) nocaptu ; GFX8-LABEL: safe_math_fract_v2f16: ; GFX8: ; %bb.0: ; %entry ; GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -; GFX8-NEXT:v_mov_b32_e32 v6, 0x204 +; GFX8-NEXT:s_movk_i32 s6, 0x204 ; GFX8-NEXT:v_floor_f16_sdwa v3, v0 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 ; GFX8-NEXT:v_floor_f16_e32 v4, v0 -; GFX8-NEXT:v_cmp_class_f16_sdwa s[4:5], v0, v6 src0_sel:WORD_1 src1_sel:DWORD +; GFX8-NEXT:v_fract_f16_sdwa v5, v0 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 +; GFX8-NEXT:v_cmp_class_f16_sdwa s[4:5], v0, s6 src0_sel:WORD_1 src1_sel:DWORD ; GFX8-NEXT:v_pack_b32_f16 v3, v4, v3 ; GFX8-NEXT:v_fract_f16_e32 v4, v0 -; GFX8-NEXT:v_fract_f16_sdwa v5, v0 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 -; GFX8-NEXT:v_cmp_class_f16_e32 vcc, v0, v6 ; GFX8-NEXT:v_cndmask_b32_e64 v5, v5, 0, s[4:5] -; GFX8-NEXT:v_cndmask_b32_e64 v0, v4, 0, vcc +; GFX8-NEXT:v_cmp_class_f16_e64 s[4:5], v0, s6 +; GFX8-NEXT:v_cndmask_b32_e64 v0, v4, 0, s[4:5] ; GFX8-NEXT:v_pack_b32_f16 v0, v0, v5 ; GFX8-NEXT:global_store_dword v[1:2], v3, off ; GFX8-NEXT:s_waitcnt vmcnt(0) diff --git a/llvm/test/CodeGen/AMDGPU/llvm.is.fpclass.f16.ll b/llvm/test/CodeGen/AMDGPU/llvm.is.fpclass.f16.ll index 9c248bd6e8b2aa..3d8e9e60973053 100644 --- a/llvm
[llvm-branch-commits] [llvm] DAG: Check if is_fpclass is custom, instead of isLegalOrCustom (PR #105577)
https://github.com/arsenm ready_for_review https://github.com/llvm/llvm-project/pull/105577 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] [LLVM][Coroutines] Create `.noalloc` variant of switch ABI coroutine ramp functions during CoroSplit (PR #99283)
@@ -1455,6 +1462,62 @@ struct SwitchCoroutineSplitter { setCoroInfo(F, Shape, Clones); } + static Function *createNoAllocVariant(Function &F, coro::Shape &Shape, yuxuanchen1997 wrote: This is done. Thanks. https://github.com/llvm/llvm-project/pull/99283 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LLVM][Coroutines] Create `.noalloc` variant of switch ABI coroutine ramp functions during CoroSplit (PR #99283)
https://github.com/yuxuanchen1997 updated https://github.com/llvm/llvm-project/pull/99283 >From be91ecd53679df7536616132b3492d53a0642ef4 Mon Sep 17 00:00:00 2001 From: Yuxuan Chen Date: Tue, 4 Jun 2024 23:22:00 -0700 Subject: [PATCH] [Clang] Introduce [[clang::coro_await_elidable]] --- llvm/docs/Coroutines.rst | 22 +++ llvm/lib/Transforms/Coroutines/CoroInternal.h | 4 + llvm/lib/Transforms/Coroutines/CoroSplit.cpp | 142 ++ llvm/lib/Transforms/Coroutines/Coroutines.cpp | 27 llvm/test/Transforms/Coroutines/ArgAddr.ll| 6 +- .../Transforms/Coroutines/coro-alloca-07.ll | 2 +- .../coro-alloca-loop-carried-address.ll | 2 +- .../Coroutines/coro-lifetime-end.ll | 6 +- .../Coroutines/coro-spill-after-phi.ll| 2 +- .../Transforms/Coroutines/coro-split-00.ll| 9 +- 10 files changed, 187 insertions(+), 35 deletions(-) diff --git a/llvm/docs/Coroutines.rst b/llvm/docs/Coroutines.rst index 36092325e536fb..13cb2d768a3bf8 100644 --- a/llvm/docs/Coroutines.rst +++ b/llvm/docs/Coroutines.rst @@ -2022,6 +2022,12 @@ The pass CoroSplit builds coroutine frame and outlines resume and destroy parts into separate functions. This pass also lowers `coro.await.suspend.void`_, `coro.await.suspend.bool`_ and `coro.await.suspend.handle`_ intrinsics. +CoroAnnotationElide +--- +This pass finds all usages of coroutines that are "must elide" and replaces +`coro.begin` intrinsic with an address of a coroutine frame placed on its caller +and replaces `coro.alloc` and `coro.free` intrinsics with `false` and `null` +respectively to remove the deallocation code. CoroElide - @@ -2049,6 +2055,22 @@ the coroutine must reach the final suspend point when it get destroyed. This attribute only works for switched-resume coroutines now. +coro_must_elide +--- + +When a Call or Invoke instruction is marked with `coro_must_elide`, +CoroAnnotationElidePass performs heap elision when possible. Note that for +recursive or mutually recursive functions this elision is usually not possible. + + +coro_gen_noalloc_ramp +- + +This attribute hints CoroSplitPass to generate a `f.noalloc` ramp function for +a given coroutine `f`. For any call or invoke instruction that calls `f` and +attributed as `coro_must_elide`, CoroAnnotationElidePass is able to redirect +the call to use the `.noalloc` variant. + Metadata diff --git a/llvm/lib/Transforms/Coroutines/CoroInternal.h b/llvm/lib/Transforms/Coroutines/CoroInternal.h index d535ad7f85d74a..760c0bf894c9e0 100644 --- a/llvm/lib/Transforms/Coroutines/CoroInternal.h +++ b/llvm/lib/Transforms/Coroutines/CoroInternal.h @@ -26,6 +26,10 @@ bool declaresIntrinsics(const Module &M, const std::initializer_list); void replaceCoroFree(CoroIdInst *CoroId, bool Elide); +void suppressCoroAllocs(CoroIdInst *CoroId); +void suppressCoroAllocs(LLVMContext &Context, +ArrayRef CoroAllocs); + /// Attempts to rewrite the location operand of debug intrinsics in terms of /// the coroutine frame pointer, folding pointer offsets into the DIExpression /// of the intrinsic. diff --git a/llvm/lib/Transforms/Coroutines/CoroSplit.cpp b/llvm/lib/Transforms/Coroutines/CoroSplit.cpp index 40bc932c3e0eef..111ebf6d5163d6 100644 --- a/llvm/lib/Transforms/Coroutines/CoroSplit.cpp +++ b/llvm/lib/Transforms/Coroutines/CoroSplit.cpp @@ -25,6 +25,7 @@ #include "llvm/ADT/PriorityWorklist.h" #include "llvm/ADT/SmallPtrSet.h" #include "llvm/ADT/SmallVector.h" +#include "llvm/ADT/StringExtras.h" #include "llvm/ADT/StringRef.h" #include "llvm/ADT/Twine.h" #include "llvm/Analysis/CFG.h" @@ -1177,6 +1178,14 @@ static void updateAsyncFuncPointerContextSize(coro::Shape &Shape) { Shape.AsyncLowering.AsyncFuncPointer->setInitializer(NewFuncPtrStruct); } +static TypeSize getFrameSizeForShape(coro::Shape &Shape) { + // In the same function all coro.sizes should have the same result type. + auto *SizeIntrin = Shape.CoroSizes.back(); + Module *M = SizeIntrin->getModule(); + const DataLayout &DL = M->getDataLayout(); + return DL.getTypeAllocSize(Shape.FrameTy); +} + static void replaceFrameSizeAndAlignment(coro::Shape &Shape) { if (Shape.ABI == coro::ABI::Async) updateAsyncFuncPointerContextSize(Shape); @@ -1192,10 +1201,8 @@ static void replaceFrameSizeAndAlignment(coro::Shape &Shape) { // In the same function all coro.sizes should have the same result type. auto *SizeIntrin = Shape.CoroSizes.back(); - Module *M = SizeIntrin->getModule(); - const DataLayout &DL = M->getDataLayout(); - auto Size = DL.getTypeAllocSize(Shape.FrameTy); - auto *SizeConstant = ConstantInt::get(SizeIntrin->getType(), Size); + auto *SizeConstant = + ConstantInt::get(SizeIntrin->getType(), getFrameSizeForShape(Shape)); for (CoroSizeInst *CS : Shape.CoroSizes) { CS->replaceAllUsesWith(SizeConstant); @@ -145
[llvm-branch-commits] [llvm] [LLVM][Coroutines] Transform "coro_must_elide" calls to switch ABI coroutines to the `noalloc` variant (PR #99285)
https://github.com/yuxuanchen1997 updated https://github.com/llvm/llvm-project/pull/99285 >From 7ca8d8b7d1dfd1d901721dd45f83f861068f9ea0 Mon Sep 17 00:00:00 2001 From: Yuxuan Chen Date: Mon, 15 Jul 2024 15:01:39 -0700 Subject: [PATCH] add CoroAnnotationElidePass Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60250514 --- .../Coroutines/CoroAnnotationElide.h | 36 + llvm/lib/Passes/PassBuilder.cpp | 1 + llvm/lib/Passes/PassBuilderPipelines.cpp | 10 +- llvm/lib/Passes/PassRegistry.def | 1 + llvm/lib/Transforms/Coroutines/CMakeLists.txt | 1 + .../Coroutines/CoroAnnotationElide.cpp| 143 ++ llvm/test/Other/new-pm-defaults.ll| 1 + .../Other/new-pm-thinlto-postlink-defaults.ll | 1 + .../new-pm-thinlto-postlink-pgo-defaults.ll | 1 + ...-pm-thinlto-postlink-samplepgo-defaults.ll | 1 + .../Coroutines/coro-transform-must-elide.ll | 76 ++ 11 files changed, 270 insertions(+), 2 deletions(-) create mode 100644 llvm/include/llvm/Transforms/Coroutines/CoroAnnotationElide.h create mode 100644 llvm/lib/Transforms/Coroutines/CoroAnnotationElide.cpp create mode 100644 llvm/test/Transforms/Coroutines/coro-transform-must-elide.ll diff --git a/llvm/include/llvm/Transforms/Coroutines/CoroAnnotationElide.h b/llvm/include/llvm/Transforms/Coroutines/CoroAnnotationElide.h new file mode 100644 index 00..2d6e84bdd66423 --- /dev/null +++ b/llvm/include/llvm/Transforms/Coroutines/CoroAnnotationElide.h @@ -0,0 +1,36 @@ +//===- CoroAnnotationElide.h - Optimizing a coro_must_elide call --===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// +// \file +// This pass transforms all Call or Invoke instructions that are annotated +// "coro_must_elide" to call the `.noalloc` variant of coroutine instead. +// The frame of the callee coroutine is allocated inside the caller. A pointer +// to the allocated frame will be passed into the `.noalloc` ramp function. +// +//===--===// + +#ifndef LLVM_TRANSFORMS_COROUTINES_COROANNOTATIONELIDE_H +#define LLVM_TRANSFORMS_COROUTINES_COROANNOTATIONELIDE_H + +#include "llvm/Analysis/CGSCCPassManager.h" +#include "llvm/Analysis/LazyCallGraph.h" +#include "llvm/IR/PassManager.h" + +namespace llvm { + +struct CoroAnnotationElidePass : PassInfoMixin { + CoroAnnotationElidePass() {} + + PreservedAnalyses run(LazyCallGraph::SCC &C, CGSCCAnalysisManager &AM, +LazyCallGraph &CG, CGSCCUpdateResult &UR); + + static bool isRequired() { return false; } +}; +} // end namespace llvm + +#endif // LLVM_TRANSFORMS_COROUTINES_COROANNOTATIONELIDE_H diff --git a/llvm/lib/Passes/PassBuilder.cpp b/llvm/lib/Passes/PassBuilder.cpp index 17eed97fd950c9..c2b99a0d1f8cea 100644 --- a/llvm/lib/Passes/PassBuilder.cpp +++ b/llvm/lib/Passes/PassBuilder.cpp @@ -138,6 +138,7 @@ #include "llvm/Target/TargetMachine.h" #include "llvm/Transforms/AggressiveInstCombine/AggressiveInstCombine.h" #include "llvm/Transforms/CFGuard.h" +#include "llvm/Transforms/Coroutines/CoroAnnotationElide.h" #include "llvm/Transforms/Coroutines/CoroCleanup.h" #include "llvm/Transforms/Coroutines/CoroConditionalWrapper.h" #include "llvm/Transforms/Coroutines/CoroEarly.h" diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp index 1184123c7710f0..992b4fca8a6919 100644 --- a/llvm/lib/Passes/PassBuilderPipelines.cpp +++ b/llvm/lib/Passes/PassBuilderPipelines.cpp @@ -33,6 +33,7 @@ #include "llvm/Support/VirtualFileSystem.h" #include "llvm/Target/TargetMachine.h" #include "llvm/Transforms/AggressiveInstCombine/AggressiveInstCombine.h" +#include "llvm/Transforms/Coroutines/CoroAnnotationElide.h" #include "llvm/Transforms/Coroutines/CoroCleanup.h" #include "llvm/Transforms/Coroutines/CoroConditionalWrapper.h" #include "llvm/Transforms/Coroutines/CoroEarly.h" @@ -984,8 +985,10 @@ PassBuilder::buildInlinerPipeline(OptimizationLevel Level, MainCGPipeline.addPass(createCGSCCToFunctionPassAdaptor( RequireAnalysisPass())); - if (Phase != ThinOrFullLTOPhase::ThinLTOPreLink) + if (Phase != ThinOrFullLTOPhase::ThinLTOPreLink) { MainCGPipeline.addPass(CoroSplitPass(Level != OptimizationLevel::O0)); +MainCGPipeline.addPass(CoroAnnotationElidePass()); + } // Make sure we don't affect potential future NoRerun CGSCC adaptors. MIWP.addLateModulePass(createModuleToFunctionPassAdaptor( @@ -1027,9 +1030,12 @@ PassBuilder::buildModuleInlinerPipeline(OptimizationLevel Level, buildFunctionSimplificationPipeline(Level, Phase), PT
[llvm-branch-commits] [llvm] [LLVM][Coroutines] Create `.noalloc` variant of switch ABI coroutine ramp functions during CoroSplit (PR #99283)
yuxuanchen1997 wrote: @ChuanqiXu9 I have changed this patch to only conditionally create the `.noalloc` variant based on an attribute (which is controlled by FE). Let me know if this is good to go. https://github.com/llvm/llvm-project/pull/99283 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LLVM][Coroutines] Transform "coro_must_elide" calls to switch ABI coroutines to the `noalloc` variant (PR #99285)
@@ -968,8 +969,8 @@ PassBuilder::buildInlinerPipeline(OptimizationLevel Level, // it's been modified since. MainCGPipeline.addPass(createCGSCCToFunctionPassAdaptor( RequireAnalysisPass())); - MainCGPipeline.addPass(CoroSplitPass(Level != OptimizationLevel::O0)); + MainCGPipeline.addPass(CoroAnnotationElidePass()); yuxuanchen1997 wrote: I applied this suggestion. However, looking at the `buildModuleInlinerPipeline`. It looks like it uses an adapter that runs single CGSCC pass on every function in the module. This won't work well for `CoroAnnotationElidePass` actually. It depends on the callee to be split, but not the caller. Thinking about this, this is actually the same condition as the old `CoroElidePass`. Maybe the right thing to do here is to make this pass a function pass instead and use `createCGSCCToFunctionPassAdaptor`. What do you think? https://github.com/llvm/llvm-project/pull/99285 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [DirectX] Lower `@llvm.dx.handle.fromBinding` to DXIL ops (PR #104251)
@@ -683,6 +685,14 @@ def Dot4 : DXILOp<56, dot4> { let attributes = [Attributes]; } +def CreateHandle : DXILOp<57, createHandle> { + let Doc = "creates the handle to a resource"; + // ResourceClass, RangeID, Index, NonUniform + let arguments = [Int8Ty, Int32Ty, Int32Ty, Int1Ty]; + let result = HandleTy; + let stages = [Stages]; dmpots wrote: This should be invalid starting in DXIL_1_6 I think, right? Did we add a way to express that in the TD definition? https://github.com/llvm/llvm-project/pull/104251 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [DirectX] Lower `@llvm.dx.handle.fromBinding` to DXIL ops (PR #104251)
@@ -119,6 +123,119 @@ class OpLowerer { }); } + Value *createTmpHandleCast(Value *V, Type *Ty) { +Function *CastFn = Intrinsic::getDeclaration(&M, Intrinsic::dx_cast_handle, + {Ty, V->getType()}); +CallInst *Cast = OpBuilder.getIRB().CreateCall(CastFn, {V}); +CleanupCasts.push_back(Cast); +return Cast; + } + + void cleanupHandleCasts() { +SmallVector ToRemove; +SmallVector CastFns; + +for (CallInst *Cast : CleanupCasts) { + CastFns.push_back(Cast->getCalledFunction()); + // All of the ops should be using `dx.types.Handle` at this point, so if + // we're not producing that we should be part of a pair. Track this so we dmpots wrote: It's not clear from reading what "it should be part of a pair" means and why it must be true. Can we expand the comment here to explain? Is there an assert we should add here as well? https://github.com/llvm/llvm-project/pull/104251 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [DirectX] Lower `@llvm.dx.handle.fromBinding` to DXIL ops (PR #104251)
@@ -119,6 +123,119 @@ class OpLowerer { }); } + Value *createTmpHandleCast(Value *V, Type *Ty) { +Function *CastFn = Intrinsic::getDeclaration(&M, Intrinsic::dx_cast_handle, + {Ty, V->getType()}); +CallInst *Cast = OpBuilder.getIRB().CreateCall(CastFn, {V}); +CleanupCasts.push_back(Cast); +return Cast; + } + + void cleanupHandleCasts() { +SmallVector ToRemove; +SmallVector CastFns; + +for (CallInst *Cast : CleanupCasts) { + CastFns.push_back(Cast->getCalledFunction()); + // All of the ops should be using `dx.types.Handle` at this point, so if + // we're not producing that we should be part of a pair. Track this so we + // can remove it at the end. + if (Cast->getType() != OpBuilder.getHandleType()) { +ToRemove.push_back(Cast); +continue; + } + // Otherwise, we're the second handle in a pair. Forward the arguments and + // remove the (second) cast. + CallInst *Def = cast(Cast->getOperand(0)); + assert(Def->getIntrinsicID() == Intrinsic::dx_cast_handle && + "Unbalanced pair of temporary handle casts"); + Cast->replaceAllUsesWith(Def->getOperand(0)); + Cast->eraseFromParent(); +} +for (CallInst *Cast : ToRemove) { + assert(Cast->user_empty() && "Temporary handle cast still has users"); + Cast->eraseFromParent(); +} +llvm::sort(CastFns); +CastFns.erase(llvm::unique(CastFns), CastFns.end()); +for (Function *F : CastFns) + F->eraseFromParent(); + +CleanupCasts.clear(); + } + + void lowerToCreateHandle(Function &F) { +IRBuilder<> &IRB = OpBuilder.getIRB(); +Type *Int8Ty = IRB.getInt8Ty(); +Type *Int32Ty = IRB.getInt32Ty(); + +replaceFunction(F, [&](CallInst *CI) -> Error { + IRB.SetInsertPoint(CI); + + dxil::ResourceInfo &RI = DRM[CI]; + dxil::ResourceInfo::ResourceBinding Binding = RI.getBinding(); + + std::array Args{ + ConstantInt::get(Int8Ty, llvm::to_underlying(RI.getResourceClass())), + ConstantInt::get(Int32Ty, Binding.RecordID), CI->getArgOperand(3), + CI->getArgOperand(4)}; + Expected OpCall = + OpBuilder.tryCreateOp(OpCode::CreateHandle, Args); + if (Error E = OpCall.takeError()) +return E; + + Value *Cast = createTmpHandleCast(*OpCall, CI->getType()); + + CI->replaceAllUsesWith(Cast); + CI->eraseFromParent(); + return Error::success(); +}); + } + + void lowerToBindAndAnnotateHandle(Function &F) { +IRBuilder<> &IRB = OpBuilder.getIRB(); + +replaceFunction(F, [&](CallInst *CI) -> Error { + IRB.SetInsertPoint(CI); + + dxil::ResourceInfo &RI = DRM[CI]; + dxil::ResourceInfo::ResourceBinding Binding = RI.getBinding(); + std::pair Props = RI.getAnnotateProps(); + + Constant *ResBind = OpBuilder.getResBind( + Binding.LowerBound, Binding.LowerBound + Binding.Size - 1, dmpots wrote: Is this going to do the right thing for unbounded resource array size? I think that should have an upper bound of UINT_MAX. https://github.com/llvm/llvm-project/pull/104251 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [DirectX] Lower `@llvm.dx.handle.fromBinding` to DXIL ops (PR #104251)
@@ -119,6 +123,119 @@ class OpLowerer { }); } + Value *createTmpHandleCast(Value *V, Type *Ty) { +Function *CastFn = Intrinsic::getDeclaration(&M, Intrinsic::dx_cast_handle, + {Ty, V->getType()}); +CallInst *Cast = OpBuilder.getIRB().CreateCall(CastFn, {V}); +CleanupCasts.push_back(Cast); +return Cast; + } + + void cleanupHandleCasts() { +SmallVector ToRemove; +SmallVector CastFns; + +for (CallInst *Cast : CleanupCasts) { + CastFns.push_back(Cast->getCalledFunction()); + // All of the ops should be using `dx.types.Handle` at this point, so if + // we're not producing that we should be part of a pair. Track this so we + // can remove it at the end. + if (Cast->getType() != OpBuilder.getHandleType()) { +ToRemove.push_back(Cast); +continue; + } + // Otherwise, we're the second handle in a pair. Forward the arguments and + // remove the (second) cast. + CallInst *Def = cast(Cast->getOperand(0)); + assert(Def->getIntrinsicID() == Intrinsic::dx_cast_handle && + "Unbalanced pair of temporary handle casts"); + Cast->replaceAllUsesWith(Def->getOperand(0)); + Cast->eraseFromParent(); +} +for (CallInst *Cast : ToRemove) { + assert(Cast->user_empty() && "Temporary handle cast still has users"); + Cast->eraseFromParent(); +} +llvm::sort(CastFns); +CastFns.erase(llvm::unique(CastFns), CastFns.end()); +for (Function *F : CastFns) + F->eraseFromParent(); + +CleanupCasts.clear(); + } + + void lowerToCreateHandle(Function &F) { +IRBuilder<> &IRB = OpBuilder.getIRB(); +Type *Int8Ty = IRB.getInt8Ty(); +Type *Int32Ty = IRB.getInt32Ty(); + +replaceFunction(F, [&](CallInst *CI) -> Error { + IRB.SetInsertPoint(CI); + + dxil::ResourceInfo &RI = DRM[CI]; + dxil::ResourceInfo::ResourceBinding Binding = RI.getBinding(); + + std::array Args{ + ConstantInt::get(Int8Ty, llvm::to_underlying(RI.getResourceClass())), + ConstantInt::get(Int32Ty, Binding.RecordID), CI->getArgOperand(3), + CI->getArgOperand(4)}; + Expected OpCall = + OpBuilder.tryCreateOp(OpCode::CreateHandle, Args); + if (Error E = OpCall.takeError()) +return E; + + Value *Cast = createTmpHandleCast(*OpCall, CI->getType()); + + CI->replaceAllUsesWith(Cast); + CI->eraseFromParent(); + return Error::success(); +}); + } + + void lowerToBindAndAnnotateHandle(Function &F) { +IRBuilder<> &IRB = OpBuilder.getIRB(); + +replaceFunction(F, [&](CallInst *CI) -> Error { + IRB.SetInsertPoint(CI); + + dxil::ResourceInfo &RI = DRM[CI]; + dxil::ResourceInfo::ResourceBinding Binding = RI.getBinding(); + std::pair Props = RI.getAnnotateProps(); + + Constant *ResBind = OpBuilder.getResBind( + Binding.LowerBound, Binding.LowerBound + Binding.Size - 1, + Binding.Space, RI.getResourceClass()); + std::array BindArgs{ResBind, CI->getArgOperand(3), + CI->getArgOperand(4)}; + Expected OpBind = + OpBuilder.tryCreateOp(OpCode::CreateHandleFromBinding, BindArgs); + if (Error E = OpBind.takeError()) +return E; + + std::array AnnotateArgs{ + *OpBind, OpBuilder.getResProps(Props.first, Props.second)}; + Expected OpAnnotate = + OpBuilder.tryCreateOp(OpCode::AnnotateHandle, AnnotateArgs); + if (Error E = OpAnnotate.takeError()) +return E; + + Value *Cast = createTmpHandleCast(*OpAnnotate, CI->getType()); + + CI->replaceAllUsesWith(Cast); + CI->eraseFromParent(); + + return Error::success(); +}); + } + + void lowerHandleFromBinding(Function &F) { dmpots wrote: This seems to be a more complicated lowering that a straightforward translation (well not this function but its implementation details). Can we add a high-level description of what the lowering does? Like the need for and usage of `dx_cast_handle` would be good to explain. https://github.com/llvm/llvm-project/pull/104251 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [DirectX] Lower `@llvm.dx.handle.fromBinding` to DXIL ops (PR #104251)
@@ -0,0 +1,61 @@ +; RUN: opt -S -dxil-op-lower %s | FileCheck %s dmpots wrote: I don't see tests for either 1. Unbounded resource arrays 2. Non-constant index into resource arrays I think it would be good to have tests for these. https://github.com/llvm/llvm-project/pull/104251 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [DirectX] Lower `@llvm.dx.handle.fromBinding` to DXIL ops (PR #104251)
@@ -119,6 +123,119 @@ class OpLowerer { }); } + Value *createTmpHandleCast(Value *V, Type *Ty) { +Function *CastFn = Intrinsic::getDeclaration(&M, Intrinsic::dx_cast_handle, + {Ty, V->getType()}); +CallInst *Cast = OpBuilder.getIRB().CreateCall(CastFn, {V}); +CleanupCasts.push_back(Cast); +return Cast; + } + + void cleanupHandleCasts() { +SmallVector ToRemove; +SmallVector CastFns; + +for (CallInst *Cast : CleanupCasts) { + CastFns.push_back(Cast->getCalledFunction()); + // All of the ops should be using `dx.types.Handle` at this point, so if + // we're not producing that we should be part of a pair. Track this so we + // can remove it at the end. + if (Cast->getType() != OpBuilder.getHandleType()) { +ToRemove.push_back(Cast); +continue; + } + // Otherwise, we're the second handle in a pair. Forward the arguments and + // remove the (second) cast. + CallInst *Def = cast(Cast->getOperand(0)); + assert(Def->getIntrinsicID() == Intrinsic::dx_cast_handle && + "Unbalanced pair of temporary handle casts"); + Cast->replaceAllUsesWith(Def->getOperand(0)); + Cast->eraseFromParent(); +} +for (CallInst *Cast : ToRemove) { + assert(Cast->user_empty() && "Temporary handle cast still has users"); + Cast->eraseFromParent(); +} +llvm::sort(CastFns); +CastFns.erase(llvm::unique(CastFns), CastFns.end()); +for (Function *F : CastFns) + F->eraseFromParent(); dmpots wrote: The explanation is good, can we get that added as a comment? https://github.com/llvm/llvm-project/pull/104251 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [ctx_prof] Add support for ICP (PR #105469)
https://github.com/mtrofin updated https://github.com/llvm/llvm-project/pull/105469 >From d58d308957961ae7442a7b5aa0561f42dea69caf Mon Sep 17 00:00:00 2001 From: Mircea Trofin Date: Tue, 20 Aug 2024 21:32:23 -0700 Subject: [PATCH] [ctx_prof] Add support for ICP --- llvm/include/llvm/Analysis/CtxProfAnalysis.h | 18 +- llvm/include/llvm/IR/IntrinsicInst.h | 2 + .../llvm/ProfileData/PGOCtxProfReader.h | 20 ++ .../Transforms/Utils/CallPromotionUtils.h | 4 + llvm/lib/Analysis/CtxProfAnalysis.cpp | 79 +--- llvm/lib/IR/IntrinsicInst.cpp | 10 + .../Transforms/Utils/CallPromotionUtils.cpp | 86 + .../Utils/CallPromotionUtilsTest.cpp | 178 ++ 8 files changed, 364 insertions(+), 33 deletions(-) diff --git a/llvm/include/llvm/Analysis/CtxProfAnalysis.h b/llvm/include/llvm/Analysis/CtxProfAnalysis.h index 0b4dd8ae3a0dc7..d6c2bb26a091af 100644 --- a/llvm/include/llvm/Analysis/CtxProfAnalysis.h +++ b/llvm/include/llvm/Analysis/CtxProfAnalysis.h @@ -73,6 +73,12 @@ class PGOContextualProfile { return FuncInfo.find(getDefinedFunctionGUID(F))->second.NextCallsiteIndex++; } + using ConstVisitor = function_ref; + using Visitor = function_ref; + + void update(Visitor, const Function *F = nullptr); + void visit(ConstVisitor, const Function *F = nullptr) const; + const CtxProfFlatProfile flatten() const; bool invalidate(Module &, const PreservedAnalyses &PA, @@ -105,13 +111,18 @@ class CtxProfAnalysis : public AnalysisInfoMixin { class CtxProfAnalysisPrinterPass : public PassInfoMixin { - raw_ostream &OS; - public: - explicit CtxProfAnalysisPrinterPass(raw_ostream &OS) : OS(OS) {} + enum class PrintMode { Everything, JSON }; + explicit CtxProfAnalysisPrinterPass(raw_ostream &OS, + PrintMode Mode = PrintMode::Everything) + : OS(OS), Mode(Mode) {} PreservedAnalyses run(Module &M, ModuleAnalysisManager &MAM); static bool isRequired() { return true; } + +private: + raw_ostream &OS; + const PrintMode Mode; }; /// Assign a GUID to functions as metadata. GUID calculation takes linkage into @@ -134,6 +145,5 @@ class AssignGUIDPass : public PassInfoMixin { // This should become GlobalValue::getGUID static uint64_t getGUID(const Function &F); }; - } // namespace llvm #endif // LLVM_ANALYSIS_CTXPROFANALYSIS_H diff --git a/llvm/include/llvm/IR/IntrinsicInst.h b/llvm/include/llvm/IR/IntrinsicInst.h index 2f1e2c08c3ecec..bab41efab528e2 100644 --- a/llvm/include/llvm/IR/IntrinsicInst.h +++ b/llvm/include/llvm/IR/IntrinsicInst.h @@ -1519,6 +1519,7 @@ class InstrProfCntrInstBase : public InstrProfInstBase { ConstantInt *getNumCounters() const; // The index of the counter that this instruction acts on. ConstantInt *getIndex() const; + void setIndex(uint32_t Idx); }; /// This represents the llvm.instrprof.cover intrinsic. @@ -1569,6 +1570,7 @@ class InstrProfCallsite : public InstrProfCntrInstBase { return isa(V) && classof(cast(V)); } Value *getCallee() const; + void setCallee(Value *); }; /// This represents the llvm.instrprof.timestamp intrinsic. diff --git a/llvm/include/llvm/ProfileData/PGOCtxProfReader.h b/llvm/include/llvm/ProfileData/PGOCtxProfReader.h index 190deaeeacd085..23dcc376508b39 100644 --- a/llvm/include/llvm/ProfileData/PGOCtxProfReader.h +++ b/llvm/include/llvm/ProfileData/PGOCtxProfReader.h @@ -57,9 +57,23 @@ class PGOCtxProfContext final { GlobalValue::GUID guid() const { return GUID; } const SmallVectorImpl &counters() const { return Counters; } + SmallVectorImpl &counters() { return Counters; } + + uint64_t getEntrycount() const { return Counters[0]; } + const CallsiteMapTy &callsites() const { return Callsites; } CallsiteMapTy &callsites() { return Callsites; } + void ingestContext(uint32_t CSId, PGOCtxProfContext &&Other) { +auto [Iter, _] = callsites().try_emplace(CSId, CallTargetMapTy()); +Iter->second.emplace(Other.guid(), std::move(Other)); + } + + void growCounters(uint32_t Size) { +if (Size >= Counters.size()) + Counters.resize(Size); + } + bool hasCallsite(uint32_t I) const { return Callsites.find(I) != Callsites.end(); } @@ -68,6 +82,12 @@ class PGOCtxProfContext final { assert(hasCallsite(I) && "Callsite not found"); return Callsites.find(I)->second; } + + CallTargetMapTy &callsite(uint32_t I) { +assert(hasCallsite(I) && "Callsite not found"); +return Callsites.find(I)->second; + } + void getContainedGuids(DenseSet &Guids) const; }; diff --git a/llvm/include/llvm/Transforms/Utils/CallPromotionUtils.h b/llvm/include/llvm/Transforms/Utils/CallPromotionUtils.h index 385831f457038d..58af26f31417b0 100644 --- a/llvm/include/llvm/Transforms/Utils/CallPromotionUtils.h +++ b/llvm/include/llvm/Transforms/Utils/CallPromotionUtils.h @@ -14,6 +14,7 @@ #ifndef LLVM_TRANSFORMS_UTILS_CALLPROMOTIONUTILS_H #defin
[llvm-branch-commits] [clang] [misexpect] Support missing-annotations diagnostics from frontend profile data (PR #96524)
https://github.com/ilovepi updated https://github.com/llvm/llvm-project/pull/96524 >From 49aabf7bbc1cf30274c034b1cf2babc1fd851b31 Mon Sep 17 00:00:00 2001 From: Paul Kirth Date: Thu, 22 Aug 2024 00:19:28 + Subject: [PATCH] Use split-file in test, and add test for switch statements Created using spr 1.3.4 --- .../missing-annotations-branch.proftext | 17 - .../test/Profile/missing-annotations-branch.c | 62 ++ .../test/Profile/missing-annotations-switch.c | 64 +++ clang/test/Profile/missing-annotations.c | 44 - 4 files changed, 126 insertions(+), 61 deletions(-) delete mode 100644 clang/test/Profile/Inputs/missing-annotations-branch.proftext create mode 100644 clang/test/Profile/missing-annotations-branch.c create mode 100644 clang/test/Profile/missing-annotations-switch.c delete mode 100644 clang/test/Profile/missing-annotations.c diff --git a/clang/test/Profile/Inputs/missing-annotations-branch.proftext b/clang/test/Profile/Inputs/missing-annotations-branch.proftext deleted file mode 100644 index 81c857b9a84fb3..00 --- a/clang/test/Profile/Inputs/missing-annotations-branch.proftext +++ /dev/null @@ -1,17 +0,0 @@ -bar -# Func Hash: -11262309464 -# Num Counters: -2 -# Counter Values: -2000 -0 - -fizz -# Func Hash: -11262309464 -# Num Counters: -2 -# Counter Values: -0 -100 diff --git a/clang/test/Profile/missing-annotations-branch.c b/clang/test/Profile/missing-annotations-branch.c new file mode 100644 index 00..fa764d9238c8a7 --- /dev/null +++ b/clang/test/Profile/missing-annotations-branch.c @@ -0,0 +1,62 @@ +// RUN: rm -rf %t && mkdir -p %t +// RUN: split-file %s %t + +/// Test that missing-annotations detects branches that are hot, but not annotated. +// RUN: llvm-profdata merge %t/a.proftext -o %t/profdata +// RUN: %clang_cc1 %t/a.c -O2 -o - -emit-llvm -fprofile-instrument-use-path=%t/profdata -verify -mllvm -pgo-missing-annotations -Rpass=missing-annotations -fdiagnostics-misexpect-tolerance=10 + +/// Test that we don't report any diagnostics, if the threshold isn't exceeded. +// RUN: %clang_cc1 %t/a.c -O2 -o - -emit-llvm -fprofile-instrument-use-path=%t/profdata -mllvm -pgo-missing-annotations -Rpass=missing-annotations 2>&1 | FileCheck -implicit-check-not=remark %s + +//--- a.c +// foo-no-diagnostics +#define UNLIKELY(x) __builtin_expect(!!(x), 0) + +int foo(int); +int baz(int); +int buzz(void); + +const int inner_loop = 100; +const int outer_loop = 2000; + +int bar(void) { // imprecise-remark-re {{Extremely hot condition. Consider adding llvm.expect intrinsic{{.* + int a = buzz(); + int x = 0; + if (a % (outer_loop * inner_loop) == 0) { // expected-remark {{Extremely hot condition. Consider adding llvm.expect intrinsic}} +x = baz(a); + } else { +x = foo(50); + } + return x; +} + +int fizz(void) { + int a = buzz(); + int x = 0; + if ((a % (outer_loop * inner_loop) == 0)) { // expected-remark-re {{Extremely hot condition. Consider adding llvm.expect intrinsic{{.*} +x = baz(a); + } else { +x = foo(50); + } + return x; +} + +//--- a.proftext +bar +# Func Hash: +11262309464 +# Num Counters: +2 +# Counter Values: +1901 +99 + +fizz +# Func Hash: +11262309464 +# Num Counters: +2 +# Counter Values: +1901 +99 + diff --git a/clang/test/Profile/missing-annotations-switch.c b/clang/test/Profile/missing-annotations-switch.c new file mode 100644 index 00..2d7ea0865ac8a1 --- /dev/null +++ b/clang/test/Profile/missing-annotations-switch.c @@ -0,0 +1,64 @@ +// RUN: rm -rf %t && mkdir -p %t +// RUN: split-file %s %t + +/// Test that missing-annotations detects switch conditions that are hot, but not annotated. +// RUN: llvm-profdata merge %t/a.proftext -o %t/profdata +// RUN: %clang_cc1 %t/a.c -O2 -o - -emit-llvm -fprofile-instrument-use-path=%t/profdata -verify -mllvm -pgo-missing-annotations -Rpass=missing-annotations -fdiagnostics-misexpect-tolerance=10 + +/// Test that we don't report any diagnostics, if the threshold isn't exceeded. +// RUN: %clang_cc1 %t/a.c -O2 -o - -emit-llvm -fprofile-instrument-use-path=%t/profdata -mllvm -pgo-missing-annotations -Rpass=missing-annotations 2>&1 | FileCheck -implicit-check-not=remark %s + +//--- a.c +#define inner_loop 1000 +#define outer_loop 20 +#define arry_size 25 + +int arry[arry_size] = {0}; + +int rand(void); +int sum(int *buff, int size); +int random_sample(int *buff, int size); + +int main(void) { + int val = 0; + + int j, k; + for (j = 0; j < outer_loop; ++j) { +for (k = 0; k < inner_loop; ++k) { + unsigned condition = rand() % 1; + switch (condition) { // expected-remark {{Extremely hot condition. Consider adding llvm.expect intrinsic}} + + case 0: +val += sum(arry, arry_size); +break; + case 1: + case 2: + case 3: +break; + default: +val += random_sample(arry, arry_size); +break; + } // end sw
[llvm-branch-commits] [clang] [misexpect] Support missing-annotations diagnostics from frontend profile data (PR #96524)
https://github.com/ilovepi edited https://github.com/llvm/llvm-project/pull/96524 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang][misexpect] Add support to clang for profitable annotation diagnostics (PR #96525)
https://github.com/ilovepi updated https://github.com/llvm/llvm-project/pull/96525 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang][misexpect] Add support to clang for profitable annotation diagnostics (PR #96525)
https://github.com/ilovepi updated https://github.com/llvm/llvm-project/pull/96525 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] Reland "[asan] Remove debug tracing from `report_globals` (#104404)" (PR #105601)
https://github.com/vitalybuka created https://github.com/llvm/llvm-project/pull/105601 This reverts commit 2704b804bec50c2b016bf678bd534c330ec655b6 and relands #104404. The Darwin should not fail after #105599. ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] Reland "[asan] Remove debug tracing from `report_globals` (#104404)" (PR #105601)
llvmbot wrote: @llvm/pr-subscribers-compiler-rt-sanitizer Author: Vitaly Buka (vitalybuka) Changes This reverts commit 2704b804bec50c2b016bf678bd534c330ec655b6 and relands #104404. The Darwin should not fail after #105599. --- Full diff: https://github.com/llvm/llvm-project/pull/105601.diff 10 Files Affected: - (modified) compiler-rt/lib/asan/asan_flags.inc (+2-5) - (modified) compiler-rt/lib/asan/asan_globals.cpp (+8-11) - (modified) compiler-rt/test/asan/TestCases/Linux/initialization-nobug-lld.cpp (+1-1) - (modified) compiler-rt/test/asan/TestCases/Linux/odr_indicator_unregister.cpp (+1-1) - (modified) compiler-rt/test/asan/TestCases/Linux/odr_indicators.cpp (+2-2) - (modified) compiler-rt/test/asan/TestCases/Windows/dll_global_dead_strip.c (+2-2) - (modified) compiler-rt/test/asan/TestCases/Windows/dll_report_globals_symbolization_at_startup.cpp (+1-1) - (modified) compiler-rt/test/asan/TestCases/Windows/global_dead_strip.c (+2-2) - (modified) compiler-rt/test/asan/TestCases/Windows/report_globals_vs_freelibrary.cpp (+1-1) - (modified) compiler-rt/test/asan/TestCases/initialization-nobug.cpp (+4-4) ``diff diff --git a/compiler-rt/lib/asan/asan_flags.inc b/compiler-rt/lib/asan/asan_flags.inc index fad1577d912a5e..5e0ced9706e664 100644 --- a/compiler-rt/lib/asan/asan_flags.inc +++ b/compiler-rt/lib/asan/asan_flags.inc @@ -36,11 +36,8 @@ ASAN_FLAG(int, max_redzone, 2048, ASAN_FLAG( bool, debug, false, "If set, prints some debugging information and does additional checks.") -ASAN_FLAG( -int, report_globals, 1, -"Controls the way to handle globals (0 - don't detect buffer overflow on " -"globals, 1 - detect buffer overflow, 2 - print data about registered " -"globals).") +ASAN_FLAG(bool, report_globals, true, + "If set, detect and report errors on globals .") ASAN_FLAG(bool, check_initialization_order, false, "If set, attempts to catch initialization order issues.") ASAN_FLAG( diff --git a/compiler-rt/lib/asan/asan_globals.cpp b/compiler-rt/lib/asan/asan_globals.cpp index c83b782cb85f89..a1211430b1268a 100644 --- a/compiler-rt/lib/asan/asan_globals.cpp +++ b/compiler-rt/lib/asan/asan_globals.cpp @@ -22,6 +22,7 @@ #include "asan_thread.h" #include "sanitizer_common/sanitizer_common.h" #include "sanitizer_common/sanitizer_dense_map.h" +#include "sanitizer_common/sanitizer_internal_defs.h" #include "sanitizer_common/sanitizer_list.h" #include "sanitizer_common/sanitizer_mutex.h" #include "sanitizer_common/sanitizer_placement_new.h" @@ -179,7 +180,7 @@ int GetGlobalsForAddress(uptr addr, Global *globals, u32 *reg_sites, int res = 0; for (const auto &l : list_of_all_globals) { const Global &g = *l.g; -if (flags()->report_globals >= 2) +if (UNLIKELY(common_flags()->verbosity >= 3)) ReportGlobal(g, "Search"); if (IsAddressNearGlobal(addr, g)) { internal_memcpy(&globals[res], &g, sizeof(g)); @@ -270,7 +271,7 @@ static inline bool UseODRIndicator(const Global *g) { // so we store the globals in a map. static void RegisterGlobal(const Global *g) SANITIZER_REQUIRES(mu_for_globals) { CHECK(AsanInited()); - if (flags()->report_globals >= 2) + if (UNLIKELY(common_flags()->verbosity >= 3)) ReportGlobal(*g, "Added"); CHECK(flags()->report_globals); CHECK(AddrIsInMem(g->beg)); @@ -307,7 +308,7 @@ static void RegisterGlobal(const Global *g) SANITIZER_REQUIRES(mu_for_globals) { static void UnregisterGlobal(const Global *g) SANITIZER_REQUIRES(mu_for_globals) { CHECK(AsanInited()); - if (flags()->report_globals >= 2) + if (UNLIKELY(common_flags()->verbosity >= 3)) ReportGlobal(*g, "Removed"); CHECK(flags()->report_globals); CHECK(AddrIsInMem(g->beg)); @@ -438,7 +439,7 @@ void __asan_register_globals(__asan_global *globals, uptr n) { } GlobalRegistrationSite site = {stack_id, &globals[0], &globals[n - 1]}; global_registration_site_vector->push_back(site); - if (flags()->report_globals >= 2) { + if (UNLIKELY(common_flags()->verbosity >= 3)) { PRINT_CURRENT_STACK(); Printf("=== ID %d; %p %p\n", stack_id, (void *)&globals[0], (void *)&globals[n - 1]); @@ -497,9 +498,7 @@ void __asan_before_dynamic_init(const char *module_name) { Lock lock(&mu_for_globals); if (current_dynamic_init_module_name == module_name) return; - if (flags()->report_globals >= 3) -Printf("DynInitPoison module: %s\n", module_name); - + VPrintf(2, "DynInitPoison module: %s\n", module_name); if (current_dynamic_init_module_name == nullptr) { // First call, poison all globals from other modules. DynInitGlobals().forEach([&](auto &kv) { @@ -545,8 +544,7 @@ static void UnpoisonBeforeMain(void) { return; allow_after_dynamic_init = true; } - if (flags()->report_globals >= 3) -Printf("UnpoisonBeforeMain\n"); + VPrintf(2, "UnpoisonBeforeMain\n"); __asan_after_dynamic_init(); } @@ -570,8 +568,7 @@ void __
[llvm-branch-commits] [DirectX] Lower `@llvm.dx.handle.fromBinding` to DXIL ops (PR #104251)
https://github.com/bogner updated https://github.com/llvm/llvm-project/pull/104251 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [DirectX] Lower `@llvm.dx.handle.fromBinding` to DXIL ops (PR #104251)
https://github.com/bogner updated https://github.com/llvm/llvm-project/pull/104251 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [DirectX] Implement metadata lowering for resources (PR #104447)
https://github.com/bogner updated https://github.com/llvm/llvm-project/pull/104447 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [DirectX] Implement metadata lowering for resources (PR #104447)
https://github.com/bogner updated https://github.com/llvm/llvm-project/pull/104447 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [DirectX] Add resource handling to the DXIL pretty printer (PR #104448)
https://github.com/bogner updated https://github.com/llvm/llvm-project/pull/104448 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [DirectX] Add resource handling to the DXIL pretty printer (PR #104448)
https://github.com/bogner updated https://github.com/llvm/llvm-project/pull/104448 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [DirectX] Implement metadata lowering for resources (PR #104447)
@@ -13,27 +13,52 @@ #include "DXILShaderFlags.h" #include "DirectX.h" #include "llvm/ADT/StringSet.h" +#include "llvm/Analysis/DXILResource.h" #include "llvm/IR/Constants.h" #include "llvm/IR/Metadata.h" #include "llvm/IR/Module.h" +#include "llvm/InitializePasses.h" #include "llvm/Pass.h" #include "llvm/TargetParser/Triple.h" using namespace llvm; using namespace llvm::dxil; -static void emitResourceMetadata(Module &M, +static void emitResourceMetadata(Module &M, const DXILResourceMap &DRM, const dxil::Resources &MDResources) { - Metadata *SRVMD = nullptr, *UAVMD = nullptr, *CBufMD = nullptr, - *SmpMD = nullptr; - bool HasResources = false; + LLVMContext &Context = M.getContext(); + + SmallVector SRVs, UAVs, CBufs, Smps; + for (auto [_, RI] : DRM) { +switch (RI.getResourceClass()) { bogner wrote: I've updated the API in #105602 - you can now iterate over the unique resources, and there are helpers to just iterate `.srvs()`, `.uavs()`, etc. https://github.com/llvm/llvm-project/pull/104447 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LLVM][Coroutines] Create `.noalloc` variant of switch ABI coroutine ramp functions during CoroSplit (PR #99283)
@@ -2049,6 +2055,22 @@ the coroutine must reach the final suspend point when it get destroyed. This attribute only works for switched-resume coroutines now. +coro_must_elide +--- + +When a Call or Invoke instruction is marked with `coro_must_elide`, +CoroAnnotationElidePass performs heap elision when possible. Note that for vogelsgesang wrote: I think the name `coro_must_elide` is a misnormer. "must elide" sounds as if it would be a compilation error if elision fails. However, this is not the case https://github.com/llvm/llvm-project/pull/99283 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LLVM][Coroutines] Create `.noalloc` variant of switch ABI coroutine ramp functions during CoroSplit (PR #99283)
@@ -2049,6 +2055,22 @@ the coroutine must reach the final suspend point when it get destroyed. This attribute only works for switched-resume coroutines now. +coro_must_elide +--- + +When a Call or Invoke instruction is marked with `coro_must_elide`, +CoroAnnotationElidePass performs heap elision when possible. Note that for yuxuanchen1997 wrote: What about `coro_elide_safe`? https://github.com/llvm/llvm-project/pull/99283 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LLVM][Coroutines] Create `.noalloc` variant of switch ABI coroutine ramp functions during CoroSplit (PR #99283)
@@ -2049,6 +2055,22 @@ the coroutine must reach the final suspend point when it get destroyed. This attribute only works for switched-resume coroutines now. +coro_must_elide +--- + +When a Call or Invoke instruction is marked with `coro_must_elide`, +CoroAnnotationElidePass performs heap elision when possible. Note that for vogelsgesang wrote: love it! https://github.com/llvm/llvm-project/pull/99283 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [DirectX] Add resource handling to the DXIL pretty printer (PR #104448)
@@ -10,23 +10,235 @@ #include "DXILResourceAnalysis.h" #include "DirectX.h" #include "llvm/ADT/StringRef.h" +#include "llvm/Analysis/DXILResource.h" #include "llvm/IR/PassManager.h" +#include "llvm/InitializePasses.h" #include "llvm/Pass.h" +#include "llvm/Support/FormatAdapters.h" #include "llvm/Support/FormatVariadic.h" #include "llvm/Support/raw_ostream.h" using namespace llvm; -static void prettyPrintResources(raw_ostream &OS, +static constexpr StringRef getRCName(dxil::ResourceClass RC) { farzonl wrote: Feel free to ignore, I was thinking of a different way to do this that would have a tighter coupling of Names and prefixes: ```cpp struct ResourceClassInfo { const StringRef name; const StringRef prefix; }; llvm::DenseMap createResourceClassMap() { return { {dxil::ResourceClass::SRV, {"SRV", "t"}}, {dxil::ResourceClass::UAV, {"UAV", "u"}}, {dxil::ResourceClass::CBuffer, {"cbuffer", "cb"}}, {dxil::ResourceClass::Sampler, {"sampler", "s"}} }; } static const llvm::DenseMap ResourceClassMap = createResourceClassMap(); StringRef getRCName(dxil::ResourceClass RC) { return ResourceClassMap.lookup(RC).name; } StringRef getRCPrefix(dxil::ResourceClass RC) { return ResourceClassMap.lookup(RC).prefix; } ``` https://github.com/llvm/llvm-project/pull/104448 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [DirectX] Add resource handling to the DXIL pretty printer (PR #104448)
https://github.com/farzonl edited https://github.com/llvm/llvm-project/pull/104448 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [DirectX] Add resource handling to the DXIL pretty printer (PR #104448)
https://github.com/farzonl approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/104448 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang][omp] Emit omp.workshare in frontend (PR #101444)
https://github.com/ivanradanov updated https://github.com/llvm/llvm-project/pull/101444 >From e5789180a3dd1fd8c46a5d7dfc446921110642ca Mon Sep 17 00:00:00 2001 From: Ivan Radanov Ivanov Date: Wed, 31 Jul 2024 14:11:47 +0900 Subject: [PATCH 1/2] [flang][omp] Emit omp.workshare in frontend --- flang/lib/Lower/OpenMP/OpenMP.cpp | 30 ++ 1 file changed, 26 insertions(+), 4 deletions(-) diff --git a/flang/lib/Lower/OpenMP/OpenMP.cpp b/flang/lib/Lower/OpenMP/OpenMP.cpp index d614db8b68ef65..83c90374afa5e3 100644 --- a/flang/lib/Lower/OpenMP/OpenMP.cpp +++ b/flang/lib/Lower/OpenMP/OpenMP.cpp @@ -1272,6 +1272,15 @@ static void genTaskwaitClauses(lower::AbstractConverter &converter, loc, llvm::omp::Directive::OMPD_taskwait); } +static void genWorkshareClauses(lower::AbstractConverter &converter, +semantics::SemanticsContext &semaCtx, +lower::StatementContext &stmtCtx, +const List &clauses, mlir::Location loc, +mlir::omp::WorkshareOperands &clauseOps) { + ClauseProcessor cp(converter, semaCtx, clauses); + cp.processNowait(clauseOps); +} + static void genTeamsClauses(lower::AbstractConverter &converter, semantics::SemanticsContext &semaCtx, lower::StatementContext &stmtCtx, @@ -1897,6 +1906,22 @@ genTaskyieldOp(lower::AbstractConverter &converter, lower::SymMap &symTable, return converter.getFirOpBuilder().create(loc); } +static mlir::omp::WorkshareOp +genWorkshareOp(lower::AbstractConverter &converter, lower::SymMap &symTable, + semantics::SemanticsContext &semaCtx, lower::pft::Evaluation &eval, + mlir::Location loc, const ConstructQueue &queue, + ConstructQueue::iterator item) { + lower::StatementContext stmtCtx; + mlir::omp::WorkshareOperands clauseOps; + genWorkshareClauses(converter, semaCtx, stmtCtx, item->clauses, loc, clauseOps); + + return genOpWithBody( + OpWithBodyGenInfo(converter, symTable, semaCtx, loc, eval, +llvm::omp::Directive::OMPD_workshare) + .setClauses(&item->clauses), + queue, item, clauseOps); +} + static mlir::omp::TeamsOp genTeamsOp(lower::AbstractConverter &converter, lower::SymMap &symTable, semantics::SemanticsContext &semaCtx, lower::pft::Evaluation &eval, @@ -2309,10 +2334,7 @@ static void genOMPDispatch(lower::AbstractConverter &converter, llvm::omp::getOpenMPDirectiveName(dir) + ")"); // case llvm::omp::Directive::OMPD_workdistribute: case llvm::omp::Directive::OMPD_workshare: -// FIXME: Workshare is not a commonly used OpenMP construct, an -// implementation for this feature will come later. For the codes -// that use this construct, add a single construct for now. -genSingleOp(converter, symTable, semaCtx, eval, loc, queue, item); +genWorkshareOp(converter, symTable, semaCtx, eval, loc, queue, item); break; default: // Combined and composite constructs should have been split into a sequence >From 70daa016c0c39861926b1b82e31b96db005cfba1 Mon Sep 17 00:00:00 2001 From: Ivan Radanov Ivanov Date: Sun, 4 Aug 2024 16:02:37 +0900 Subject: [PATCH 2/2] Fix lower test for workshare --- flang/test/Lower/OpenMP/workshare.f90 | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/flang/test/Lower/OpenMP/workshare.f90 b/flang/test/Lower/OpenMP/workshare.f90 index 1e11677a15e1f0..8e771952f5b6da 100644 --- a/flang/test/Lower/OpenMP/workshare.f90 +++ b/flang/test/Lower/OpenMP/workshare.f90 @@ -6,7 +6,7 @@ subroutine sb1(arr) integer :: arr(:) !CHECK: omp.parallel { !$omp parallel -!CHECK: omp.single { +!CHECK: omp.workshare { !$omp workshare arr = 0 !$omp end workshare @@ -20,7 +20,7 @@ subroutine sb2(arr) integer :: arr(:) !CHECK: omp.parallel { !$omp parallel -!CHECK: omp.single nowait { +!CHECK: omp.workshare nowait { !$omp workshare arr = 0 !$omp end workshare nowait @@ -33,7 +33,7 @@ subroutine sb2(arr) subroutine sb3(arr) integer :: arr(:) !CHECK: omp.parallel { -!CHECK: omp.single { +!CHECK: omp.workshare { !$omp parallel workshare arr = 0 !$omp end parallel workshare ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang] Introduce custom loop nest generation for loops in workshare construct (PR #101445)
https://github.com/ivanradanov updated https://github.com/llvm/llvm-project/pull/101445 >From 81606df746e9862c330681ed8ae9113a43e577a2 Mon Sep 17 00:00:00 2001 From: Ivan Radanov Ivanov Date: Wed, 31 Jul 2024 14:12:34 +0900 Subject: [PATCH 1/4] [flang] Introduce ws loop nest generation for HLFIR lowering --- .../flang/Optimizer/Builder/HLFIRTools.h | 12 +++-- flang/lib/Lower/ConvertCall.cpp | 2 +- flang/lib/Lower/OpenMP/ReductionProcessor.cpp | 4 +- flang/lib/Optimizer/Builder/HLFIRTools.cpp| 52 ++- .../HLFIR/Transforms/BufferizeHLFIR.cpp | 3 +- .../LowerHLFIROrderedAssignments.cpp | 30 +-- .../Transforms/OptimizedBufferization.cpp | 6 +-- 7 files changed, 69 insertions(+), 40 deletions(-) diff --git a/flang/include/flang/Optimizer/Builder/HLFIRTools.h b/flang/include/flang/Optimizer/Builder/HLFIRTools.h index 6b41025eea0780..14e42c6f358e46 100644 --- a/flang/include/flang/Optimizer/Builder/HLFIRTools.h +++ b/flang/include/flang/Optimizer/Builder/HLFIRTools.h @@ -357,8 +357,8 @@ hlfir::ElementalOp genElementalOp( /// Structure to describe a loop nest. struct LoopNest { - fir::DoLoopOp outerLoop; - fir::DoLoopOp innerLoop; + mlir::Operation *outerOp; + mlir::Block *body; llvm::SmallVector oneBasedIndices; }; @@ -366,11 +366,13 @@ struct LoopNest { /// \p isUnordered specifies whether the loops in the loop nest /// are unordered. LoopNest genLoopNest(mlir::Location loc, fir::FirOpBuilder &builder, - mlir::ValueRange extents, bool isUnordered = false); + mlir::ValueRange extents, bool isUnordered = false, + bool emitWsLoop = false); inline LoopNest genLoopNest(mlir::Location loc, fir::FirOpBuilder &builder, -mlir::Value shape, bool isUnordered = false) { +mlir::Value shape, bool isUnordered = false, +bool emitWsLoop = false) { return genLoopNest(loc, builder, getIndexExtents(loc, builder, shape), - isUnordered); + isUnordered, emitWsLoop); } /// Inline the body of an hlfir.elemental at the current insertion point diff --git a/flang/lib/Lower/ConvertCall.cpp b/flang/lib/Lower/ConvertCall.cpp index fd873f55dd844e..0689d6e033dd9c 100644 --- a/flang/lib/Lower/ConvertCall.cpp +++ b/flang/lib/Lower/ConvertCall.cpp @@ -2128,7 +2128,7 @@ class ElementalCallBuilder { hlfir::genLoopNest(loc, builder, shape, !mustBeOrdered); mlir::ValueRange oneBasedIndices = loopNest.oneBasedIndices; auto insPt = builder.saveInsertionPoint(); - builder.setInsertionPointToStart(loopNest.innerLoop.getBody()); + builder.setInsertionPointToStart(loopNest.body); callContext.stmtCtx.pushScope(); for (auto &preparedActual : loweredActuals) if (preparedActual) diff --git a/flang/lib/Lower/OpenMP/ReductionProcessor.cpp b/flang/lib/Lower/OpenMP/ReductionProcessor.cpp index c3c1f363033c27..72a90dd0d6f29d 100644 --- a/flang/lib/Lower/OpenMP/ReductionProcessor.cpp +++ b/flang/lib/Lower/OpenMP/ReductionProcessor.cpp @@ -375,7 +375,7 @@ static void genBoxCombiner(fir::FirOpBuilder &builder, mlir::Location loc, // know this won't miss any opportuinties for clever elemental inlining hlfir::LoopNest nest = hlfir::genLoopNest( loc, builder, shapeShift.getExtents(), /*isUnordered=*/true); - builder.setInsertionPointToStart(nest.innerLoop.getBody()); + builder.setInsertionPointToStart(nest.body); mlir::Type refTy = fir::ReferenceType::get(seqTy.getEleTy()); auto lhsEleAddr = builder.create( loc, refTy, lhs, shapeShift, /*slice=*/mlir::Value{}, @@ -389,7 +389,7 @@ static void genBoxCombiner(fir::FirOpBuilder &builder, mlir::Location loc, builder, loc, redId, refTy, lhsEle, rhsEle); builder.create(loc, scalarReduction, lhsEleAddr); - builder.setInsertionPointAfter(nest.outerLoop); + builder.setInsertionPointAfter(nest.outerOp); builder.create(loc, lhsAddr); } diff --git a/flang/lib/Optimizer/Builder/HLFIRTools.cpp b/flang/lib/Optimizer/Builder/HLFIRTools.cpp index 8d0ae2f195178c..cd07cb741eb4bb 100644 --- a/flang/lib/Optimizer/Builder/HLFIRTools.cpp +++ b/flang/lib/Optimizer/Builder/HLFIRTools.cpp @@ -20,6 +20,7 @@ #include "mlir/IR/IRMapping.h" #include "mlir/Support/LLVM.h" #include "llvm/ADT/TypeSwitch.h" +#include #include // Return explicit extents. If the base is a fir.box, this won't read it to @@ -855,26 +856,51 @@ mlir::Value hlfir::inlineElementalOp( hlfir::LoopNest hlfir::genLoopNest(mlir::Location loc, fir::FirOpBuilder &builder, - mlir::ValueRange extents, bool isUnordered) { + mlir::ValueRange extents, bool isUnordered, + bool emitWsLoop) { hlfir::LoopNest loopNest; assert(!extents.empty() &&
[llvm-branch-commits] [lld] release/19.x: [lld-macho] Fix crash: ObjC category merge + relative method lists (#104081) (PR #105615)
https://github.com/llvmbot created https://github.com/llvm/llvm-project/pull/105615 Backport 0df91893efc752a76c7bbe6b063d66c8a2fa0d55 Requested by: @alx32 >From 643fd0b1a2a2fb73ea54f4f2ac6e6bb61238b99e Mon Sep 17 00:00:00 2001 From: alx32 <103613512+al...@users.noreply.github.com> Date: Wed, 14 Aug 2024 19:30:41 -0700 Subject: [PATCH] [lld-macho] Fix crash: ObjC category merge + relative method lists (#104081) A crash was happening when both ObjC Category Merging and Relative method lists were enabled. ObjC Category Merging creates new data sections and adds them by calling `addInputSection`. `addInputSection` uses the symbols within the added section to determine which container to actually add the section to. The issue is that ObjC Category merging is calling `addInputSection` before actually adding the relevant symbols the the added section. This causes `addInputSection` to add the `InputSection` to the wrong container, eventually resulting in a crash. To fix this, we ensure that ObjC Category Merging calls `addInputSection` only after the symbols have been added to the `InputSection`. (cherry picked from commit 0df91893efc752a76c7bbe6b063d66c8a2fa0d55) --- lld/MachO/ObjC.cpp| 10 +- .../MachO/objc-category-merging-minimal.s | 20 +-- 2 files changed, 15 insertions(+), 15 deletions(-) diff --git a/lld/MachO/ObjC.cpp b/lld/MachO/ObjC.cpp index 9c056f40aa943f..39d885188d34ac 100644 --- a/lld/MachO/ObjC.cpp +++ b/lld/MachO/ObjC.cpp @@ -873,7 +873,6 @@ Defined *ObjcCategoryMerger::emitAndLinkProtocolList( infoCategoryWriter.catPtrListInfo.align); listSec->parent = infoCategoryWriter.catPtrListInfo.outputSection; listSec->live = true; - addInputSection(listSec); listSec->parent = infoCategoryWriter.catPtrListInfo.outputSection; @@ -889,6 +888,7 @@ Defined *ObjcCategoryMerger::emitAndLinkProtocolList( ptrListSym->used = true; parentSym->getObjectFile()->symbols.push_back(ptrListSym); + addInputSection(listSec); createSymbolReference(parentSym, ptrListSym, linkAtOffset, infoCategoryWriter.catBodyInfo.relocTemplate); @@ -933,7 +933,6 @@ void ObjcCategoryMerger::emitAndLinkPointerList( infoCategoryWriter.catPtrListInfo.align); listSec->parent = infoCategoryWriter.catPtrListInfo.outputSection; listSec->live = true; - addInputSection(listSec); listSec->parent = infoCategoryWriter.catPtrListInfo.outputSection; @@ -949,6 +948,7 @@ void ObjcCategoryMerger::emitAndLinkPointerList( ptrListSym->used = true; parentSym->getObjectFile()->symbols.push_back(ptrListSym); + addInputSection(listSec); createSymbolReference(parentSym, ptrListSym, linkAtOffset, infoCategoryWriter.catBodyInfo.relocTemplate); @@ -974,7 +974,6 @@ ObjcCategoryMerger::emitCatListEntrySec(const std::string &forCategoryName, bodyData, infoCategoryWriter.catListInfo.align); newCatList->parent = infoCategoryWriter.catListInfo.outputSection; newCatList->live = true; - addInputSection(newCatList); newCatList->parent = infoCategoryWriter.catListInfo.outputSection; @@ -990,6 +989,7 @@ ObjcCategoryMerger::emitCatListEntrySec(const std::string &forCategoryName, catListSym->used = true; objFile->symbols.push_back(catListSym); + addInputSection(newCatList); return catListSym; } @@ -1012,7 +1012,6 @@ Defined *ObjcCategoryMerger::emitCategoryBody(const std::string &name, bodyData, infoCategoryWriter.catBodyInfo.align); newBodySec->parent = infoCategoryWriter.catBodyInfo.outputSection; newBodySec->live = true; - addInputSection(newBodySec); std::string symName = objc::symbol_names::category + baseClassName + "(" + name + ")"; @@ -1025,6 +1024,7 @@ Defined *ObjcCategoryMerger::emitCategoryBody(const std::string &name, catBodySym->used = true; objFile->symbols.push_back(catBodySym); + addInputSection(newBodySec); createSymbolReference(catBodySym, nameSym, catLayout.nameOffset, infoCategoryWriter.catBodyInfo.relocTemplate); @@ -1245,7 +1245,6 @@ void ObjcCategoryMerger::generateCatListForNonErasedCategories( infoCategoryWriter.catListInfo.align); listSec->parent = infoCategoryWriter.catListInfo.outputSection; listSec->live = true; - addInputSection(listSec); std::string slotSymName = "<__objc_catlist slot for category "; slotSymName += nonErasedCatBody->getName(); @@ -1260,6 +1259,7 @@ void ObjcCategoryMerger::generateCatListForNonErasedCategories( catListSlotSym->used = true; objFile->symbols.push_back(catListSlotSym); + addInputSection(listSec); // Now link the category body into the newly created slot createSymbolReference(catListSlotSym, nonErasedCatBody, 0, diff --git a/lld/test/MachO/objc-category-merging-minimal.s b/lld/test/MachO/objc-categ