[llvm-branch-commits] [llvm] release/19.x: [LICM] allow MemoryAccess creation failure (#116813) (PR #117082)
https://github.com/nikic approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/117082 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/19.x: [LICM] allow MemoryAccess creation failure (#116813) (PR #117082)
https://github.com/DianQK updated https://github.com/llvm/llvm-project/pull/117082 >From d7c9977e092ee48d8bee2a2787af0d23b75cfee5 Mon Sep 17 00:00:00 2001 From: DianQK Date: Wed, 20 Nov 2024 19:52:51 +0800 Subject: [PATCH] [LICM] allow MemoryAccess creation failure (#116813) Fixes #116809. After running some passes (SimpleLoopUnswitch, LoopInstSimplify, etc.), MemorySSA might be outdated, and the instruction `I` may have become a non-memory touching instruction. LICM has already handled this, but it does not pass `CreationMustSucceed=false` to `createDefinedAccess`. (cherry picked from commit 18b02bbf441660683df7f3925946984203d49bab) --- llvm/include/llvm/Analysis/MemorySSAUpdater.h | 5 ++ llvm/lib/Analysis/MemorySSAUpdater.cpp| 13 - llvm/lib/Transforms/Scalar/LICM.cpp | 5 +- .../LICM/PR116813-memoryssa-outdated.ll | 50 +++ 4 files changed, 70 insertions(+), 3 deletions(-) create mode 100644 llvm/test/Transforms/LICM/PR116813-memoryssa-outdated.ll diff --git a/llvm/include/llvm/Analysis/MemorySSAUpdater.h b/llvm/include/llvm/Analysis/MemorySSAUpdater.h index d4da3ef1146db7..f598dedea75fd6 100644 --- a/llvm/include/llvm/Analysis/MemorySSAUpdater.h +++ b/llvm/include/llvm/Analysis/MemorySSAUpdater.h @@ -192,6 +192,11 @@ class MemorySSAUpdater { const BasicBlock *BB, MemorySSA::InsertionPlace Point); + MemoryAccess *createMemoryAccessInBB(Instruction *I, MemoryAccess *Definition, + const BasicBlock *BB, + MemorySSA::InsertionPlace Point, + bool CreationMustSucceed); + /// Create a MemoryAccess in MemorySSA before an existing MemoryAccess. /// /// See createMemoryAccessInBB() for usage details. diff --git a/llvm/lib/Analysis/MemorySSAUpdater.cpp b/llvm/lib/Analysis/MemorySSAUpdater.cpp index aa550f0b6a7bfd..94061c949b7f85 100644 --- a/llvm/lib/Analysis/MemorySSAUpdater.cpp +++ b/llvm/lib/Analysis/MemorySSAUpdater.cpp @@ -1404,8 +1404,17 @@ void MemorySSAUpdater::changeToUnreachable(const Instruction *I) { MemoryAccess *MemorySSAUpdater::createMemoryAccessInBB( Instruction *I, MemoryAccess *Definition, const BasicBlock *BB, MemorySSA::InsertionPlace Point) { - MemoryUseOrDef *NewAccess = MSSA->createDefinedAccess(I, Definition); - MSSA->insertIntoListsForBlock(NewAccess, BB, Point); + return createMemoryAccessInBB(I, Definition, BB, Point, +/*CreationMustSucceed=*/true); +} + +MemoryAccess *MemorySSAUpdater::createMemoryAccessInBB( +Instruction *I, MemoryAccess *Definition, const BasicBlock *BB, +MemorySSA::InsertionPlace Point, bool CreationMustSucceed) { + MemoryUseOrDef *NewAccess = MSSA->createDefinedAccess( + I, Definition, /*Template=*/nullptr, CreationMustSucceed); + if (NewAccess) +MSSA->insertIntoListsForBlock(NewAccess, BB, Point); return NewAccess; } diff --git a/llvm/lib/Transforms/Scalar/LICM.cpp b/llvm/lib/Transforms/Scalar/LICM.cpp index 91ef2b4b7c1839..ca03eff7a4e25f 100644 --- a/llvm/lib/Transforms/Scalar/LICM.cpp +++ b/llvm/lib/Transforms/Scalar/LICM.cpp @@ -1464,8 +1464,11 @@ static Instruction *cloneInstructionInExitBlock( if (MSSAU.getMemorySSA()->getMemoryAccess(&I)) { // Create a new MemoryAccess and let MemorySSA set its defining access. +// After running some passes, MemorySSA might be outdated, and the +// instruction `I` may have become a non-memory touching instruction. MemoryAccess *NewMemAcc = MSSAU.createMemoryAccessInBB( -New, nullptr, New->getParent(), MemorySSA::Beginning); +New, nullptr, New->getParent(), MemorySSA::Beginning, +/*CreationMustSucceed=*/false); if (NewMemAcc) { if (auto *MemDef = dyn_cast(NewMemAcc)) MSSAU.insertDef(MemDef, /*RenameUses=*/true); diff --git a/llvm/test/Transforms/LICM/PR116813-memoryssa-outdated.ll b/llvm/test/Transforms/LICM/PR116813-memoryssa-outdated.ll new file mode 100644 index 00..a040c3cc6947c6 --- /dev/null +++ b/llvm/test/Transforms/LICM/PR116813-memoryssa-outdated.ll @@ -0,0 +1,50 @@ +; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5 +; RUN: opt -passes='loop-mssa(simple-loop-unswitch,licm)' -verify-memoryssa -S < %s | FileCheck %s + +; Check that running LICM after SimpleLoopUnswitch does not result in a crash. + +define i32 @foo(i1 %arg, ptr %arg1) { +; CHECK-LABEL: define i32 @foo( +; CHECK-SAME: i1 [[ARG:%.*]], ptr [[ARG1:%.*]]) { +; CHECK-NEXT: [[START:.*:]] +; CHECK-NEXT:[[ARG_FR:%.*]] = freeze i1 [[ARG]] +; CHECK-NEXT:br i1 [[ARG_FR]], label %[[START_SPLIT_US:.*]], label %[[START_SPLIT:.*]] +; CHECK: [[START_SPLIT_US]]: +; CHECK-NEXT:br label %[[LOOP_US:.*]] +; CHECK: [[LOOP_US]]: +; CHECK-NEXT:br label %[[BB0:.*]] +; CHECK: [[BB0]
[llvm-branch-commits] [llvm] release/19.x: [LICM] allow MemoryAccess creation failure (#116813) (PR #117082)
@@ -192,6 +192,12 @@ class MemorySSAUpdater { const BasicBlock *BB, MemorySSA::InsertionPlace Point); + MemoryAccess *createMemoryAccessInBB2(Instruction *I, +MemoryAccess *Definition, +const BasicBlock *BB, +MemorySSA::InsertionPlace Point, +bool CreationMustSucceed = true); DianQK wrote: Ah, yes! :3 https://github.com/llvm/llvm-project/pull/117082 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/19.x: [LICM] allow MemoryAccess creation failure (#116813) (PR #117082)
@@ -192,6 +192,12 @@ class MemorySSAUpdater { const BasicBlock *BB, MemorySSA::InsertionPlace Point); + MemoryAccess *createMemoryAccessInBB2(Instruction *I, +MemoryAccess *Definition, +const BasicBlock *BB, +MemorySSA::InsertionPlace Point, +bool CreationMustSucceed = true); nikic wrote: ```suggestion MemoryAccess *createMemoryAccessInBB(Instruction *I, MemoryAccess *Definition, const BasicBlock *BB, MemorySSA::InsertionPlace Point, bool CreationMustSucceed); ``` This can be an overload with the extra parameter, no need to use a different name. https://github.com/llvm/llvm-project/pull/117082 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [MLIR][OpenMP] Add Lowering support for OpenMP Declare Mapper directive (PR #117046)
@@ -21,7 +21,7 @@ subroutine declare_mapper_1 type (my_type2):: t real :: x, y(nvals) !$omp declare mapper (my_type :: var) map (var, var%values (1:var%num_vals)) -!CHECK: not yet implemented: OpenMPDeclareMapperConstruct +!CHECK: not yet implemented: lowering symbol to HLFIR tblah wrote: I'm surprised to see this TODO come up. Please could you fix this before merging so that we can maintain a helpful error message for the user. https://github.com/llvm/llvm-project/pull/117046 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [MLIR][OpenMP] Add Lowering support for OpenMP Declare Mapper directive (PR #117046)
@@ -2701,7 +2702,39 @@ static void genOMP(lower::AbstractConverter &converter, lower::SymMap &symTable, semantics::SemanticsContext &semaCtx, lower::pft::Evaluation &eval, const parser::OpenMPDeclareMapperConstruct &declareMapperConstruct) { - TODO(converter.getCurrentLocation(), "OpenMPDeclareMapperConstruct"); + fir::FirOpBuilder &firOpBuilder = converter.getFirOpBuilder(); + lower::StatementContext stmtCtx; + const auto &spec = + std::get(declareMapperConstruct.t); + const auto &mapperName{std::get>(spec.t)}; + const auto &varType{std::get(spec.t)}; + const auto &varName{std::get(spec.t)}; + std::stringstream mapperNameStr; + if (mapperName.has_value()) { +mapperNameStr << mapperName->ToString(); + } else { +mapperNameStr << "default_" + << varType.declTypeSpec->derivedTypeSpec().name().ToString(); + } tblah wrote: Two nits. Feel free to ignore number 2. 1. Flang **lowering** follows the MLIR style guide, which in this case matches LLVM: https://llvm.org/docs/CodingStandards.html#don-t-use-braces-on-simple-single-statement-bodies-of-if-else-loop-statements 2. To me, a `std::stringstream` feels like overkill here. You could use a `std::string` with the concatenation in the else branch handled by an implicit `Twine` (https://llvm.org/docs/ProgrammersManual.html#llvm-adt-twine-h) https://github.com/llvm/llvm-project/pull/117046 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/19.x: [SCEV] Fix sext handling for `getConstantMultiple` (#117093) (PR #117136)
llvmbot wrote: @llvm/pr-subscribers-llvm-transforms Author: None (llvmbot) Changes Backport 458dfbd855806461b4508bf8845cafe0411dbfd4 Requested by: @dtcxzyw --- Full diff: https://github.com/llvm/llvm-project/pull/117136.diff 3 Files Affected: - (modified) llvm/lib/Analysis/ScalarEvolution.cpp (+3-1) - (added) llvm/test/Analysis/ScalarEvolution/pr116483.ll (+26) - (added) llvm/test/Transforms/IndVarSimplify/pr116483.ll (+36) ``diff diff --git a/llvm/lib/Analysis/ScalarEvolution.cpp b/llvm/lib/Analysis/ScalarEvolution.cpp index 51cffac8087689..412cfe73d3e559 100644 --- a/llvm/lib/Analysis/ScalarEvolution.cpp +++ b/llvm/lib/Analysis/ScalarEvolution.cpp @@ -6313,8 +6313,10 @@ APInt ScalarEvolution::getConstantMultipleImpl(const SCEV *S) { return getConstantMultiple(Z->getOperand()).zext(BitWidth); } case scSignExtend: { +// Only multiples that are a power of 2 will hold after sext. const SCEVSignExtendExpr *E = cast(S); -return getConstantMultiple(E->getOperand()).sext(BitWidth); +uint32_t TZ = getMinTrailingZeros(E->getOperand()); +return GetShiftedByZeros(TZ); } case scMulExpr: { const SCEVMulExpr *M = cast(S); diff --git a/llvm/test/Analysis/ScalarEvolution/pr116483.ll b/llvm/test/Analysis/ScalarEvolution/pr116483.ll new file mode 100644 index 00..cc2334e9c64f92 --- /dev/null +++ b/llvm/test/Analysis/ScalarEvolution/pr116483.ll @@ -0,0 +1,26 @@ +; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py UTC_ARGS: --version 5 +; RUN: opt -S -disable-output "-passes=print" < %s 2>&1 | FileCheck %s + +define i16 @test() { +; CHECK-LABEL: 'test' +; CHECK-NEXT: Classifying expressions for: @test +; CHECK-NEXT:%xor = xor i32 0, 3 +; CHECK-NEXT:--> %xor U: [3,4) S: [3,4) +; CHECK-NEXT:%mul = mul i32 %xor, 329 +; CHECK-NEXT:--> (329 * %xor) U: [987,988) S: [987,988) +; CHECK-NEXT:%conv = trunc i32 %mul to i16 +; CHECK-NEXT:--> (329 * (trunc i32 %xor to i16)) U: [987,988) S: [987,988) +; CHECK-NEXT:%sext = shl i16 %conv, 8 +; CHECK-NEXT:--> (18688 * (trunc i32 %xor to i16)) U: [-9472,-9471) S: [-9472,-9471) +; CHECK-NEXT:%conv1 = ashr i16 %sext, 8 +; CHECK-NEXT:--> (sext i8 (73 * (trunc i32 %xor to i8)) to i16) U: [-37,-36) S: [-37,-36) +; CHECK-NEXT: Determining loop execution counts for: @test +; +entry: + %xor = xor i32 0, 3 + %mul = mul i32 %xor, 329 + %conv = trunc i32 %mul to i16 + %sext = shl i16 %conv, 8 + %conv1 = ashr i16 %sext, 8 + ret i16 %conv1 +} diff --git a/llvm/test/Transforms/IndVarSimplify/pr116483.ll b/llvm/test/Transforms/IndVarSimplify/pr116483.ll new file mode 100644 index 00..ae108a525223e0 --- /dev/null +++ b/llvm/test/Transforms/IndVarSimplify/pr116483.ll @@ -0,0 +1,36 @@ +; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5 +; RUN: opt -S -passes=indvars < %s | FileCheck %s + +define i32 @test() { +; CHECK-LABEL: define i32 @test() { +; CHECK-NEXT: [[ENTRY:.*:]] +; CHECK-NEXT:[[XOR:%.*]] = xor i32 0, 3 +; CHECK-NEXT:[[MUL:%.*]] = mul i32 [[XOR]], 329 +; CHECK-NEXT:[[CONV:%.*]] = trunc i32 [[MUL]] to i16 +; CHECK-NEXT:[[SEXT:%.*]] = shl i16 [[CONV]], 8 +; CHECK-NEXT:[[CONV1:%.*]] = ashr i16 [[SEXT]], 8 +; CHECK-NEXT:br label %[[LOOP_BODY:.*]] +; CHECK: [[LOOP_BODY]]: +; CHECK-NEXT:br i1 true, label %[[EXIT:.*]], label %[[LOOP_BODY]] +; CHECK: [[EXIT]]: +; CHECK-NEXT:[[CONV3:%.*]] = zext i16 [[CONV1]] to i32 +; CHECK-NEXT:ret i32 [[CONV3]] +; +entry: + %xor = xor i32 0, 3 + %mul = mul i32 %xor, 329 + %conv = trunc i32 %mul to i16 + %sext = shl i16 %conv, 8 + %conv1 = ashr i16 %sext, 8 + %conv3 = zext i16 %conv1 to i32 + br label %loop.body + +loop.body: + %indvar = phi i32 [ %indvar.inc, %loop.body ], [ 1, %entry ] + %indvar.inc = add nuw i32 %indvar, 1 + %exitcond = icmp eq i32 %indvar, %conv3 + br i1 %exitcond, label %exit, label %loop.body + +exit: + ret i32 %conv3 +} `` https://github.com/llvm/llvm-project/pull/117136 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/19.x: [ConstraintElim] Bail out on non-dedicated exits when adding exiting conditions (#116627) (PR #117137)
llvmbot wrote: @llvm/pr-subscribers-llvm-transforms Author: None (llvmbot) Changes Backport 52361d0368b79841be12156bf03cf8c1851e5df7 Requested by: @antoniofrighetto --- Full diff: https://github.com/llvm/llvm-project/pull/117137.diff 2 Files Affected: - (modified) llvm/lib/Transforms/Scalar/ConstraintElimination.cpp (+8-5) - (modified) llvm/test/Transforms/ConstraintElimination/induction-condition-in-loop-exit.ll (+44) ``diff diff --git a/llvm/lib/Transforms/Scalar/ConstraintElimination.cpp b/llvm/lib/Transforms/Scalar/ConstraintElimination.cpp index 37022104d0a9bd..d1c80aa6712433 100644 --- a/llvm/lib/Transforms/Scalar/ConstraintElimination.cpp +++ b/llvm/lib/Transforms/Scalar/ConstraintElimination.cpp @@ -1033,9 +1033,9 @@ void State::addInfoForInductions(BasicBlock &BB) { DTN, CmpInst::ICMP_SLT, PN, B, ConditionTy(CmpInst::ICMP_SLE, StartValue, B))); - // Try to add condition from header to the exit blocks. When exiting either - // with EQ or NE in the header, we know that the induction value must be u<= - // B, as other exits may only exit earlier. + // Try to add condition from header to the dedicated exit blocks. When exiting + // either with EQ or NE in the header, we know that the induction value must + // be u<= B, as other exits may only exit earlier. assert(!StepOffset.isNegative() && "induction must be increasing"); assert((Pred == CmpInst::ICMP_EQ || Pred == CmpInst::ICMP_NE) && "unsupported predicate"); @@ -1043,8 +1043,11 @@ void State::addInfoForInductions(BasicBlock &BB) { SmallVector ExitBBs; L->getExitBlocks(ExitBBs); for (BasicBlock *EB : ExitBBs) { -WorkList.emplace_back(FactOrCheck::getConditionFact( -DT.getNode(EB), CmpInst::ICMP_ULE, A, B, Precond)); +// Bail out on non-dedicated exits. +if (DT.dominates(&BB, EB)) { + WorkList.emplace_back(FactOrCheck::getConditionFact( + DT.getNode(EB), CmpInst::ICMP_ULE, A, B, Precond)); +} } } diff --git a/llvm/test/Transforms/ConstraintElimination/induction-condition-in-loop-exit.ll b/llvm/test/Transforms/ConstraintElimination/induction-condition-in-loop-exit.ll index 15e1d843726278..a04b06e1bf0a52 100644 --- a/llvm/test/Transforms/ConstraintElimination/induction-condition-in-loop-exit.ll +++ b/llvm/test/Transforms/ConstraintElimination/induction-condition-in-loop-exit.ll @@ -763,3 +763,47 @@ exit.2: %t.2 = icmp ult i32 %iv, %N ret i1 %t.2 } + +define i1 @test_non_dedicated_exit(i16 %n) { +; CHECK-LABEL: define i1 @test_non_dedicated_exit( +; CHECK-SAME: i16 [[N:%.*]]) { +; CHECK-NEXT: [[ENTRY:.*:]] +; CHECK-NEXT:[[COND:%.*]] = icmp slt i16 [[N]], 1 +; CHECK-NEXT:br i1 [[COND]], label %[[EXIT:.*]], label %[[LOOP_PREHEADER:.*]] +; CHECK: [[LOOP_PREHEADER]]: +; CHECK-NEXT:[[SUB:%.*]] = add nsw i16 [[N]], -1 +; CHECK-NEXT:[[EXT:%.*]] = zext nneg i16 [[SUB]] to i32 +; CHECK-NEXT:br label %[[LOOP:.*]] +; CHECK: [[LOOP]]: +; CHECK-NEXT:[[INDVAR:%.*]] = phi i32 [ [[INDVAR_INC:%.*]], %[[LOOP_LATCH:.*]] ], [ 0, %[[LOOP_PREHEADER]] ] +; CHECK-NEXT:[[EXITCOND:%.*]] = icmp eq i32 [[INDVAR]], [[EXT]] +; CHECK-NEXT:br i1 [[EXITCOND]], label %[[EXIT]], label %[[LOOP_LATCH]] +; CHECK: [[LOOP_LATCH]]: +; CHECK-NEXT:[[INDVAR_INC]] = add nuw nsw i32 [[INDVAR]], 1 +; CHECK-NEXT:br label %[[LOOP]] +; CHECK: [[EXIT]]: +; CHECK-NEXT:[[CMP:%.*]] = icmp sgt i16 [[N]], 0 +; CHECK-NEXT:ret i1 [[CMP]] +; +entry: + %cond = icmp slt i16 %n, 1 + br i1 %cond, label %exit, label %loop.preheader + +loop.preheader: + %sub = add nsw i16 %n, -1 + %ext = zext nneg i16 %sub to i32 + br label %loop + +loop: + %indvar = phi i32 [ %indvar.inc, %loop.latch ], [ 0, %loop.preheader ] + %exitcond = icmp eq i32 %indvar, %ext + br i1 %exitcond, label %exit, label %loop.latch + +loop.latch: + %indvar.inc = add nuw nsw i32 %indvar, 1 + br label %loop + +exit: + %cmp = icmp sgt i16 %n, 0 + ret i1 %cmp +} `` https://github.com/llvm/llvm-project/pull/117137 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/19.x: [ConstraintElim] Bail out on non-dedicated exits when adding exiting conditions (#116627) (PR #117137)
https://github.com/llvmbot milestoned https://github.com/llvm/llvm-project/pull/117137 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/19.x: [ConstraintElim] Bail out on non-dedicated exits when adding exiting conditions (#116627) (PR #117137)
https://github.com/llvmbot created https://github.com/llvm/llvm-project/pull/117137 Backport 52361d0368b79841be12156bf03cf8c1851e5df7 Requested by: @antoniofrighetto >From 4e3f5191928641fdf7298ee21fdf09ab0f17a53e Mon Sep 17 00:00:00 2001 From: Yingwei Zheng Date: Mon, 18 Nov 2024 23:41:04 +0800 Subject: [PATCH] [ConstraintElim] Bail out on non-dedicated exits when adding exiting conditions (#116627) This patch bails out non-dedicated exits to avoid adding exiting conditions to invalid context. Closes https://github.com/llvm/llvm-project/issues/116553. (cherry picked from commit 52361d0368b79841be12156bf03cf8c1851e5df7) --- .../Scalar/ConstraintElimination.cpp | 13 +++--- .../induction-condition-in-loop-exit.ll | 44 +++ 2 files changed, 52 insertions(+), 5 deletions(-) diff --git a/llvm/lib/Transforms/Scalar/ConstraintElimination.cpp b/llvm/lib/Transforms/Scalar/ConstraintElimination.cpp index 37022104d0a9bd..d1c80aa6712433 100644 --- a/llvm/lib/Transforms/Scalar/ConstraintElimination.cpp +++ b/llvm/lib/Transforms/Scalar/ConstraintElimination.cpp @@ -1033,9 +1033,9 @@ void State::addInfoForInductions(BasicBlock &BB) { DTN, CmpInst::ICMP_SLT, PN, B, ConditionTy(CmpInst::ICMP_SLE, StartValue, B))); - // Try to add condition from header to the exit blocks. When exiting either - // with EQ or NE in the header, we know that the induction value must be u<= - // B, as other exits may only exit earlier. + // Try to add condition from header to the dedicated exit blocks. When exiting + // either with EQ or NE in the header, we know that the induction value must + // be u<= B, as other exits may only exit earlier. assert(!StepOffset.isNegative() && "induction must be increasing"); assert((Pred == CmpInst::ICMP_EQ || Pred == CmpInst::ICMP_NE) && "unsupported predicate"); @@ -1043,8 +1043,11 @@ void State::addInfoForInductions(BasicBlock &BB) { SmallVector ExitBBs; L->getExitBlocks(ExitBBs); for (BasicBlock *EB : ExitBBs) { -WorkList.emplace_back(FactOrCheck::getConditionFact( -DT.getNode(EB), CmpInst::ICMP_ULE, A, B, Precond)); +// Bail out on non-dedicated exits. +if (DT.dominates(&BB, EB)) { + WorkList.emplace_back(FactOrCheck::getConditionFact( + DT.getNode(EB), CmpInst::ICMP_ULE, A, B, Precond)); +} } } diff --git a/llvm/test/Transforms/ConstraintElimination/induction-condition-in-loop-exit.ll b/llvm/test/Transforms/ConstraintElimination/induction-condition-in-loop-exit.ll index 15e1d843726278..a04b06e1bf0a52 100644 --- a/llvm/test/Transforms/ConstraintElimination/induction-condition-in-loop-exit.ll +++ b/llvm/test/Transforms/ConstraintElimination/induction-condition-in-loop-exit.ll @@ -763,3 +763,47 @@ exit.2: %t.2 = icmp ult i32 %iv, %N ret i1 %t.2 } + +define i1 @test_non_dedicated_exit(i16 %n) { +; CHECK-LABEL: define i1 @test_non_dedicated_exit( +; CHECK-SAME: i16 [[N:%.*]]) { +; CHECK-NEXT: [[ENTRY:.*:]] +; CHECK-NEXT:[[COND:%.*]] = icmp slt i16 [[N]], 1 +; CHECK-NEXT:br i1 [[COND]], label %[[EXIT:.*]], label %[[LOOP_PREHEADER:.*]] +; CHECK: [[LOOP_PREHEADER]]: +; CHECK-NEXT:[[SUB:%.*]] = add nsw i16 [[N]], -1 +; CHECK-NEXT:[[EXT:%.*]] = zext nneg i16 [[SUB]] to i32 +; CHECK-NEXT:br label %[[LOOP:.*]] +; CHECK: [[LOOP]]: +; CHECK-NEXT:[[INDVAR:%.*]] = phi i32 [ [[INDVAR_INC:%.*]], %[[LOOP_LATCH:.*]] ], [ 0, %[[LOOP_PREHEADER]] ] +; CHECK-NEXT:[[EXITCOND:%.*]] = icmp eq i32 [[INDVAR]], [[EXT]] +; CHECK-NEXT:br i1 [[EXITCOND]], label %[[EXIT]], label %[[LOOP_LATCH]] +; CHECK: [[LOOP_LATCH]]: +; CHECK-NEXT:[[INDVAR_INC]] = add nuw nsw i32 [[INDVAR]], 1 +; CHECK-NEXT:br label %[[LOOP]] +; CHECK: [[EXIT]]: +; CHECK-NEXT:[[CMP:%.*]] = icmp sgt i16 [[N]], 0 +; CHECK-NEXT:ret i1 [[CMP]] +; +entry: + %cond = icmp slt i16 %n, 1 + br i1 %cond, label %exit, label %loop.preheader + +loop.preheader: + %sub = add nsw i16 %n, -1 + %ext = zext nneg i16 %sub to i32 + br label %loop + +loop: + %indvar = phi i32 [ %indvar.inc, %loop.latch ], [ 0, %loop.preheader ] + %exitcond = icmp eq i32 %indvar, %ext + br i1 %exitcond, label %exit, label %loop.latch + +loop.latch: + %indvar.inc = add nuw nsw i32 %indvar, 1 + br label %loop + +exit: + %cmp = icmp sgt i16 %n, 0 + ret i1 %cmp +} ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/19.x: [ConstraintElim] Bail out on non-dedicated exits when adding exiting conditions (#116627) (PR #117137)
llvmbot wrote: @fhahn What do you think about merging this PR to the release branch? https://github.com/llvm/llvm-project/pull/117137 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoongArch] Fix GOT usage for `non-dso_local` function calls in large code model (PR #117134)
https://github.com/wangleiat milestoned https://github.com/llvm/llvm-project/pull/117134 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/19.x: [SCEV] Fix sext handling for `getConstantMultiple` (#117093) (PR #117136)
https://github.com/llvmbot created https://github.com/llvm/llvm-project/pull/117136 Backport 458dfbd855806461b4508bf8845cafe0411dbfd4 Requested by: @dtcxzyw >From f6c67ad7a20fe7bb535242c78b8f06cacc48d521 Mon Sep 17 00:00:00 2001 From: Yingwei Zheng Date: Thu, 21 Nov 2024 17:23:04 +0800 Subject: [PATCH] [SCEV] Fix sext handling for `getConstantMultiple` (#117093) Counterexample: 219 is a multiple of 73. But `sext i8 219 to i16 = 65499` is not. Fixes https://github.com/llvm/llvm-project/issues/116483. (cherry picked from commit 458dfbd855806461b4508bf8845cafe0411dbfd4) --- llvm/lib/Analysis/ScalarEvolution.cpp | 4 ++- .../test/Analysis/ScalarEvolution/pr116483.ll | 26 ++ .../Transforms/IndVarSimplify/pr116483.ll | 36 +++ 3 files changed, 65 insertions(+), 1 deletion(-) create mode 100644 llvm/test/Analysis/ScalarEvolution/pr116483.ll create mode 100644 llvm/test/Transforms/IndVarSimplify/pr116483.ll diff --git a/llvm/lib/Analysis/ScalarEvolution.cpp b/llvm/lib/Analysis/ScalarEvolution.cpp index 51cffac8087689..412cfe73d3e559 100644 --- a/llvm/lib/Analysis/ScalarEvolution.cpp +++ b/llvm/lib/Analysis/ScalarEvolution.cpp @@ -6313,8 +6313,10 @@ APInt ScalarEvolution::getConstantMultipleImpl(const SCEV *S) { return getConstantMultiple(Z->getOperand()).zext(BitWidth); } case scSignExtend: { +// Only multiples that are a power of 2 will hold after sext. const SCEVSignExtendExpr *E = cast(S); -return getConstantMultiple(E->getOperand()).sext(BitWidth); +uint32_t TZ = getMinTrailingZeros(E->getOperand()); +return GetShiftedByZeros(TZ); } case scMulExpr: { const SCEVMulExpr *M = cast(S); diff --git a/llvm/test/Analysis/ScalarEvolution/pr116483.ll b/llvm/test/Analysis/ScalarEvolution/pr116483.ll new file mode 100644 index 00..cc2334e9c64f92 --- /dev/null +++ b/llvm/test/Analysis/ScalarEvolution/pr116483.ll @@ -0,0 +1,26 @@ +; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py UTC_ARGS: --version 5 +; RUN: opt -S -disable-output "-passes=print" < %s 2>&1 | FileCheck %s + +define i16 @test() { +; CHECK-LABEL: 'test' +; CHECK-NEXT: Classifying expressions for: @test +; CHECK-NEXT:%xor = xor i32 0, 3 +; CHECK-NEXT:--> %xor U: [3,4) S: [3,4) +; CHECK-NEXT:%mul = mul i32 %xor, 329 +; CHECK-NEXT:--> (329 * %xor) U: [987,988) S: [987,988) +; CHECK-NEXT:%conv = trunc i32 %mul to i16 +; CHECK-NEXT:--> (329 * (trunc i32 %xor to i16)) U: [987,988) S: [987,988) +; CHECK-NEXT:%sext = shl i16 %conv, 8 +; CHECK-NEXT:--> (18688 * (trunc i32 %xor to i16)) U: [-9472,-9471) S: [-9472,-9471) +; CHECK-NEXT:%conv1 = ashr i16 %sext, 8 +; CHECK-NEXT:--> (sext i8 (73 * (trunc i32 %xor to i8)) to i16) U: [-37,-36) S: [-37,-36) +; CHECK-NEXT: Determining loop execution counts for: @test +; +entry: + %xor = xor i32 0, 3 + %mul = mul i32 %xor, 329 + %conv = trunc i32 %mul to i16 + %sext = shl i16 %conv, 8 + %conv1 = ashr i16 %sext, 8 + ret i16 %conv1 +} diff --git a/llvm/test/Transforms/IndVarSimplify/pr116483.ll b/llvm/test/Transforms/IndVarSimplify/pr116483.ll new file mode 100644 index 00..ae108a525223e0 --- /dev/null +++ b/llvm/test/Transforms/IndVarSimplify/pr116483.ll @@ -0,0 +1,36 @@ +; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5 +; RUN: opt -S -passes=indvars < %s | FileCheck %s + +define i32 @test() { +; CHECK-LABEL: define i32 @test() { +; CHECK-NEXT: [[ENTRY:.*:]] +; CHECK-NEXT:[[XOR:%.*]] = xor i32 0, 3 +; CHECK-NEXT:[[MUL:%.*]] = mul i32 [[XOR]], 329 +; CHECK-NEXT:[[CONV:%.*]] = trunc i32 [[MUL]] to i16 +; CHECK-NEXT:[[SEXT:%.*]] = shl i16 [[CONV]], 8 +; CHECK-NEXT:[[CONV1:%.*]] = ashr i16 [[SEXT]], 8 +; CHECK-NEXT:br label %[[LOOP_BODY:.*]] +; CHECK: [[LOOP_BODY]]: +; CHECK-NEXT:br i1 true, label %[[EXIT:.*]], label %[[LOOP_BODY]] +; CHECK: [[EXIT]]: +; CHECK-NEXT:[[CONV3:%.*]] = zext i16 [[CONV1]] to i32 +; CHECK-NEXT:ret i32 [[CONV3]] +; +entry: + %xor = xor i32 0, 3 + %mul = mul i32 %xor, 329 + %conv = trunc i32 %mul to i16 + %sext = shl i16 %conv, 8 + %conv1 = ashr i16 %sext, 8 + %conv3 = zext i16 %conv1 to i32 + br label %loop.body + +loop.body: + %indvar = phi i32 [ %indvar.inc, %loop.body ], [ 1, %entry ] + %indvar.inc = add nuw i32 %indvar, 1 + %exitcond = icmp eq i32 %indvar, %conv3 + br i1 %exitcond, label %exit, label %loop.body + +exit: + ret i32 %conv3 +} ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/19.x: [SCEV] Fix sext handling for `getConstantMultiple` (#117093) (PR #117136)
llvmbot wrote: @antoniofrighetto What do you think about merging this PR to the release branch? https://github.com/llvm/llvm-project/pull/117136 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/19.x: [SCEV] Fix sext handling for `getConstantMultiple` (#117093) (PR #117136)
https://github.com/llvmbot milestoned https://github.com/llvm/llvm-project/pull/117136 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/19.x: [SCEV] Fix sext handling for `getConstantMultiple` (#117093) (PR #117136)
llvmbot wrote: @llvm/pr-subscribers-llvm-analysis Author: None (llvmbot) Changes Backport 458dfbd855806461b4508bf8845cafe0411dbfd4 Requested by: @dtcxzyw --- Full diff: https://github.com/llvm/llvm-project/pull/117136.diff 3 Files Affected: - (modified) llvm/lib/Analysis/ScalarEvolution.cpp (+3-1) - (added) llvm/test/Analysis/ScalarEvolution/pr116483.ll (+26) - (added) llvm/test/Transforms/IndVarSimplify/pr116483.ll (+36) ``diff diff --git a/llvm/lib/Analysis/ScalarEvolution.cpp b/llvm/lib/Analysis/ScalarEvolution.cpp index 51cffac8087689..412cfe73d3e559 100644 --- a/llvm/lib/Analysis/ScalarEvolution.cpp +++ b/llvm/lib/Analysis/ScalarEvolution.cpp @@ -6313,8 +6313,10 @@ APInt ScalarEvolution::getConstantMultipleImpl(const SCEV *S) { return getConstantMultiple(Z->getOperand()).zext(BitWidth); } case scSignExtend: { +// Only multiples that are a power of 2 will hold after sext. const SCEVSignExtendExpr *E = cast(S); -return getConstantMultiple(E->getOperand()).sext(BitWidth); +uint32_t TZ = getMinTrailingZeros(E->getOperand()); +return GetShiftedByZeros(TZ); } case scMulExpr: { const SCEVMulExpr *M = cast(S); diff --git a/llvm/test/Analysis/ScalarEvolution/pr116483.ll b/llvm/test/Analysis/ScalarEvolution/pr116483.ll new file mode 100644 index 00..cc2334e9c64f92 --- /dev/null +++ b/llvm/test/Analysis/ScalarEvolution/pr116483.ll @@ -0,0 +1,26 @@ +; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py UTC_ARGS: --version 5 +; RUN: opt -S -disable-output "-passes=print" < %s 2>&1 | FileCheck %s + +define i16 @test() { +; CHECK-LABEL: 'test' +; CHECK-NEXT: Classifying expressions for: @test +; CHECK-NEXT:%xor = xor i32 0, 3 +; CHECK-NEXT:--> %xor U: [3,4) S: [3,4) +; CHECK-NEXT:%mul = mul i32 %xor, 329 +; CHECK-NEXT:--> (329 * %xor) U: [987,988) S: [987,988) +; CHECK-NEXT:%conv = trunc i32 %mul to i16 +; CHECK-NEXT:--> (329 * (trunc i32 %xor to i16)) U: [987,988) S: [987,988) +; CHECK-NEXT:%sext = shl i16 %conv, 8 +; CHECK-NEXT:--> (18688 * (trunc i32 %xor to i16)) U: [-9472,-9471) S: [-9472,-9471) +; CHECK-NEXT:%conv1 = ashr i16 %sext, 8 +; CHECK-NEXT:--> (sext i8 (73 * (trunc i32 %xor to i8)) to i16) U: [-37,-36) S: [-37,-36) +; CHECK-NEXT: Determining loop execution counts for: @test +; +entry: + %xor = xor i32 0, 3 + %mul = mul i32 %xor, 329 + %conv = trunc i32 %mul to i16 + %sext = shl i16 %conv, 8 + %conv1 = ashr i16 %sext, 8 + ret i16 %conv1 +} diff --git a/llvm/test/Transforms/IndVarSimplify/pr116483.ll b/llvm/test/Transforms/IndVarSimplify/pr116483.ll new file mode 100644 index 00..ae108a525223e0 --- /dev/null +++ b/llvm/test/Transforms/IndVarSimplify/pr116483.ll @@ -0,0 +1,36 @@ +; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5 +; RUN: opt -S -passes=indvars < %s | FileCheck %s + +define i32 @test() { +; CHECK-LABEL: define i32 @test() { +; CHECK-NEXT: [[ENTRY:.*:]] +; CHECK-NEXT:[[XOR:%.*]] = xor i32 0, 3 +; CHECK-NEXT:[[MUL:%.*]] = mul i32 [[XOR]], 329 +; CHECK-NEXT:[[CONV:%.*]] = trunc i32 [[MUL]] to i16 +; CHECK-NEXT:[[SEXT:%.*]] = shl i16 [[CONV]], 8 +; CHECK-NEXT:[[CONV1:%.*]] = ashr i16 [[SEXT]], 8 +; CHECK-NEXT:br label %[[LOOP_BODY:.*]] +; CHECK: [[LOOP_BODY]]: +; CHECK-NEXT:br i1 true, label %[[EXIT:.*]], label %[[LOOP_BODY]] +; CHECK: [[EXIT]]: +; CHECK-NEXT:[[CONV3:%.*]] = zext i16 [[CONV1]] to i32 +; CHECK-NEXT:ret i32 [[CONV3]] +; +entry: + %xor = xor i32 0, 3 + %mul = mul i32 %xor, 329 + %conv = trunc i32 %mul to i16 + %sext = shl i16 %conv, 8 + %conv1 = ashr i16 %sext, 8 + %conv3 = zext i16 %conv1 to i32 + br label %loop.body + +loop.body: + %indvar = phi i32 [ %indvar.inc, %loop.body ], [ 1, %entry ] + %indvar.inc = add nuw i32 %indvar, 1 + %exitcond = icmp eq i32 %indvar, %conv3 + br i1 %exitcond, label %exit, label %loop.body + +exit: + ret i32 %conv3 +} `` https://github.com/llvm/llvm-project/pull/117136 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/19.x: [SCEV] Fix sext handling for `getConstantMultiple` (#117093) (PR #117136)
https://github.com/nikic approved this pull request. https://github.com/llvm/llvm-project/pull/117136 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/19.x: [LICM] allow MemoryAccess creation failure (#116813) (PR #117082)
@@ -190,7 +190,8 @@ class MemorySSAUpdater { /// inaccessible and it *must* have removeMemoryAccess called on it. MemoryAccess *createMemoryAccessInBB(Instruction *I, MemoryAccess *Definition, const BasicBlock *BB, - MemorySSA::InsertionPlace Point); + MemorySSA::InsertionPlace Point, + bool CreationMustSucceed = true); nikic wrote: This is an ABI-breaking change. Instead of an optional argument, you need to add two functions and forward one to the other. https://github.com/llvm/llvm-project/pull/117082 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [lld] [llvm] release/19.x: [MC][LoongArch] Change default cpu in `MCSubtargetInfo`. (#114922) (PR #117105)
heiher wrote: > Some tests need to be fixed. > > ``` > Failed Tests (3): > LLVM :: CodeGen/LoongArch/e_flags.ll > lld :: ELF/emulation-loongarch.s > lld :: ELF/loongarch-interlink.test > ``` Fixed. https://github.com/llvm/llvm-project/pull/117105 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoongArch] Fix GOT usage for `non-dso_local` function calls in large code model (PR #117134)
https://github.com/wangleiat created https://github.com/llvm/llvm-project/pull/117134 This commit fixes an issue in the large code model where non-dso_local function calls did not use the GOT as expected in PIC mode. Instead, direct PC-relative access was incorrectly applied, leading to linker errors when building shared libraries. For `ExternalSymbol`, it is not possible to determine whether it is dso_local during pseudo-instruction expansion. We use target flags to differentiate whether GOT should be used. Cherry-picked from #117099, used for fix linker errors when bulding shared libraries with large code model. >From 9616c8b70c9c272af93191624129dbf1f8992e41 Mon Sep 17 00:00:00 2001 From: wanglei Date: Thu, 21 Nov 2024 09:31:12 +0800 Subject: [PATCH] [LoongArch] Fix GOT usage for `non-dso_local` function calls in large code model This commit fixes an issue in the large code model where non-dso_local function calls did not use the GOT as expected in PIC mode. Instead, direct PC-relative access was incorrectly applied, leading to linker errors when building shared libraries. For `ExternalSymbol`, it is not possible to determine whether it is dso_local during pseudo-instruction expansion. We use target flags to differentiate whether GOT should be used. Cherry-picked from #117099, used for fix linker errors when bulding shared libraries with large code model. --- .../LoongArch/LoongArchExpandPseudoInsts.cpp | 2 +- llvm/test/CodeGen/LoongArch/code-models.ll| 10 ++--- .../LoongArch/machinelicm-address-pseudos.ll | 20 +- .../LoongArch/psabi-restricted-scheduling.ll | 40 +-- llvm/test/CodeGen/LoongArch/tls-models.ll | 20 +- 5 files changed, 46 insertions(+), 46 deletions(-) diff --git a/llvm/lib/Target/LoongArch/LoongArchExpandPseudoInsts.cpp b/llvm/lib/Target/LoongArch/LoongArchExpandPseudoInsts.cpp index c136f5b3e515d7..e680dda7374d07 100644 --- a/llvm/lib/Target/LoongArch/LoongArchExpandPseudoInsts.cpp +++ b/llvm/lib/Target/LoongArch/LoongArchExpandPseudoInsts.cpp @@ -721,7 +721,7 @@ bool LoongArchExpandPseudo::expandFunctionCALL( IsTailCall ? LoongArch::PseudoJIRL_TAIL : LoongArch::PseudoJIRL_CALL; Register AddrReg = IsTailCall ? LoongArch::R19 : LoongArch::R1; -bool UseGOT = Func.isGlobal() && !Func.getGlobal()->isDSOLocal(); +bool UseGOT = Func.getTargetFlags() == LoongArchII::MO_CALL_PLT; unsigned MO = UseGOT ? LoongArchII::MO_GOT_PC_HI : LoongArchII::MO_PCREL_LO; unsigned LAOpcode = UseGOT ? LoongArch::LDX_D : LoongArch::ADD_D; expandLargeAddressLoad(MBB, MBBI, NextMBBI, LAOpcode, MO, Func, AddrReg, diff --git a/llvm/test/CodeGen/LoongArch/code-models.ll b/llvm/test/CodeGen/LoongArch/code-models.ll index 4b2b72afaee171..4eb1e5e596fd3f 100644 --- a/llvm/test/CodeGen/LoongArch/code-models.ll +++ b/llvm/test/CodeGen/LoongArch/code-models.ll @@ -82,11 +82,11 @@ define void @call_external_sym(ptr %dst) { ; LARGE-NEXT:.cfi_offset 1, -8 ; LARGE-NEXT:ori $a2, $zero, 1000 ; LARGE-NEXT:move $a1, $zero -; LARGE-NEXT:pcalau12i $ra, %pc_hi20(memset) -; LARGE-NEXT:addi.d $t8, $zero, %pc_lo12(memset) -; LARGE-NEXT:lu32i.d $t8, %pc64_lo20(memset) -; LARGE-NEXT:lu52i.d $t8, $t8, %pc64_hi12(memset) -; LARGE-NEXT:add.d $ra, $t8, $ra +; LARGE-NEXT:pcalau12i $ra, %got_pc_hi20(memset) +; LARGE-NEXT:addi.d $t8, $zero, %got_pc_lo12(memset) +; LARGE-NEXT:lu32i.d $t8, %got64_pc_lo20(memset) +; LARGE-NEXT:lu52i.d $t8, $t8, %got64_pc_hi12(memset) +; LARGE-NEXT:ldx.d $ra, $t8, $ra ; LARGE-NEXT:jirl $ra, $ra, 0 ; LARGE-NEXT:ld.d $ra, $sp, 8 # 8-byte Folded Reload ; LARGE-NEXT:addi.d $sp, $sp, 16 diff --git a/llvm/test/CodeGen/LoongArch/machinelicm-address-pseudos.ll b/llvm/test/CodeGen/LoongArch/machinelicm-address-pseudos.ll index ed1a24e82b4e46..29348fe0d641ed 100644 --- a/llvm/test/CodeGen/LoongArch/machinelicm-address-pseudos.ll +++ b/llvm/test/CodeGen/LoongArch/machinelicm-address-pseudos.ll @@ -282,11 +282,11 @@ define void @test_la_tls_ld(i32 signext %n) { ; LA64LARGE-NEXT: .LBB3_1: # %loop ; LA64LARGE-NEXT:# =>This Inner Loop Header: Depth=1 ; LA64LARGE-NEXT:move $a0, $s0 -; LA64LARGE-NEXT:pcalau12i $ra, %pc_hi20(__tls_get_addr) -; LA64LARGE-NEXT:addi.d $t8, $zero, %pc_lo12(__tls_get_addr) -; LA64LARGE-NEXT:lu32i.d $t8, %pc64_lo20(__tls_get_addr) -; LA64LARGE-NEXT:lu52i.d $t8, $t8, %pc64_hi12(__tls_get_addr) -; LA64LARGE-NEXT:add.d $ra, $t8, $ra +; LA64LARGE-NEXT:pcalau12i $ra, %got_pc_hi20(__tls_get_addr) +; LA64LARGE-NEXT:addi.d $t8, $zero, %got_pc_lo12(__tls_get_addr) +; LA64LARGE-NEXT:lu32i.d $t8, %got64_pc_lo20(__tls_get_addr) +; LA64LARGE-NEXT:lu52i.d $t8, $t8, %got64_pc_hi12(__tls_get_addr) +; LA64LARGE-NEXT:ldx.d $ra, $t8, $ra ; LA64LARGE-NEXT:jirl $ra, $ra, 0 ; LA64LARGE-NEXT:ld.w $zero, $a0, 0 ; LA64LARGE-NEXT:addi.w $s1, $s1, 1 @@ -448,11 +448,11 @@ define void @test_la_t
[llvm-branch-commits] [llvm] [LoongArch] Fix GOT usage for `non-dso_local` function calls in large code model (PR #117134)
llvmbot wrote: @llvm/pr-subscribers-backend-loongarch Author: wanglei (wangleiat) Changes This commit fixes an issue in the large code model where non-dso_local function calls did not use the GOT as expected in PIC mode. Instead, direct PC-relative access was incorrectly applied, leading to linker errors when building shared libraries. For `ExternalSymbol`, it is not possible to determine whether it is dso_local during pseudo-instruction expansion. We use target flags to differentiate whether GOT should be used. Cherry-picked from #117099, used for fix linker errors when bulding shared libraries with large code model. --- Full diff: https://github.com/llvm/llvm-project/pull/117134.diff 5 Files Affected: - (modified) llvm/lib/Target/LoongArch/LoongArchExpandPseudoInsts.cpp (+1-1) - (modified) llvm/test/CodeGen/LoongArch/code-models.ll (+5-5) - (modified) llvm/test/CodeGen/LoongArch/machinelicm-address-pseudos.ll (+10-10) - (modified) llvm/test/CodeGen/LoongArch/psabi-restricted-scheduling.ll (+20-20) - (modified) llvm/test/CodeGen/LoongArch/tls-models.ll (+10-10) ``diff diff --git a/llvm/lib/Target/LoongArch/LoongArchExpandPseudoInsts.cpp b/llvm/lib/Target/LoongArch/LoongArchExpandPseudoInsts.cpp index c136f5b3e515d7..e680dda7374d07 100644 --- a/llvm/lib/Target/LoongArch/LoongArchExpandPseudoInsts.cpp +++ b/llvm/lib/Target/LoongArch/LoongArchExpandPseudoInsts.cpp @@ -721,7 +721,7 @@ bool LoongArchExpandPseudo::expandFunctionCALL( IsTailCall ? LoongArch::PseudoJIRL_TAIL : LoongArch::PseudoJIRL_CALL; Register AddrReg = IsTailCall ? LoongArch::R19 : LoongArch::R1; -bool UseGOT = Func.isGlobal() && !Func.getGlobal()->isDSOLocal(); +bool UseGOT = Func.getTargetFlags() == LoongArchII::MO_CALL_PLT; unsigned MO = UseGOT ? LoongArchII::MO_GOT_PC_HI : LoongArchII::MO_PCREL_LO; unsigned LAOpcode = UseGOT ? LoongArch::LDX_D : LoongArch::ADD_D; expandLargeAddressLoad(MBB, MBBI, NextMBBI, LAOpcode, MO, Func, AddrReg, diff --git a/llvm/test/CodeGen/LoongArch/code-models.ll b/llvm/test/CodeGen/LoongArch/code-models.ll index 4b2b72afaee171..4eb1e5e596fd3f 100644 --- a/llvm/test/CodeGen/LoongArch/code-models.ll +++ b/llvm/test/CodeGen/LoongArch/code-models.ll @@ -82,11 +82,11 @@ define void @call_external_sym(ptr %dst) { ; LARGE-NEXT:.cfi_offset 1, -8 ; LARGE-NEXT:ori $a2, $zero, 1000 ; LARGE-NEXT:move $a1, $zero -; LARGE-NEXT:pcalau12i $ra, %pc_hi20(memset) -; LARGE-NEXT:addi.d $t8, $zero, %pc_lo12(memset) -; LARGE-NEXT:lu32i.d $t8, %pc64_lo20(memset) -; LARGE-NEXT:lu52i.d $t8, $t8, %pc64_hi12(memset) -; LARGE-NEXT:add.d $ra, $t8, $ra +; LARGE-NEXT:pcalau12i $ra, %got_pc_hi20(memset) +; LARGE-NEXT:addi.d $t8, $zero, %got_pc_lo12(memset) +; LARGE-NEXT:lu32i.d $t8, %got64_pc_lo20(memset) +; LARGE-NEXT:lu52i.d $t8, $t8, %got64_pc_hi12(memset) +; LARGE-NEXT:ldx.d $ra, $t8, $ra ; LARGE-NEXT:jirl $ra, $ra, 0 ; LARGE-NEXT:ld.d $ra, $sp, 8 # 8-byte Folded Reload ; LARGE-NEXT:addi.d $sp, $sp, 16 diff --git a/llvm/test/CodeGen/LoongArch/machinelicm-address-pseudos.ll b/llvm/test/CodeGen/LoongArch/machinelicm-address-pseudos.ll index ed1a24e82b4e46..29348fe0d641ed 100644 --- a/llvm/test/CodeGen/LoongArch/machinelicm-address-pseudos.ll +++ b/llvm/test/CodeGen/LoongArch/machinelicm-address-pseudos.ll @@ -282,11 +282,11 @@ define void @test_la_tls_ld(i32 signext %n) { ; LA64LARGE-NEXT: .LBB3_1: # %loop ; LA64LARGE-NEXT:# =>This Inner Loop Header: Depth=1 ; LA64LARGE-NEXT:move $a0, $s0 -; LA64LARGE-NEXT:pcalau12i $ra, %pc_hi20(__tls_get_addr) -; LA64LARGE-NEXT:addi.d $t8, $zero, %pc_lo12(__tls_get_addr) -; LA64LARGE-NEXT:lu32i.d $t8, %pc64_lo20(__tls_get_addr) -; LA64LARGE-NEXT:lu52i.d $t8, $t8, %pc64_hi12(__tls_get_addr) -; LA64LARGE-NEXT:add.d $ra, $t8, $ra +; LA64LARGE-NEXT:pcalau12i $ra, %got_pc_hi20(__tls_get_addr) +; LA64LARGE-NEXT:addi.d $t8, $zero, %got_pc_lo12(__tls_get_addr) +; LA64LARGE-NEXT:lu32i.d $t8, %got64_pc_lo20(__tls_get_addr) +; LA64LARGE-NEXT:lu52i.d $t8, $t8, %got64_pc_hi12(__tls_get_addr) +; LA64LARGE-NEXT:ldx.d $ra, $t8, $ra ; LA64LARGE-NEXT:jirl $ra, $ra, 0 ; LA64LARGE-NEXT:ld.w $zero, $a0, 0 ; LA64LARGE-NEXT:addi.w $s1, $s1, 1 @@ -448,11 +448,11 @@ define void @test_la_tls_gd(i32 signext %n) nounwind { ; LA64LARGE-NEXT: .LBB5_1: # %loop ; LA64LARGE-NEXT:# =>This Inner Loop Header: Depth=1 ; LA64LARGE-NEXT:move $a0, $s0 -; LA64LARGE-NEXT:pcalau12i $ra, %pc_hi20(__tls_get_addr) -; LA64LARGE-NEXT:addi.d $t8, $zero, %pc_lo12(__tls_get_addr) -; LA64LARGE-NEXT:lu32i.d $t8, %pc64_lo20(__tls_get_addr) -; LA64LARGE-NEXT:lu52i.d $t8, $t8, %pc64_hi12(__tls_get_addr) -; LA64LARGE-NEXT:add.d $ra, $t8, $ra +; LA64LARGE-NEXT:pcalau12i $ra, %got_pc_hi20(__tls_get_addr) +; LA64LARGE-NEXT:addi.d $t8, $zero, %got_pc_lo12(__tls_get_addr) +; LA64LARGE-NEXT:lu32i.d $t8, %got64_
[llvm-branch-commits] [llvm] [LoongArch] Fix GOT usage for `non-dso_local` function calls in large code model (PR #117134)
https://github.com/wangleiat edited https://github.com/llvm/llvm-project/pull/117134 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/19.x: [LICM] allow MemoryAccess creation failure (#116813) (PR #117082)
https://github.com/DianQK updated https://github.com/llvm/llvm-project/pull/117082 >From e3364b6e56999488106d990b5f0f907823afa42c Mon Sep 17 00:00:00 2001 From: DianQK Date: Wed, 20 Nov 2024 19:52:51 +0800 Subject: [PATCH] [LICM] allow MemoryAccess creation failure (#116813) Fixes #116809. After running some passes (SimpleLoopUnswitch, LoopInstSimplify, etc.), MemorySSA might be outdated, and the instruction `I` may have become a non-memory touching instruction. LICM has already handled this, but it does not pass `CreationMustSucceed=false` to `createDefinedAccess`. (cherry picked from commit 18b02bbf441660683df7f3925946984203d49bab) --- llvm/include/llvm/Analysis/MemorySSAUpdater.h | 6 +++ llvm/lib/Analysis/MemorySSAUpdater.cpp| 12 - llvm/lib/Transforms/Scalar/LICM.cpp | 7 ++- .../LICM/PR116813-memoryssa-outdated.ll | 50 +++ 4 files changed, 71 insertions(+), 4 deletions(-) create mode 100644 llvm/test/Transforms/LICM/PR116813-memoryssa-outdated.ll diff --git a/llvm/include/llvm/Analysis/MemorySSAUpdater.h b/llvm/include/llvm/Analysis/MemorySSAUpdater.h index d4da3ef1146db7..015a652f309c56 100644 --- a/llvm/include/llvm/Analysis/MemorySSAUpdater.h +++ b/llvm/include/llvm/Analysis/MemorySSAUpdater.h @@ -192,6 +192,12 @@ class MemorySSAUpdater { const BasicBlock *BB, MemorySSA::InsertionPlace Point); + MemoryAccess *createMemoryAccessInBB2(Instruction *I, +MemoryAccess *Definition, +const BasicBlock *BB, +MemorySSA::InsertionPlace Point, +bool CreationMustSucceed = true); + /// Create a MemoryAccess in MemorySSA before an existing MemoryAccess. /// /// See createMemoryAccessInBB() for usage details. diff --git a/llvm/lib/Analysis/MemorySSAUpdater.cpp b/llvm/lib/Analysis/MemorySSAUpdater.cpp index aa550f0b6a7bfd..c84b31a3a9374d 100644 --- a/llvm/lib/Analysis/MemorySSAUpdater.cpp +++ b/llvm/lib/Analysis/MemorySSAUpdater.cpp @@ -1404,8 +1404,16 @@ void MemorySSAUpdater::changeToUnreachable(const Instruction *I) { MemoryAccess *MemorySSAUpdater::createMemoryAccessInBB( Instruction *I, MemoryAccess *Definition, const BasicBlock *BB, MemorySSA::InsertionPlace Point) { - MemoryUseOrDef *NewAccess = MSSA->createDefinedAccess(I, Definition); - MSSA->insertIntoListsForBlock(NewAccess, BB, Point); + return createMemoryAccessInBB2(I, Definition, BB, Point); +} + +MemoryAccess *MemorySSAUpdater::createMemoryAccessInBB2( +Instruction *I, MemoryAccess *Definition, const BasicBlock *BB, +MemorySSA::InsertionPlace Point, bool CreationMustSucceed) { + MemoryUseOrDef *NewAccess = MSSA->createDefinedAccess( + I, Definition, /*Template=*/nullptr, CreationMustSucceed); + if (NewAccess) +MSSA->insertIntoListsForBlock(NewAccess, BB, Point); return NewAccess; } diff --git a/llvm/lib/Transforms/Scalar/LICM.cpp b/llvm/lib/Transforms/Scalar/LICM.cpp index 91ef2b4b7c1839..102a5bd5bbb88b 100644 --- a/llvm/lib/Transforms/Scalar/LICM.cpp +++ b/llvm/lib/Transforms/Scalar/LICM.cpp @@ -1464,8 +1464,11 @@ static Instruction *cloneInstructionInExitBlock( if (MSSAU.getMemorySSA()->getMemoryAccess(&I)) { // Create a new MemoryAccess and let MemorySSA set its defining access. -MemoryAccess *NewMemAcc = MSSAU.createMemoryAccessInBB( -New, nullptr, New->getParent(), MemorySSA::Beginning); +// After running some passes, MemorySSA might be outdated, and the +// instruction `I` may have become a non-memory touching instruction. +MemoryAccess *NewMemAcc = MSSAU.createMemoryAccessInBB2( +New, nullptr, New->getParent(), MemorySSA::Beginning, +/*CreationMustSucceed=*/false); if (NewMemAcc) { if (auto *MemDef = dyn_cast(NewMemAcc)) MSSAU.insertDef(MemDef, /*RenameUses=*/true); diff --git a/llvm/test/Transforms/LICM/PR116813-memoryssa-outdated.ll b/llvm/test/Transforms/LICM/PR116813-memoryssa-outdated.ll new file mode 100644 index 00..a040c3cc6947c6 --- /dev/null +++ b/llvm/test/Transforms/LICM/PR116813-memoryssa-outdated.ll @@ -0,0 +1,50 @@ +; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5 +; RUN: opt -passes='loop-mssa(simple-loop-unswitch,licm)' -verify-memoryssa -S < %s | FileCheck %s + +; Check that running LICM after SimpleLoopUnswitch does not result in a crash. + +define i32 @foo(i1 %arg, ptr %arg1) { +; CHECK-LABEL: define i32 @foo( +; CHECK-SAME: i1 [[ARG:%.*]], ptr [[ARG1:%.*]]) { +; CHECK-NEXT: [[START:.*:]] +; CHECK-NEXT:[[ARG_FR:%.*]] = freeze i1 [[ARG]] +; CHECK-NEXT:br i1 [[ARG_FR]], label %[[START_SPLIT_US:.*]], label %[[START_SPLIT:.*]] +; CHECK: [[START_SPLIT_US]]: +; CHECK-NEXT:br label %[[LOOP_US:.*]] +; CHECK: [[LOOP_US]]: +; CHEC
[llvm-branch-commits] [llvm] release/19.x: [ConstraintElim] Bail out on non-dedicated exits when adding exiting conditions (#116627) (PR #117137)
https://github.com/fhahn approved this pull request. LGTM to cherry pick, thanks! https://github.com/llvm/llvm-project/pull/117137 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang-tools-extra] 2a4a50d - Revert "[NFC] Explicitly pass a VFS when creating DiagnosticsEngine (#115852)"
Author: Sylvestre Ledru Date: 2024-11-21T07:04:23-05:00 New Revision: 2a4a50d85689bb2ac51258c485fceb64dfb6cd73 URL: https://github.com/llvm/llvm-project/commit/2a4a50d85689bb2ac51258c485fceb64dfb6cd73 DIFF: https://github.com/llvm/llvm-project/commit/2a4a50d85689bb2ac51258c485fceb64dfb6cd73.diff LOG: Revert "[NFC] Explicitly pass a VFS when creating DiagnosticsEngine (#115852)" This reverts commit bdd10d9d249bd1c2a45e3de56a5accd97e953458. Added: Modified: clang-tools-extra/clang-include-fixer/IncludeFixer.cpp clang-tools-extra/clangd/Compiler.cpp clang-tools-extra/clangd/ModulesBuilder.cpp clang-tools-extra/clangd/Preamble.cpp clang-tools-extra/include-cleaner/unittests/RecordTest.cpp clang/include/clang/Frontend/CompilerInstance.h clang/lib/Frontend/CompilerInstance.cpp clang/lib/Frontend/CreateInvocationFromCommandLine.cpp clang/lib/Frontend/Rewrite/FrontendActions.cpp clang/lib/Interpreter/Interpreter.cpp clang/lib/StaticAnalyzer/Frontend/ModelInjector.cpp clang/lib/Testing/TestAST.cpp clang/lib/Tooling/DependencyScanning/DependencyScanningWorker.cpp clang/lib/Tooling/Tooling.cpp clang/tools/c-index-test/core_main.cpp clang/tools/clang-import-test/clang-import-test.cpp clang/tools/clang-installapi/ClangInstallAPI.cpp clang/tools/clang-scan-deps/ClangScanDeps.cpp clang/tools/diagtool/ShowEnabledWarnings.cpp clang/tools/driver/cc1_main.cpp clang/tools/libclang/CIndex.cpp clang/tools/libclang/Indexing.cpp clang/unittests/AST/ExternalASTSourceTest.cpp clang/unittests/CodeGen/TestCompiler.h clang/unittests/Driver/DXCModeTest.cpp clang/unittests/Driver/ToolChainTest.cpp clang/unittests/Frontend/ASTUnitTest.cpp clang/unittests/Frontend/CodeGenActionTest.cpp clang/unittests/Frontend/CompilerInstanceTest.cpp clang/unittests/Frontend/CompilerInvocationTest.cpp clang/unittests/Frontend/FrontendActionTest.cpp clang/unittests/Frontend/OutputStreamTest.cpp clang/unittests/Frontend/PCHPreambleTest.cpp clang/unittests/Frontend/ReparseWorkingDirTest.cpp clang/unittests/Frontend/UtilsTest.cpp clang/unittests/Sema/SemaNoloadLookupTest.cpp clang/unittests/Serialization/ForceCheckFileInputTest.cpp clang/unittests/Serialization/ModuleCacheTest.cpp clang/unittests/Serialization/NoCommentsTest.cpp clang/unittests/Serialization/PreambleInNamedModulesTest.cpp clang/unittests/Serialization/VarDeclConstantInitTest.cpp clang/unittests/Support/TimeProfilerTest.cpp clang/unittests/Tooling/DependencyScanning/DependencyScannerTest.cpp clang/unittests/Tooling/ToolingTest.cpp Removed: diff --git a/clang-tools-extra/clang-include-fixer/IncludeFixer.cpp b/clang-tools-extra/clang-include-fixer/IncludeFixer.cpp index bba8f8acc77da9..354f35cbadbeb9 100644 --- a/clang-tools-extra/clang-include-fixer/IncludeFixer.cpp +++ b/clang-tools-extra/clang-include-fixer/IncludeFixer.cpp @@ -95,8 +95,7 @@ bool IncludeFixerActionFactory::runInvocation( // Create the compiler's actual diagnostics engine. We want to drop all // diagnostics here. - Compiler.createDiagnostics(Files->getVirtualFileSystem(), - new clang::IgnoringDiagConsumer, + Compiler.createDiagnostics(new clang::IgnoringDiagConsumer, /*ShouldOwnClient=*/true); Compiler.createSourceManager(*Files); diff --git a/clang-tools-extra/clangd/Compiler.cpp b/clang-tools-extra/clangd/Compiler.cpp index 161cc9ae0ca365..c60ab8e1b8062a 100644 --- a/clang-tools-extra/clangd/Compiler.cpp +++ b/clang-tools-extra/clangd/Compiler.cpp @@ -110,8 +110,8 @@ buildCompilerInvocation(const ParseInputs &Inputs, clang::DiagnosticConsumer &D, CIOpts.VFS = Inputs.TFS->view(Inputs.CompileCommand.Directory); CIOpts.CC1Args = CC1Args; CIOpts.RecoverOnError = true; - CIOpts.Diags = CompilerInstance::createDiagnostics( - *CIOpts.VFS, new DiagnosticOptions, &D, false); + CIOpts.Diags = + CompilerInstance::createDiagnostics(new DiagnosticOptions, &D, false); CIOpts.ProbePrecompiled = false; std::unique_ptr CI = createInvocation(ArgStrs, CIOpts); if (!CI) @@ -148,7 +148,7 @@ prepareCompilerInstance(std::unique_ptr CI, auto Clang = std::make_unique( std::make_shared()); Clang->setInvocation(std::move(CI)); - Clang->createDiagnostics(*VFS, &DiagsClient, false); + Clang->createDiagnostics(&DiagsClient, false); if (auto VFSWithRemapping = createVFSFromCompilerInvocation( Clang->getInvocation(), Clang->getDiagnostics(), VFS)) diff --git a/clang-tools-extra/clangd/ModulesBuilder.cpp b/clang-tools-extra/clangd/ModulesBuilder.cpp index 29508901f85bba..2bce3a20825616 100644 --- a/clang-tools-extra/clangd/ModulesBuilder.cpp +++ b/clang-tools-extra/clangd/ModulesBuilder.cpp @@ -188,8 +188,7 @@ bool IsModuleFileUpToDate(Pa
[llvm-branch-commits] [flang] [flang] handle fir.call in AliasAnalysis::getModRef (PR #117164)
github-actions[bot] wrote: :warning: Python code formatter, darker found issues in your code. :warning: You can test this locally with the following command: ``bash darker --check --diff -r c12869e010d892caf93d153c187db846ba995a9e...84c95d6c816004abe6c01eb754688fb35a666ffc flang/test/Analysis/AliasAnalysis/gen_mod_ref_test.py `` View the diff from darker here. ``diff --- gen_mod_ref_test.py 2024-11-21 13:14:25.00 + +++ gen_mod_ref_test.py 2024-11-21 14:15:26.444588 + @@ -11,8 +11,16 @@ import sys import re for line in sys.stdin: - line = re.sub(r'(fir.call @_\w*P)(test_effect_\w*)(\(.*) : ', r'\1\2\3 {test.ptr ="\2"} : ', line) - line = re.sub(r'(hlfir.declare .*uniq_name =.*E)(test_var_\w*)"', r'\1\2", test.ptr ="\2"', line) - sys.stdout.write(line) +line = re.sub( +r"(fir.call @_\w*P)(test_effect_\w*)(\(.*) : ", +r'\1\2\3 {test.ptr ="\2"} : ', +line, +) +line = re.sub( +r'(hlfir.declare .*uniq_name =.*E)(test_var_\w*)"', +r'\1\2", test.ptr ="\2"', +line, +) +sys.stdout.write(line) `` https://github.com/llvm/llvm-project/pull/117164 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/19.x: [MachineLICM] Don't allow hoisting invariant loads across mem barrier. (#116987) (PR #117154)
https://github.com/llvmbot created https://github.com/llvm/llvm-project/pull/117154 Backport a9b3ec154d7ab2d0896ac5c9f1e9a1266a37be80 ef102b4a6333a304e36dc623d5381257a7ef1ed6 Requested by: @fhahn >From fccca51f3cdf8f918643b2afa0d410590e3acf95 Mon Sep 17 00:00:00 2001 From: Florian Hahn Date: Wed, 20 Nov 2024 15:10:19 + Subject: [PATCH 1/2] [MachineLICM] Add test case showing load hoisted across memory barrier. (cherry picked from commit a9b3ec154d7ab2d0896ac5c9f1e9a1266a37be80) --- .../AArch64/machine-licm-hoist-load.ll| 29 +++ 1 file changed, 29 insertions(+) diff --git a/llvm/test/CodeGen/AArch64/machine-licm-hoist-load.ll b/llvm/test/CodeGen/AArch64/machine-licm-hoist-load.ll index e8dafd5e8fbabe..932a5af264a000 100644 --- a/llvm/test/CodeGen/AArch64/machine-licm-hoist-load.ll +++ b/llvm/test/CodeGen/AArch64/machine-licm-hoist-load.ll @@ -497,6 +497,35 @@ for.exit: ; preds = %for.body ret i64 %spec.select } +@a = external local_unnamed_addr global i32, align 4 + +; FIXME: Load hoisted out of the loop across memory barriers. +define i32 @load_between_memory_barriers() { +; CHECK-LABEL: load_between_memory_barriers: +; CHECK: // %bb.0: +; CHECK-NEXT:adrp x8, :got:a +; CHECK-NEXT:ldr x8, [x8, :got_lo12:a] +; CHECK-NEXT:ldr w0, [x8] +; CHECK-NEXT: .LBB8_1: // %loop +; CHECK-NEXT:// =>This Inner Loop Header: Depth=1 +; CHECK-NEXT://MEMBARRIER +; CHECK-NEXT://MEMBARRIER +; CHECK-NEXT:cbz w0, .LBB8_1 +; CHECK-NEXT: // %bb.2: // %exit +; CHECK-NEXT:ret + br label %loop + +loop: + fence syncscope("singlethread") acq_rel + %l = load i32, ptr @a, align 4 + fence syncscope("singlethread") acq_rel + %c = icmp eq i32 %l, 0 + br i1 %c, label %loop, label %exit + +exit: + ret i32 %l +} + declare i32 @bcmp(ptr, ptr, i64) declare i32 @memcmp(ptr, ptr, i64) declare void @func() >From 4ed7f75167fc2979e1e63f33389bc6fdb617ea71 Mon Sep 17 00:00:00 2001 From: Florian Hahn Date: Thu, 21 Nov 2024 10:25:04 + Subject: [PATCH 2/2] [MachineLICM] Don't allow hoisting invariant loads across mem barrier. (#116987) The improvements in 63917e1 / #70796 do not check for memory barriers/unmodelled sideeffects, which means we may incorrectly hoist loads across memory barriers. Fix this by checking any machine instruction in the loop is a load-fold barrier. PR: https://github.com/llvm/llvm-project/pull/116987 (cherry picked from commit ef102b4a6333a304e36dc623d5381257a7ef1ed6) --- llvm/lib/CodeGen/MachineLICM.cpp | 2 +- llvm/test/CodeGen/AArch64/machine-licm-hoist-load.ll | 4 ++-- llvm/test/CodeGen/Mips/lcb5.ll | 4 ++-- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/llvm/lib/CodeGen/MachineLICM.cpp b/llvm/lib/CodeGen/MachineLICM.cpp index f24ab187ef4005..21a02a6f094784 100644 --- a/llvm/lib/CodeGen/MachineLICM.cpp +++ b/llvm/lib/CodeGen/MachineLICM.cpp @@ -1474,7 +1474,7 @@ void MachineLICMBase::InitializeLoadsHoistableLoops() { if (!AllowedToHoistLoads[Loop]) continue; for (auto &MI : *MBB) { -if (!MI.mayStore() && !MI.isCall() && +if (!MI.isLoadFoldBarrier() && !MI.mayStore() && !MI.isCall() && !(MI.mayLoad() && MI.hasOrderedMemoryRef())) continue; for (MachineLoop *L = Loop; L != nullptr; L = L->getParentLoop()) diff --git a/llvm/test/CodeGen/AArch64/machine-licm-hoist-load.ll b/llvm/test/CodeGen/AArch64/machine-licm-hoist-load.ll index 932a5af264a000..17f8263560430d 100644 --- a/llvm/test/CodeGen/AArch64/machine-licm-hoist-load.ll +++ b/llvm/test/CodeGen/AArch64/machine-licm-hoist-load.ll @@ -499,16 +499,16 @@ for.exit: ; preds = %for.body @a = external local_unnamed_addr global i32, align 4 -; FIXME: Load hoisted out of the loop across memory barriers. +; Make sure the load is not hoisted out of the loop across memory barriers. define i32 @load_between_memory_barriers() { ; CHECK-LABEL: load_between_memory_barriers: ; CHECK: // %bb.0: ; CHECK-NEXT:adrp x8, :got:a ; CHECK-NEXT:ldr x8, [x8, :got_lo12:a] -; CHECK-NEXT:ldr w0, [x8] ; CHECK-NEXT: .LBB8_1: // %loop ; CHECK-NEXT:// =>This Inner Loop Header: Depth=1 ; CHECK-NEXT://MEMBARRIER +; CHECK-NEXT:ldr w0, [x8] ; CHECK-NEXT://MEMBARRIER ; CHECK-NEXT:cbz w0, .LBB8_1 ; CHECK-NEXT: // %bb.2: // %exit diff --git a/llvm/test/CodeGen/Mips/lcb5.ll b/llvm/test/CodeGen/Mips/lcb5.ll index f320f6fc5660ce..bb059f1ee8453e 100644 --- a/llvm/test/CodeGen/Mips/lcb5.ll +++ b/llvm/test/CodeGen/Mips/lcb5.ll @@ -186,7 +186,7 @@ if.end: ; preds = %if.then, %entry } ; ci: .entz3 -; ci: bteqz $BB6_3 +; ci: bteqz $BB6_2 ; ci: .endz3 ; Function Attrs: nounwind optsize @@ -210,7 +210,7 @@ if.end: ; preds = %if.then, %entry ; ci: .entz4 ; c
[llvm-branch-commits] [llvm] release/19.x: [MachineLICM] Don't allow hoisting invariant loads across mem barrier. (#116987) (PR #117154)
llvmbot wrote: @david-arm What do you think about merging this PR to the release branch? https://github.com/llvm/llvm-project/pull/117154 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/19.x: [MachineLICM] Don't allow hoisting invariant loads across mem barrier. (#116987) (PR #117154)
llvmbot wrote: @llvm/pr-subscribers-backend-aarch64 Author: None (llvmbot) Changes Backport a9b3ec154d7ab2d0896ac5c9f1e9a1266a37be80 ef102b4a6333a304e36dc623d5381257a7ef1ed6 Requested by: @fhahn --- Full diff: https://github.com/llvm/llvm-project/pull/117154.diff 3 Files Affected: - (modified) llvm/lib/CodeGen/MachineLICM.cpp (+1-1) - (modified) llvm/test/CodeGen/AArch64/machine-licm-hoist-load.ll (+29) - (modified) llvm/test/CodeGen/Mips/lcb5.ll (+2-2) ``diff diff --git a/llvm/lib/CodeGen/MachineLICM.cpp b/llvm/lib/CodeGen/MachineLICM.cpp index f24ab187ef4005..21a02a6f094784 100644 --- a/llvm/lib/CodeGen/MachineLICM.cpp +++ b/llvm/lib/CodeGen/MachineLICM.cpp @@ -1474,7 +1474,7 @@ void MachineLICMBase::InitializeLoadsHoistableLoops() { if (!AllowedToHoistLoads[Loop]) continue; for (auto &MI : *MBB) { -if (!MI.mayStore() && !MI.isCall() && +if (!MI.isLoadFoldBarrier() && !MI.mayStore() && !MI.isCall() && !(MI.mayLoad() && MI.hasOrderedMemoryRef())) continue; for (MachineLoop *L = Loop; L != nullptr; L = L->getParentLoop()) diff --git a/llvm/test/CodeGen/AArch64/machine-licm-hoist-load.ll b/llvm/test/CodeGen/AArch64/machine-licm-hoist-load.ll index e8dafd5e8fbabe..17f8263560430d 100644 --- a/llvm/test/CodeGen/AArch64/machine-licm-hoist-load.ll +++ b/llvm/test/CodeGen/AArch64/machine-licm-hoist-load.ll @@ -497,6 +497,35 @@ for.exit: ; preds = %for.body ret i64 %spec.select } +@a = external local_unnamed_addr global i32, align 4 + +; Make sure the load is not hoisted out of the loop across memory barriers. +define i32 @load_between_memory_barriers() { +; CHECK-LABEL: load_between_memory_barriers: +; CHECK: // %bb.0: +; CHECK-NEXT:adrp x8, :got:a +; CHECK-NEXT:ldr x8, [x8, :got_lo12:a] +; CHECK-NEXT: .LBB8_1: // %loop +; CHECK-NEXT:// =>This Inner Loop Header: Depth=1 +; CHECK-NEXT://MEMBARRIER +; CHECK-NEXT:ldr w0, [x8] +; CHECK-NEXT://MEMBARRIER +; CHECK-NEXT:cbz w0, .LBB8_1 +; CHECK-NEXT: // %bb.2: // %exit +; CHECK-NEXT:ret + br label %loop + +loop: + fence syncscope("singlethread") acq_rel + %l = load i32, ptr @a, align 4 + fence syncscope("singlethread") acq_rel + %c = icmp eq i32 %l, 0 + br i1 %c, label %loop, label %exit + +exit: + ret i32 %l +} + declare i32 @bcmp(ptr, ptr, i64) declare i32 @memcmp(ptr, ptr, i64) declare void @func() diff --git a/llvm/test/CodeGen/Mips/lcb5.ll b/llvm/test/CodeGen/Mips/lcb5.ll index f320f6fc5660ce..bb059f1ee8453e 100644 --- a/llvm/test/CodeGen/Mips/lcb5.ll +++ b/llvm/test/CodeGen/Mips/lcb5.ll @@ -186,7 +186,7 @@ if.end: ; preds = %if.then, %entry } ; ci: .entz3 -; ci: bteqz $BB6_3 +; ci: bteqz $BB6_2 ; ci: .endz3 ; Function Attrs: nounwind optsize @@ -210,7 +210,7 @@ if.end: ; preds = %if.then, %entry ; ci: .entz4 ; ci: btnez $BB7_1 # 16 bit inst -; ci: jal $BB7_3 # branch +; ci: jal $BB7_2 # branch ; ci: nop ; ci: $BB7_1: ; ci: .p2align2 `` https://github.com/llvm/llvm-project/pull/117154 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/19.x: [MachineLICM] Don't allow hoisting invariant loads across mem barrier. (#116987) (PR #117154)
https://github.com/llvmbot milestoned https://github.com/llvm/llvm-project/pull/117154 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [CodeGen][NewPM] Port SpillPlacement analysis to NPM (PR #116618)
https://github.com/optimisan updated https://github.com/llvm/llvm-project/pull/116618 >From 6408bcec55deafbf767a417684c2bfe3dd251068 Mon Sep 17 00:00:00 2001 From: Akshat Oke Date: Mon, 18 Nov 2024 12:42:00 + Subject: [PATCH 1/3] [CodeGen][NewPM] Port SpillPlacement analysis to NPM --- llvm/include/llvm/InitializePasses.h | 2 +- llvm/lib/CodeGen/RegAllocGreedy.cpp | 6 +- llvm/lib/CodeGen/SpillPlacement.cpp | 91 ++-- llvm/lib/CodeGen/SpillPlacement.h| 52 +--- 4 files changed, 104 insertions(+), 47 deletions(-) diff --git a/llvm/include/llvm/InitializePasses.h b/llvm/include/llvm/InitializePasses.h index e883aae2758688..88bca2c75c9498 100644 --- a/llvm/include/llvm/InitializePasses.h +++ b/llvm/include/llvm/InitializePasses.h @@ -289,7 +289,7 @@ void initializeSinkingLegacyPassPass(PassRegistry &); void initializeSjLjEHPreparePass(PassRegistry &); void initializeSlotIndexesWrapperPassPass(PassRegistry &); void initializeSpeculativeExecutionLegacyPassPass(PassRegistry &); -void initializeSpillPlacementPass(PassRegistry &); +void initializeSpillPlacementWrapperLegacyPass(PassRegistry &); void initializeStackColoringLegacyPass(PassRegistry &); void initializeStackFrameLayoutAnalysisPassPass(PassRegistry &); void initializeStackMapLivenessPass(PassRegistry &); diff --git a/llvm/lib/CodeGen/RegAllocGreedy.cpp b/llvm/lib/CodeGen/RegAllocGreedy.cpp index 3542bfe18af46f..3fdf2d6e07a75f 100644 --- a/llvm/lib/CodeGen/RegAllocGreedy.cpp +++ b/llvm/lib/CodeGen/RegAllocGreedy.cpp @@ -162,7 +162,7 @@ INITIALIZE_PASS_DEPENDENCY(MachineLoopInfoWrapperPass) INITIALIZE_PASS_DEPENDENCY(VirtRegMapWrapperLegacy) INITIALIZE_PASS_DEPENDENCY(LiveRegMatrixWrapperLegacy) INITIALIZE_PASS_DEPENDENCY(EdgeBundlesWrapperLegacy) -INITIALIZE_PASS_DEPENDENCY(SpillPlacement) +INITIALIZE_PASS_DEPENDENCY(SpillPlacementWrapperLegacy) INITIALIZE_PASS_DEPENDENCY(MachineOptimizationRemarkEmitterPass) INITIALIZE_PASS_DEPENDENCY(RegAllocEvictionAdvisorAnalysis) INITIALIZE_PASS_DEPENDENCY(RegAllocPriorityAdvisorAnalysis) @@ -217,7 +217,7 @@ void RAGreedy::getAnalysisUsage(AnalysisUsage &AU) const { AU.addRequired(); AU.addPreserved(); AU.addRequired(); - AU.addRequired(); + AU.addRequired(); AU.addRequired(); AU.addRequired(); AU.addRequired(); @@ -2731,7 +2731,7 @@ bool RAGreedy::runOnMachineFunction(MachineFunction &mf) { ORE = &getAnalysis().getORE(); Loops = &getAnalysis().getLI(); Bundles = &getAnalysis().getEdgeBundles(); - SpillPlacer = &getAnalysis(); + SpillPlacer = &getAnalysis().getResult(); DebugVars = &getAnalysis(); initializeCSRCost(); diff --git a/llvm/lib/CodeGen/SpillPlacement.cpp b/llvm/lib/CodeGen/SpillPlacement.cpp index 318e2b19322bb4..c9baabf6161d3a 100644 --- a/llvm/lib/CodeGen/SpillPlacement.cpp +++ b/llvm/lib/CodeGen/SpillPlacement.cpp @@ -44,17 +44,17 @@ using namespace llvm; #define DEBUG_TYPE "spill-code-placement" -char SpillPlacement::ID = 0; +char SpillPlacementWrapperLegacy::ID = 0; -char &llvm::SpillPlacementID = SpillPlacement::ID; +char &llvm::SpillPlacementID = SpillPlacementWrapperLegacy::ID; -INITIALIZE_PASS_BEGIN(SpillPlacement, DEBUG_TYPE, +INITIALIZE_PASS_BEGIN(SpillPlacementWrapperLegacy, DEBUG_TYPE, "Spill Code Placement Analysis", true, true) INITIALIZE_PASS_DEPENDENCY(EdgeBundlesWrapperLegacy) -INITIALIZE_PASS_END(SpillPlacement, DEBUG_TYPE, +INITIALIZE_PASS_END(SpillPlacementWrapperLegacy, DEBUG_TYPE, "Spill Code Placement Analysis", true, true) -void SpillPlacement::getAnalysisUsage(AnalysisUsage &AU) const { +void SpillPlacementWrapperLegacy::getAnalysisUsage(AnalysisUsage &AU) const { AU.setPreservesAll(); AU.addRequired(); AU.addRequiredTransitive(); @@ -189,32 +189,57 @@ struct SpillPlacement::Node { } }; -bool SpillPlacement::runOnMachineFunction(MachineFunction &mf) { +bool SpillPlacementWrapperLegacy::runOnMachineFunction(MachineFunction &MF) { + auto *Bundles = &getAnalysis().getEdgeBundles(); + auto *MBFI = &getAnalysis().getMBFI(); + + Impl.reset(new SpillPlacement(Bundles, MBFI)); + Impl->run(MF); + return false; +} + +AnalysisKey SpillPlacementAnalysis::Key; + +SpillPlacement +SpillPlacementAnalysis::run(MachineFunction &MF, +MachineFunctionAnalysisManager &MFAM) { + auto *Bundles = &MFAM.getResult(MF); + auto *MBFI = &MFAM.getResult(MF); + SpillPlacement Impl(Bundles, MBFI); + Impl.run(MF); + return Impl; +} + +bool SpillPlacementAnalysis::Result::invalidate( +MachineFunction &MF, const PreservedAnalyses &PA, +MachineFunctionAnalysisManager::Invalidator &Inv) { + auto PAC = PA.getChecker(); + return !(PAC.preserved() || + PAC.preservedSet>()) || + Inv.invalidate(MF, PA) || + Inv.invalidate(MF, PA); +} + +void SpillPlacement::arrayDeleter(Node *N) { + if (N) +delete[] N; +} + +void SpillPlacement::run(MachineFunction &mf) { MF = &m
[llvm-branch-commits] [flang] [flang] handle fir.call in AliasAnalysis::getModRef (PR #117164)
@@ -329,14 +341,92 @@ AliasResult AliasAnalysis::alias(Source lhsSrc, Source rhsSrc, mlir::Value lhs, // AliasAnalysis: getModRef //===--===// +static bool isSavedLocal(const fir::AliasAnalysis::Source &src) { + if (auto symRef = llvm::dyn_cast(src.origin.u)) { +auto [nameKind, deconstruct] = +fir::NameUniquer::deconstruct(symRef.getLeafReference().getValue()); +return nameKind == fir::NameUniquer::NameKind::VARIABLE && + !deconstruct.procs.empty(); + } + return false; +} + +static bool isCallToFortranUserProcedure(fir::CallOp call) { + // TODO: indirect calls are excluded by these checks. Maybe some attribute is + // needed to flag user calls in this case. + if (fir::hasBindcAttr(call)) +return true; + if (std::optional callee = call.getCallee()) +return fir::NameUniquer::deconstruct(callee->getLeafReference().getValue()) + .first == fir::NameUniquer::NameKind::PROCEDURE; + return false; +} + +static ModRefResult getCallModRef(fir::CallOp call, mlir::Value var) { + // TODO: limit to Fortran functions?? + // 1. Detect variables that can be accessed indirectly. + fir::AliasAnalysis aliasAnalysis; + fir::AliasAnalysis::Source varSrc = aliasAnalysis.getSource(var); + // If the variable is not a user variable, we cannot safely assume that + // Fortran semantics apply (e.g., a bare alloca/allocmem result may very well + // be placed in an allocatable/pointer descriptor and escape). + + // All the logic bellows are based on Fortran semantics and only holds if this + // is a call to a procedure form the Fortran source and this is a variable + // from the Fortran source. Compiler generated temporaries or functions may + // not adhere to this semantic. + // TODO: add some opt-in or op-out mechanism for compiler generated temps. + // An example of something currently problematic is the allocmem generated for + // ALLOCATE of allocatable target. It currently does not have the target + // attribute, which would lead this analysis to believe it cannot escape. + if (!varSrc.isFortranUserVariable() || !isCallToFortranUserProcedure(call)) +return ModRefResult::getModAndRef(); + // Pointer and target may have been captured. + if (varSrc.isTargetOrPointer()) +return ModRefResult::getModAndRef(); + // Host associated variables may be addressed indirectly via an internal + // function call, whether the call is in the parent or an internal procedure. + // Note that the host associated/internal procedure may be referenced + // indirectly inside calls to non internal procedure. This is because internal + // procedures may be captured or passed. As this is tricky to analyze, always + // consider such variables may be accessed in any calls. + if (varSrc.kind == fir::AliasAnalysis::SourceKind::HostAssoc || + varSrc.isCapturedInInternalProcedure) +return ModRefResult::getModAndRef(); + // At that stage, it has been ruled out that local (including the saved ones) + // and dummy cannot be indirectly accessed in the call. + if (varSrc.kind != fir::AliasAnalysis::SourceKind::Allocate && + !varSrc.isDummyArgument()) { +if (varSrc.kind != fir::AliasAnalysis::SourceKind::Global || +!isSavedLocal(varSrc)) + return ModRefResult::getModAndRef(); + } + // 2. Check if the variable is passed via the arguments. + for (auto arg : call.getArgs()) { +if (fir::conformsWithPassByRef(arg.getType()) && +!aliasAnalysis.alias(arg, var).isNo()) { + // TODO: intent(in) would allow returning Ref here. This can be obtained + // in the func.func attributes for direct calls, but the module lookup is + // linear with the number of MLIR symbols, which would introduce a pseudo + // quadratic behavior num_calls * num_func. tblah wrote: I believe lookups in an `mlir::SymbolTable` are constant time. Constructing a SymbolTable is linear, but perhaps one could be re-used from a calling context. Or `fir::AliasAnalysis` could have a `LazySymbolTable` (`AbstractResult.cpp`). It is fine by me to leave this as a TODO in this PR and only attempt this if the optimization turns out to be useful on some real code. https://github.com/llvm/llvm-project/pull/117164 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang] handle fir.call in AliasAnalysis::getModRef (PR #117164)
https://github.com/tblah edited https://github.com/llvm/llvm-project/pull/117164 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang] handle fir.call in AliasAnalysis::getModRef (PR #117164)
@@ -329,14 +341,92 @@ AliasResult AliasAnalysis::alias(Source lhsSrc, Source rhsSrc, mlir::Value lhs, // AliasAnalysis: getModRef //===--===// +static bool isSavedLocal(const fir::AliasAnalysis::Source &src) { + if (auto symRef = llvm::dyn_cast(src.origin.u)) { +auto [nameKind, deconstruct] = +fir::NameUniquer::deconstruct(symRef.getLeafReference().getValue()); +return nameKind == fir::NameUniquer::NameKind::VARIABLE && + !deconstruct.procs.empty(); + } + return false; +} + +static bool isCallToFortranUserProcedure(fir::CallOp call) { + // TODO: indirect calls are excluded by these checks. Maybe some attribute is + // needed to flag user calls in this case. + if (fir::hasBindcAttr(call)) +return true; + if (std::optional callee = call.getCallee()) +return fir::NameUniquer::deconstruct(callee->getLeafReference().getValue()) + .first == fir::NameUniquer::NameKind::PROCEDURE; + return false; +} + +static ModRefResult getCallModRef(fir::CallOp call, mlir::Value var) { + // TODO: limit to Fortran functions?? + // 1. Detect variables that can be accessed indirectly. + fir::AliasAnalysis aliasAnalysis; + fir::AliasAnalysis::Source varSrc = aliasAnalysis.getSource(var); + // If the variable is not a user variable, we cannot safely assume that + // Fortran semantics apply (e.g., a bare alloca/allocmem result may very well + // be placed in an allocatable/pointer descriptor and escape). + + // All the logic bellows are based on Fortran semantics and only holds if this tblah wrote: ```suggestion // All the logic bellow is based on Fortran semantics and only holds if this ``` nit https://github.com/llvm/llvm-project/pull/117164 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang] handle fir.call in AliasAnalysis::getModRef (PR #117164)
@@ -329,14 +341,92 @@ AliasResult AliasAnalysis::alias(Source lhsSrc, Source rhsSrc, mlir::Value lhs, // AliasAnalysis: getModRef //===--===// +static bool isSavedLocal(const fir::AliasAnalysis::Source &src) { + if (auto symRef = llvm::dyn_cast(src.origin.u)) { +auto [nameKind, deconstruct] = +fir::NameUniquer::deconstruct(symRef.getLeafReference().getValue()); +return nameKind == fir::NameUniquer::NameKind::VARIABLE && + !deconstruct.procs.empty(); + } + return false; +} + +static bool isCallToFortranUserProcedure(fir::CallOp call) { + // TODO: indirect calls are excluded by these checks. Maybe some attribute is + // needed to flag user calls in this case. + if (fir::hasBindcAttr(call)) +return true; + if (std::optional callee = call.getCallee()) +return fir::NameUniquer::deconstruct(callee->getLeafReference().getValue()) + .first == fir::NameUniquer::NameKind::PROCEDURE; + return false; +} + +static ModRefResult getCallModRef(fir::CallOp call, mlir::Value var) { + // TODO: limit to Fortran functions?? + // 1. Detect variables that can be accessed indirectly. + fir::AliasAnalysis aliasAnalysis; + fir::AliasAnalysis::Source varSrc = aliasAnalysis.getSource(var); + // If the variable is not a user variable, we cannot safely assume that + // Fortran semantics apply (e.g., a bare alloca/allocmem result may very well + // be placed in an allocatable/pointer descriptor and escape). + + // All the logic bellows are based on Fortran semantics and only holds if this + // is a call to a procedure form the Fortran source and this is a variable tblah wrote: ```suggestion // is a call to a procedure from the Fortran source and this is a variable ``` nit https://github.com/llvm/llvm-project/pull/117164 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang] handle fir.call in AliasAnalysis::getModRef (PR #117164)
https://github.com/tblah approved this pull request. Looks great to me. I have reviewed that this does implement the language rules you mentioned in the description (which match my understanding). Please wait for Peter to check those before merging. https://github.com/llvm/llvm-project/pull/117164 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang][OpenMP] Use new modifier code in ORDER and SCHEDULE clauses (PR #117081)
https://github.com/kparzysz updated https://github.com/llvm/llvm-project/pull/117081 >From 43bdfcdb48328fcdfe762734bd5a4c1df3987c4b Mon Sep 17 00:00:00 2001 From: Krzysztof Parzyszek Date: Mon, 18 Nov 2024 13:01:30 -0600 Subject: [PATCH 1/2] [flang][OpenMP] Use new modifier code in ORDER and SCHEDULE clauses This actually simplifies the AST node for the schedule clause: the two allowed modifiers can be easily classified as the ordering-modifier and the chunk-modifier during parsing without the need to create additional classes. --- flang/examples/FeatureList/FeatureList.cpp| 13 ++- .../FlangOmpReport/FlangOmpReportVisitor.cpp | 10 ++- .../FlangOmpReport/FlangOmpReportVisitor.h| 3 +- flang/include/flang/Parser/dump-parse-tree.h | 17 ++-- flang/include/flang/Parser/parse-tree.h | 81 --- .../flang/Semantics/openmp-modifiers.h| 6 ++ flang/lib/Lower/OpenMP/Clauses.cpp| 75 ++--- flang/lib/Lower/OpenMP/Clauses.h | 10 +++ flang/lib/Parser/openmp-parsers.cpp | 71 flang/lib/Parser/unparse.cpp | 23 +++--- flang/lib/Semantics/check-omp-structure.cpp | 58 ++--- flang/lib/Semantics/check-omp-structure.h | 2 - flang/lib/Semantics/openmp-modifiers.cpp | 48 +++ flang/test/Parser/OpenMP/order-clause01.f90 | 50 ++-- 14 files changed, 263 insertions(+), 204 deletions(-) diff --git a/flang/examples/FeatureList/FeatureList.cpp b/flang/examples/FeatureList/FeatureList.cpp index 753ecb918a9ccb..e1c42586c62c94 100644 --- a/flang/examples/FeatureList/FeatureList.cpp +++ b/flang/examples/FeatureList/FeatureList.cpp @@ -505,9 +505,9 @@ struct NodeVisitor { READ_FEATURE(OmpObject) READ_FEATURE(OmpObjectList) READ_FEATURE(OmpOrderClause) - READ_FEATURE(OmpOrderClause::Type) + READ_FEATURE(OmpOrderClause::Ordering) READ_FEATURE(OmpOrderModifier) - READ_FEATURE(OmpOrderModifier::Kind) + READ_FEATURE(OmpOrderModifier::Value) READ_FEATURE(OmpProcBindClause) READ_FEATURE(OmpProcBindClause::Type) READ_FEATURE(OmpReductionClause) @@ -527,11 +527,10 @@ struct NodeVisitor { READ_FEATURE(OmpDeviceClause::DeviceModifier) READ_FEATURE(OmpDeviceTypeClause) READ_FEATURE(OmpDeviceTypeClause::Type) - READ_FEATURE(OmpScheduleModifier) - READ_FEATURE(OmpScheduleModifier::Modifier1) - READ_FEATURE(OmpScheduleModifier::Modifier2) - READ_FEATURE(OmpScheduleModifierType) - READ_FEATURE(OmpScheduleModifierType::ModType) + READ_FEATURE(OmpChunkModifier) + READ_FEATURE(OmpChunkModifier::Value) + READ_FEATURE(OmpOrderingModifier) + READ_FEATURE(OmpOrderingModifier::Value) READ_FEATURE(OmpSectionBlocks) READ_FEATURE(OmpSectionsDirective) READ_FEATURE(OmpSimpleStandaloneDirective) diff --git a/flang/examples/FlangOmpReport/FlangOmpReportVisitor.cpp b/flang/examples/FlangOmpReport/FlangOmpReportVisitor.cpp index a9ff163f8243ce..a3d9b0cfdc79b8 100644 --- a/flang/examples/FlangOmpReport/FlangOmpReportVisitor.cpp +++ b/flang/examples/FlangOmpReport/FlangOmpReportVisitor.cpp @@ -213,14 +213,18 @@ void OpenMPCounterVisitor::Post(const OmpVariableCategory::Value &c) { "variable_category=" + std::string{OmpVariableCategory::EnumToString(c)} + ";"; } -void OpenMPCounterVisitor::Post(const OmpScheduleModifierType::ModType &c) { +void OpenMPCounterVisitor::Post(const OmpChunkModifier::Value &c) { clauseDetails += - "modifier=" + std::string{OmpScheduleModifierType::EnumToString(c)} + ";"; + "modifier=" + std::string{OmpChunkModifier::EnumToString(c)} + ";"; } void OpenMPCounterVisitor::Post(const OmpLinearModifier::Value &c) { clauseDetails += "modifier=" + std::string{OmpLinearModifier::EnumToString(c)} + ";"; } +void OpenMPCounterVisitor::Post(const OmpOrderingModifier::Value &c) { + clauseDetails += + "modifier=" + std::string{OmpOrderingModifier::EnumToString(c)} + ";"; +} void OpenMPCounterVisitor::Post(const OmpTaskDependenceType::Value &c) { clauseDetails += "type=" + std::string{OmpTaskDependenceType::EnumToString(c)} + ";"; @@ -228,7 +232,7 @@ void OpenMPCounterVisitor::Post(const OmpTaskDependenceType::Value &c) { void OpenMPCounterVisitor::Post(const OmpMapClause::Type &c) { clauseDetails += "type=" + std::string{OmpMapClause::EnumToString(c)} + ";"; } -void OpenMPCounterVisitor::Post(const OmpScheduleClause::ScheduleType &c) { +void OpenMPCounterVisitor::Post(const OmpScheduleClause::Kind &c) { clauseDetails += "type=" + std::string{OmpScheduleClause::EnumToString(c)} + ";"; } diff --git a/flang/examples/FlangOmpReport/FlangOmpReportVisitor.h b/flang/examples/FlangOmpReport/FlangOmpReportVisitor.h index 83bd3644577e1c..608cb5a2241b83 100644 --- a/flang/examples/FlangOmpReport/FlangOmpReportVisitor.h +++ b/flang/examples/FlangOmpReport/FlangOmpReportVisitor.h @@ -71,8 +71,9 @@ struct OpenMPCounterVisitor { void Post(const OmpDefaultmapClause::Implici
[llvm-branch-commits] [flang] [flang] handle fir.call in AliasAnalysis::getModRef (PR #117164)
https://github.com/razvanlupusoru approved this pull request. Looks amazing! I agree with the various limitations and as far as I can tell - the non-implemented TODOs are not a correctness problem - just a limitation. Do you have plans to add support for Fortran runtime calls also? I think a similar approach as your check for escaping args would work conservatively for them as well. https://github.com/llvm/llvm-project/pull/117164 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang] handle fir.call in AliasAnalysis::getModRef (PR #117164)
https://github.com/jeanPerier created https://github.com/llvm/llvm-project/pull/117164 fir.call side effects are hard to describe in a useful way using `MemoryEffectOpInterface` because it is impossible to list which memory location a user procedure read/write without doing a data flow analysis of its body (even PURE procedures may read from any module variable, Fortran SIMPLE procedure from F2023 will allow that, but they are far from common at that point). While doing a data flow analysis is likely unavoidable at some point, it will not address cases where the procedure body is not available in the current compilation unit, and will be rather expansive to do. Luckily, Fortran language specifications allow the compiler to deduce that a procedure call cannot access a variable in many cases (this mainly stems from the 15.5.2.14 restrictions about dummy argument, and the inability to capture variables that do not have the TARGET attribute). MLIR provides the perfect interface to leverages that: `AliasAnalysis::getModRef(mlir::Operation*op, mlir::Value location)`. This interface allows telling whether `op` may reference or modify the memory "location". This patch extends `fir::AliasAnalysis::getModRef` to deal with fir.call. The cost is reasonable: "number of arguments" * "average(memory SSA defining-op chain depth)". It is currently very conservative and will only apply Fortran rules if: 1. It was able to find [hl]fir.declare for a Fortran variable from the source in the SSA defining-op chain depth starting from "location". 2. The fir.call is a direct call to a procedure from the Fortran source (not a runtime or compiler generated function). It then: 1. Try to rule out any indirect access to "location" inside the procedure (location must not: have the POINTER/TARGET attributes, or a be host procedure variable used in an internal procedure, or be a module variable, or be in a common block). 2. Try to rule out any access via the arguments (Must not alias with any of the arguments. The cases where the access would be made via some pointer inside the data passed by argument is covered by the fact that the location must not be a POINTER/TARGET). Currently, it is always replying "ModRef" (may be referenced or modified) or "NoModRef" (may nor be referenced neither modified). This could be refined in the future to reply "Ref" for the cases where the only access is made via "Intent(IN)" argument. It also inherits a lot of "false positive cases" coming from alias analysis current limitations (e.g., any copy-in/out on an arguments will make it return "ModRef" because alias analysis currently does not handle hlfir.copy_in in the SSA chain). These will be improved with time. @klausler, I am adding you as a reviewer for the Fortran test (not the implementation) because it is very important that I am getting the language specifications correct here. Any `! CHECK: function_name -> variable_name#0 : ModRef` lines in the test are verifying that the optimizer considers that, in the FIR representation of the Fortran code right above, `call function_name()` may access/modify the variable `variable_name` (from the scope of the call). If `NoModRef` is used instead of `ModRef`, the optimizer considers the variable cannot be accessed/modified. Please flag any expectations where you disagree (especially bad `NoModRef`, which would be bugs, while bad "ModRef" will only cause missing optimization opportunities). This will allow implementing "array = array_function()" optimization in a future patch. >From 84c95d6c816004abe6c01eb754688fb35a666ffc Mon Sep 17 00:00:00 2001 From: Jean Perier Date: Wed, 20 Nov 2024 05:44:28 -0800 Subject: [PATCH] [flang] handle fir.call in getModRef --- .../flang/Optimizer/Analysis/AliasAnalysis.h | 11 +- .../Dialect/FortranVariableInterface.td | 7 + .../lib/Optimizer/Analysis/AliasAnalysis.cpp | 111 +- flang/lib/Optimizer/Analysis/CMakeLists.txt | 1 + .../lib/Optimizer/Transforms/AddAliasTags.cpp | 5 +- .../AliasAnalysis/gen_mod_ref_test.py | 18 +++ .../modref-call-after-inlining.fir| 45 ++ .../AliasAnalysis/modref-call-args.f90| 62 .../AliasAnalysis/modref-call-dummies.f90 | 53 +++ .../AliasAnalysis/modref-call-equivalence.f90 | 34 + .../AliasAnalysis/modref-call-globals.f90 | 68 + .../modref-call-internal-proc.f90 | 135 ++ .../AliasAnalysis/modref-call-locals.f90 | 52 +++ .../AliasAnalysis/modref-call-not-fortran.fir | 25 14 files changed, 614 insertions(+), 13 deletions(-) create mode 100755 flang/test/Analysis/AliasAnalysis/gen_mod_ref_test.py create mode 100644 flang/test/Analysis/AliasAnalysis/modref-call-after-inlining.fir create mode 100644 flang/test/Analysis/AliasAnalysis/modref-call-args.f90 create mode 100644 flang/test/Analysis/AliasAnalysis/modref-call-dummies.f90 create mode 1
[llvm-branch-commits] [flang] [flang] handle fir.call in AliasAnalysis::getModRef (PR #117164)
llvmbot wrote: @llvm/pr-subscribers-flang-fir-hlfir Author: None (jeanPerier) Changes fir.call side effects are hard to describe in a useful way using `MemoryEffectOpInterface` because it is impossible to list which memory location a user procedure read/write without doing a data flow analysis of its body (even PURE procedures may read from any module variable, Fortran SIMPLE procedure from F2023 will allow that, but they are far from common at that point). While doing a data flow analysis is likely unavoidable at some point, it will not address cases where the procedure body is not available in the current compilation unit, and will be rather expansive to do. Luckily, Fortran language specifications allow the compiler to deduce that a procedure call cannot access a variable in many cases (this mainly stems from the 15.5.2.14 restrictions about dummy argument, and the inability to capture variables that do not have the TARGET attribute). MLIR provides the perfect interface to leverages that: `AliasAnalysis::getModRef(mlir::Operation*op, mlir::Value location)`. This interface allows telling whether `op` may reference or modify the memory "location". This patch extends `fir::AliasAnalysis::getModRef` to deal with fir.call. The cost is reasonable: "number of arguments" * "average(memory SSA defining-op chain depth)". It is currently very conservative and will only apply Fortran rules if: 1. It was able to find [hl]fir.declare for a Fortran variable from the source in the SSA defining-op chain depth starting from "location". 2. The fir.call is a direct call to a procedure from the Fortran source (not a runtime or compiler generated function). It then: 1. Try to rule out any indirect access to "location" inside the procedure (location must not: have the POINTER/TARGET attributes, or a be host procedure variable used in an internal procedure, or be a module variable, or be in a common block). 2. Try to rule out any access via the arguments (Must not alias with any of the arguments. The cases where the access would be made via some pointer inside the data passed by argument is covered by the fact that the location must not be a POINTER/TARGET). Currently, it is always replying "ModRef" (may be referenced or modified) or "NoModRef" (may nor be referenced neither modified). This could be refined in the future to reply "Ref" for the cases where the only access is made via "Intent(IN)" argument. It also inherits a lot of "false positive cases" coming from alias analysis current limitations (e.g., any copy-in/out on an arguments will make it return "ModRef" because alias analysis currently does not handle hlfir.copy_in in the SSA chain). These will be improved with time. @klausler, I am adding you as a reviewer for the Fortran test (not the implementation) because it is very important that I am getting the language specifications correct here. Any `! CHECK: function_name -> variable_name#0 : ModRef` lines in the test are verifying that the optimizer considers that, in the FIR representation of the Fortran code right above, `call function_name()` may access/modify the variable `variable_name` (from the scope of the call). If `NoModRef` is used instead of `ModRef`, the optimizer considers the variable cannot be accessed/modified. Please flag any expectations where you disagree (especially bad `NoModRef`, which would be bugs, while bad "ModRef" will only cause missing optimization opportunities). This will allow implementing "array = array_function()" optimization in a future patch. --- Patch is 33.65 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/117164.diff 14 Files Affected: - (modified) flang/include/flang/Optimizer/Analysis/AliasAnalysis.h (+8-3) - (modified) flang/include/flang/Optimizer/Dialect/FortranVariableInterface.td (+7) - (modified) flang/lib/Optimizer/Analysis/AliasAnalysis.cpp (+104-7) - (modified) flang/lib/Optimizer/Analysis/CMakeLists.txt (+1) - (modified) flang/lib/Optimizer/Transforms/AddAliasTags.cpp (+2-3) - (added) flang/test/Analysis/AliasAnalysis/gen_mod_ref_test.py (+18) - (added) flang/test/Analysis/AliasAnalysis/modref-call-after-inlining.fir (+45) - (added) flang/test/Analysis/AliasAnalysis/modref-call-args.f90 (+62) - (added) flang/test/Analysis/AliasAnalysis/modref-call-dummies.f90 (+53) - (added) flang/test/Analysis/AliasAnalysis/modref-call-equivalence.f90 (+34) - (added) flang/test/Analysis/AliasAnalysis/modref-call-globals.f90 (+68) - (added) flang/test/Analysis/AliasAnalysis/modref-call-internal-proc.f90 (+135) - (added) flang/test/Analysis/AliasAnalysis/modref-call-locals.f90 (+52) - (added) flang/test/Analysis/AliasAnalysis/modref-call-not-fortran.fir (+25) ``diff diff --git a/flang/include/flang/Optimizer/Analysis/AliasAnalysis.h b/flang/include/flang/Optimizer/Analysis/AliasAnalysis.h index d9953f580f401d..e410831c0fc3eb 100644 --- a/flang/include
[llvm-branch-commits] [clang] [llvm] AMDGPU: Shrink used number of registers for mfma scale based on format (PR #117047)
arsenm wrote: ### Merge activity * **Nov 21, 11:47 AM EST**: A user started a stack merge that includes this pull request via [Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/117047). https://github.com/llvm/llvm-project/pull/117047 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Optimize mfma_scale intrinsics with 0 inputs (PR #116724)
arsenm wrote: ### Merge activity * **Nov 21, 11:47 AM EST**: A user started a stack merge that includes this pull request via [Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/116724). https://github.com/llvm/llvm-project/pull/116724 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang] handle fir.call in AliasAnalysis::getModRef (PR #117164)
https://github.com/klausler approved this pull request. https://github.com/llvm/llvm-project/pull/117164 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang] handle fir.call in AliasAnalysis::getModRef (PR #117164)
@@ -0,0 +1,135 @@ +! RUN: bbc -emit-hlfir %s -o - | %python %S/gen_mod_ref_test.py | \ +! RUN: fir-opt -pass-pipeline='builtin.module(func.func(test-fir-alias-analysis-modref))' \ +! RUN: --mlir-disable-threading -o /dev/null 2>&1 | FileCheck %s + +! Test fir.call modref with internal procedures + +subroutine simple_modref_test(test_var_x) + implicit none + real :: test_var_x + call test_effect_internal() +contains + subroutine test_effect_internal() +test_var_x = 0. + end subroutine +end subroutine +! CHECK-LABEL: Testing : "_QPsimple_modref_test" +! CHECK: test_effect_internal -> test_var_x#0: ModRef + +subroutine simple_nomodref_test(test_var_x) + implicit none + real :: test_var_x + call test_effect_internal() +contains + subroutine test_effect_internal() +call some_external() + end subroutine +end subroutine +! CHECK-LABEL: Testing : "_QPsimple_nomodref_test" +! CHECK: test_effect_internal -> test_var_x#0: NoModRef + +! Test that effects on captured variable are propagated to associated variables +! in associate construct. + +subroutine test_associate() + implicit none + real :: test_var_x(10) + associate (test_var_y=>test_var_x) +test_var_y = test_effect_internal() klausler wrote: Is it necessary for this test that `test_var_y` be the LHS of the assignment statement? You would expect ModRef even if it were not modified by/after the call. Might be more clear if the result were stored elsewhere. https://github.com/llvm/llvm-project/pull/117164 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [MLIR][OpenMP] Add Lowering support for OpenMP Declare Mapper directive (PR #117046)
@@ -21,7 +21,7 @@ subroutine declare_mapper_1 type (my_type2):: t real :: x, y(nvals) !$omp declare mapper (my_type :: var) map (var, var%values (1:var%num_vals)) -!CHECK: not yet implemented: OpenMPDeclareMapperConstruct +!CHECK: not yet implemented: lowering symbol to HLFIR TIFitis wrote: This error is now from an unhandled form of map clause rather than declare mapper. As such, I believe it's out of scope for this PR. I will however subsequently look into fixing it in a separate PR, hope that doesn't hold up this PR. https://github.com/llvm/llvm-project/pull/117046 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Add a baseline, non-comprehensive test for scaled mfma hazards (PR #117055)
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/117055 >From a0485e65e1c41a3113b68b7c4c3456f7d9337f97 Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Mon, 4 Mar 2024 17:36:33 +0530 Subject: [PATCH] AMDGPU: Add a baseline, non-comprehensive test for scaled mfma hazards Add some tests which will demonstrate that we treat the number of cycles differently depending on whether the first matrix uses an f8 format. --- .../CodeGen/AMDGPU/mai-hazards-gfx940.mir | 2 +- .../AMDGPU/mai-hazards-mfma-scale.gfx950.mir | 274 ++ 2 files changed, 275 insertions(+), 1 deletion(-) create mode 100644 llvm/test/CodeGen/AMDGPU/mai-hazards-mfma-scale.gfx950.mir diff --git a/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir b/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir index a98b02d792d984..9681b01f334f9a 100644 --- a/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir +++ b/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir @@ -2199,7 +2199,7 @@ name: xdl_mfma_4pass_write_vgpr_sgemm_mfma_read_overlap_srcb body: | bb.0: $vgpr0_vgpr1_vgpr2_vgpr3 = V_MFMA_F32_16X16X16F16_vgprcd_e64 $vgpr4_vgpr5, $vgpr6_vgpr7, $vgpr0_vgpr1_vgpr2_vgpr3, 1, 2, 3, implicit $mode, implicit $exec -$vgpr0_vgpr1_vgpr2_vgpr3 = V_MFMA_F32_4X4X1F32_vgprcd_e64 $vgpr8, $vgpr1, $vgpr6_vgpr7_vgpr8_vgpr9, 0, 0, 0, implicit $mode, implicit $exec +$vgpr0_vgpr1_vgpr2_vgpr3 = V_MFMA_F32_4X4X1F32_vgprcd_e64 $vgpr8, $vgpr1, $vgpr2_vgpr3_vgpr4_vgpr5, 0, 0, 0, implicit $mode, implicit $exec ... diff --git a/llvm/test/CodeGen/AMDGPU/mai-hazards-mfma-scale.gfx950.mir b/llvm/test/CodeGen/AMDGPU/mai-hazards-mfma-scale.gfx950.mir new file mode 100644 index 00..c0f0482debbcb3 --- /dev/null +++ b/llvm/test/CodeGen/AMDGPU/mai-hazards-mfma-scale.gfx950.mir @@ -0,0 +1,274 @@ +# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 4 +# RUN: llc -march=amdgcn -mcpu=gfx950 -verify-machineinstrs -run-pass post-RA-hazard-rec %s -o - | FileCheck -check-prefix=GCN %s + +# Immediate operand order = cbsz, abid, blgp + +# First MFMA uses f8 format, so should be treated as 32 cycles +--- +name: V_MFMA_F32_16X16X128_F8F6F4_vgprcd_e64___xdl_write_vgpr__cbsz0_blgp0xdl_read_overlap_vgpr_srcC +tracksRegLiveness: true +body: | + bb.0: +liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5, $vgpr6, $vgpr7, $vgpr8, $vgpr9, $vgpr10, $vgpr11, $vgpr12, $vgpr13, $vgpr14, $vgpr15, $vgpr16, $vgpr17, $vgpr18, $vgpr19, $vgpr20, $vgpr21 + +; GCN-LABEL: name: V_MFMA_F32_16X16X128_F8F6F4_vgprcd_e64___xdl_write_vgpr__cbsz0_blgp0xdl_read_overlap_vgpr_srcC +; GCN: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5, $vgpr6, $vgpr7, $vgpr8, $vgpr9, $vgpr10, $vgpr11, $vgpr12, $vgpr13, $vgpr14, $vgpr15, $vgpr16, $vgpr17, $vgpr18, $vgpr19, $vgpr20, $vgpr21 +; GCN-NEXT: {{ $}} +; GCN-NEXT: renamable $vgpr0_vgpr1_vgpr2_vgpr3 = nofpexcept V_MFMA_F32_16X16X128_F8F6F4_vgprcd_e64 $vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11, $vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17_vgpr18_vgpr19, killed $vgpr0_vgpr1_vgpr2_vgpr3, 0, 0, 0, implicit $mode, implicit $exec +; GCN-NEXT: S_NOP 1 +; GCN-NEXT: renamable $vgpr0_vgpr1_vgpr2_vgpr3 = nofpexcept V_MFMA_F32_16X16X128_F8F6F4_vgprcd_e64 killed $vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11, killed $vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17_vgpr18_vgpr19, killed $vgpr2_vgpr3_vgpr4_vgpr5, 0, 0, 0, implicit $mode, implicit $exec +; GCN-NEXT: S_SETPC_B64_return undef $sgpr30_sgpr31, implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3 +renamable $vgpr0_vgpr1_vgpr2_vgpr3 = nofpexcept V_MFMA_F32_16X16X128_F8F6F4_vgprcd_e64 $vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11, $vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17_vgpr18_vgpr19, killed $vgpr0_vgpr1_vgpr2_vgpr3, 0, 0, 0, implicit $mode, implicit $exec +renamable $vgpr0_vgpr1_vgpr2_vgpr3 = nofpexcept V_MFMA_F32_16X16X128_F8F6F4_vgprcd_e64 killed $vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11, killed $vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17_vgpr18_vgpr19, killed $vgpr2_vgpr3_vgpr4_vgpr5, 0, 0, 0, implicit $mode, implicit $exec +S_SETPC_B64_return undef $sgpr30_sgpr31, implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3 + +... + +# First MFMA uses f8 format, so should be treated as 32 cycles +--- +name: V_MFMA_F32_16X16X128_F8F6F4_vgprcd_e64___xdl_write_vgpr__cbsz1_blgp1xdl_read_overlap_vgpr_srcC +tracksRegLiveness: true +body: | + bb.0: +liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5, $vgpr6, $vgpr7, $vgpr8, $vgpr9, $vgpr10, $vgpr11, $vgpr12, $vgpr13, $vgpr14, $vgpr15, $vgpr16, $vgpr17, $vgpr18, $vgpr19, $vgpr20, $vgpr21 + +; GCN-LABEL: name: V_MFMA_F32_16X16X128_F8F6F4_vgprcd_e64___xdl_write_vgpr__cbsz1_blgp1xdl_read_overlap_vgpr_srcC +; GCN: liveins: $vgpr0, $vgpr1, $vgpr2,
[llvm-branch-commits] [flang] [flang] handle fir.call in AliasAnalysis::getModRef (PR #117164)
@@ -0,0 +1,68 @@ +! RUN: bbc -emit-hlfir %s -o - | %python %S/gen_mod_ref_test.py | \ +! RUN: fir-opt -pass-pipeline='builtin.module(func.func(test-fir-alias-analysis-modref))' \ +! RUN: --mlir-disable-threading -o /dev/null 2>&1 | FileCheck %s + +! Test fir.call modref for global variables (module, saved, common). + + +module somemod + implicit none + real :: test_var_xmod + interface +subroutine may_capture(x) + real, target :: x +end subroutine + end interface +end module + +subroutine test_module + use somemod, only : test_var_xmod + implicit none + call test_effect_external() +end subroutine +! CHECK-LABEL: Testing : "_QPtest_module" +! CHECK: test_effect_external -> test_var_xmod#0: ModRef + +subroutine test_saved_local + use somemod, only : may_capture + implicit none + real, save :: test_var_xsaved + ! Capture is invalid after the call because test_var_xsaved does not have the + ! target attribute. + call may_capture(test_var_xsaved) + call test_effect_external() +end subroutine +! CHECK-LABEL: Testing : "_QPtest_saved_local" +! CHECK: test_effect_external -> test_var_xsaved#0: NoModRef + +subroutine test_saved_target + use somemod, only : may_capture + implicit none + real, save, target :: test_var_target_xsaved klausler wrote: The 'save' attribute shouldn't matter; the result would be ModRef with and without `save`, yes? https://github.com/llvm/llvm-project/pull/117164 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_mfma_f32_16x16x32_bf16 for gfx950 (PR #117053)
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/117053 >From 84c3383558d5962f78086b64244997ca7a2b8c01 Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Mon, 29 Jan 2024 18:16:52 +0530 Subject: [PATCH] AMDGPU: Add v_mfma_f32_16x16x32_bf16 for gfx950 --- .../CodeGenOpenCL/builtins-amdgcn-mfma.cl | 7 + .../builtins-amdgcn-error-gfx950-param.cl | 7 + .../builtins-amdgcn-error-gfx950.cl | 1 + llvm/include/llvm/IR/IntrinsicsAMDGPU.td | 2 +- .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp | 3 +- llvm/lib/Target/AMDGPU/SIInstrInfo.td | 1 + llvm/lib/Target/AMDGPU/VOP3PInstructions.td | 5 + .../UniformityAnalysis/AMDGPU/intrinsics.ll | 9 + .../CodeGen/AMDGPU/llvm.amdgcn.mfma.gfx950.ll | 198 ++ llvm/test/MC/AMDGPU/mai-gfx950.s | 56 + .../MC/Disassembler/AMDGPU/gfx950_mai.txt | 34 +++ llvm/test/tools/llvm-mca/AMDGPU/gfx950.s | 10 +- 12 files changed, 328 insertions(+), 5 deletions(-) diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl b/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl index b21394b6982631..bfe2901ee962a3 100644 --- a/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl +++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl @@ -458,4 +458,11 @@ v16i test_mfma_i32_32x32x32_i8(v4i a, v4i b, v16i c) { return __builtin_amdgcn_mfma_i32_32x32x32_i8(a, b, c, 1, 2, 3); } +// CHECK-GFX950-LABEL: @test_mfma_f32_16x16x32_bf16( +// CHECK-GFX950: tail call <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.bf16(<8 x bfloat> %a, <8 x bfloat> %b, <4 x float> %c, i32 1, i32 2, i32 3) +v4f test_mfma_f32_16x16x32_bf16(v8bf16 a, v8bf16 b, v4f c) +{ + return __builtin_amdgcn_mfma_f32_16x16x32_bf16(a, b, c, 1, 2, 3); +} + #endif diff --git a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl index 9c14c0541ff3b8..acaa20090dfcba 100644 --- a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl +++ b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl @@ -55,3 +55,10 @@ void test_mfma_i32_32x32x32_i8(__global int16* out, int4 a, int4 b, int16 c, int *out = __builtin_amdgcn_mfma_i32_32x32x32_i8(a, b, c, 0, X, 0); // expected-error{{argument to '__builtin_amdgcn_mfma_i32_32x32x32_i8' must be a constant integer}} *out = __builtin_amdgcn_mfma_i32_32x32x32_i8(a, b, c, 0, 0, X); // expected-error{{argument to '__builtin_amdgcn_mfma_i32_32x32x32_i8' must be a constant integer}} } + +void test_mfma_f32_16x16x32_bf16(__global float4* out, bfloat8 a, bfloat8 b, float4 c, int X) { + + *out = __builtin_amdgcn_mfma_f32_16x16x32_bf16(a, b, c, X, 0, 0); // expected-error{{argument to '__builtin_amdgcn_mfma_f32_16x16x32_bf16' must be a constant integer}} + *out = __builtin_amdgcn_mfma_f32_16x16x32_bf16(a, b, c, 0, X, 0); // expected-error{{argument to '__builtin_amdgcn_mfma_f32_16x16x32_bf16' must be a constant integer}} + *out = __builtin_amdgcn_mfma_f32_16x16x32_bf16(a, b, c, 0, 0, X); // expected-error{{argument to '__builtin_amdgcn_mfma_f32_16x16x32_bf16' must be a constant integer}} +} diff --git a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl index 71a110066342cb..6bf76b3cba0f59 100644 --- a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl +++ b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl @@ -33,6 +33,7 @@ void test(__global float4* out0, half8 a0, half8 b0, float4 c0, *out2 = __builtin_amdgcn_mfma_f32_32x32x16_bf16(a2, b2, c2, 0, 0, 0); // expected-error{{'__builtin_amdgcn_mfma_f32_32x32x16_bf16' needs target feature gfx950-insts}} *out3 = __builtin_amdgcn_mfma_i32_16x16x64_i8(a3, b3, c3, 0, 0, 0); // expected-error{{'__builtin_amdgcn_mfma_i32_16x16x64_i8' needs target feature gfx950-insts}} *out4 = __builtin_amdgcn_mfma_i32_32x32x32_i8(a4, b4, c4, 0, 0, 0); // expected-error{{'__builtin_amdgcn_mfma_i32_32x32x32_i8' needs target feature gfx950-insts}} + *out5 = __builtin_amdgcn_mfma_f32_16x16x32_bf16(a5, b5, c5, 0, 0, 0); // expected-error{{'__builtin_amdgcn_mfma_f32_16x16x32_bf16' needs target feature gfx950-insts}} *out14 = __builtin_amdgcn_mfma_scale_f32_16x16x128_f8f6f4(a14, b14, c14, 0, 0, 0, d14, 0, e14); // expected-error{{'__builtin_amdgcn_mfma_scale_f32_16x16x128_f8f6f4' needs target feature gfx950-insts}} *out15 = __builtin_amdgcn_mfma_scale_f32_32x32x64_f8f6f4(a15, b15, c15, 0, 0, 0, d15, 0, e15); // expected-error{{'__builtin_amdgcn_mfma_scale_f32_32x32x64_f8f6f4' needs target feature gfx950-insts}} } diff --git a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td index b5d5eae0c7cd7e..479120f9c202bf 100644 --- a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td +++ b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td @@ -3148,7 +3148,7 @@ def int_amdgcn_mfma_f32_16x16x32_f16 : AMDGPUMfmaIntrinsic; def int_amdgcn_mfma_i32_16x16x64_i8 : AM
[llvm-branch-commits] [llvm] AMDGPU: Add a baseline, non-comprehensive test for scaled mfma hazards (PR #117055)
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/117055 >From a5ed11b07ab7ac28d304db851abf01c6b1230c24 Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Mon, 4 Mar 2024 17:36:33 +0530 Subject: [PATCH] AMDGPU: Add a baseline, non-comprehensive test for scaled mfma hazards Add some tests which will demonstrate that we treat the number of cycles differently depending on whether the first matrix uses an f8 format. --- .../CodeGen/AMDGPU/mai-hazards-gfx940.mir | 2 +- .../AMDGPU/mai-hazards-mfma-scale.gfx950.mir | 274 ++ 2 files changed, 275 insertions(+), 1 deletion(-) create mode 100644 llvm/test/CodeGen/AMDGPU/mai-hazards-mfma-scale.gfx950.mir diff --git a/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir b/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir index a98b02d792d984..9681b01f334f9a 100644 --- a/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir +++ b/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir @@ -2199,7 +2199,7 @@ name: xdl_mfma_4pass_write_vgpr_sgemm_mfma_read_overlap_srcb body: | bb.0: $vgpr0_vgpr1_vgpr2_vgpr3 = V_MFMA_F32_16X16X16F16_vgprcd_e64 $vgpr4_vgpr5, $vgpr6_vgpr7, $vgpr0_vgpr1_vgpr2_vgpr3, 1, 2, 3, implicit $mode, implicit $exec -$vgpr0_vgpr1_vgpr2_vgpr3 = V_MFMA_F32_4X4X1F32_vgprcd_e64 $vgpr8, $vgpr1, $vgpr6_vgpr7_vgpr8_vgpr9, 0, 0, 0, implicit $mode, implicit $exec +$vgpr0_vgpr1_vgpr2_vgpr3 = V_MFMA_F32_4X4X1F32_vgprcd_e64 $vgpr8, $vgpr1, $vgpr2_vgpr3_vgpr4_vgpr5, 0, 0, 0, implicit $mode, implicit $exec ... diff --git a/llvm/test/CodeGen/AMDGPU/mai-hazards-mfma-scale.gfx950.mir b/llvm/test/CodeGen/AMDGPU/mai-hazards-mfma-scale.gfx950.mir new file mode 100644 index 00..c0f0482debbcb3 --- /dev/null +++ b/llvm/test/CodeGen/AMDGPU/mai-hazards-mfma-scale.gfx950.mir @@ -0,0 +1,274 @@ +# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 4 +# RUN: llc -march=amdgcn -mcpu=gfx950 -verify-machineinstrs -run-pass post-RA-hazard-rec %s -o - | FileCheck -check-prefix=GCN %s + +# Immediate operand order = cbsz, abid, blgp + +# First MFMA uses f8 format, so should be treated as 32 cycles +--- +name: V_MFMA_F32_16X16X128_F8F6F4_vgprcd_e64___xdl_write_vgpr__cbsz0_blgp0xdl_read_overlap_vgpr_srcC +tracksRegLiveness: true +body: | + bb.0: +liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5, $vgpr6, $vgpr7, $vgpr8, $vgpr9, $vgpr10, $vgpr11, $vgpr12, $vgpr13, $vgpr14, $vgpr15, $vgpr16, $vgpr17, $vgpr18, $vgpr19, $vgpr20, $vgpr21 + +; GCN-LABEL: name: V_MFMA_F32_16X16X128_F8F6F4_vgprcd_e64___xdl_write_vgpr__cbsz0_blgp0xdl_read_overlap_vgpr_srcC +; GCN: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5, $vgpr6, $vgpr7, $vgpr8, $vgpr9, $vgpr10, $vgpr11, $vgpr12, $vgpr13, $vgpr14, $vgpr15, $vgpr16, $vgpr17, $vgpr18, $vgpr19, $vgpr20, $vgpr21 +; GCN-NEXT: {{ $}} +; GCN-NEXT: renamable $vgpr0_vgpr1_vgpr2_vgpr3 = nofpexcept V_MFMA_F32_16X16X128_F8F6F4_vgprcd_e64 $vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11, $vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17_vgpr18_vgpr19, killed $vgpr0_vgpr1_vgpr2_vgpr3, 0, 0, 0, implicit $mode, implicit $exec +; GCN-NEXT: S_NOP 1 +; GCN-NEXT: renamable $vgpr0_vgpr1_vgpr2_vgpr3 = nofpexcept V_MFMA_F32_16X16X128_F8F6F4_vgprcd_e64 killed $vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11, killed $vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17_vgpr18_vgpr19, killed $vgpr2_vgpr3_vgpr4_vgpr5, 0, 0, 0, implicit $mode, implicit $exec +; GCN-NEXT: S_SETPC_B64_return undef $sgpr30_sgpr31, implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3 +renamable $vgpr0_vgpr1_vgpr2_vgpr3 = nofpexcept V_MFMA_F32_16X16X128_F8F6F4_vgprcd_e64 $vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11, $vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17_vgpr18_vgpr19, killed $vgpr0_vgpr1_vgpr2_vgpr3, 0, 0, 0, implicit $mode, implicit $exec +renamable $vgpr0_vgpr1_vgpr2_vgpr3 = nofpexcept V_MFMA_F32_16X16X128_F8F6F4_vgprcd_e64 killed $vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11, killed $vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17_vgpr18_vgpr19, killed $vgpr2_vgpr3_vgpr4_vgpr5, 0, 0, 0, implicit $mode, implicit $exec +S_SETPC_B64_return undef $sgpr30_sgpr31, implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3 + +... + +# First MFMA uses f8 format, so should be treated as 32 cycles +--- +name: V_MFMA_F32_16X16X128_F8F6F4_vgprcd_e64___xdl_write_vgpr__cbsz1_blgp1xdl_read_overlap_vgpr_srcC +tracksRegLiveness: true +body: | + bb.0: +liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5, $vgpr6, $vgpr7, $vgpr8, $vgpr9, $vgpr10, $vgpr11, $vgpr12, $vgpr13, $vgpr14, $vgpr15, $vgpr16, $vgpr17, $vgpr18, $vgpr19, $vgpr20, $vgpr21 + +; GCN-LABEL: name: V_MFMA_F32_16X16X128_F8F6F4_vgprcd_e64___xdl_write_vgpr__cbsz1_blgp1xdl_read_overlap_vgpr_srcC +; GCN: liveins: $vgpr0, $vgpr1, $vgpr2,
[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_smfmac_f32_32x32x64_fp8_fp8 for gfx950 (PR #117259)
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/117259 >From d36a1301eb84377617c35c125e136230327eb3e9 Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Sat, 3 Feb 2024 21:43:00 +0530 Subject: [PATCH] AMDGPU: Add v_smfmac_f32_32x32x64_fp8_fp8 for gfx950 --- clang/include/clang/Basic/BuiltinsAMDGPU.def | 1 + .../CodeGenOpenCL/builtins-amdgcn-mfma.cl | 7 + .../builtins-amdgcn-error-gfx950-param.cl | 6 + .../builtins-amdgcn-error-gfx950.cl | 1 + llvm/include/llvm/IR/IntrinsicsAMDGPU.td | 1 + .../AMDGPU/AMDGPUInstructionSelector.cpp | 4 + .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp | 3 +- llvm/lib/Target/AMDGPU/VOP3PInstructions.td | 2 + .../UniformityAnalysis/AMDGPU/intrinsics.ll | 9 + .../AMDGPU/llvm.amdgcn.smfmac.gfx950.ll | 414 ++ llvm/test/MC/AMDGPU/mai-gfx950.s | 36 ++ .../MC/Disassembler/AMDGPU/gfx950_mai.txt | 22 + 12 files changed, 505 insertions(+), 1 deletion(-) diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def b/clang/include/clang/Basic/BuiltinsAMDGPU.def index f90af7000e3196..51a5b1dbad495c 100644 --- a/clang/include/clang/Basic/BuiltinsAMDGPU.def +++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def @@ -457,6 +457,7 @@ TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_16x16x128_fp8_fp8, "V4fV4iV8iV4fiIiIi TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_32x32x64_bf8_bf8, "V16fV4iV8iV16fiIiIi", "nc", "gfx950-insts") TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8, "V16fV4iV8iV16fiIiIi", "nc", "gfx950-insts") TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_32x32x64_fp8_bf8, "V16fV4iV8iV16fiIiIi", "nc", "gfx950-insts") +TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_32x32x64_fp8_fp8, "V16fV4iV8iV16fiIiIi", "nc", "gfx950-insts") //===--===// // GFX12+ only builtins. diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl b/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl index 33b60d53f11cc8..00346baa6ff84d 100644 --- a/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl +++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl @@ -559,4 +559,11 @@ void test_smfmac_f32_32x32x64_fp8_bf8(global v16f* out, v4i a, v8i b, v16f c, in *out = __builtin_amdgcn_smfmac_f32_32x32x64_fp8_bf8(a, b, c, idx, 0, 0); } +// CHECK-GFX950-LABEL: @test_smfmac_f32_32x32x64_fp8_fp8 +// CHECK-GFX950: call <16 x float> @llvm.amdgcn.smfmac.f32.32x32x64.fp8.fp8(<4 x i32> %a, <8 x i32> %b, <16 x float> %c, i32 %idx, i32 0, i32 0) +void test_smfmac_f32_32x32x64_fp8_fp8(global v16f* out, v4i a, v8i b, v16f c, int idx) +{ + *out = __builtin_amdgcn_smfmac_f32_32x32x64_fp8_fp8(a, b, c, idx, 0, 0); +} + #endif diff --git a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl index c53ca8a7c3513f..b3b359a1e0c65b 100644 --- a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl +++ b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl @@ -142,3 +142,9 @@ void test_smfmac_f32_32x32x64_fp8_bf8(global float16* out, int4 a, int8 b, float *out = __builtin_amdgcn_smfmac_f32_32x32x64_fp8_bf8(a, b, c, idx, d, 0); // expected-error{{argument to '__builtin_amdgcn_smfmac_f32_32x32x64_fp8_bf8' must be a constant integer}} *out = __builtin_amdgcn_smfmac_f32_32x32x64_fp8_bf8(a, b, c, idx, 0, d); // expected-error{{argument to '__builtin_amdgcn_smfmac_f32_32x32x64_fp8_bf8' must be a constant integer}} } + +void test_smfmac_f32_32x32x64_fp8_fp8(global float16* out, int4 a, int8 b, float16 c, int idx, int d) +{ + *out = __builtin_amdgcn_smfmac_f32_32x32x64_fp8_fp8(a, b, c, idx, d, 0); // expected-error{{argument to '__builtin_amdgcn_smfmac_f32_32x32x64_fp8_fp8' must be a constant integer}} + *out = __builtin_amdgcn_smfmac_f32_32x32x64_fp8_fp8(a, b, c, idx, 0, d); // expected-error{{argument to '__builtin_amdgcn_smfmac_f32_32x32x64_fp8_fp8' must be a constant integer}} +} diff --git a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl index 9e563a7b0bd64c..57523cf0af1b18 100644 --- a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl +++ b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl @@ -47,6 +47,7 @@ void test(__global float4* out0, half8 a0, half8 b0, float4 c0, *out13 = __builtin_amdgcn_smfmac_f32_32x32x64_bf8_bf8(a13, b13, c13, 0, 0, 0); // expected-error{{'__builtin_amdgcn_smfmac_f32_32x32x64_bf8_bf8' needs target feature gfx950-insts}} *out13 = __builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8(a13, b13, c13, 0, 0, 0); // expected-error{{'__builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8' needs target feature gfx950-insts}} *out13 = __builtin_amdgcn_smfmac_f32_32x32x64_fp8_bf8(a13, b13, c13, 0, 0, 0); // expected-error{{'__builtin_amdgcn_smfmac_f32_32x32x64_fp8_bf8' needs target feature gfx950-insts}} + *out13 = __builtin_amdgcn_smfmac_f32_32
[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_smfmac_f32_32x32x64_bf8_fp8 for gfx950 (PR #117257)
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/117257 >From 73f8fed93b6fd985cf79d384fee64fc506ceb062 Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Sat, 3 Feb 2024 21:09:21 +0530 Subject: [PATCH] AMDGPU: Add v_smfmac_f32_32x32x64_bf8_fp8 for gfx950 --- clang/include/clang/Basic/BuiltinsAMDGPU.def | 1 + .../CodeGenOpenCL/builtins-amdgcn-mfma.cl | 7 + .../builtins-amdgcn-error-gfx950-param.cl | 6 + .../builtins-amdgcn-error-gfx950.cl | 1 + llvm/include/llvm/IR/IntrinsicsAMDGPU.td | 1 + .../AMDGPU/AMDGPUInstructionSelector.cpp | 4 + .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp | 3 +- llvm/lib/Target/AMDGPU/VOP3PInstructions.td | 2 + .../UniformityAnalysis/AMDGPU/intrinsics.ll | 9 + .../AMDGPU/llvm.amdgcn.smfmac.gfx950.ll | 414 ++ llvm/test/MC/AMDGPU/mai-gfx950.s | 36 ++ .../MC/Disassembler/AMDGPU/gfx950_mai.txt | 22 + 12 files changed, 505 insertions(+), 1 deletion(-) diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def b/clang/include/clang/Basic/BuiltinsAMDGPU.def index 8abfcf496b7d73..d6123fa41ca8b8 100644 --- a/clang/include/clang/Basic/BuiltinsAMDGPU.def +++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def @@ -455,6 +455,7 @@ TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_16x16x128_bf8_fp8, "V4fV4iV8iV4fiIiIi TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_16x16x128_fp8_bf8, "V4fV4iV8iV4fiIiIi", "nc", "gfx950-insts") TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_16x16x128_fp8_fp8, "V4fV4iV8iV4fiIiIi", "nc", "gfx950-insts") TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_32x32x64_bf8_bf8, "V16fV4iV8iV16fiIiIi", "nc", "gfx950-insts") +TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8, "V16fV4iV8iV16fiIiIi", "nc", "gfx950-insts") //===--===// // GFX12+ only builtins. diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl b/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl index fdaedc1f92bede..d79ca36f003c5e 100644 --- a/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl +++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl @@ -545,4 +545,11 @@ void test_smfmac_f32_32x32x64_bf8_bf8(global v16f* out, v4i a, v8i b, v16f c, in *out = __builtin_amdgcn_smfmac_f32_32x32x64_bf8_bf8(a, b, c, idx, 0, 0); } +// CHECK-GFX950-LABEL: @test_smfmac_f32_32x32x64_bf8_fp8 +// CHECK-GFX950: call <16 x float> @llvm.amdgcn.smfmac.f32.32x32x64.bf8.fp8(<4 x i32> %a, <8 x i32> %b, <16 x float> %c, i32 %idx, i32 0, i32 0) +void test_smfmac_f32_32x32x64_bf8_fp8(global v16f* out, v4i a, v8i b, v16f c, int idx) +{ + *out = __builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8(a, b, c, idx, 0, 0); +} + #endif diff --git a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl index 9e0c46b8777533..d1751a6af15463 100644 --- a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl +++ b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl @@ -130,3 +130,9 @@ void test_smfmac_f32_32x32x64_bf8_bf8(global float16* out, int4 a, int8 b, float *out = __builtin_amdgcn_smfmac_f32_32x32x64_bf8_bf8(a, b, c, idx, d, 0); // expected-error{{argument to '__builtin_amdgcn_smfmac_f32_32x32x64_bf8_bf8' must be a constant integer}} *out = __builtin_amdgcn_smfmac_f32_32x32x64_bf8_bf8(a, b, c, idx, 0, d); // expected-error{{argument to '__builtin_amdgcn_smfmac_f32_32x32x64_bf8_bf8' must be a constant integer}} } + +void test_smfmac_f32_32x32x64_bf8_fp8(global float16* out, int4 a, int8 b, float16 c, int idx, int d) +{ + *out = __builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8(a, b, c, idx, d, 0); // expected-error{{argument to '__builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8' must be a constant integer}} + *out = __builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8(a, b, c, idx, 0, d); // expected-error{{argument to '__builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8' must be a constant integer}} +} diff --git a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl index a0955b290c9830..f8ac3399d2b64b 100644 --- a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl +++ b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl @@ -45,6 +45,7 @@ void test(__global float4* out0, half8 a0, half8 b0, float4 c0, *out12 = __builtin_amdgcn_smfmac_f32_16x16x128_fp8_bf8(a12, b12, c12, 0, 0, 0); // expected-error{{'__builtin_amdgcn_smfmac_f32_16x16x128_fp8_bf8' needs target feature gfx950-insts}} *out12 = __builtin_amdgcn_smfmac_f32_16x16x128_fp8_fp8(a12, b12, c12, 0, 0, 0); // expected-error{{'__builtin_amdgcn_smfmac_f32_16x16x128_fp8_fp8' needs target feature gfx950-insts}} *out13 = __builtin_amdgcn_smfmac_f32_32x32x64_bf8_bf8(a13, b13, c13, 0, 0, 0); // expected-error{{'__builtin_amdgcn_smfmac_f32_32x32x64_bf8_bf8' needs target feature gfx950-insts}} + *out13 = __builtin_amdgcn_smfmac_f32_
[llvm-branch-commits] [flang] [flang][OpenMP] Use new modifier code in ORDER and SCHEDULE clauses (PR #117081)
@@ -153,6 +153,16 @@ std::optional maybeApply(FuncTy &&func, return std::move(func(*arg)); } +template < +typename FuncTy, // +typename ArgTy, // +typename ResultTy = std::invoke_result_t> +std::optional maybeApplyToV(FuncTy &&func, const ArgTy *arg) { + if (!arg) +return std::nullopt; + return std::move(func(arg->v)); tblah wrote: nit: I don't think this `std::move` is necessary. In the uses I can see here `ResultTy` is not a reference. Therefore, the function result is a prvalue and so will be moved automatically. https://github.com/llvm/llvm-project/pull/117081 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang][OpenMP] Use new modifier code in ORDER and SCHEDULE clauses (PR #117081)
https://github.com/tblah edited https://github.com/llvm/llvm-project/pull/117081 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_smfmac_i32_32x32x64_i8 for gfx950 (PR #117214)
https://github.com/srpande approved this pull request. lgrm https://github.com/llvm/llvm-project/pull/117214 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_smfmac_f32_16x16x64_bf16 for gfx950 (PR #117211)
https://github.com/arsenm ready_for_review https://github.com/llvm/llvm-project/pull/117211 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_smfmac_f32_16x16x128_fp8_fp8 for gfx950 (PR #117235)
arsenm wrote: ### Merge activity * **Nov 21, 7:53 PM EST**: A user started a stack merge that includes this pull request via [Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/117235). https://github.com/llvm/llvm-project/pull/117235 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 14b474b - Revert "[LV][VPlan] Remove any-of reduction from precomputeCost. NFC (#117109)"
Author: Elvis Wang Date: 2024-11-22T11:32:12+08:00 New Revision: 14b474be36144527a55b5d49954379a3484c5f84 URL: https://github.com/llvm/llvm-project/commit/14b474be36144527a55b5d49954379a3484c5f84 DIFF: https://github.com/llvm/llvm-project/commit/14b474be36144527a55b5d49954379a3484c5f84.diff LOG: Revert "[LV][VPlan] Remove any-of reduction from precomputeCost. NFC (#117109)" This reverts commit ce66b56865426fc1760b5a090ca2748c046094f5. Added: Modified: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp Removed: diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp index 5b556058cc762c..d13770a35c108f 100644 --- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp +++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp @@ -7303,14 +7303,34 @@ LoopVectorizationPlanner::precomputeCosts(VPlan &Plan, ElementCount VF, // The legacy cost model has special logic to compute the cost of in-loop // reductions, which may be smaller than the sum of all instructions involved - // in the reduction. + // in the reduction. For AnyOf reductions, VPlan codegen may remove the select + // which the legacy cost model uses to assign cost. Pre-compute their costs + // for now. // TODO: Switch to costing based on VPlan once the logic has been ported. for (const auto &[RedPhi, RdxDesc] : Legal->getReductionVars()) { if (ForceTargetInstructionCost.getNumOccurrences()) continue; -if (!CM.isInLoopReduction(RedPhi)) +if (!CM.isInLoopReduction(RedPhi) && +!RecurrenceDescriptor::isAnyOfRecurrenceKind( +RdxDesc.getRecurrenceKind())) + continue; + +// AnyOf reduction codegen may remove the select. To match the legacy cost +// model, pre-compute the cost for AnyOf reductions here. +if (RecurrenceDescriptor::isAnyOfRecurrenceKind( +RdxDesc.getRecurrenceKind())) { + auto *Select = cast(*find_if( + RedPhi->users(), [](User *U) { return isa(U); })); + assert(!CostCtx.SkipCostComputation.contains(Select) && + "reduction op visited multiple times"); + CostCtx.SkipCostComputation.insert(Select); + auto ReductionCost = CostCtx.getLegacyCost(Select, VF); + LLVM_DEBUG(dbgs() << "Cost of " << ReductionCost << " for VF " << VF +<< ":\n any-of reduction " << *Select << "\n"); + Cost += ReductionCost; continue; +} const auto &ChainOps = RdxDesc.getReductionOpChain(RedPhi, OrigLoop); SetVector ChainOpsAndOperands(ChainOps.begin(), ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Handle v_mfma_f64_16x16x4_f64 srcc write VGPR hazard change for gfx950 (PR #117283)
llvmbot wrote: @llvm/pr-subscribers-backend-amdgpu Author: Matt Arsenault (arsenm) Changes Read by sgemm/dgemm in srcc after v_mfma_f64_16x16x4_f64 increases from 9 to 17 wait states. --- Full diff: https://github.com/llvm/llvm-project/pull/117283.diff 2 Files Affected: - (modified) llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp (+5-1) - (modified) llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir (+33-12) ``diff diff --git a/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp b/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp index be0936ce74835f..4a4c9788b3d881 100644 --- a/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp +++ b/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp @@ -2302,6 +2302,7 @@ int GCNHazardRecognizer::checkMAIHazards90A(MachineInstr *MI) { const int SMFMA16x16WritesVGPROverlappedDMFMASrcCWaitStates = 9; const int SMFMA32x32WritesVGPROverlappedDMFMASrcCWaitStates = 17; const int DMFMA16x16WritesVGPROverlappedSrcCWaitStates = 9; +const int GFX950_DMFMA16x16WritesVGPROverlappedSrcCWaitStates = 17; const int DMFMA4x4WritesVGPROverlappedSrcCWaitStates = 4; const int SMFMA4x4WritesVGPROverlappedSrcABWaitStates = 5; const int SMFMA16x16WritesVGPROverlappedSrcABWaitStates = 11; @@ -2359,7 +2360,10 @@ int GCNHazardRecognizer::checkMAIHazards90A(MachineInstr *MI) { case AMDGPU::V_MFMA_F64_16X16X4F64_mac_e64: case AMDGPU::V_MFMA_F64_16X16X4F64_mac_vgprcd_e64: if (!isXDL(ST, *MI)) -NeedWaitStates = DMFMA16x16WritesVGPROverlappedSrcCWaitStates; +NeedWaitStates = +ST.hasGFX950Insts() +? GFX950_DMFMA16x16WritesVGPROverlappedSrcCWaitStates +: DMFMA16x16WritesVGPROverlappedSrcCWaitStates; break; case AMDGPU::V_MFMA_F64_4X4X4F64_e64: case AMDGPU::V_MFMA_F64_4X4X4F64_vgprcd_e64: diff --git a/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir b/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir index b9135dbd46fc1f..1499fd4907a181 100644 --- a/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir +++ b/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir @@ -298,8 +298,12 @@ body: | ... # GCN-LABEL: name: dgemm16x16_mfma_write_vgpr_mfma_read_overlap # GCN: V_MFMA -# GCN-NEXT: S_NOP 7 -# GCN-NEXT: S_NOP 0 +# GFX940-NEXT: S_NOP 7 +# GFX940-NEXT: S_NOP 0 + +# GFX950-NEXT: S_NOP 7 +# GFX950-NEXT: S_NOP 7 +# GFX950-NEXT: S_NOP 0 # GCN-NEXT: V_MFMA name:dgemm16x16_mfma_write_vgpr_mfma_read_overlap body: | @@ -319,8 +323,12 @@ body: | ... # GCN-LABEL: name: dgemm16x16_mfma_write_vgpr_sgemm_mfma_read_overlap # GCN: V_MFMA -# GCN-NEXT: S_NOP 7 -# GCN-NEXT: S_NOP 0 +# GFX940-NEXT: S_NOP 7 +# GFX940-NEXT: S_NOP 0 + +# GFX950-NEXT: S_NOP 7 +# GFX950-NEXT: S_NOP 7 +# GFX950-NEXT: S_NOP 0 # GCN-NEXT: V_MFMA name:dgemm16x16_mfma_write_vgpr_sgemm_mfma_read_overlap body: | @@ -549,8 +557,12 @@ body: | ... # GCN-LABEL: name: dgemm16x16_mfma_write_vgpr_sgemm_mfma_srca_read_overlap # GCN: V_MFMA -# GCN-NEXT: S_NOP 7 -# GCN-NEXT: S_NOP 2 +# GFX940-NEXT: S_NOP 7 +# GFX940-NEXT: S_NOP 2 + +# GFX950-NEXT: S_NOP 7 +# GFX950-NEXT: S_NOP 7 +# GFX950-NEXT: S_NOP 0 # GCN-NEXT: V_MFMA name:dgemm16x16_mfma_write_vgpr_sgemm_mfma_srca_read_overlap body: | @@ -1333,8 +1345,12 @@ body: | ... # GCN-LABEL: name: dgemm16x16_mfma_write_agpr_mfma_read_overlap # GCN: V_MFMA -# GCN-NEXT: S_NOP 7 -# GCN-NEXT: S_NOP 0 +# GFX940-NEXT: S_NOP 7 +# GFX940-NEXT: S_NOP 0 + +# GFX950-NEXT: S_NOP 7 +# GFX950-NEXT: S_NOP 7 +# GFX950-NEXT: S_NOP 0 # GCN-NEXT: V_MFMA name:dgemm16x16_mfma_write_agpr_mfma_read_overlap body: | @@ -1354,8 +1370,13 @@ body: | ... # GCN-LABEL: name: dgemm16x16_mfma_write_agpr_sgemm_mfma_read_overlap # GCN: V_MFMA -# GCN-NEXT: S_NOP 7 -# GCN-NEXT: S_NOP 0 +# GFX940-NEXT: S_NOP 7 +# GFX940-NEXT: S_NOP 0 + +# GFX950-NEXT: S_NOP 7 +# GFX950-NEXT: S_NOP 7 +# GFX950-NEXT: S_NOP 0 + # GCN-NEXT: V_MFMA name:dgemm16x16_mfma_write_agpr_sgemm_mfma_read_overlap body: | @@ -2502,8 +2523,8 @@ body: | ... # GCN-LABEL: name: xdl_4pass_mfma_write_agpr_smfmac_read_overlap_srcc # GCN: V_MFMA -# GFX940: S_NOP 4 -# GFX950: S_NOP 5 +# GFX940-NEXT: S_NOP 4 +# GFX950-NEXT: S_NOP 5 # GCN-NEXT: V_SMFMAC_ name:xdl_4pass_mfma_write_agpr_smfmac_read_overlap_srcc body: | `` https://github.com/llvm/llvm-project/pull/117283 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Handle gfx950 change in mfma_f64_16x16x4 + valu hazard (PR #117262)
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/117262 >From 06412577e65e05abf3edc1a884edc8640b924933 Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Thu, 7 Mar 2024 15:01:08 +0530 Subject: [PATCH] AMDGPU: Handle gfx950 change in mfma_f64_16x16x4 + valu hazard Increase from 11 wait states to 19 --- .../lib/Target/AMDGPU/GCNHazardRecognizer.cpp | 10 +-- .../CodeGen/AMDGPU/mai-hazards-gfx940.mir | 28 ++- 2 files changed, 28 insertions(+), 10 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp b/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp index 44afccb0690d0d..99a176731599cc 100644 --- a/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp +++ b/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp @@ -2603,6 +2603,7 @@ int GCNHazardRecognizer::checkMAIVALUHazards(MachineInstr *MI) { const int DMFMA16x16WriteVgprMemExpReadWaitStates = 18; const int DMFMA4x4WriteVgprVALUReadWaitStates = 6; const int DMFMA16x16WriteVgprVALUReadWaitStates = 11; +const int GFX950_DMFMA16x16WriteVgprVALUReadWaitStates = 19; const int DotWriteSameDotReadSrcAB = 3; const int DotWriteDifferentVALURead = 3; const int DMFMABetweenVALUWriteVMEMRead = 2; @@ -2663,9 +2664,12 @@ int GCNHazardRecognizer::checkMAIVALUHazards(MachineInstr *MI) { break; case 8: case 16: - NeedWaitStates = IsMemOrExport - ? DMFMA16x16WriteVgprMemExpReadWaitStates - : DMFMA16x16WriteVgprVALUReadWaitStates; + NeedWaitStates = + IsMemOrExport + ? DMFMA16x16WriteVgprMemExpReadWaitStates + : (ST.hasGFX950Insts() + ? GFX950_DMFMA16x16WriteVgprVALUReadWaitStates + : DMFMA16x16WriteVgprVALUReadWaitStates); break; default: llvm_unreachable("unexpected dgemm"); diff --git a/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir b/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir index 9681b01f334f9a..d2b2f226404da8 100644 --- a/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir +++ b/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir @@ -1,4 +1,5 @@ -# RUN: llc -mtriple=amdgcn -mcpu=gfx940 -verify-machineinstrs -run-pass post-RA-hazard-rec %s -o - | FileCheck -check-prefix=GCN %s +# RUN: llc -mtriple=amdgcn -mcpu=gfx940 -verify-machineinstrs -run-pass post-RA-hazard-rec %s -o - | FileCheck -check-prefixes=GCN,GFX940 %s +# RUN: llc -mtriple=amdgcn -mcpu=gfx950 -verify-machineinstrs -run-pass post-RA-hazard-rec %s -o - | FileCheck -check-prefixes=GCN,GFX950 %s # GCN-LABEL: name: valu_write_vgpr_sgemm_mfma_read # GCN: V_MOV_B32 @@ -803,8 +804,12 @@ body: | ... # GCN-LABEL: name: dmfma16x16_write_vgpr_valu_read # GCN: V_MFMA -# GCN-NEXT: S_NOP 7 -# GCN-NEXT: S_NOP 2 +# GFX940-NEXT: S_NOP 7 +# GFX940-NEXT: S_NOP 2 + +# GFX950-NEXT: S_NOP 7 +# GFX950-NEXT: S_NOP 7 +# GFX950-NEXT: S_NOP 2 # GCN-NEXT: V_MOV_B32 name:dmfma16x16_write_vgpr_valu_read body: | @@ -867,8 +872,13 @@ body: | ... # GCN-LABEL: name: dmfma16x16_write_vgpr_dot_read # GCN: V_MFMA -# GCN-NEXT: S_NOP 7 -# GCN-NEXT: S_NOP 2 +# GFX940-NEXT: S_NOP 7 +# GFX940-NEXT: S_NOP 2 + +# GFX950-NEXT: S_NOP 7 +# GFX950-NEXT: S_NOP 7 +# GFX950-NEXT: S_NOP 2 + # GCN-NEXT: V_DOT name:dmfma16x16_write_vgpr_dot_read body: | @@ -1505,8 +1515,12 @@ body: | ... # GCN-LABEL: name: dmfma16x16_write_agpr_valu_read # GCN: V_MFMA -# GCN-NEXT: S_NOP 7 -# GCN-NEXT: S_NOP 2 +# GFX940-NEXT: S_NOP 7 +# GFX940-NEXT: S_NOP 2 + +# GFX950-NEXT: S_NOP 7 +# GFX950-NEXT: S_NOP 7 +# GFX950-NEXT: S_NOP 2 # GCN-NEXT: V_ACCVGPR_READ_B32_e64 name:dmfma16x16_write_agpr_valu_read body: | ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Handle vcmpx+permalane gfx950 hazard (PR #117286)
github-actions[bot] wrote: :warning: C/C++ code formatter, clang-format found issues in your code. :warning: You can test this locally with the following command: ``bash git-clang-format --diff 52f540df160ad84aef090acb35c9372c270d758b 0cbee40e03bff1514abbf1e879522a4808175c1a --extensions cpp,h -- llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp llvm/lib/Target/AMDGPU/GCNHazardRecognizer.h `` View the diff from clang-format here. ``diff diff --git a/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp b/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp index 45ff1f4a63..9799556084 100644 --- a/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp +++ b/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp @@ -1207,12 +1207,11 @@ void GCNHazardRecognizer::fixHazards(MachineInstr *MI) { fixRequiredExportPriority(MI); } -static bool isVCmpXWritesExec(const SIInstrInfo &TII, - const SIRegisterInfo &TRI, +static bool isVCmpXWritesExec(const SIInstrInfo &TII, const SIRegisterInfo &TRI, const MachineInstr &MI) { return (TII.isVOPC(MI) || (MI.isCompare() && (TII.isVOP3(MI) || TII.isSDWA(MI && -MI.modifiesRegister(AMDGPU::EXEC, &TRI); + MI.modifiesRegister(AMDGPU::EXEC, &TRI); } bool GCNHazardRecognizer::fixVcmpxPermlaneHazards(MachineInstr *MI) { `` https://github.com/llvm/llvm-project/pull/117286 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Refine gfx950 xdl-write-vgpr hazard cases (PR #117285)
llvmbot wrote: @llvm/pr-subscribers-backend-amdgpu Author: Matt Arsenault (arsenm) Changes The 2-pass XDL write VGPR, read by non-XDL SGEMM/DGEMM case was 1 wait state overly conservative. Previously, for gfx940, the XDL/non-XDL cases happened to have the same number of cycles in all cases. Now the XDL consumer case has an additional state for 2 pass sources. --- Full diff: https://github.com/llvm/llvm-project/pull/117285.diff 2 Files Affected: - (modified) llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp (+18-4) - (modified) llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir (+5-10) ``diff diff --git a/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp b/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp index 218f487f7e12ce..8008b5f7bcc991 100644 --- a/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp +++ b/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp @@ -2232,8 +2232,8 @@ int GCNHazardRecognizer::checkMAIHazards908(MachineInstr *MI) { } static int -GFX940_XDL_N_PassWritesVGPROverlappedSMFMASrcCWaitStates(int NumPasses, - bool IsGFX950) { +GFX940_XDL_N_PassWritesVGPROverlappedXDLOrSMFMASrcCWaitStates(int NumPasses, + bool IsGFX950) { // xdl def cycles | gfx940 | gfx950 // 2 pass | 34 // 4 pass | 56 @@ -2242,6 +2242,17 @@ GFX940_XDL_N_PassWritesVGPROverlappedSMFMASrcCWaitStates(int NumPasses, return NumPasses + 1 + IsGFX950; } +static int +GFX940_XDL_N_PassWritesVGPROverlappedSGEMMDGEMMSrcCWaitStates(int NumPasses, + bool IsGFX950) { + // xdl def cycles | gfx940 | gfx950 + // 2 pass | 33 + // 4 pass | 56 + // 8 pass | 910 + // 16 pass| 17 18 + return NumPasses + 1 + (NumPasses != 2 && IsGFX950); +} + static int GFX940_SMFMA_N_PassWritesVGPROverlappedSMFMASrcCWaitStates(int NumPasses) { // 2 pass -> 2 @@ -2379,8 +2390,11 @@ int GCNHazardRecognizer::checkMAIHazards90A(MachineInstr *MI) { NeedWaitStates = isXDL(ST, *MI1) -? GFX940_XDL_N_PassWritesVGPROverlappedSMFMASrcCWaitStates( - NumPasses, ST.hasGFX950Insts()) +? (isXDL(ST, *MI) + ? GFX940_XDL_N_PassWritesVGPROverlappedXDLOrSMFMASrcCWaitStates( + NumPasses, ST.hasGFX950Insts()) + : GFX940_XDL_N_PassWritesVGPROverlappedSGEMMDGEMMSrcCWaitStates( + NumPasses, ST.hasGFX950Insts())) : GFX940_SMFMA_N_PassWritesVGPROverlappedSMFMASrcCWaitStates( NumPasses); break; diff --git a/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir b/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir index 2ba873f55a1eb0..d59bcfb16eece2 100644 --- a/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir +++ b/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir @@ -156,8 +156,7 @@ body: | ... # GCN-LABEL: name: sgemm4x4_mfma_write_vgpr_mfma_read_overlap # GCN: V_MFMA -# GFX940-NEXT: S_NOP 2 -# GFX950-NEXT: S_NOP 3 +# GCN-NEXT: S_NOP 2 # GCN-NEXT: V_MFMA name:sgemm4x4_mfma_write_vgpr_mfma_read_overlap body: | @@ -348,8 +347,7 @@ body: | ... # GCN-LABEL: name: sgemm4x4_mfma_write_vgpr_dgemm_mfma_read_overlap # GCN: V_MFMA -# GFX940-NEXT: S_NOP 2 -# GFX950-NEXT: S_NOP 3 +# GCN-NEXT: S_NOP 2 # GCN-NEXT: V_MFMA name:sgemm4x4_mfma_write_vgpr_dgemm_mfma_read_overlap body: | @@ -1403,8 +1401,7 @@ body: | ... # GCN-LABEL: name: sgemm4x4_mfma_write_agpr_dgemm_mfma_read_overlap # GCN: V_MFMA -# GFX940-NEXT: S_NOP 2 -# GFX950-NEXT: S_NOP 3 +# GCN-NEXT: S_NOP 2 # GCN-NEXT: V_MFMA name:sgemm4x4_mfma_write_agpr_dgemm_mfma_read_overlap body: | @@ -1885,8 +1882,7 @@ body: | ... # GCN-LABEL: name: xdl_sgemm4x4_mfma_write_agpr_mfma_read_overlap # GCN: V_MFMA -# GFX940-NEXT: S_NOP 2 -# GFX950-NEXT: S_NOP 3 +# GCN-NEXT: S_NOP 2 # GCN-NEXT: V_MFMA name:xdl_sgemm4x4_mfma_write_agpr_mfma_read_overlap body: | @@ -2220,8 +2216,7 @@ body: | # 2 pass source # GCN-LABEL: name: xdl_mfma_2pass_write_vgpr_sgemm_mfma_read_overlap_srcc # GCN: V_MFMA -# GFX940-NEXT: S_NOP 2 -# GFX950-NEXT: S_NOP 3 +# GCN-NEXT: S_NOP 2 # GCN-NEXT: V_MFMA name:xdl_mfma_2pass_write_vgpr_sgemm_mfma_read_overlap_srcc body: | `` https://github.com/llvm/llvm-project/pull/117285 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_smfmac_f32_32x32x64_bf8_fp8 for gfx950 (PR #117257)
https://github.com/shiltian approved this pull request. https://github.com/llvm/llvm-project/pull/117257 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_smfmac_f32_32x32x64_bf8_fp8 for gfx950 (PR #117257)
https://github.com/shiltian edited https://github.com/llvm/llvm-project/pull/117257 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_permlane16_swap_b32 and v_permlane32_swap_b32 for gfx950 (PR #117260)
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/117260 >From 426d5baaf7d373a6d35ead2af4515e108a6eb8b8 Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Mon, 22 Jan 2024 12:40:54 +0700 Subject: [PATCH] AMDGPU: Add v_permlane16_swap_b32 and v_permlane32_swap_b32 for gfx950 This was a bit annoying because these introduce a new special case encoding usage. op_sel is repurposed as a subset of dpp controls, and is eligible for VOP3->VOP1 shrinking. For some reason fi also uses an enum value, so we need to convert the raw boolean to 1 instead of -1. The 2 registers are swapped, so this has 2 defs. Ideally the builtin would return a pair, but that's difficult so return a vector instead. This would make a hypothetical builtin that supports v2f16 directly uglier. --- clang/include/clang/Basic/BuiltinsAMDGPU.def | 3 + clang/lib/CodeGen/CGBuiltin.cpp | 26 clang/test/CodeGenOpenCL/amdgpu-features.cl | 2 +- .../builtins-amdgcn-gfx950-err.cl | 6 +- .../CodeGenOpenCL/builtins-amdgcn-gfx950.cl | 87 + .../builtins-amdgcn-error-gfx950-param.cl | 10 ++ .../builtins-amdgcn-error-gfx950.cl | 5 +- llvm/docs/AMDGPUUsage.rst | 13 ++ llvm/include/llvm/IR/IntrinsicsAMDGPU.td | 14 ++ llvm/lib/Target/AMDGPU/AMDGPU.td | 23 +++- llvm/lib/Target/AMDGPU/AMDGPUGISel.td | 3 + llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp | 25 .../AMDGPU/AMDGPUInstructionSelector.cpp | 32 + .../Target/AMDGPU/AMDGPUInstructionSelector.h | 3 + .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp | 9 ++ .../Target/AMDGPU/AMDGPUSearchableTables.td | 2 + llvm/lib/Target/AMDGPU/GCNSubtarget.h | 8 +- llvm/lib/Target/AMDGPU/SIInstrInfo.cpp| 6 +- llvm/lib/Target/AMDGPU/SIInstrInfo.td | 4 + llvm/lib/Target/AMDGPU/VOP1Instructions.td| 46 +++ llvm/lib/Target/AMDGPU/VOPInstructions.td | 12 ++ llvm/lib/TargetParser/TargetParser.cpp| 2 + .../UniformityAnalysis/AMDGPU/intrinsics.ll | 16 +++ .../AMDGPU/llvm.amdgcn.permlane16.swap.ll | 121 ++ .../AMDGPU/llvm.amdgcn.permlane32.swap.ll | 121 ++ llvm/test/MC/AMDGPU/gfx950_asm_features.s | 82 llvm/test/MC/AMDGPU/gfx950_err.s | 31 + llvm/test/MC/Disassembler/AMDGPU/gfx950.txt | 32 + 28 files changed, 737 insertions(+), 7 deletions(-) create mode 100644 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.permlane16.swap.ll create mode 100644 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.permlane32.swap.ll create mode 100644 llvm/test/MC/AMDGPU/gfx950_err.s diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def b/clang/include/clang/Basic/BuiltinsAMDGPU.def index 51a5b1dbad495c..548bcc8ad55f48 100644 --- a/clang/include/clang/Basic/BuiltinsAMDGPU.def +++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def @@ -459,6 +459,9 @@ TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8, "V16fV4iV8iV16fiIiI TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_32x32x64_fp8_bf8, "V16fV4iV8iV16fiIiIi", "nc", "gfx950-insts") TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_32x32x64_fp8_fp8, "V16fV4iV8iV16fiIiIi", "nc", "gfx950-insts") +TARGET_BUILTIN(__builtin_amdgcn_permlane16_swap, "V2UiUiUiIbIb", "nc", "permlane16-swap") +TARGET_BUILTIN(__builtin_amdgcn_permlane32_swap, "V2UiUiUiIbIb", "nc", "permlane32-swap") + //===--===// // GFX12+ only builtins. //===--===// diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp index ff7132fd8bc1e7..3b3c46b56868cf 100644 --- a/clang/lib/CodeGen/CGBuiltin.cpp +++ b/clang/lib/CodeGen/CGBuiltin.cpp @@ -20162,6 +20162,32 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, CGM.getIntrinsic(Intrinsic::amdgcn_s_sendmsg_rtn, {ResultType}); return Builder.CreateCall(F, {Arg}); } + case AMDGPU::BI__builtin_amdgcn_permlane16_swap: + case AMDGPU::BI__builtin_amdgcn_permlane32_swap: { +// Because builtin types are limited, and the intrinsic uses a struct/pair +// output, marshal the pair-of-i32 to <2 x i32>. +Value *VDstOld = EmitScalarExpr(E->getArg(0)); +Value *VSrcOld = EmitScalarExpr(E->getArg(1)); +Value *FI = EmitScalarExpr(E->getArg(2)); +Value *BoundCtrl = EmitScalarExpr(E->getArg(3)); +Function *F = +CGM.getIntrinsic(BuiltinID == AMDGPU::BI__builtin_amdgcn_permlane16_swap + ? Intrinsic::amdgcn_permlane16_swap + : Intrinsic::amdgcn_permlane32_swap); +llvm::CallInst *Call = +Builder.CreateCall(F, {VDstOld, VSrcOld, FI, BoundCtrl}); + +llvm::Value *Elt0 = Builder.CreateExtractValue(Call, 0); +llvm::Value *Elt1 = Builder.CreateExtractValue(Call, 1); + +llvm::Type *ResultType = Con
[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_smfmac_f32_32x32x64_bf8_fp8 for gfx950 (PR #117257)
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/117257 >From 698095bb278b20ff853018b997a563a2387eeca6 Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Sat, 3 Feb 2024 21:09:21 +0530 Subject: [PATCH] AMDGPU: Add v_smfmac_f32_32x32x64_bf8_fp8 for gfx950 --- clang/include/clang/Basic/BuiltinsAMDGPU.def | 1 + .../CodeGenOpenCL/builtins-amdgcn-mfma.cl | 7 + .../builtins-amdgcn-error-gfx950-param.cl | 6 + .../builtins-amdgcn-error-gfx950.cl | 1 + llvm/include/llvm/IR/IntrinsicsAMDGPU.td | 1 + .../AMDGPU/AMDGPUInstructionSelector.cpp | 4 + .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp | 3 +- llvm/lib/Target/AMDGPU/VOP3PInstructions.td | 2 + .../UniformityAnalysis/AMDGPU/intrinsics.ll | 9 + .../AMDGPU/llvm.amdgcn.smfmac.gfx950.ll | 414 ++ llvm/test/MC/AMDGPU/mai-gfx950.s | 36 ++ .../MC/Disassembler/AMDGPU/gfx950_mai.txt | 22 + 12 files changed, 505 insertions(+), 1 deletion(-) diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def b/clang/include/clang/Basic/BuiltinsAMDGPU.def index 8abfcf496b7d73..d6123fa41ca8b8 100644 --- a/clang/include/clang/Basic/BuiltinsAMDGPU.def +++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def @@ -455,6 +455,7 @@ TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_16x16x128_bf8_fp8, "V4fV4iV8iV4fiIiIi TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_16x16x128_fp8_bf8, "V4fV4iV8iV4fiIiIi", "nc", "gfx950-insts") TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_16x16x128_fp8_fp8, "V4fV4iV8iV4fiIiIi", "nc", "gfx950-insts") TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_32x32x64_bf8_bf8, "V16fV4iV8iV16fiIiIi", "nc", "gfx950-insts") +TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8, "V16fV4iV8iV16fiIiIi", "nc", "gfx950-insts") //===--===// // GFX12+ only builtins. diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl b/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl index fdaedc1f92bede..d79ca36f003c5e 100644 --- a/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl +++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl @@ -545,4 +545,11 @@ void test_smfmac_f32_32x32x64_bf8_bf8(global v16f* out, v4i a, v8i b, v16f c, in *out = __builtin_amdgcn_smfmac_f32_32x32x64_bf8_bf8(a, b, c, idx, 0, 0); } +// CHECK-GFX950-LABEL: @test_smfmac_f32_32x32x64_bf8_fp8 +// CHECK-GFX950: call <16 x float> @llvm.amdgcn.smfmac.f32.32x32x64.bf8.fp8(<4 x i32> %a, <8 x i32> %b, <16 x float> %c, i32 %idx, i32 0, i32 0) +void test_smfmac_f32_32x32x64_bf8_fp8(global v16f* out, v4i a, v8i b, v16f c, int idx) +{ + *out = __builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8(a, b, c, idx, 0, 0); +} + #endif diff --git a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl index 9e0c46b8777533..d1751a6af15463 100644 --- a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl +++ b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl @@ -130,3 +130,9 @@ void test_smfmac_f32_32x32x64_bf8_bf8(global float16* out, int4 a, int8 b, float *out = __builtin_amdgcn_smfmac_f32_32x32x64_bf8_bf8(a, b, c, idx, d, 0); // expected-error{{argument to '__builtin_amdgcn_smfmac_f32_32x32x64_bf8_bf8' must be a constant integer}} *out = __builtin_amdgcn_smfmac_f32_32x32x64_bf8_bf8(a, b, c, idx, 0, d); // expected-error{{argument to '__builtin_amdgcn_smfmac_f32_32x32x64_bf8_bf8' must be a constant integer}} } + +void test_smfmac_f32_32x32x64_bf8_fp8(global float16* out, int4 a, int8 b, float16 c, int idx, int d) +{ + *out = __builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8(a, b, c, idx, d, 0); // expected-error{{argument to '__builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8' must be a constant integer}} + *out = __builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8(a, b, c, idx, 0, d); // expected-error{{argument to '__builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8' must be a constant integer}} +} diff --git a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl index a0955b290c9830..f8ac3399d2b64b 100644 --- a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl +++ b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl @@ -45,6 +45,7 @@ void test(__global float4* out0, half8 a0, half8 b0, float4 c0, *out12 = __builtin_amdgcn_smfmac_f32_16x16x128_fp8_bf8(a12, b12, c12, 0, 0, 0); // expected-error{{'__builtin_amdgcn_smfmac_f32_16x16x128_fp8_bf8' needs target feature gfx950-insts}} *out12 = __builtin_amdgcn_smfmac_f32_16x16x128_fp8_fp8(a12, b12, c12, 0, 0, 0); // expected-error{{'__builtin_amdgcn_smfmac_f32_16x16x128_fp8_fp8' needs target feature gfx950-insts}} *out13 = __builtin_amdgcn_smfmac_f32_32x32x64_bf8_bf8(a13, b13, c13, 0, 0, 0); // expected-error{{'__builtin_amdgcn_smfmac_f32_32x32x64_bf8_bf8' needs target feature gfx950-insts}} + *out13 = __builtin_amdgcn_smfmac_f32_
[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_smfmac_f32_32x32x64_fp8_fp8 for gfx950 (PR #117259)
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/117259 >From d5b3bb6210d19c81a935790c5267c3d97125a00d Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Sat, 3 Feb 2024 21:43:00 +0530 Subject: [PATCH] AMDGPU: Add v_smfmac_f32_32x32x64_fp8_fp8 for gfx950 --- clang/include/clang/Basic/BuiltinsAMDGPU.def | 1 + .../CodeGenOpenCL/builtins-amdgcn-mfma.cl | 7 + .../builtins-amdgcn-error-gfx950-param.cl | 6 + .../builtins-amdgcn-error-gfx950.cl | 1 + llvm/include/llvm/IR/IntrinsicsAMDGPU.td | 1 + .../AMDGPU/AMDGPUInstructionSelector.cpp | 4 + .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp | 3 +- llvm/lib/Target/AMDGPU/VOP3PInstructions.td | 2 + .../UniformityAnalysis/AMDGPU/intrinsics.ll | 9 + .../AMDGPU/llvm.amdgcn.smfmac.gfx950.ll | 414 ++ llvm/test/MC/AMDGPU/mai-gfx950.s | 36 ++ .../MC/Disassembler/AMDGPU/gfx950_mai.txt | 22 + 12 files changed, 505 insertions(+), 1 deletion(-) diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def b/clang/include/clang/Basic/BuiltinsAMDGPU.def index f90af7000e3196..51a5b1dbad495c 100644 --- a/clang/include/clang/Basic/BuiltinsAMDGPU.def +++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def @@ -457,6 +457,7 @@ TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_16x16x128_fp8_fp8, "V4fV4iV8iV4fiIiIi TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_32x32x64_bf8_bf8, "V16fV4iV8iV16fiIiIi", "nc", "gfx950-insts") TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8, "V16fV4iV8iV16fiIiIi", "nc", "gfx950-insts") TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_32x32x64_fp8_bf8, "V16fV4iV8iV16fiIiIi", "nc", "gfx950-insts") +TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_32x32x64_fp8_fp8, "V16fV4iV8iV16fiIiIi", "nc", "gfx950-insts") //===--===// // GFX12+ only builtins. diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl b/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl index 33b60d53f11cc8..00346baa6ff84d 100644 --- a/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl +++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl @@ -559,4 +559,11 @@ void test_smfmac_f32_32x32x64_fp8_bf8(global v16f* out, v4i a, v8i b, v16f c, in *out = __builtin_amdgcn_smfmac_f32_32x32x64_fp8_bf8(a, b, c, idx, 0, 0); } +// CHECK-GFX950-LABEL: @test_smfmac_f32_32x32x64_fp8_fp8 +// CHECK-GFX950: call <16 x float> @llvm.amdgcn.smfmac.f32.32x32x64.fp8.fp8(<4 x i32> %a, <8 x i32> %b, <16 x float> %c, i32 %idx, i32 0, i32 0) +void test_smfmac_f32_32x32x64_fp8_fp8(global v16f* out, v4i a, v8i b, v16f c, int idx) +{ + *out = __builtin_amdgcn_smfmac_f32_32x32x64_fp8_fp8(a, b, c, idx, 0, 0); +} + #endif diff --git a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl index c53ca8a7c3513f..b3b359a1e0c65b 100644 --- a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl +++ b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl @@ -142,3 +142,9 @@ void test_smfmac_f32_32x32x64_fp8_bf8(global float16* out, int4 a, int8 b, float *out = __builtin_amdgcn_smfmac_f32_32x32x64_fp8_bf8(a, b, c, idx, d, 0); // expected-error{{argument to '__builtin_amdgcn_smfmac_f32_32x32x64_fp8_bf8' must be a constant integer}} *out = __builtin_amdgcn_smfmac_f32_32x32x64_fp8_bf8(a, b, c, idx, 0, d); // expected-error{{argument to '__builtin_amdgcn_smfmac_f32_32x32x64_fp8_bf8' must be a constant integer}} } + +void test_smfmac_f32_32x32x64_fp8_fp8(global float16* out, int4 a, int8 b, float16 c, int idx, int d) +{ + *out = __builtin_amdgcn_smfmac_f32_32x32x64_fp8_fp8(a, b, c, idx, d, 0); // expected-error{{argument to '__builtin_amdgcn_smfmac_f32_32x32x64_fp8_fp8' must be a constant integer}} + *out = __builtin_amdgcn_smfmac_f32_32x32x64_fp8_fp8(a, b, c, idx, 0, d); // expected-error{{argument to '__builtin_amdgcn_smfmac_f32_32x32x64_fp8_fp8' must be a constant integer}} +} diff --git a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl index 9e563a7b0bd64c..57523cf0af1b18 100644 --- a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl +++ b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl @@ -47,6 +47,7 @@ void test(__global float4* out0, half8 a0, half8 b0, float4 c0, *out13 = __builtin_amdgcn_smfmac_f32_32x32x64_bf8_bf8(a13, b13, c13, 0, 0, 0); // expected-error{{'__builtin_amdgcn_smfmac_f32_32x32x64_bf8_bf8' needs target feature gfx950-insts}} *out13 = __builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8(a13, b13, c13, 0, 0, 0); // expected-error{{'__builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8' needs target feature gfx950-insts}} *out13 = __builtin_amdgcn_smfmac_f32_32x32x64_fp8_bf8(a13, b13, c13, 0, 0, 0); // expected-error{{'__builtin_amdgcn_smfmac_f32_32x32x64_fp8_bf8' needs target feature gfx950-insts}} + *out13 = __builtin_amdgcn_smfmac_f32_32
[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_smfmac_f32_32x32x32x64_fp8_bf8 for gfx950 (PR #117258)
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/117258 >From 32ccf3950258693e8ca7be1c7ecc6670debc2bf7 Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Sat, 3 Feb 2024 21:25:33 +0530 Subject: [PATCH] AMDGPU: Add v_smfmac_f32_32x32x32x64_fp8_bf8 for gfx950 --- clang/include/clang/Basic/BuiltinsAMDGPU.def | 1 + .../CodeGenOpenCL/builtins-amdgcn-mfma.cl | 7 + .../builtins-amdgcn-error-gfx950-param.cl | 6 + .../builtins-amdgcn-error-gfx950.cl | 1 + llvm/include/llvm/IR/IntrinsicsAMDGPU.td | 1 + .../AMDGPU/AMDGPUInstructionSelector.cpp | 4 + .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp | 3 +- llvm/lib/Target/AMDGPU/VOP3PInstructions.td | 2 + .../UniformityAnalysis/AMDGPU/intrinsics.ll | 9 + .../AMDGPU/llvm.amdgcn.smfmac.gfx950.ll | 414 ++ llvm/test/MC/AMDGPU/mai-gfx950.s | 36 ++ .../MC/Disassembler/AMDGPU/gfx950_mai.txt | 22 + 12 files changed, 505 insertions(+), 1 deletion(-) diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def b/clang/include/clang/Basic/BuiltinsAMDGPU.def index d6123fa41ca8b8..f90af7000e3196 100644 --- a/clang/include/clang/Basic/BuiltinsAMDGPU.def +++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def @@ -456,6 +456,7 @@ TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_16x16x128_fp8_bf8, "V4fV4iV8iV4fiIiIi TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_16x16x128_fp8_fp8, "V4fV4iV8iV4fiIiIi", "nc", "gfx950-insts") TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_32x32x64_bf8_bf8, "V16fV4iV8iV16fiIiIi", "nc", "gfx950-insts") TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8, "V16fV4iV8iV16fiIiIi", "nc", "gfx950-insts") +TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_32x32x64_fp8_bf8, "V16fV4iV8iV16fiIiIi", "nc", "gfx950-insts") //===--===// // GFX12+ only builtins. diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl b/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl index d79ca36f003c5e..33b60d53f11cc8 100644 --- a/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl +++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl @@ -552,4 +552,11 @@ void test_smfmac_f32_32x32x64_bf8_fp8(global v16f* out, v4i a, v8i b, v16f c, in *out = __builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8(a, b, c, idx, 0, 0); } +// CHECK-GFX950-LABEL: @test_smfmac_f32_32x32x64_fp8_bf8 +// CHECK-GFX950: call <16 x float> @llvm.amdgcn.smfmac.f32.32x32x64.fp8.bf8(<4 x i32> %a, <8 x i32> %b, <16 x float> %c, i32 %idx, i32 0, i32 0) +void test_smfmac_f32_32x32x64_fp8_bf8(global v16f* out, v4i a, v8i b, v16f c, int idx) +{ + *out = __builtin_amdgcn_smfmac_f32_32x32x64_fp8_bf8(a, b, c, idx, 0, 0); +} + #endif diff --git a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl index d1751a6af15463..c53ca8a7c3513f 100644 --- a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl +++ b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl @@ -136,3 +136,9 @@ void test_smfmac_f32_32x32x64_bf8_fp8(global float16* out, int4 a, int8 b, float *out = __builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8(a, b, c, idx, d, 0); // expected-error{{argument to '__builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8' must be a constant integer}} *out = __builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8(a, b, c, idx, 0, d); // expected-error{{argument to '__builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8' must be a constant integer}} } + +void test_smfmac_f32_32x32x64_fp8_bf8(global float16* out, int4 a, int8 b, float16 c, int idx, int d) +{ + *out = __builtin_amdgcn_smfmac_f32_32x32x64_fp8_bf8(a, b, c, idx, d, 0); // expected-error{{argument to '__builtin_amdgcn_smfmac_f32_32x32x64_fp8_bf8' must be a constant integer}} + *out = __builtin_amdgcn_smfmac_f32_32x32x64_fp8_bf8(a, b, c, idx, 0, d); // expected-error{{argument to '__builtin_amdgcn_smfmac_f32_32x32x64_fp8_bf8' must be a constant integer}} +} diff --git a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl index f8ac3399d2b64b..9e563a7b0bd64c 100644 --- a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl +++ b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl @@ -46,6 +46,7 @@ void test(__global float4* out0, half8 a0, half8 b0, float4 c0, *out12 = __builtin_amdgcn_smfmac_f32_16x16x128_fp8_fp8(a12, b12, c12, 0, 0, 0); // expected-error{{'__builtin_amdgcn_smfmac_f32_16x16x128_fp8_fp8' needs target feature gfx950-insts}} *out13 = __builtin_amdgcn_smfmac_f32_32x32x64_bf8_bf8(a13, b13, c13, 0, 0, 0); // expected-error{{'__builtin_amdgcn_smfmac_f32_32x32x64_bf8_bf8' needs target feature gfx950-insts}} *out13 = __builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8(a13, b13, c13, 0, 0, 0); // expected-error{{'__builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8' needs target feature gfx950-insts}} + *out13 = __builtin_amdgcn_smfmac_f3
[llvm-branch-commits] [llvm] AMDGPU: Handle gfx950 XDL-write-overlapped-smfma-src-c wait state change (PR #117263)
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/117263 >From 087117bc3dc327237d52746813e932d4c8f0b8bc Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Wed, 6 Mar 2024 19:51:00 +0530 Subject: [PATCH] AMDGPU: Handle gfx950 XDL-write-overlapped-smfma-src-c wait state change These have an additional wait state compared to gfx940. --- .../lib/Target/AMDGPU/GCNHazardRecognizer.cpp | 16 ++- .../CodeGen/AMDGPU/mai-hazards-gfx940.mir | 129 -- .../AMDGPU/mai-hazards-mfma-scale.gfx950.mir | 22 +-- 3 files changed, 107 insertions(+), 60 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp b/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp index 99a176731599cc..be0936ce74835f 100644 --- a/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp +++ b/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp @@ -2232,12 +2232,14 @@ int GCNHazardRecognizer::checkMAIHazards908(MachineInstr *MI) { } static int -GFX940_XDL_N_PassWritesVGPROverlappedSMFMASrcCWaitStates(int NumPasses) { - // 2 pass -> 3 - // 4 pass -> 5 - // 8 pass -> 9 - // 16 pass -> 17 - return NumPasses + 1; +GFX940_XDL_N_PassWritesVGPROverlappedSMFMASrcCWaitStates(int NumPasses, + bool IsGFX950) { + // xdl def cycles | gfx940 | gfx950 + // 2 pass | 34 + // 4 pass | 56 + // 8 pass | 910 + // 16 pass| 17 18 + return NumPasses + 1 + IsGFX950; } static int @@ -2373,7 +2375,7 @@ int GCNHazardRecognizer::checkMAIHazards90A(MachineInstr *MI) { NeedWaitStates = isXDL(ST, *MI1) ? GFX940_XDL_N_PassWritesVGPROverlappedSMFMASrcCWaitStates( - NumPasses) + NumPasses, ST.hasGFX950Insts()) : GFX940_SMFMA_N_PassWritesVGPROverlappedSMFMASrcCWaitStates( NumPasses); break; diff --git a/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir b/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir index d2b2f226404da8..b9135dbd46fc1f 100644 --- a/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir +++ b/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir @@ -145,7 +145,8 @@ body: | ... # GCN-LABEL: name: sgemm4x4_mfma_write_agpr_mfma_read_overlap # GCN: V_MFMA -# GCN-NEXT: S_NOP 2 +# GFX940-NEXT: S_NOP 2 +# GFX950-NEXT: S_NOP 3 # GCN-NEXT: V_MFMA name:sgemm4x4_mfma_write_agpr_mfma_read_overlap body: | @@ -155,7 +156,8 @@ body: | ... # GCN-LABEL: name: sgemm4x4_mfma_write_vgpr_mfma_read_overlap # GCN: V_MFMA -# GCN-NEXT: S_NOP 2 +# GFX940-NEXT: S_NOP 2 +# GFX950-NEXT: S_NOP 3 # GCN-NEXT: V_MFMA name:sgemm4x4_mfma_write_vgpr_mfma_read_overlap body: | @@ -165,7 +167,8 @@ body: | ... # GCN-LABEL: name: sgemm4x4_mfma_write_agpr_smfmac_read_overlap # GCN: V_MFMA -# GCN-NEXT: S_NOP 2 +# GFX940-NEXT: S_NOP 2 +# GFX950-NEXT: S_NOP 3 # GCN-NEXT: V_SMFMAC name:sgemm4x4_mfma_write_agpr_smfmac_read_overlap body: | @@ -175,8 +178,11 @@ body: | ... # GCN-LABEL: name: xdl_sgemm16x16_mfma_write_agpr_mfma_read_overlap # GCN: V_MFMA -# GCN-NEXT: S_NOP 7 -# GCN-NEXT: S_NOP 0 +# GFX940-NEXT: S_NOP 7 +# GFX940-NEXT: S_NOP 0 + +# GFX950-NEXT: S_NOP 7 +# GFX950-NEXT: S_NOP 1 # GCN-NEXT: V_MFMA name:xdl_sgemm16x16_mfma_write_agpr_mfma_read_overlap body: | @@ -186,8 +192,11 @@ body: | ... # GCN-LABEL: name: xdl_sgemm16x16_mfma_write_vgpr_mfma_read_overlap # GCN: V_MFMA -# GCN-NEXT: S_NOP 7 -# GCN-NEXT: S_NOP 0 +# GFX940-NEXT: S_NOP 7 +# GFX940-NEXT: S_NOP 0 + +# GFX950-NEXT: S_NOP 7 +# GFX950-NEXT: S_NOP 1 # GCN-NEXT: V_MFMA name:xdl_sgemm16x16_mfma_write_vgpr_mfma_read_overlap body: | @@ -216,8 +225,11 @@ body: | ... # GCN-LABEL: name: xdl_sgemm16x16_mfma_write_agpr_smfmac_read_overlap # GCN: V_MFMA -# GCN-NEXT: S_NOP 7 -# GCN-NEXT: S_NOP 0 +# GFX940-NEXT: S_NOP 7 +# GFX940-NEXT: S_NOP 0 + +# GFX950-NEXT: S_NOP 7 +# GFX950-NEXT: S_NOP 1 # GCN-NEXT: V_SMFMAC name:xdl_sgemm16x16_mfma_write_agpr_smfmac_read_overlap body: | @@ -229,7 +241,8 @@ body: | # GCN: V_MFMA # GCN-NEXT: S_NOP 7 # GCN-NEXT: S_NOP 7 -# GCN-NEXT: S_NOP 0 +# GFX940-NEXT: S_NOP 0 +# GFX950-NEXT: S_NOP 1 # GCN-NEXT: V_MFMA name:xdl_sgemm32x32_mfma_write_agpr_mfma_read_overlap body: | @@ -241,7 +254,8 @@ body: | # GCN: V_MFMA # GCN-NEXT: S_NOP 7 # GCN-NEXT: S_NOP 7 -# GCN-NEXT: S_NOP 0 +# GFX940-NEXT: S_NOP 0 +# GFX950-NEXT: S_NOP 1 # GCN-NEXT: V_MFMA name:xdl_sgemm32x32_mfma_write_vgpr_mfma_read_overlap body: | @@ -273,7 +287,8 @@ body: | # GCN: V_MFMA # GCN-NEXT: S_NOP 7 # GCN-NEXT: S_NOP 7 -# GCN-NEXT: S_
[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_smfmac_f32_16x16x128_bf8_fp8 for gfx950 (PR #117233)
arsenm wrote: ### Merge activity * **Nov 21, 7:53 PM EST**: A user started a stack merge that includes this pull request via [Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/117233). https://github.com/llvm/llvm-project/pull/117233 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_smfmac_f32_16x16x128_fp8_bf8 for gfx950 (PR #117234)
arsenm wrote: ### Merge activity * **Nov 21, 7:53 PM EST**: A user started a stack merge that includes this pull request via [Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/117234). https://github.com/llvm/llvm-project/pull/117234 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Handle vcmpx+permalane gfx950 hazard (PR #117286)
llvmbot wrote: @llvm/pr-subscribers-backend-amdgpu Author: Matt Arsenault (arsenm) Changes Confusingly, this is a different hazard to the one on gfx10 with a subtarget feature. --- Full diff: https://github.com/llvm/llvm-project/pull/117286.diff 3 Files Affected: - (modified) llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp (+31-4) - (modified) llvm/lib/Target/AMDGPU/GCNHazardRecognizer.h (+1) - (added) llvm/test/CodeGen/AMDGPU/hazards-gfx950.mir (+144) ``diff diff --git a/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp b/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp index 8008b5f7bcc991..45ff1f4a63cf03 100644 --- a/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp +++ b/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp @@ -168,7 +168,11 @@ static bool isPermlane(const MachineInstr &MI) { Opcode == AMDGPU::V_PERMLANE64_B32 || Opcode == AMDGPU::V_PERMLANEX16_B32_e64 || Opcode == AMDGPU::V_PERMLANE16_VAR_B32_e64 || - Opcode == AMDGPU::V_PERMLANEX16_VAR_B32_e64; + Opcode == AMDGPU::V_PERMLANEX16_VAR_B32_e64 || + Opcode == AMDGPU::V_PERMLANE16_SWAP_B32_e32 || + Opcode == AMDGPU::V_PERMLANE16_SWAP_B32_e64 || + Opcode == AMDGPU::V_PERMLANE32_SWAP_B32_e32 || + Opcode == AMDGPU::V_PERMLANE32_SWAP_B32_e64; } static bool isLdsDma(const MachineInstr &MI) { @@ -395,6 +399,9 @@ unsigned GCNHazardRecognizer::PreEmitNoopsCommon(MachineInstr *MI) { SIInstrInfo::isDS(*MI)) return std::max(WaitStates, checkMAILdStHazards(MI)); + if (ST.hasGFX950Insts() && isPermlane(*MI)) +return std::max(WaitStates, checkPermlaneHazards(MI)); + return WaitStates; } @@ -1200,6 +1207,14 @@ void GCNHazardRecognizer::fixHazards(MachineInstr *MI) { fixRequiredExportPriority(MI); } +static bool isVCmpXWritesExec(const SIInstrInfo &TII, + const SIRegisterInfo &TRI, + const MachineInstr &MI) { + return (TII.isVOPC(MI) || + (MI.isCompare() && (TII.isVOP3(MI) || TII.isSDWA(MI && +MI.modifiesRegister(AMDGPU::EXEC, &TRI); +} + bool GCNHazardRecognizer::fixVcmpxPermlaneHazards(MachineInstr *MI) { if (!ST.hasVcmpxPermlaneHazard() || !isPermlane(*MI)) return false; @@ -1207,9 +1222,7 @@ bool GCNHazardRecognizer::fixVcmpxPermlaneHazards(MachineInstr *MI) { const SIInstrInfo *TII = ST.getInstrInfo(); const SIRegisterInfo *TRI = ST.getRegisterInfo(); auto IsHazardFn = [TII, TRI](const MachineInstr &MI) { -return (TII->isVOPC(MI) || -((TII->isVOP3(MI) || TII->isSDWA(MI)) && MI.isCompare())) && - MI.modifiesRegister(AMDGPU::EXEC, TRI); +return isVCmpXWritesExec(*TII, *TRI, MI); }; auto IsExpiredFn = [](const MachineInstr &MI, int) { @@ -2529,6 +2542,20 @@ int GCNHazardRecognizer::checkMAILdStHazards(MachineInstr *MI) { return WaitStatesNeeded; } +int GCNHazardRecognizer::checkPermlaneHazards(MachineInstr *MI) { + assert(!ST.hasVcmpxPermlaneHazard() && + "this is a different vcmpx+permlane hazard"); + const SIRegisterInfo *TRI = ST.getRegisterInfo(); + const SIInstrInfo *TII = ST.getInstrInfo(); + + auto IsVCmpXWritesExecFn = [TII, TRI](const MachineInstr &MI) { +return isVCmpXWritesExec(*TII, *TRI, MI); + }; + + const int NumWaitStates = 4; + return NumWaitStates - getWaitStatesSince(IsVCmpXWritesExecFn, NumWaitStates); +} + static int GFX940_SMFMA_N_PassWriteVgprVALUWawWaitStates(int NumPasses) { // 2 pass -> 4 // 4 pass -> 6 diff --git a/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.h b/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.h index adb2278c48eebe..83ce100c58f0a6 100644 --- a/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.h +++ b/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.h @@ -134,6 +134,7 @@ class GCNHazardRecognizer final : public ScheduleHazardRecognizer { int checkMFMAPadding(MachineInstr *MI); int checkMAIVALUHazards(MachineInstr *MI); int checkMAILdStHazards(MachineInstr *MI); + int checkPermlaneHazards(MachineInstr *MI); public: GCNHazardRecognizer(const MachineFunction &MF); diff --git a/llvm/test/CodeGen/AMDGPU/hazards-gfx950.mir b/llvm/test/CodeGen/AMDGPU/hazards-gfx950.mir new file mode 100644 index 00..97bef7be711ff2 --- /dev/null +++ b/llvm/test/CodeGen/AMDGPU/hazards-gfx950.mir @@ -0,0 +1,144 @@ +# RUN: llc -mtriple=amdgcn -mcpu=gfx950 -verify-machineinstrs -run-pass=post-RA-hazard-rec %s -o - | FileCheck -check-prefix=GCN %s + +--- +# GCN-LABEL: name: vcmpx_vopc_write_exec_permlane16_swap_vop1 +# GCN: V_CMPX_EQ_I32_e32 +# GCN-NEXT: S_NOP 3 +# GCN-NEXT: V_PERMLANE +name:vcmpx_vopc_write_exec_permlane16_swap_vop1 +body: | + bb.0: +liveins: $vgpr0, $vgpr1 +V_CMPX_EQ_I32_e32 $vgpr0, $vgpr1, implicit-def $exec, implicit-def $vcc, implicit $exec +renamable $vgpr0, renamable $vgpr1 = V_PERMLANE16_SWAP_B32_e32 killed $vgpr0, killed $vgpr1, implicit $exec +... + +--- +# GCN-LABEL: nam
[llvm-branch-commits] [llvm] AMDGPU: Handle gfx950 XDL-write-overlapped-smfma-src-c wait state change (PR #117263)
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/117263 >From 736d914241979efb46b506fb45cee79e73bbd20e Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Wed, 6 Mar 2024 19:51:00 +0530 Subject: [PATCH] AMDGPU: Handle gfx950 XDL-write-overlapped-smfma-src-c wait state change These have an additional wait state compared to gfx940. --- .../lib/Target/AMDGPU/GCNHazardRecognizer.cpp | 16 ++- .../CodeGen/AMDGPU/mai-hazards-gfx940.mir | 129 -- .../AMDGPU/mai-hazards-mfma-scale.gfx950.mir | 22 +-- 3 files changed, 107 insertions(+), 60 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp b/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp index 99a176731599cc..be0936ce74835f 100644 --- a/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp +++ b/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp @@ -2232,12 +2232,14 @@ int GCNHazardRecognizer::checkMAIHazards908(MachineInstr *MI) { } static int -GFX940_XDL_N_PassWritesVGPROverlappedSMFMASrcCWaitStates(int NumPasses) { - // 2 pass -> 3 - // 4 pass -> 5 - // 8 pass -> 9 - // 16 pass -> 17 - return NumPasses + 1; +GFX940_XDL_N_PassWritesVGPROverlappedSMFMASrcCWaitStates(int NumPasses, + bool IsGFX950) { + // xdl def cycles | gfx940 | gfx950 + // 2 pass | 34 + // 4 pass | 56 + // 8 pass | 910 + // 16 pass| 17 18 + return NumPasses + 1 + IsGFX950; } static int @@ -2373,7 +2375,7 @@ int GCNHazardRecognizer::checkMAIHazards90A(MachineInstr *MI) { NeedWaitStates = isXDL(ST, *MI1) ? GFX940_XDL_N_PassWritesVGPROverlappedSMFMASrcCWaitStates( - NumPasses) + NumPasses, ST.hasGFX950Insts()) : GFX940_SMFMA_N_PassWritesVGPROverlappedSMFMASrcCWaitStates( NumPasses); break; diff --git a/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir b/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir index d2b2f226404da8..b9135dbd46fc1f 100644 --- a/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir +++ b/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir @@ -145,7 +145,8 @@ body: | ... # GCN-LABEL: name: sgemm4x4_mfma_write_agpr_mfma_read_overlap # GCN: V_MFMA -# GCN-NEXT: S_NOP 2 +# GFX940-NEXT: S_NOP 2 +# GFX950-NEXT: S_NOP 3 # GCN-NEXT: V_MFMA name:sgemm4x4_mfma_write_agpr_mfma_read_overlap body: | @@ -155,7 +156,8 @@ body: | ... # GCN-LABEL: name: sgemm4x4_mfma_write_vgpr_mfma_read_overlap # GCN: V_MFMA -# GCN-NEXT: S_NOP 2 +# GFX940-NEXT: S_NOP 2 +# GFX950-NEXT: S_NOP 3 # GCN-NEXT: V_MFMA name:sgemm4x4_mfma_write_vgpr_mfma_read_overlap body: | @@ -165,7 +167,8 @@ body: | ... # GCN-LABEL: name: sgemm4x4_mfma_write_agpr_smfmac_read_overlap # GCN: V_MFMA -# GCN-NEXT: S_NOP 2 +# GFX940-NEXT: S_NOP 2 +# GFX950-NEXT: S_NOP 3 # GCN-NEXT: V_SMFMAC name:sgemm4x4_mfma_write_agpr_smfmac_read_overlap body: | @@ -175,8 +178,11 @@ body: | ... # GCN-LABEL: name: xdl_sgemm16x16_mfma_write_agpr_mfma_read_overlap # GCN: V_MFMA -# GCN-NEXT: S_NOP 7 -# GCN-NEXT: S_NOP 0 +# GFX940-NEXT: S_NOP 7 +# GFX940-NEXT: S_NOP 0 + +# GFX950-NEXT: S_NOP 7 +# GFX950-NEXT: S_NOP 1 # GCN-NEXT: V_MFMA name:xdl_sgemm16x16_mfma_write_agpr_mfma_read_overlap body: | @@ -186,8 +192,11 @@ body: | ... # GCN-LABEL: name: xdl_sgemm16x16_mfma_write_vgpr_mfma_read_overlap # GCN: V_MFMA -# GCN-NEXT: S_NOP 7 -# GCN-NEXT: S_NOP 0 +# GFX940-NEXT: S_NOP 7 +# GFX940-NEXT: S_NOP 0 + +# GFX950-NEXT: S_NOP 7 +# GFX950-NEXT: S_NOP 1 # GCN-NEXT: V_MFMA name:xdl_sgemm16x16_mfma_write_vgpr_mfma_read_overlap body: | @@ -216,8 +225,11 @@ body: | ... # GCN-LABEL: name: xdl_sgemm16x16_mfma_write_agpr_smfmac_read_overlap # GCN: V_MFMA -# GCN-NEXT: S_NOP 7 -# GCN-NEXT: S_NOP 0 +# GFX940-NEXT: S_NOP 7 +# GFX940-NEXT: S_NOP 0 + +# GFX950-NEXT: S_NOP 7 +# GFX950-NEXT: S_NOP 1 # GCN-NEXT: V_SMFMAC name:xdl_sgemm16x16_mfma_write_agpr_smfmac_read_overlap body: | @@ -229,7 +241,8 @@ body: | # GCN: V_MFMA # GCN-NEXT: S_NOP 7 # GCN-NEXT: S_NOP 7 -# GCN-NEXT: S_NOP 0 +# GFX940-NEXT: S_NOP 0 +# GFX950-NEXT: S_NOP 1 # GCN-NEXT: V_MFMA name:xdl_sgemm32x32_mfma_write_agpr_mfma_read_overlap body: | @@ -241,7 +254,8 @@ body: | # GCN: V_MFMA # GCN-NEXT: S_NOP 7 # GCN-NEXT: S_NOP 7 -# GCN-NEXT: S_NOP 0 +# GFX940-NEXT: S_NOP 0 +# GFX950-NEXT: S_NOP 1 # GCN-NEXT: V_MFMA name:xdl_sgemm32x32_mfma_write_vgpr_mfma_read_overlap body: | @@ -273,7 +287,8 @@ body: | # GCN: V_MFMA # GCN-NEXT: S_NOP 7 # GCN-NEXT: S_NOP 7 -# GCN-NEXT: S_
[llvm-branch-commits] [llvm] AMDGPU: Handle gfx950 change in mfma_f64_16x16x4 + valu hazard (PR #117262)
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/117262 >From fc9424bd9d0d54a931f4059ff9a6f657f1c5a2dd Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Thu, 7 Mar 2024 15:01:08 +0530 Subject: [PATCH] AMDGPU: Handle gfx950 change in mfma_f64_16x16x4 + valu hazard Increase from 11 wait states to 19 --- .../lib/Target/AMDGPU/GCNHazardRecognizer.cpp | 10 +-- .../CodeGen/AMDGPU/mai-hazards-gfx940.mir | 28 ++- 2 files changed, 28 insertions(+), 10 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp b/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp index 44afccb0690d0d..99a176731599cc 100644 --- a/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp +++ b/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp @@ -2603,6 +2603,7 @@ int GCNHazardRecognizer::checkMAIVALUHazards(MachineInstr *MI) { const int DMFMA16x16WriteVgprMemExpReadWaitStates = 18; const int DMFMA4x4WriteVgprVALUReadWaitStates = 6; const int DMFMA16x16WriteVgprVALUReadWaitStates = 11; +const int GFX950_DMFMA16x16WriteVgprVALUReadWaitStates = 19; const int DotWriteSameDotReadSrcAB = 3; const int DotWriteDifferentVALURead = 3; const int DMFMABetweenVALUWriteVMEMRead = 2; @@ -2663,9 +2664,12 @@ int GCNHazardRecognizer::checkMAIVALUHazards(MachineInstr *MI) { break; case 8: case 16: - NeedWaitStates = IsMemOrExport - ? DMFMA16x16WriteVgprMemExpReadWaitStates - : DMFMA16x16WriteVgprVALUReadWaitStates; + NeedWaitStates = + IsMemOrExport + ? DMFMA16x16WriteVgprMemExpReadWaitStates + : (ST.hasGFX950Insts() + ? GFX950_DMFMA16x16WriteVgprVALUReadWaitStates + : DMFMA16x16WriteVgprVALUReadWaitStates); break; default: llvm_unreachable("unexpected dgemm"); diff --git a/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir b/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir index 9681b01f334f9a..d2b2f226404da8 100644 --- a/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir +++ b/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir @@ -1,4 +1,5 @@ -# RUN: llc -mtriple=amdgcn -mcpu=gfx940 -verify-machineinstrs -run-pass post-RA-hazard-rec %s -o - | FileCheck -check-prefix=GCN %s +# RUN: llc -mtriple=amdgcn -mcpu=gfx940 -verify-machineinstrs -run-pass post-RA-hazard-rec %s -o - | FileCheck -check-prefixes=GCN,GFX940 %s +# RUN: llc -mtriple=amdgcn -mcpu=gfx950 -verify-machineinstrs -run-pass post-RA-hazard-rec %s -o - | FileCheck -check-prefixes=GCN,GFX950 %s # GCN-LABEL: name: valu_write_vgpr_sgemm_mfma_read # GCN: V_MOV_B32 @@ -803,8 +804,12 @@ body: | ... # GCN-LABEL: name: dmfma16x16_write_vgpr_valu_read # GCN: V_MFMA -# GCN-NEXT: S_NOP 7 -# GCN-NEXT: S_NOP 2 +# GFX940-NEXT: S_NOP 7 +# GFX940-NEXT: S_NOP 2 + +# GFX950-NEXT: S_NOP 7 +# GFX950-NEXT: S_NOP 7 +# GFX950-NEXT: S_NOP 2 # GCN-NEXT: V_MOV_B32 name:dmfma16x16_write_vgpr_valu_read body: | @@ -867,8 +872,13 @@ body: | ... # GCN-LABEL: name: dmfma16x16_write_vgpr_dot_read # GCN: V_MFMA -# GCN-NEXT: S_NOP 7 -# GCN-NEXT: S_NOP 2 +# GFX940-NEXT: S_NOP 7 +# GFX940-NEXT: S_NOP 2 + +# GFX950-NEXT: S_NOP 7 +# GFX950-NEXT: S_NOP 7 +# GFX950-NEXT: S_NOP 2 + # GCN-NEXT: V_DOT name:dmfma16x16_write_vgpr_dot_read body: | @@ -1505,8 +1515,12 @@ body: | ... # GCN-LABEL: name: dmfma16x16_write_agpr_valu_read # GCN: V_MFMA -# GCN-NEXT: S_NOP 7 -# GCN-NEXT: S_NOP 2 +# GFX940-NEXT: S_NOP 7 +# GFX940-NEXT: S_NOP 2 + +# GFX950-NEXT: S_NOP 7 +# GFX950-NEXT: S_NOP 7 +# GFX950-NEXT: S_NOP 2 # GCN-NEXT: V_ACCVGPR_READ_B32_e64 name:dmfma16x16_write_agpr_valu_read body: | ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_smfmac_f32_32x32x32x64_fp8_bf8 for gfx950 (PR #117258)
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/117258 >From 24576df683abfa29c9d7f4406a318b6b67701732 Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Sat, 3 Feb 2024 21:25:33 +0530 Subject: [PATCH] AMDGPU: Add v_smfmac_f32_32x32x32x64_fp8_bf8 for gfx950 --- clang/include/clang/Basic/BuiltinsAMDGPU.def | 1 + .../CodeGenOpenCL/builtins-amdgcn-mfma.cl | 7 + .../builtins-amdgcn-error-gfx950-param.cl | 6 + .../builtins-amdgcn-error-gfx950.cl | 1 + llvm/include/llvm/IR/IntrinsicsAMDGPU.td | 1 + .../AMDGPU/AMDGPUInstructionSelector.cpp | 4 + .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp | 3 +- llvm/lib/Target/AMDGPU/VOP3PInstructions.td | 2 + .../UniformityAnalysis/AMDGPU/intrinsics.ll | 9 + .../AMDGPU/llvm.amdgcn.smfmac.gfx950.ll | 414 ++ llvm/test/MC/AMDGPU/mai-gfx950.s | 36 ++ .../MC/Disassembler/AMDGPU/gfx950_mai.txt | 22 + 12 files changed, 505 insertions(+), 1 deletion(-) diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def b/clang/include/clang/Basic/BuiltinsAMDGPU.def index d6123fa41ca8b8..f90af7000e3196 100644 --- a/clang/include/clang/Basic/BuiltinsAMDGPU.def +++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def @@ -456,6 +456,7 @@ TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_16x16x128_fp8_bf8, "V4fV4iV8iV4fiIiIi TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_16x16x128_fp8_fp8, "V4fV4iV8iV4fiIiIi", "nc", "gfx950-insts") TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_32x32x64_bf8_bf8, "V16fV4iV8iV16fiIiIi", "nc", "gfx950-insts") TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8, "V16fV4iV8iV16fiIiIi", "nc", "gfx950-insts") +TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_32x32x64_fp8_bf8, "V16fV4iV8iV16fiIiIi", "nc", "gfx950-insts") //===--===// // GFX12+ only builtins. diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl b/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl index d79ca36f003c5e..33b60d53f11cc8 100644 --- a/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl +++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl @@ -552,4 +552,11 @@ void test_smfmac_f32_32x32x64_bf8_fp8(global v16f* out, v4i a, v8i b, v16f c, in *out = __builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8(a, b, c, idx, 0, 0); } +// CHECK-GFX950-LABEL: @test_smfmac_f32_32x32x64_fp8_bf8 +// CHECK-GFX950: call <16 x float> @llvm.amdgcn.smfmac.f32.32x32x64.fp8.bf8(<4 x i32> %a, <8 x i32> %b, <16 x float> %c, i32 %idx, i32 0, i32 0) +void test_smfmac_f32_32x32x64_fp8_bf8(global v16f* out, v4i a, v8i b, v16f c, int idx) +{ + *out = __builtin_amdgcn_smfmac_f32_32x32x64_fp8_bf8(a, b, c, idx, 0, 0); +} + #endif diff --git a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl index d1751a6af15463..c53ca8a7c3513f 100644 --- a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl +++ b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl @@ -136,3 +136,9 @@ void test_smfmac_f32_32x32x64_bf8_fp8(global float16* out, int4 a, int8 b, float *out = __builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8(a, b, c, idx, d, 0); // expected-error{{argument to '__builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8' must be a constant integer}} *out = __builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8(a, b, c, idx, 0, d); // expected-error{{argument to '__builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8' must be a constant integer}} } + +void test_smfmac_f32_32x32x64_fp8_bf8(global float16* out, int4 a, int8 b, float16 c, int idx, int d) +{ + *out = __builtin_amdgcn_smfmac_f32_32x32x64_fp8_bf8(a, b, c, idx, d, 0); // expected-error{{argument to '__builtin_amdgcn_smfmac_f32_32x32x64_fp8_bf8' must be a constant integer}} + *out = __builtin_amdgcn_smfmac_f32_32x32x64_fp8_bf8(a, b, c, idx, 0, d); // expected-error{{argument to '__builtin_amdgcn_smfmac_f32_32x32x64_fp8_bf8' must be a constant integer}} +} diff --git a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl index f8ac3399d2b64b..9e563a7b0bd64c 100644 --- a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl +++ b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl @@ -46,6 +46,7 @@ void test(__global float4* out0, half8 a0, half8 b0, float4 c0, *out12 = __builtin_amdgcn_smfmac_f32_16x16x128_fp8_fp8(a12, b12, c12, 0, 0, 0); // expected-error{{'__builtin_amdgcn_smfmac_f32_16x16x128_fp8_fp8' needs target feature gfx950-insts}} *out13 = __builtin_amdgcn_smfmac_f32_32x32x64_bf8_bf8(a13, b13, c13, 0, 0, 0); // expected-error{{'__builtin_amdgcn_smfmac_f32_32x32x64_bf8_bf8' needs target feature gfx950-insts}} *out13 = __builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8(a13, b13, c13, 0, 0, 0); // expected-error{{'__builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8' needs target feature gfx950-insts}} + *out13 = __builtin_amdgcn_smfmac_f3
[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_permlane16_swap_b32 and v_permlane32_swap_b32 for gfx950 (PR #117260)
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/117260 >From 549b571ea25a06301f719778786a288d85604464 Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Mon, 22 Jan 2024 12:40:54 +0700 Subject: [PATCH] AMDGPU: Add v_permlane16_swap_b32 and v_permlane32_swap_b32 for gfx950 This was a bit annoying because these introduce a new special case encoding usage. op_sel is repurposed as a subset of dpp controls, and is eligible for VOP3->VOP1 shrinking. For some reason fi also uses an enum value, so we need to convert the raw boolean to 1 instead of -1. The 2 registers are swapped, so this has 2 defs. Ideally the builtin would return a pair, but that's difficult so return a vector instead. This would make a hypothetical builtin that supports v2f16 directly uglier. --- clang/include/clang/Basic/BuiltinsAMDGPU.def | 3 + clang/lib/CodeGen/CGBuiltin.cpp | 26 clang/test/CodeGenOpenCL/amdgpu-features.cl | 2 +- .../builtins-amdgcn-gfx950-err.cl | 6 +- .../CodeGenOpenCL/builtins-amdgcn-gfx950.cl | 87 + .../builtins-amdgcn-error-gfx950-param.cl | 10 ++ .../builtins-amdgcn-error-gfx950.cl | 5 +- llvm/docs/AMDGPUUsage.rst | 13 ++ llvm/include/llvm/IR/IntrinsicsAMDGPU.td | 14 ++ llvm/lib/Target/AMDGPU/AMDGPU.td | 23 +++- llvm/lib/Target/AMDGPU/AMDGPUGISel.td | 3 + llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp | 25 .../AMDGPU/AMDGPUInstructionSelector.cpp | 32 + .../Target/AMDGPU/AMDGPUInstructionSelector.h | 3 + .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp | 9 ++ .../Target/AMDGPU/AMDGPUSearchableTables.td | 2 + llvm/lib/Target/AMDGPU/GCNSubtarget.h | 8 +- llvm/lib/Target/AMDGPU/SIInstrInfo.cpp| 6 +- llvm/lib/Target/AMDGPU/SIInstrInfo.td | 4 + llvm/lib/Target/AMDGPU/VOP1Instructions.td| 46 +++ llvm/lib/Target/AMDGPU/VOPInstructions.td | 12 ++ llvm/lib/TargetParser/TargetParser.cpp| 2 + .../UniformityAnalysis/AMDGPU/intrinsics.ll | 16 +++ .../AMDGPU/llvm.amdgcn.permlane16.swap.ll | 121 ++ .../AMDGPU/llvm.amdgcn.permlane32.swap.ll | 121 ++ llvm/test/MC/AMDGPU/gfx950_asm_features.s | 82 llvm/test/MC/AMDGPU/gfx950_err.s | 31 + llvm/test/MC/Disassembler/AMDGPU/gfx950.txt | 32 + 28 files changed, 737 insertions(+), 7 deletions(-) create mode 100644 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.permlane16.swap.ll create mode 100644 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.permlane32.swap.ll create mode 100644 llvm/test/MC/AMDGPU/gfx950_err.s diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def b/clang/include/clang/Basic/BuiltinsAMDGPU.def index 51a5b1dbad495c..548bcc8ad55f48 100644 --- a/clang/include/clang/Basic/BuiltinsAMDGPU.def +++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def @@ -459,6 +459,9 @@ TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8, "V16fV4iV8iV16fiIiI TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_32x32x64_fp8_bf8, "V16fV4iV8iV16fiIiIi", "nc", "gfx950-insts") TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_32x32x64_fp8_fp8, "V16fV4iV8iV16fiIiIi", "nc", "gfx950-insts") +TARGET_BUILTIN(__builtin_amdgcn_permlane16_swap, "V2UiUiUiIbIb", "nc", "permlane16-swap") +TARGET_BUILTIN(__builtin_amdgcn_permlane32_swap, "V2UiUiUiIbIb", "nc", "permlane32-swap") + //===--===// // GFX12+ only builtins. //===--===// diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp index ff7132fd8bc1e7..3b3c46b56868cf 100644 --- a/clang/lib/CodeGen/CGBuiltin.cpp +++ b/clang/lib/CodeGen/CGBuiltin.cpp @@ -20162,6 +20162,32 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, CGM.getIntrinsic(Intrinsic::amdgcn_s_sendmsg_rtn, {ResultType}); return Builder.CreateCall(F, {Arg}); } + case AMDGPU::BI__builtin_amdgcn_permlane16_swap: + case AMDGPU::BI__builtin_amdgcn_permlane32_swap: { +// Because builtin types are limited, and the intrinsic uses a struct/pair +// output, marshal the pair-of-i32 to <2 x i32>. +Value *VDstOld = EmitScalarExpr(E->getArg(0)); +Value *VSrcOld = EmitScalarExpr(E->getArg(1)); +Value *FI = EmitScalarExpr(E->getArg(2)); +Value *BoundCtrl = EmitScalarExpr(E->getArg(3)); +Function *F = +CGM.getIntrinsic(BuiltinID == AMDGPU::BI__builtin_amdgcn_permlane16_swap + ? Intrinsic::amdgcn_permlane16_swap + : Intrinsic::amdgcn_permlane32_swap); +llvm::CallInst *Call = +Builder.CreateCall(F, {VDstOld, VSrcOld, FI, BoundCtrl}); + +llvm::Value *Elt0 = Builder.CreateExtractValue(Call, 0); +llvm::Value *Elt1 = Builder.CreateExtractValue(Call, 1); + +llvm::Type *ResultType = Con
[llvm-branch-commits] [llvm] [LoongArch] Fix GOT usage for `non-dso_local` function calls in large code model (PR #117134)
https://github.com/SixWeining approved this pull request. https://github.com/llvm/llvm-project/pull/117134 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] [RISCV] Support __builtin_cpu_is (PR #116231)
https://github.com/wangpc-pp updated https://github.com/llvm/llvm-project/pull/116231 >From 9686a2c5c5276289e72d9098f497a9f246a1c457 Mon Sep 17 00:00:00 2001 From: Wang Pengcheng Date: Thu, 14 Nov 2024 22:06:45 +0800 Subject: [PATCH 1/4] Remove stale CHECKs Created using spr 1.3.6-beta.1 --- clang/test/CodeGen/builtin-cpu-is.c | 20 1 file changed, 20 deletions(-) diff --git a/clang/test/CodeGen/builtin-cpu-is.c b/clang/test/CodeGen/builtin-cpu-is.c index e4a2071cf46795..b8dd97eeacebcf 100644 --- a/clang/test/CodeGen/builtin-cpu-is.c +++ b/clang/test/CodeGen/builtin-cpu-is.c @@ -7,8 +7,6 @@ // global, the bit grab, and the icmp correct. extern void a(const char *); -// CHECK: @__cpu_model = external dso_local global { i32, i32, i32, [1 x i32] } - // CHECK-X86-LABEL: define dso_local void @intel( // CHECK-X86-SAME: ) #[[ATTR0:[0-9]+]] { // CHECK-X86-NEXT: [[ENTRY:.*:]] @@ -24,9 +22,6 @@ extern void a(const char *); void intel(void) { if (__builtin_cpu_is("intel")) a("intel"); - - // CHECK: [[LOAD:%[^ ]+]] = load i32, ptr @__cpu_model - // CHECK: = icmp eq i32 [[LOAD]], 1 } // CHECK-X86-LABEL: define dso_local void @amd( @@ -44,9 +39,6 @@ void intel(void) { void amd(void) { if (__builtin_cpu_is("amd")) a("amd"); - - // CHECK: [[LOAD:%[^ ]+]] = load i32, ptr @__cpu_model - // CHECK: = icmp eq i32 [[LOAD]], 2 } // CHECK-X86-LABEL: define dso_local void @atom( @@ -64,9 +56,6 @@ void amd(void) { void atom(void) { if (__builtin_cpu_is("atom")) a("atom"); - - // CHECK: [[LOAD:%[^ ]+]] = load i32, ptr getelementptr inbounds ({ i32, i32, i32, [1 x i32] }, ptr @__cpu_model, i32 0, i32 1) - // CHECK: = icmp eq i32 [[LOAD]], 1 } // CHECK-X86-LABEL: define dso_local void @amdfam10h( @@ -84,9 +73,6 @@ void atom(void) { void amdfam10h(void) { if (__builtin_cpu_is("amdfam10h")) a("amdfam10h"); - - // CHECK: [[LOAD:%[^ ]+]] = load i32, ptr getelementptr inbounds ({ i32, i32, i32, [1 x i32] }, ptr @__cpu_model, i32 0, i32 1) - // CHECK: = icmp eq i32 [[LOAD]], 4 } // CHECK-X86-LABEL: define dso_local void @barcelona( @@ -104,9 +90,6 @@ void amdfam10h(void) { void barcelona(void) { if (__builtin_cpu_is("barcelona")) a("barcelona"); - - // CHECK: [[LOAD:%[^ ]+]] = load i32, ptr getelementptr inbounds ({ i32, i32, i32, [1 x i32] }, ptr @__cpu_model, i32 0, i32 2) - // CHECK: = icmp eq i32 [[LOAD]], 4 } // CHECK-X86-LABEL: define dso_local void @nehalem( @@ -124,9 +107,6 @@ void barcelona(void) { void nehalem(void) { if (__builtin_cpu_is("nehalem")) a("nehalem"); - - // CHECK: [[LOAD:%[^ ]+]] = load i32, ptr getelementptr inbounds ({ i32, i32, i32, [1 x i32] }, ptr @__cpu_model, i32 0, i32 2) - // CHECK: = icmp eq i32 [[LOAD]], 1 } #endif >From 2bb2d5079b5bf98ba9f87e082ca3e67ab70068aa Mon Sep 17 00:00:00 2001 From: Wang Pengcheng Date: Thu, 14 Nov 2024 22:12:36 +0800 Subject: [PATCH 2/4] Simplify test Created using spr 1.3.6-beta.1 --- clang/test/CodeGen/builtin-cpu-is.c | 25 ++--- 1 file changed, 6 insertions(+), 19 deletions(-) diff --git a/clang/test/CodeGen/builtin-cpu-is.c b/clang/test/CodeGen/builtin-cpu-is.c index b8dd97eeacebcf..8e78213a7cfcfb 100644 --- a/clang/test/CodeGen/builtin-cpu-is.c +++ b/clang/test/CodeGen/builtin-cpu-is.c @@ -111,12 +111,9 @@ void nehalem(void) { #endif #ifdef __riscv -// CHECK-RV64-LABEL: define dso_local signext i32 @test_riscv( -// CHECK-RV64-SAME: i32 noundef signext [[A:%.*]]) #[[ATTR0:[0-9]+]] { +// CHECK-RV64-LABEL: define dso_local signext i32 @test_cpu_is_veyron_v1( +// CHECK-RV64-SAME: ) #[[ATTR0:[0-9]+]] { // CHECK-RV64-NEXT: [[ENTRY:.*:]] -// CHECK-RV64-NEXT:[[RETVAL:%.*]] = alloca i32, align 4 -// CHECK-RV64-NEXT:[[A_ADDR:%.*]] = alloca i32, align 4 -// CHECK-RV64-NEXT:store i32 [[A]], ptr [[A_ADDR]], align 4 // CHECK-RV64-NEXT:[[TMP0:%.*]] = load i32, ptr @__riscv_cpu_model, align 4 // CHECK-RV64-NEXT:[[TMP1:%.*]] = icmp eq i32 [[TMP0]], 1567 // CHECK-RV64-NEXT:[[TMP2:%.*]] = load i64, ptr getelementptr inbounds ({ i32, i64, i64 }, ptr @__riscv_cpu_model, i32 0, i32 1), align 8 @@ -125,20 +122,10 @@ void nehalem(void) { // CHECK-RV64-NEXT:[[TMP5:%.*]] = load i64, ptr getelementptr inbounds ({ i32, i64, i64 }, ptr @__riscv_cpu_model, i32 0, i32 2), align 8 // CHECK-RV64-NEXT:[[TMP6:%.*]] = icmp eq i64 [[TMP5]], 273 // CHECK-RV64-NEXT:[[TMP7:%.*]] = and i1 [[TMP4]], [[TMP6]] -// CHECK-RV64-NEXT:br i1 [[TMP7]], label %[[IF_THEN:.*]], label %[[IF_END:.*]] -// CHECK-RV64: [[IF_THEN]]: -// CHECK-RV64-NEXT:store i32 3, ptr [[RETVAL]], align 4 -// CHECK-RV64-NEXT:br label %[[RETURN:.*]] -// CHECK-RV64: [[IF_END]]: -// CHECK-RV64-NEXT:store i32 0, ptr [[RETVAL]], align 4 -// CHECK-RV64-NEXT:br label %[[RETURN]] -// CHECK-RV64: [[RETURN]]: -// CHECK-RV64-NEXT:[[TMP8:%.*]] = load i32, ptr [[RETVAL]], align 4 -// CHECK-RV64-NEXT:ret i32 [[TM
[llvm-branch-commits] [clang] [llvm] [RISCV] Support __builtin_cpu_is (PR #116231)
@@ -58,6 +58,19 @@ bool hasFastVectorUnalignedAccess(StringRef CPU) { return Info && Info->FastVectorUnalignedAccess; } +bool hasValidCPUModel(StringRef CPU) { + const CPUModel CPUModel = getCPUModel(CPU); + return CPUModel.MVendorID != 0 && CPUModel.MArchID != 0 && wangpc-pp wrote: Done! https://github.com/llvm/llvm-project/pull/116231 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] [RISCV] Support __builtin_cpu_is (PR #116231)
https://github.com/wangpc-pp updated https://github.com/llvm/llvm-project/pull/116231 >From 9686a2c5c5276289e72d9098f497a9f246a1c457 Mon Sep 17 00:00:00 2001 From: Wang Pengcheng Date: Thu, 14 Nov 2024 22:06:45 +0800 Subject: [PATCH 1/4] Remove stale CHECKs Created using spr 1.3.6-beta.1 --- clang/test/CodeGen/builtin-cpu-is.c | 20 1 file changed, 20 deletions(-) diff --git a/clang/test/CodeGen/builtin-cpu-is.c b/clang/test/CodeGen/builtin-cpu-is.c index e4a2071cf46795..b8dd97eeacebcf 100644 --- a/clang/test/CodeGen/builtin-cpu-is.c +++ b/clang/test/CodeGen/builtin-cpu-is.c @@ -7,8 +7,6 @@ // global, the bit grab, and the icmp correct. extern void a(const char *); -// CHECK: @__cpu_model = external dso_local global { i32, i32, i32, [1 x i32] } - // CHECK-X86-LABEL: define dso_local void @intel( // CHECK-X86-SAME: ) #[[ATTR0:[0-9]+]] { // CHECK-X86-NEXT: [[ENTRY:.*:]] @@ -24,9 +22,6 @@ extern void a(const char *); void intel(void) { if (__builtin_cpu_is("intel")) a("intel"); - - // CHECK: [[LOAD:%[^ ]+]] = load i32, ptr @__cpu_model - // CHECK: = icmp eq i32 [[LOAD]], 1 } // CHECK-X86-LABEL: define dso_local void @amd( @@ -44,9 +39,6 @@ void intel(void) { void amd(void) { if (__builtin_cpu_is("amd")) a("amd"); - - // CHECK: [[LOAD:%[^ ]+]] = load i32, ptr @__cpu_model - // CHECK: = icmp eq i32 [[LOAD]], 2 } // CHECK-X86-LABEL: define dso_local void @atom( @@ -64,9 +56,6 @@ void amd(void) { void atom(void) { if (__builtin_cpu_is("atom")) a("atom"); - - // CHECK: [[LOAD:%[^ ]+]] = load i32, ptr getelementptr inbounds ({ i32, i32, i32, [1 x i32] }, ptr @__cpu_model, i32 0, i32 1) - // CHECK: = icmp eq i32 [[LOAD]], 1 } // CHECK-X86-LABEL: define dso_local void @amdfam10h( @@ -84,9 +73,6 @@ void atom(void) { void amdfam10h(void) { if (__builtin_cpu_is("amdfam10h")) a("amdfam10h"); - - // CHECK: [[LOAD:%[^ ]+]] = load i32, ptr getelementptr inbounds ({ i32, i32, i32, [1 x i32] }, ptr @__cpu_model, i32 0, i32 1) - // CHECK: = icmp eq i32 [[LOAD]], 4 } // CHECK-X86-LABEL: define dso_local void @barcelona( @@ -104,9 +90,6 @@ void amdfam10h(void) { void barcelona(void) { if (__builtin_cpu_is("barcelona")) a("barcelona"); - - // CHECK: [[LOAD:%[^ ]+]] = load i32, ptr getelementptr inbounds ({ i32, i32, i32, [1 x i32] }, ptr @__cpu_model, i32 0, i32 2) - // CHECK: = icmp eq i32 [[LOAD]], 4 } // CHECK-X86-LABEL: define dso_local void @nehalem( @@ -124,9 +107,6 @@ void barcelona(void) { void nehalem(void) { if (__builtin_cpu_is("nehalem")) a("nehalem"); - - // CHECK: [[LOAD:%[^ ]+]] = load i32, ptr getelementptr inbounds ({ i32, i32, i32, [1 x i32] }, ptr @__cpu_model, i32 0, i32 2) - // CHECK: = icmp eq i32 [[LOAD]], 1 } #endif >From 2bb2d5079b5bf98ba9f87e082ca3e67ab70068aa Mon Sep 17 00:00:00 2001 From: Wang Pengcheng Date: Thu, 14 Nov 2024 22:12:36 +0800 Subject: [PATCH 2/4] Simplify test Created using spr 1.3.6-beta.1 --- clang/test/CodeGen/builtin-cpu-is.c | 25 ++--- 1 file changed, 6 insertions(+), 19 deletions(-) diff --git a/clang/test/CodeGen/builtin-cpu-is.c b/clang/test/CodeGen/builtin-cpu-is.c index b8dd97eeacebcf..8e78213a7cfcfb 100644 --- a/clang/test/CodeGen/builtin-cpu-is.c +++ b/clang/test/CodeGen/builtin-cpu-is.c @@ -111,12 +111,9 @@ void nehalem(void) { #endif #ifdef __riscv -// CHECK-RV64-LABEL: define dso_local signext i32 @test_riscv( -// CHECK-RV64-SAME: i32 noundef signext [[A:%.*]]) #[[ATTR0:[0-9]+]] { +// CHECK-RV64-LABEL: define dso_local signext i32 @test_cpu_is_veyron_v1( +// CHECK-RV64-SAME: ) #[[ATTR0:[0-9]+]] { // CHECK-RV64-NEXT: [[ENTRY:.*:]] -// CHECK-RV64-NEXT:[[RETVAL:%.*]] = alloca i32, align 4 -// CHECK-RV64-NEXT:[[A_ADDR:%.*]] = alloca i32, align 4 -// CHECK-RV64-NEXT:store i32 [[A]], ptr [[A_ADDR]], align 4 // CHECK-RV64-NEXT:[[TMP0:%.*]] = load i32, ptr @__riscv_cpu_model, align 4 // CHECK-RV64-NEXT:[[TMP1:%.*]] = icmp eq i32 [[TMP0]], 1567 // CHECK-RV64-NEXT:[[TMP2:%.*]] = load i64, ptr getelementptr inbounds ({ i32, i64, i64 }, ptr @__riscv_cpu_model, i32 0, i32 1), align 8 @@ -125,20 +122,10 @@ void nehalem(void) { // CHECK-RV64-NEXT:[[TMP5:%.*]] = load i64, ptr getelementptr inbounds ({ i32, i64, i64 }, ptr @__riscv_cpu_model, i32 0, i32 2), align 8 // CHECK-RV64-NEXT:[[TMP6:%.*]] = icmp eq i64 [[TMP5]], 273 // CHECK-RV64-NEXT:[[TMP7:%.*]] = and i1 [[TMP4]], [[TMP6]] -// CHECK-RV64-NEXT:br i1 [[TMP7]], label %[[IF_THEN:.*]], label %[[IF_END:.*]] -// CHECK-RV64: [[IF_THEN]]: -// CHECK-RV64-NEXT:store i32 3, ptr [[RETVAL]], align 4 -// CHECK-RV64-NEXT:br label %[[RETURN:.*]] -// CHECK-RV64: [[IF_END]]: -// CHECK-RV64-NEXT:store i32 0, ptr [[RETVAL]], align 4 -// CHECK-RV64-NEXT:br label %[[RETURN]] -// CHECK-RV64: [[RETURN]]: -// CHECK-RV64-NEXT:[[TMP8:%.*]] = load i32, ptr [[RETVAL]], align 4 -// CHECK-RV64-NEXT:ret i32 [[TM
[llvm-branch-commits] [clang] [llvm] [RISCV] Support __builtin_cpu_is (PR #116231)
https://github.com/wangpc-pp updated https://github.com/llvm/llvm-project/pull/116231 >From 9686a2c5c5276289e72d9098f497a9f246a1c457 Mon Sep 17 00:00:00 2001 From: Wang Pengcheng Date: Thu, 14 Nov 2024 22:06:45 +0800 Subject: [PATCH 1/4] Remove stale CHECKs Created using spr 1.3.6-beta.1 --- clang/test/CodeGen/builtin-cpu-is.c | 20 1 file changed, 20 deletions(-) diff --git a/clang/test/CodeGen/builtin-cpu-is.c b/clang/test/CodeGen/builtin-cpu-is.c index e4a2071cf46795..b8dd97eeacebcf 100644 --- a/clang/test/CodeGen/builtin-cpu-is.c +++ b/clang/test/CodeGen/builtin-cpu-is.c @@ -7,8 +7,6 @@ // global, the bit grab, and the icmp correct. extern void a(const char *); -// CHECK: @__cpu_model = external dso_local global { i32, i32, i32, [1 x i32] } - // CHECK-X86-LABEL: define dso_local void @intel( // CHECK-X86-SAME: ) #[[ATTR0:[0-9]+]] { // CHECK-X86-NEXT: [[ENTRY:.*:]] @@ -24,9 +22,6 @@ extern void a(const char *); void intel(void) { if (__builtin_cpu_is("intel")) a("intel"); - - // CHECK: [[LOAD:%[^ ]+]] = load i32, ptr @__cpu_model - // CHECK: = icmp eq i32 [[LOAD]], 1 } // CHECK-X86-LABEL: define dso_local void @amd( @@ -44,9 +39,6 @@ void intel(void) { void amd(void) { if (__builtin_cpu_is("amd")) a("amd"); - - // CHECK: [[LOAD:%[^ ]+]] = load i32, ptr @__cpu_model - // CHECK: = icmp eq i32 [[LOAD]], 2 } // CHECK-X86-LABEL: define dso_local void @atom( @@ -64,9 +56,6 @@ void amd(void) { void atom(void) { if (__builtin_cpu_is("atom")) a("atom"); - - // CHECK: [[LOAD:%[^ ]+]] = load i32, ptr getelementptr inbounds ({ i32, i32, i32, [1 x i32] }, ptr @__cpu_model, i32 0, i32 1) - // CHECK: = icmp eq i32 [[LOAD]], 1 } // CHECK-X86-LABEL: define dso_local void @amdfam10h( @@ -84,9 +73,6 @@ void atom(void) { void amdfam10h(void) { if (__builtin_cpu_is("amdfam10h")) a("amdfam10h"); - - // CHECK: [[LOAD:%[^ ]+]] = load i32, ptr getelementptr inbounds ({ i32, i32, i32, [1 x i32] }, ptr @__cpu_model, i32 0, i32 1) - // CHECK: = icmp eq i32 [[LOAD]], 4 } // CHECK-X86-LABEL: define dso_local void @barcelona( @@ -104,9 +90,6 @@ void amdfam10h(void) { void barcelona(void) { if (__builtin_cpu_is("barcelona")) a("barcelona"); - - // CHECK: [[LOAD:%[^ ]+]] = load i32, ptr getelementptr inbounds ({ i32, i32, i32, [1 x i32] }, ptr @__cpu_model, i32 0, i32 2) - // CHECK: = icmp eq i32 [[LOAD]], 4 } // CHECK-X86-LABEL: define dso_local void @nehalem( @@ -124,9 +107,6 @@ void barcelona(void) { void nehalem(void) { if (__builtin_cpu_is("nehalem")) a("nehalem"); - - // CHECK: [[LOAD:%[^ ]+]] = load i32, ptr getelementptr inbounds ({ i32, i32, i32, [1 x i32] }, ptr @__cpu_model, i32 0, i32 2) - // CHECK: = icmp eq i32 [[LOAD]], 1 } #endif >From 2bb2d5079b5bf98ba9f87e082ca3e67ab70068aa Mon Sep 17 00:00:00 2001 From: Wang Pengcheng Date: Thu, 14 Nov 2024 22:12:36 +0800 Subject: [PATCH 2/4] Simplify test Created using spr 1.3.6-beta.1 --- clang/test/CodeGen/builtin-cpu-is.c | 25 ++--- 1 file changed, 6 insertions(+), 19 deletions(-) diff --git a/clang/test/CodeGen/builtin-cpu-is.c b/clang/test/CodeGen/builtin-cpu-is.c index b8dd97eeacebcf..8e78213a7cfcfb 100644 --- a/clang/test/CodeGen/builtin-cpu-is.c +++ b/clang/test/CodeGen/builtin-cpu-is.c @@ -111,12 +111,9 @@ void nehalem(void) { #endif #ifdef __riscv -// CHECK-RV64-LABEL: define dso_local signext i32 @test_riscv( -// CHECK-RV64-SAME: i32 noundef signext [[A:%.*]]) #[[ATTR0:[0-9]+]] { +// CHECK-RV64-LABEL: define dso_local signext i32 @test_cpu_is_veyron_v1( +// CHECK-RV64-SAME: ) #[[ATTR0:[0-9]+]] { // CHECK-RV64-NEXT: [[ENTRY:.*:]] -// CHECK-RV64-NEXT:[[RETVAL:%.*]] = alloca i32, align 4 -// CHECK-RV64-NEXT:[[A_ADDR:%.*]] = alloca i32, align 4 -// CHECK-RV64-NEXT:store i32 [[A]], ptr [[A_ADDR]], align 4 // CHECK-RV64-NEXT:[[TMP0:%.*]] = load i32, ptr @__riscv_cpu_model, align 4 // CHECK-RV64-NEXT:[[TMP1:%.*]] = icmp eq i32 [[TMP0]], 1567 // CHECK-RV64-NEXT:[[TMP2:%.*]] = load i64, ptr getelementptr inbounds ({ i32, i64, i64 }, ptr @__riscv_cpu_model, i32 0, i32 1), align 8 @@ -125,20 +122,10 @@ void nehalem(void) { // CHECK-RV64-NEXT:[[TMP5:%.*]] = load i64, ptr getelementptr inbounds ({ i32, i64, i64 }, ptr @__riscv_cpu_model, i32 0, i32 2), align 8 // CHECK-RV64-NEXT:[[TMP6:%.*]] = icmp eq i64 [[TMP5]], 273 // CHECK-RV64-NEXT:[[TMP7:%.*]] = and i1 [[TMP4]], [[TMP6]] -// CHECK-RV64-NEXT:br i1 [[TMP7]], label %[[IF_THEN:.*]], label %[[IF_END:.*]] -// CHECK-RV64: [[IF_THEN]]: -// CHECK-RV64-NEXT:store i32 3, ptr [[RETVAL]], align 4 -// CHECK-RV64-NEXT:br label %[[RETURN:.*]] -// CHECK-RV64: [[IF_END]]: -// CHECK-RV64-NEXT:store i32 0, ptr [[RETVAL]], align 4 -// CHECK-RV64-NEXT:br label %[[RETURN]] -// CHECK-RV64: [[RETURN]]: -// CHECK-RV64-NEXT:[[TMP8:%.*]] = load i32, ptr [[RETVAL]], align 4 -// CHECK-RV64-NEXT:ret i32 [[TM
[llvm-branch-commits] [clang] [llvm] [RISCV] Support __builtin_cpu_is (PR #116231)
https://github.com/wangpc-pp updated https://github.com/llvm/llvm-project/pull/116231 >From 9686a2c5c5276289e72d9098f497a9f246a1c457 Mon Sep 17 00:00:00 2001 From: Wang Pengcheng Date: Thu, 14 Nov 2024 22:06:45 +0800 Subject: [PATCH 1/4] Remove stale CHECKs Created using spr 1.3.6-beta.1 --- clang/test/CodeGen/builtin-cpu-is.c | 20 1 file changed, 20 deletions(-) diff --git a/clang/test/CodeGen/builtin-cpu-is.c b/clang/test/CodeGen/builtin-cpu-is.c index e4a2071cf46795..b8dd97eeacebcf 100644 --- a/clang/test/CodeGen/builtin-cpu-is.c +++ b/clang/test/CodeGen/builtin-cpu-is.c @@ -7,8 +7,6 @@ // global, the bit grab, and the icmp correct. extern void a(const char *); -// CHECK: @__cpu_model = external dso_local global { i32, i32, i32, [1 x i32] } - // CHECK-X86-LABEL: define dso_local void @intel( // CHECK-X86-SAME: ) #[[ATTR0:[0-9]+]] { // CHECK-X86-NEXT: [[ENTRY:.*:]] @@ -24,9 +22,6 @@ extern void a(const char *); void intel(void) { if (__builtin_cpu_is("intel")) a("intel"); - - // CHECK: [[LOAD:%[^ ]+]] = load i32, ptr @__cpu_model - // CHECK: = icmp eq i32 [[LOAD]], 1 } // CHECK-X86-LABEL: define dso_local void @amd( @@ -44,9 +39,6 @@ void intel(void) { void amd(void) { if (__builtin_cpu_is("amd")) a("amd"); - - // CHECK: [[LOAD:%[^ ]+]] = load i32, ptr @__cpu_model - // CHECK: = icmp eq i32 [[LOAD]], 2 } // CHECK-X86-LABEL: define dso_local void @atom( @@ -64,9 +56,6 @@ void amd(void) { void atom(void) { if (__builtin_cpu_is("atom")) a("atom"); - - // CHECK: [[LOAD:%[^ ]+]] = load i32, ptr getelementptr inbounds ({ i32, i32, i32, [1 x i32] }, ptr @__cpu_model, i32 0, i32 1) - // CHECK: = icmp eq i32 [[LOAD]], 1 } // CHECK-X86-LABEL: define dso_local void @amdfam10h( @@ -84,9 +73,6 @@ void atom(void) { void amdfam10h(void) { if (__builtin_cpu_is("amdfam10h")) a("amdfam10h"); - - // CHECK: [[LOAD:%[^ ]+]] = load i32, ptr getelementptr inbounds ({ i32, i32, i32, [1 x i32] }, ptr @__cpu_model, i32 0, i32 1) - // CHECK: = icmp eq i32 [[LOAD]], 4 } // CHECK-X86-LABEL: define dso_local void @barcelona( @@ -104,9 +90,6 @@ void amdfam10h(void) { void barcelona(void) { if (__builtin_cpu_is("barcelona")) a("barcelona"); - - // CHECK: [[LOAD:%[^ ]+]] = load i32, ptr getelementptr inbounds ({ i32, i32, i32, [1 x i32] }, ptr @__cpu_model, i32 0, i32 2) - // CHECK: = icmp eq i32 [[LOAD]], 4 } // CHECK-X86-LABEL: define dso_local void @nehalem( @@ -124,9 +107,6 @@ void barcelona(void) { void nehalem(void) { if (__builtin_cpu_is("nehalem")) a("nehalem"); - - // CHECK: [[LOAD:%[^ ]+]] = load i32, ptr getelementptr inbounds ({ i32, i32, i32, [1 x i32] }, ptr @__cpu_model, i32 0, i32 2) - // CHECK: = icmp eq i32 [[LOAD]], 1 } #endif >From 2bb2d5079b5bf98ba9f87e082ca3e67ab70068aa Mon Sep 17 00:00:00 2001 From: Wang Pengcheng Date: Thu, 14 Nov 2024 22:12:36 +0800 Subject: [PATCH 2/4] Simplify test Created using spr 1.3.6-beta.1 --- clang/test/CodeGen/builtin-cpu-is.c | 25 ++--- 1 file changed, 6 insertions(+), 19 deletions(-) diff --git a/clang/test/CodeGen/builtin-cpu-is.c b/clang/test/CodeGen/builtin-cpu-is.c index b8dd97eeacebcf..8e78213a7cfcfb 100644 --- a/clang/test/CodeGen/builtin-cpu-is.c +++ b/clang/test/CodeGen/builtin-cpu-is.c @@ -111,12 +111,9 @@ void nehalem(void) { #endif #ifdef __riscv -// CHECK-RV64-LABEL: define dso_local signext i32 @test_riscv( -// CHECK-RV64-SAME: i32 noundef signext [[A:%.*]]) #[[ATTR0:[0-9]+]] { +// CHECK-RV64-LABEL: define dso_local signext i32 @test_cpu_is_veyron_v1( +// CHECK-RV64-SAME: ) #[[ATTR0:[0-9]+]] { // CHECK-RV64-NEXT: [[ENTRY:.*:]] -// CHECK-RV64-NEXT:[[RETVAL:%.*]] = alloca i32, align 4 -// CHECK-RV64-NEXT:[[A_ADDR:%.*]] = alloca i32, align 4 -// CHECK-RV64-NEXT:store i32 [[A]], ptr [[A_ADDR]], align 4 // CHECK-RV64-NEXT:[[TMP0:%.*]] = load i32, ptr @__riscv_cpu_model, align 4 // CHECK-RV64-NEXT:[[TMP1:%.*]] = icmp eq i32 [[TMP0]], 1567 // CHECK-RV64-NEXT:[[TMP2:%.*]] = load i64, ptr getelementptr inbounds ({ i32, i64, i64 }, ptr @__riscv_cpu_model, i32 0, i32 1), align 8 @@ -125,20 +122,10 @@ void nehalem(void) { // CHECK-RV64-NEXT:[[TMP5:%.*]] = load i64, ptr getelementptr inbounds ({ i32, i64, i64 }, ptr @__riscv_cpu_model, i32 0, i32 2), align 8 // CHECK-RV64-NEXT:[[TMP6:%.*]] = icmp eq i64 [[TMP5]], 273 // CHECK-RV64-NEXT:[[TMP7:%.*]] = and i1 [[TMP4]], [[TMP6]] -// CHECK-RV64-NEXT:br i1 [[TMP7]], label %[[IF_THEN:.*]], label %[[IF_END:.*]] -// CHECK-RV64: [[IF_THEN]]: -// CHECK-RV64-NEXT:store i32 3, ptr [[RETVAL]], align 4 -// CHECK-RV64-NEXT:br label %[[RETURN:.*]] -// CHECK-RV64: [[IF_END]]: -// CHECK-RV64-NEXT:store i32 0, ptr [[RETVAL]], align 4 -// CHECK-RV64-NEXT:br label %[[RETURN]] -// CHECK-RV64: [[RETURN]]: -// CHECK-RV64-NEXT:[[TMP8:%.*]] = load i32, ptr [[RETVAL]], align 4 -// CHECK-RV64-NEXT:ret i32 [[TM
[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_permlane16_swap_b32 and v_permlane32_swap_b32 for gfx950 (PR #117260)
https://github.com/arsenm created https://github.com/llvm/llvm-project/pull/117260 This was a bit annoying because these introduce a new special case encoding usage. op_sel is repurposed as a subset of dpp controls, and is eligible for VOP3->VOP1 shrinking. For some reason fi also uses an enum value, so we need to convert the raw boolean to 1 instead of -1. The 2 registers are swapped, so this has 2 defs. Ideally the builtin would return a pair, but that's difficult so return a vector instead. This would make a hypothetical builtin that supports v2f16 directly uglier. >From 66e98ff5b008512e73f63e037f3f76defa6c0a19 Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Mon, 22 Jan 2024 12:40:54 +0700 Subject: [PATCH] AMDGPU: Add v_permlane16_swap_b32 and v_permlane32_swap_b32 for gfx950 This was a bit annoying because these introduce a new special case encoding usage. op_sel is repurposed as a subset of dpp controls, and is eligible for VOP3->VOP1 shrinking. For some reason fi also uses an enum value, so we need to convert the raw boolean to 1 instead of -1. The 2 registers are swapped, so this has 2 defs. Ideally the builtin would return a pair, but that's difficult so return a vector instead. This would make a hypothetical builtin that supports v2f16 directly uglier. --- clang/include/clang/Basic/BuiltinsAMDGPU.def | 3 + clang/lib/CodeGen/CGBuiltin.cpp | 26 clang/test/CodeGenOpenCL/amdgpu-features.cl | 2 +- .../builtins-amdgcn-gfx950-err.cl | 6 +- .../CodeGenOpenCL/builtins-amdgcn-gfx950.cl | 87 + .../builtins-amdgcn-error-gfx950-param.cl | 10 ++ .../builtins-amdgcn-error-gfx950.cl | 5 +- llvm/docs/AMDGPUUsage.rst | 13 ++ llvm/include/llvm/IR/IntrinsicsAMDGPU.td | 14 ++ llvm/lib/Target/AMDGPU/AMDGPU.td | 23 +++- llvm/lib/Target/AMDGPU/AMDGPUGISel.td | 3 + llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp | 25 .../AMDGPU/AMDGPUInstructionSelector.cpp | 32 + .../Target/AMDGPU/AMDGPUInstructionSelector.h | 3 + .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp | 9 ++ .../Target/AMDGPU/AMDGPUSearchableTables.td | 2 + llvm/lib/Target/AMDGPU/GCNSubtarget.h | 8 +- llvm/lib/Target/AMDGPU/SIInstrInfo.cpp| 6 +- llvm/lib/Target/AMDGPU/SIInstrInfo.td | 4 + llvm/lib/Target/AMDGPU/VOP1Instructions.td| 46 +++ llvm/lib/Target/AMDGPU/VOPInstructions.td | 12 ++ llvm/lib/TargetParser/TargetParser.cpp| 2 + .../UniformityAnalysis/AMDGPU/intrinsics.ll | 16 +++ .../AMDGPU/llvm.amdgcn.permlane16.swap.ll | 121 ++ .../AMDGPU/llvm.amdgcn.permlane32.swap.ll | 121 ++ llvm/test/MC/AMDGPU/gfx950_asm_features.s | 82 llvm/test/MC/AMDGPU/gfx950_err.s | 31 + llvm/test/MC/Disassembler/AMDGPU/gfx950.txt | 32 + 28 files changed, 737 insertions(+), 7 deletions(-) create mode 100644 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.permlane16.swap.ll create mode 100644 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.permlane32.swap.ll create mode 100644 llvm/test/MC/AMDGPU/gfx950_err.s diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def b/clang/include/clang/Basic/BuiltinsAMDGPU.def index 51a5b1dbad495c..548bcc8ad55f48 100644 --- a/clang/include/clang/Basic/BuiltinsAMDGPU.def +++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def @@ -459,6 +459,9 @@ TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8, "V16fV4iV8iV16fiIiI TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_32x32x64_fp8_bf8, "V16fV4iV8iV16fiIiIi", "nc", "gfx950-insts") TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_32x32x64_fp8_fp8, "V16fV4iV8iV16fiIiIi", "nc", "gfx950-insts") +TARGET_BUILTIN(__builtin_amdgcn_permlane16_swap, "V2UiUiUiIbIb", "nc", "permlane16-swap") +TARGET_BUILTIN(__builtin_amdgcn_permlane32_swap, "V2UiUiUiIbIb", "nc", "permlane32-swap") + //===--===// // GFX12+ only builtins. //===--===// diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp index ff7132fd8bc1e7..3b3c46b56868cf 100644 --- a/clang/lib/CodeGen/CGBuiltin.cpp +++ b/clang/lib/CodeGen/CGBuiltin.cpp @@ -20162,6 +20162,32 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, CGM.getIntrinsic(Intrinsic::amdgcn_s_sendmsg_rtn, {ResultType}); return Builder.CreateCall(F, {Arg}); } + case AMDGPU::BI__builtin_amdgcn_permlane16_swap: + case AMDGPU::BI__builtin_amdgcn_permlane32_swap: { +// Because builtin types are limited, and the intrinsic uses a struct/pair +// output, marshal the pair-of-i32 to <2 x i32>. +Value *VDstOld = EmitScalarExpr(E->getArg(0)); +Value *VSrcOld = EmitScalarExpr(E->getArg(1)); +Value *FI = EmitScalarExpr(E->getArg(2)); +Value *BoundCtrl = EmitScalarExpr(E->getArg(3))
[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_smfmac_f32_16x16x64_f16 for gfx950 (PR #117202)
llvmbot wrote: @llvm/pr-subscribers-mc Author: Matt Arsenault (arsenm) Changes --- Patch is 27.26 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/117202.diff 13 Files Affected: - (modified) clang/include/clang/Basic/BuiltinsAMDGPU.def (+1) - (modified) clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl (+7) - (modified) clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl (+7) - (modified) clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl (+1) - (modified) llvm/include/llvm/IR/IntrinsicsAMDGPU.td (+1) - (modified) llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp (+6) - (modified) llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp (+2-1) - (modified) llvm/lib/Target/AMDGPU/SIInstrInfo.td (+1) - (modified) llvm/lib/Target/AMDGPU/VOP3PInstructions.td (+7) - (modified) llvm/test/Analysis/UniformityAnalysis/AMDGPU/intrinsics.ll (+9) - (added) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.smfmac.gfx950.ll (+218) - (modified) llvm/test/MC/AMDGPU/mai-gfx950.s (+42) - (modified) llvm/test/MC/Disassembler/AMDGPU/gfx950_mai.txt (+22) ``diff diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def b/clang/include/clang/Basic/BuiltinsAMDGPU.def index 3b7cc559e88b29..f013714798cc54 100644 --- a/clang/include/clang/Basic/BuiltinsAMDGPU.def +++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def @@ -444,6 +444,7 @@ TARGET_BUILTIN(__builtin_amdgcn_mfma_f32_32x32x16_bf16, "V16fV8yV8yV16fIiIiIi", TARGET_BUILTIN(__builtin_amdgcn_mfma_i32_16x16x64_i8, "V4iV4iV4iV4iIiIiIi", "nc", "gfx950-insts") TARGET_BUILTIN(__builtin_amdgcn_mfma_i32_32x32x32_i8, "V16iV4iV4iV16iIiIiIi", "nc", "gfx950-insts") +TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_16x16x64_f16, "V4fV8hV16hV4fiIiIi", "nc", "gfx950-insts") //===--===// // GFX12+ only builtins. //===--===// diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl b/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl index 345f05f463bf44..e63d89a28de44d 100644 --- a/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl +++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl @@ -467,4 +467,11 @@ v4f test_mfma_f32_16x16x32_bf16(v8bf16 a, v8bf16 b, v4f c) return __builtin_amdgcn_mfma_f32_16x16x32_bf16(a, b, c, 1, 2, 3); } +// CHECK-GFX950-LABEL: @test_smfmac_f32_16x16x64_f16 +// CHECK-GFX950: call <4 x float> @llvm.amdgcn.smfmac.f32.16x16x64.f16(<8 x half> %a, <16 x half> %b, <4 x float> %c, i32 %idx, i32 0, i32 0) +void test_smfmac_f32_16x16x64_f16(global v4f* out, v8h a, v16h b, v4f c, int idx) +{ + *out = __builtin_amdgcn_smfmac_f32_16x16x64_f16(a, b, c, idx, 0, 0); +} + #endif diff --git a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl index acaa20090dfcba..6366997465aeff 100644 --- a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl +++ b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl @@ -4,6 +4,7 @@ typedef float float4 __attribute__((ext_vector_type(4))); typedef float float16 __attribute__((ext_vector_type(16))); typedef half half8 __attribute__((ext_vector_type(8))); +typedef half half16 __attribute__((ext_vector_type(16))); typedef __bf16 bfloat8 __attribute__((ext_vector_type(8))); typedef int int4 __attribute__((ext_vector_type(4))); typedef int int8 __attribute__((ext_vector_type(8))); @@ -62,3 +63,9 @@ void test_mfma_f32_16x16x32_bf16(__global float4* out, bfloat8 a, bfloat8 b, flo *out = __builtin_amdgcn_mfma_f32_16x16x32_bf16(a, b, c, 0, X, 0); // expected-error{{argument to '__builtin_amdgcn_mfma_f32_16x16x32_bf16' must be a constant integer}} *out = __builtin_amdgcn_mfma_f32_16x16x32_bf16(a, b, c, 0, 0, X); // expected-error{{argument to '__builtin_amdgcn_mfma_f32_16x16x32_bf16' must be a constant integer}} } + +void test_smfmac_f32_16x16x64_f16(global float4* out, half8 a, half16 b, float4 c, int idx, int d) +{ + *out = __builtin_amdgcn_smfmac_f32_16x16x64_f16(a, b, c, idx, d, 0); // expected-error{{argument to '__builtin_amdgcn_smfmac_f32_16x16x64_f16' must be a constant integer}} + *out = __builtin_amdgcn_smfmac_f32_16x16x64_f16(a, b, c, idx, 0, d); // expected-error{{argument to '__builtin_amdgcn_smfmac_f32_16x16x64_f16' must be a constant integer}} +} diff --git a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl index 6bf76b3cba0f59..1e924e86f3b897 100644 --- a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl +++ b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl @@ -34,6 +34,7 @@ void test(__global float4* out0, half8 a0, half8 b0, float4 c0, *out3 = __builtin_amdgcn_mfma_i32_16x16x64_i8(a3, b3, c3, 0, 0, 0); // expected-error{{'__builtin_amdgcn_mfma_i32_16x16x64_i8' needs target feature gfx950-insts}} *out4 = __builtin_amdgcn_mfma_i32_3
[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_smfmac_f32_16x16x64_f16 for gfx950 (PR #117202)
llvmbot wrote: @llvm/pr-subscribers-clang Author: Matt Arsenault (arsenm) Changes --- Patch is 27.26 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/117202.diff 13 Files Affected: - (modified) clang/include/clang/Basic/BuiltinsAMDGPU.def (+1) - (modified) clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl (+7) - (modified) clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl (+7) - (modified) clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl (+1) - (modified) llvm/include/llvm/IR/IntrinsicsAMDGPU.td (+1) - (modified) llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp (+6) - (modified) llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp (+2-1) - (modified) llvm/lib/Target/AMDGPU/SIInstrInfo.td (+1) - (modified) llvm/lib/Target/AMDGPU/VOP3PInstructions.td (+7) - (modified) llvm/test/Analysis/UniformityAnalysis/AMDGPU/intrinsics.ll (+9) - (added) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.smfmac.gfx950.ll (+218) - (modified) llvm/test/MC/AMDGPU/mai-gfx950.s (+42) - (modified) llvm/test/MC/Disassembler/AMDGPU/gfx950_mai.txt (+22) ``diff diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def b/clang/include/clang/Basic/BuiltinsAMDGPU.def index 3b7cc559e88b29..f013714798cc54 100644 --- a/clang/include/clang/Basic/BuiltinsAMDGPU.def +++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def @@ -444,6 +444,7 @@ TARGET_BUILTIN(__builtin_amdgcn_mfma_f32_32x32x16_bf16, "V16fV8yV8yV16fIiIiIi", TARGET_BUILTIN(__builtin_amdgcn_mfma_i32_16x16x64_i8, "V4iV4iV4iV4iIiIiIi", "nc", "gfx950-insts") TARGET_BUILTIN(__builtin_amdgcn_mfma_i32_32x32x32_i8, "V16iV4iV4iV16iIiIiIi", "nc", "gfx950-insts") +TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_16x16x64_f16, "V4fV8hV16hV4fiIiIi", "nc", "gfx950-insts") //===--===// // GFX12+ only builtins. //===--===// diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl b/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl index 345f05f463bf44..e63d89a28de44d 100644 --- a/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl +++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl @@ -467,4 +467,11 @@ v4f test_mfma_f32_16x16x32_bf16(v8bf16 a, v8bf16 b, v4f c) return __builtin_amdgcn_mfma_f32_16x16x32_bf16(a, b, c, 1, 2, 3); } +// CHECK-GFX950-LABEL: @test_smfmac_f32_16x16x64_f16 +// CHECK-GFX950: call <4 x float> @llvm.amdgcn.smfmac.f32.16x16x64.f16(<8 x half> %a, <16 x half> %b, <4 x float> %c, i32 %idx, i32 0, i32 0) +void test_smfmac_f32_16x16x64_f16(global v4f* out, v8h a, v16h b, v4f c, int idx) +{ + *out = __builtin_amdgcn_smfmac_f32_16x16x64_f16(a, b, c, idx, 0, 0); +} + #endif diff --git a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl index acaa20090dfcba..6366997465aeff 100644 --- a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl +++ b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl @@ -4,6 +4,7 @@ typedef float float4 __attribute__((ext_vector_type(4))); typedef float float16 __attribute__((ext_vector_type(16))); typedef half half8 __attribute__((ext_vector_type(8))); +typedef half half16 __attribute__((ext_vector_type(16))); typedef __bf16 bfloat8 __attribute__((ext_vector_type(8))); typedef int int4 __attribute__((ext_vector_type(4))); typedef int int8 __attribute__((ext_vector_type(8))); @@ -62,3 +63,9 @@ void test_mfma_f32_16x16x32_bf16(__global float4* out, bfloat8 a, bfloat8 b, flo *out = __builtin_amdgcn_mfma_f32_16x16x32_bf16(a, b, c, 0, X, 0); // expected-error{{argument to '__builtin_amdgcn_mfma_f32_16x16x32_bf16' must be a constant integer}} *out = __builtin_amdgcn_mfma_f32_16x16x32_bf16(a, b, c, 0, 0, X); // expected-error{{argument to '__builtin_amdgcn_mfma_f32_16x16x32_bf16' must be a constant integer}} } + +void test_smfmac_f32_16x16x64_f16(global float4* out, half8 a, half16 b, float4 c, int idx, int d) +{ + *out = __builtin_amdgcn_smfmac_f32_16x16x64_f16(a, b, c, idx, d, 0); // expected-error{{argument to '__builtin_amdgcn_smfmac_f32_16x16x64_f16' must be a constant integer}} + *out = __builtin_amdgcn_smfmac_f32_16x16x64_f16(a, b, c, idx, 0, d); // expected-error{{argument to '__builtin_amdgcn_smfmac_f32_16x16x64_f16' must be a constant integer}} +} diff --git a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl index 6bf76b3cba0f59..1e924e86f3b897 100644 --- a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl +++ b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl @@ -34,6 +34,7 @@ void test(__global float4* out0, half8 a0, half8 b0, float4 c0, *out3 = __builtin_amdgcn_mfma_i32_16x16x64_i8(a3, b3, c3, 0, 0, 0); // expected-error{{'__builtin_amdgcn_mfma_i32_16x16x64_i8' needs target feature gfx950-insts}} *out4 = __builtin_amdgcn_mfma_i3
[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_smfmac_f32_16x16x64_f16 for gfx950 (PR #117202)
arsenm wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/117202?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#117202** https://app.graphite.dev/github/pr/llvm/llvm-project/117202?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/117202?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#117055** https://app.graphite.dev/github/pr/llvm/llvm-project/117055?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117053** https://app.graphite.dev/github/pr/llvm/llvm-project/117053?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117052** https://app.graphite.dev/github/pr/llvm/llvm-project/117052?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#116728** https://app.graphite.dev/github/pr/llvm/llvm-project/116728?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#116724** https://app.graphite.dev/github/pr/llvm/llvm-project/116724?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/>: 1 other dependent PR ([#117047](https://github.com/llvm/llvm-project/pull/117047) https://app.graphite.dev/github/pr/llvm/llvm-project/117047?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/>) * **#116723** https://app.graphite.dev/github/pr/llvm/llvm-project/116723?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#116722** https://app.graphite.dev/github/pr/llvm/llvm-project/116722?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#116681** https://app.graphite.dev/github/pr/llvm/llvm-project/116681?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#116680** https://app.graphite.dev/github/pr/llvm/llvm-project/116680?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#116679** https://app.graphite.dev/github/pr/llvm/llvm-project/116679?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#116678** https://app.graphite.dev/github/pr/llvm/llvm-project/116678?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#116312** https://app.graphite.dev/github/pr/llvm/llvm-project/116312?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#116311** https://app.graphite.dev/github/pr/llvm/llvm-project/116311?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#116310** https://app.graphite.dev/github/pr/llvm/llvm-project/116310?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#116309** https://app.graphite.dev/github/pr/llvm/llvm-project/116309?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#116308** https://app.graphite.dev/github/pr/llvm/llvm-project/116308?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#116307** https://app.graphite.dev/github/pr/llvm/llvm-project/116307?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn more about https://stacking.dev/?utm_source=stack-co
[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_smfmac_f32_16x16x64_f16 for gfx950 (PR #117202)
llvmbot wrote: @llvm/pr-subscribers-backend-amdgpu @llvm/pr-subscribers-llvm-analysis Author: Matt Arsenault (arsenm) Changes --- Patch is 27.26 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/117202.diff 13 Files Affected: - (modified) clang/include/clang/Basic/BuiltinsAMDGPU.def (+1) - (modified) clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl (+7) - (modified) clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl (+7) - (modified) clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl (+1) - (modified) llvm/include/llvm/IR/IntrinsicsAMDGPU.td (+1) - (modified) llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp (+6) - (modified) llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp (+2-1) - (modified) llvm/lib/Target/AMDGPU/SIInstrInfo.td (+1) - (modified) llvm/lib/Target/AMDGPU/VOP3PInstructions.td (+7) - (modified) llvm/test/Analysis/UniformityAnalysis/AMDGPU/intrinsics.ll (+9) - (added) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.smfmac.gfx950.ll (+218) - (modified) llvm/test/MC/AMDGPU/mai-gfx950.s (+42) - (modified) llvm/test/MC/Disassembler/AMDGPU/gfx950_mai.txt (+22) ``diff diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def b/clang/include/clang/Basic/BuiltinsAMDGPU.def index 3b7cc559e88b29..f013714798cc54 100644 --- a/clang/include/clang/Basic/BuiltinsAMDGPU.def +++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def @@ -444,6 +444,7 @@ TARGET_BUILTIN(__builtin_amdgcn_mfma_f32_32x32x16_bf16, "V16fV8yV8yV16fIiIiIi", TARGET_BUILTIN(__builtin_amdgcn_mfma_i32_16x16x64_i8, "V4iV4iV4iV4iIiIiIi", "nc", "gfx950-insts") TARGET_BUILTIN(__builtin_amdgcn_mfma_i32_32x32x32_i8, "V16iV4iV4iV16iIiIiIi", "nc", "gfx950-insts") +TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_16x16x64_f16, "V4fV8hV16hV4fiIiIi", "nc", "gfx950-insts") //===--===// // GFX12+ only builtins. //===--===// diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl b/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl index 345f05f463bf44..e63d89a28de44d 100644 --- a/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl +++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl @@ -467,4 +467,11 @@ v4f test_mfma_f32_16x16x32_bf16(v8bf16 a, v8bf16 b, v4f c) return __builtin_amdgcn_mfma_f32_16x16x32_bf16(a, b, c, 1, 2, 3); } +// CHECK-GFX950-LABEL: @test_smfmac_f32_16x16x64_f16 +// CHECK-GFX950: call <4 x float> @llvm.amdgcn.smfmac.f32.16x16x64.f16(<8 x half> %a, <16 x half> %b, <4 x float> %c, i32 %idx, i32 0, i32 0) +void test_smfmac_f32_16x16x64_f16(global v4f* out, v8h a, v16h b, v4f c, int idx) +{ + *out = __builtin_amdgcn_smfmac_f32_16x16x64_f16(a, b, c, idx, 0, 0); +} + #endif diff --git a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl index acaa20090dfcba..6366997465aeff 100644 --- a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl +++ b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl @@ -4,6 +4,7 @@ typedef float float4 __attribute__((ext_vector_type(4))); typedef float float16 __attribute__((ext_vector_type(16))); typedef half half8 __attribute__((ext_vector_type(8))); +typedef half half16 __attribute__((ext_vector_type(16))); typedef __bf16 bfloat8 __attribute__((ext_vector_type(8))); typedef int int4 __attribute__((ext_vector_type(4))); typedef int int8 __attribute__((ext_vector_type(8))); @@ -62,3 +63,9 @@ void test_mfma_f32_16x16x32_bf16(__global float4* out, bfloat8 a, bfloat8 b, flo *out = __builtin_amdgcn_mfma_f32_16x16x32_bf16(a, b, c, 0, X, 0); // expected-error{{argument to '__builtin_amdgcn_mfma_f32_16x16x32_bf16' must be a constant integer}} *out = __builtin_amdgcn_mfma_f32_16x16x32_bf16(a, b, c, 0, 0, X); // expected-error{{argument to '__builtin_amdgcn_mfma_f32_16x16x32_bf16' must be a constant integer}} } + +void test_smfmac_f32_16x16x64_f16(global float4* out, half8 a, half16 b, float4 c, int idx, int d) +{ + *out = __builtin_amdgcn_smfmac_f32_16x16x64_f16(a, b, c, idx, d, 0); // expected-error{{argument to '__builtin_amdgcn_smfmac_f32_16x16x64_f16' must be a constant integer}} + *out = __builtin_amdgcn_smfmac_f32_16x16x64_f16(a, b, c, idx, 0, d); // expected-error{{argument to '__builtin_amdgcn_smfmac_f32_16x16x64_f16' must be a constant integer}} +} diff --git a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl index 6bf76b3cba0f59..1e924e86f3b897 100644 --- a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl +++ b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl @@ -34,6 +34,7 @@ void test(__global float4* out0, half8 a0, half8 b0, float4 c0, *out3 = __builtin_amdgcn_mfma_i32_16x16x64_i8(a3, b3, c3, 0, 0, 0); // expected-error{{'__builtin_amdgcn_mfma_i32_16x16x64_i8' needs target feature gfx950
[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_smfmac_f32_16x16x64_f16 for gfx950 (PR #117202)
https://github.com/arsenm ready_for_review https://github.com/llvm/llvm-project/pull/117202 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_smfmac_f32_32x32x32_f16 for gfx950 (PR #117205)
https://github.com/arsenm ready_for_review https://github.com/llvm/llvm-project/pull/117205 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [HLSL] Add RWBuffer::Load(Index) (PR #117018)
https://github.com/hekota edited https://github.com/llvm/llvm-project/pull/117018 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_smfmac_f32_32x32x32_f16 for gfx950 (PR #117205)
https://github.com/shiltian edited https://github.com/llvm/llvm-project/pull/117205 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_smfmac_f32_16x16x64_bf16 for gfx950 (PR #117211)
https://github.com/shiltian edited https://github.com/llvm/llvm-project/pull/117211 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_smfmac_f32_16x16x64_bf16 for gfx950 (PR #117211)
https://github.com/shiltian approved this pull request. https://github.com/llvm/llvm-project/pull/117211 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_smfmac_f32_32x32x32_bf16 for gfx950 (PR #117212)
https://github.com/shiltian approved this pull request. https://github.com/llvm/llvm-project/pull/117212 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_smfmac_i32_16x16x128_i8 for gfx950 (PR #117213)
https://github.com/shiltian approved this pull request. https://github.com/llvm/llvm-project/pull/117213 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_smfmac_i32_32x32x64_i8 for gfx950 (PR #117214)
https://github.com/shiltian edited https://github.com/llvm/llvm-project/pull/117214 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_smfmac_f32_16x16x128_bf8_fp8 for gfx950 (PR #117233)
https://github.com/shiltian edited https://github.com/llvm/llvm-project/pull/117233 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_smfmac_f32_16x16x128_bf8_bf8 for gfx950 (PR #117232)
https://github.com/shiltian edited https://github.com/llvm/llvm-project/pull/117232 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_smfmac_f32_16x16x128_bf8_fp8 for gfx950 (PR #117233)
https://github.com/shiltian approved this pull request. https://github.com/llvm/llvm-project/pull/117233 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_smfmac_f32_16x16x128_fp8_fp8 for gfx950 (PR #117235)
https://github.com/shiltian approved this pull request. https://github.com/llvm/llvm-project/pull/117235 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_smfmac_f32_16x16x128_fp8_fp8 for gfx950 (PR #117235)
https://github.com/shiltian edited https://github.com/llvm/llvm-project/pull/117235 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits