[llvm-branch-commits] [llvm] release/19.x: [LICM] allow MemoryAccess creation failure (#116813) (PR #117082)

2024-11-21 Thread Nikita Popov via llvm-branch-commits

https://github.com/nikic approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/117082
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/19.x: [LICM] allow MemoryAccess creation failure (#116813) (PR #117082)

2024-11-21 Thread via llvm-branch-commits

https://github.com/DianQK updated 
https://github.com/llvm/llvm-project/pull/117082

>From d7c9977e092ee48d8bee2a2787af0d23b75cfee5 Mon Sep 17 00:00:00 2001
From: DianQK 
Date: Wed, 20 Nov 2024 19:52:51 +0800
Subject: [PATCH] [LICM] allow MemoryAccess creation failure (#116813)

Fixes #116809.

After running some passes (SimpleLoopUnswitch, LoopInstSimplify, etc.),
MemorySSA might be outdated, and the instruction `I` may have become a
non-memory touching instruction.

LICM has already handled this, but it does not pass
`CreationMustSucceed=false` to `createDefinedAccess`.

(cherry picked from commit 18b02bbf441660683df7f3925946984203d49bab)
---
 llvm/include/llvm/Analysis/MemorySSAUpdater.h |  5 ++
 llvm/lib/Analysis/MemorySSAUpdater.cpp| 13 -
 llvm/lib/Transforms/Scalar/LICM.cpp   |  5 +-
 .../LICM/PR116813-memoryssa-outdated.ll   | 50 +++
 4 files changed, 70 insertions(+), 3 deletions(-)
 create mode 100644 llvm/test/Transforms/LICM/PR116813-memoryssa-outdated.ll

diff --git a/llvm/include/llvm/Analysis/MemorySSAUpdater.h 
b/llvm/include/llvm/Analysis/MemorySSAUpdater.h
index d4da3ef1146db7..f598dedea75fd6 100644
--- a/llvm/include/llvm/Analysis/MemorySSAUpdater.h
+++ b/llvm/include/llvm/Analysis/MemorySSAUpdater.h
@@ -192,6 +192,11 @@ class MemorySSAUpdater {
const BasicBlock *BB,
MemorySSA::InsertionPlace Point);
 
+  MemoryAccess *createMemoryAccessInBB(Instruction *I, MemoryAccess 
*Definition,
+   const BasicBlock *BB,
+   MemorySSA::InsertionPlace Point,
+   bool CreationMustSucceed);
+
   /// Create a MemoryAccess in MemorySSA before an existing MemoryAccess.
   ///
   /// See createMemoryAccessInBB() for usage details.
diff --git a/llvm/lib/Analysis/MemorySSAUpdater.cpp 
b/llvm/lib/Analysis/MemorySSAUpdater.cpp
index aa550f0b6a7bfd..94061c949b7f85 100644
--- a/llvm/lib/Analysis/MemorySSAUpdater.cpp
+++ b/llvm/lib/Analysis/MemorySSAUpdater.cpp
@@ -1404,8 +1404,17 @@ void MemorySSAUpdater::changeToUnreachable(const 
Instruction *I) {
 MemoryAccess *MemorySSAUpdater::createMemoryAccessInBB(
 Instruction *I, MemoryAccess *Definition, const BasicBlock *BB,
 MemorySSA::InsertionPlace Point) {
-  MemoryUseOrDef *NewAccess = MSSA->createDefinedAccess(I, Definition);
-  MSSA->insertIntoListsForBlock(NewAccess, BB, Point);
+  return createMemoryAccessInBB(I, Definition, BB, Point,
+/*CreationMustSucceed=*/true);
+}
+
+MemoryAccess *MemorySSAUpdater::createMemoryAccessInBB(
+Instruction *I, MemoryAccess *Definition, const BasicBlock *BB,
+MemorySSA::InsertionPlace Point, bool CreationMustSucceed) {
+  MemoryUseOrDef *NewAccess = MSSA->createDefinedAccess(
+  I, Definition, /*Template=*/nullptr, CreationMustSucceed);
+  if (NewAccess)
+MSSA->insertIntoListsForBlock(NewAccess, BB, Point);
   return NewAccess;
 }
 
diff --git a/llvm/lib/Transforms/Scalar/LICM.cpp 
b/llvm/lib/Transforms/Scalar/LICM.cpp
index 91ef2b4b7c1839..ca03eff7a4e25f 100644
--- a/llvm/lib/Transforms/Scalar/LICM.cpp
+++ b/llvm/lib/Transforms/Scalar/LICM.cpp
@@ -1464,8 +1464,11 @@ static Instruction *cloneInstructionInExitBlock(
 
   if (MSSAU.getMemorySSA()->getMemoryAccess(&I)) {
 // Create a new MemoryAccess and let MemorySSA set its defining access.
+// After running some passes, MemorySSA might be outdated, and the
+// instruction `I` may have become a non-memory touching instruction.
 MemoryAccess *NewMemAcc = MSSAU.createMemoryAccessInBB(
-New, nullptr, New->getParent(), MemorySSA::Beginning);
+New, nullptr, New->getParent(), MemorySSA::Beginning,
+/*CreationMustSucceed=*/false);
 if (NewMemAcc) {
   if (auto *MemDef = dyn_cast(NewMemAcc))
 MSSAU.insertDef(MemDef, /*RenameUses=*/true);
diff --git a/llvm/test/Transforms/LICM/PR116813-memoryssa-outdated.ll 
b/llvm/test/Transforms/LICM/PR116813-memoryssa-outdated.ll
new file mode 100644
index 00..a040c3cc6947c6
--- /dev/null
+++ b/llvm/test/Transforms/LICM/PR116813-memoryssa-outdated.ll
@@ -0,0 +1,50 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py 
UTC_ARGS: --version 5
+; RUN: opt -passes='loop-mssa(simple-loop-unswitch,licm)' 
-verify-memoryssa -S < %s | FileCheck %s
+
+; Check that running LICM after SimpleLoopUnswitch does not result in a crash.
+
+define i32 @foo(i1 %arg, ptr %arg1) {
+; CHECK-LABEL: define i32 @foo(
+; CHECK-SAME: i1 [[ARG:%.*]], ptr [[ARG1:%.*]]) {
+; CHECK-NEXT:  [[START:.*:]]
+; CHECK-NEXT:[[ARG_FR:%.*]] = freeze i1 [[ARG]]
+; CHECK-NEXT:br i1 [[ARG_FR]], label %[[START_SPLIT_US:.*]], label 
%[[START_SPLIT:.*]]
+; CHECK:   [[START_SPLIT_US]]:
+; CHECK-NEXT:br label %[[LOOP_US:.*]]
+; CHECK:   [[LOOP_US]]:
+; CHECK-NEXT:br label %[[BB0:.*]]
+; CHECK:   [[BB0]

[llvm-branch-commits] [llvm] release/19.x: [LICM] allow MemoryAccess creation failure (#116813) (PR #117082)

2024-11-21 Thread via llvm-branch-commits


@@ -192,6 +192,12 @@ class MemorySSAUpdater {
const BasicBlock *BB,
MemorySSA::InsertionPlace Point);
 
+  MemoryAccess *createMemoryAccessInBB2(Instruction *I,
+MemoryAccess *Definition,
+const BasicBlock *BB,
+MemorySSA::InsertionPlace Point,
+bool CreationMustSucceed = true);

DianQK wrote:

Ah, yes! :3

https://github.com/llvm/llvm-project/pull/117082
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/19.x: [LICM] allow MemoryAccess creation failure (#116813) (PR #117082)

2024-11-21 Thread Nikita Popov via llvm-branch-commits


@@ -192,6 +192,12 @@ class MemorySSAUpdater {
const BasicBlock *BB,
MemorySSA::InsertionPlace Point);
 
+  MemoryAccess *createMemoryAccessInBB2(Instruction *I,
+MemoryAccess *Definition,
+const BasicBlock *BB,
+MemorySSA::InsertionPlace Point,
+bool CreationMustSucceed = true);

nikic wrote:

```suggestion
  MemoryAccess *createMemoryAccessInBB(Instruction *I,
   MemoryAccess *Definition,
   const BasicBlock *BB,
   MemorySSA::InsertionPlace Point,
   bool CreationMustSucceed);
```
This can be an overload with the extra parameter, no need to use a different 
name.

https://github.com/llvm/llvm-project/pull/117082
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [MLIR][OpenMP] Add Lowering support for OpenMP Declare Mapper directive (PR #117046)

2024-11-21 Thread Tom Eccles via llvm-branch-commits


@@ -21,7 +21,7 @@ subroutine declare_mapper_1
   type (my_type2):: t
   real   :: x, y(nvals)
   !$omp declare mapper (my_type :: var) map (var, var%values (1:var%num_vals))
-!CHECK: not yet implemented: OpenMPDeclareMapperConstruct
+!CHECK: not yet implemented: lowering symbol to HLFIR

tblah wrote:

I'm surprised to see this TODO come up. Please could you fix this before 
merging so that we can maintain a helpful error message for the user.

https://github.com/llvm/llvm-project/pull/117046
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [MLIR][OpenMP] Add Lowering support for OpenMP Declare Mapper directive (PR #117046)

2024-11-21 Thread Tom Eccles via llvm-branch-commits


@@ -2701,7 +2702,39 @@ static void
 genOMP(lower::AbstractConverter &converter, lower::SymMap &symTable,
semantics::SemanticsContext &semaCtx, lower::pft::Evaluation &eval,
const parser::OpenMPDeclareMapperConstruct &declareMapperConstruct) {
-  TODO(converter.getCurrentLocation(), "OpenMPDeclareMapperConstruct");
+  fir::FirOpBuilder &firOpBuilder = converter.getFirOpBuilder();
+  lower::StatementContext stmtCtx;
+  const auto &spec =
+  std::get(declareMapperConstruct.t);
+  const auto &mapperName{std::get>(spec.t)};
+  const auto &varType{std::get(spec.t)};
+  const auto &varName{std::get(spec.t)};
+  std::stringstream mapperNameStr;
+  if (mapperName.has_value()) {
+mapperNameStr << mapperName->ToString();
+  } else {
+mapperNameStr << "default_"
+  << varType.declTypeSpec->derivedTypeSpec().name().ToString();
+  }

tblah wrote:

Two nits. Feel free to ignore number 2.
1. Flang **lowering** follows the MLIR style guide, which in this case matches 
LLVM: 
https://llvm.org/docs/CodingStandards.html#don-t-use-braces-on-simple-single-statement-bodies-of-if-else-loop-statements
2. To me, a `std::stringstream` feels like overkill here. You could use a 
`std::string` with the concatenation in the else branch handled  by an implicit 
`Twine` (https://llvm.org/docs/ProgrammersManual.html#llvm-adt-twine-h)

https://github.com/llvm/llvm-project/pull/117046
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/19.x: [SCEV] Fix sext handling for `getConstantMultiple` (#117093) (PR #117136)

2024-11-21 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-llvm-transforms

Author: None (llvmbot)


Changes

Backport 458dfbd855806461b4508bf8845cafe0411dbfd4

Requested by: @dtcxzyw

---
Full diff: https://github.com/llvm/llvm-project/pull/117136.diff


3 Files Affected:

- (modified) llvm/lib/Analysis/ScalarEvolution.cpp (+3-1) 
- (added) llvm/test/Analysis/ScalarEvolution/pr116483.ll (+26) 
- (added) llvm/test/Transforms/IndVarSimplify/pr116483.ll (+36) 


``diff
diff --git a/llvm/lib/Analysis/ScalarEvolution.cpp 
b/llvm/lib/Analysis/ScalarEvolution.cpp
index 51cffac8087689..412cfe73d3e559 100644
--- a/llvm/lib/Analysis/ScalarEvolution.cpp
+++ b/llvm/lib/Analysis/ScalarEvolution.cpp
@@ -6313,8 +6313,10 @@ APInt ScalarEvolution::getConstantMultipleImpl(const 
SCEV *S) {
 return getConstantMultiple(Z->getOperand()).zext(BitWidth);
   }
   case scSignExtend: {
+// Only multiples that are a power of 2 will hold after sext.
 const SCEVSignExtendExpr *E = cast(S);
-return getConstantMultiple(E->getOperand()).sext(BitWidth);
+uint32_t TZ = getMinTrailingZeros(E->getOperand());
+return GetShiftedByZeros(TZ);
   }
   case scMulExpr: {
 const SCEVMulExpr *M = cast(S);
diff --git a/llvm/test/Analysis/ScalarEvolution/pr116483.ll 
b/llvm/test/Analysis/ScalarEvolution/pr116483.ll
new file mode 100644
index 00..cc2334e9c64f92
--- /dev/null
+++ b/llvm/test/Analysis/ScalarEvolution/pr116483.ll
@@ -0,0 +1,26 @@
+; NOTE: Assertions have been autogenerated by 
utils/update_analyze_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -S -disable-output "-passes=print" < %s 2>&1 | 
FileCheck %s
+
+define i16 @test() {
+; CHECK-LABEL: 'test'
+; CHECK-NEXT:  Classifying expressions for: @test
+; CHECK-NEXT:%xor = xor i32 0, 3
+; CHECK-NEXT:--> %xor U: [3,4) S: [3,4)
+; CHECK-NEXT:%mul = mul i32 %xor, 329
+; CHECK-NEXT:--> (329 * %xor) U: [987,988) S: [987,988)
+; CHECK-NEXT:%conv = trunc i32 %mul to i16
+; CHECK-NEXT:--> (329 * (trunc i32 %xor to i16)) U: [987,988) S: 
[987,988)
+; CHECK-NEXT:%sext = shl i16 %conv, 8
+; CHECK-NEXT:--> (18688 * (trunc i32 %xor to i16)) U: [-9472,-9471) 
S: [-9472,-9471)
+; CHECK-NEXT:%conv1 = ashr i16 %sext, 8
+; CHECK-NEXT:--> (sext i8 (73 * (trunc i32 %xor to i8)) to i16) U: 
[-37,-36) S: [-37,-36)
+; CHECK-NEXT:  Determining loop execution counts for: @test
+;
+entry:
+  %xor = xor i32 0, 3
+  %mul = mul i32 %xor, 329
+  %conv = trunc i32 %mul to i16
+  %sext = shl i16 %conv, 8
+  %conv1 = ashr i16 %sext, 8
+  ret i16 %conv1
+}
diff --git a/llvm/test/Transforms/IndVarSimplify/pr116483.ll 
b/llvm/test/Transforms/IndVarSimplify/pr116483.ll
new file mode 100644
index 00..ae108a525223e0
--- /dev/null
+++ b/llvm/test/Transforms/IndVarSimplify/pr116483.ll
@@ -0,0 +1,36 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py 
UTC_ARGS: --version 5
+; RUN: opt -S -passes=indvars < %s | FileCheck %s
+
+define i32 @test() {
+; CHECK-LABEL: define i32 @test() {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:[[XOR:%.*]] = xor i32 0, 3
+; CHECK-NEXT:[[MUL:%.*]] = mul i32 [[XOR]], 329
+; CHECK-NEXT:[[CONV:%.*]] = trunc i32 [[MUL]] to i16
+; CHECK-NEXT:[[SEXT:%.*]] = shl i16 [[CONV]], 8
+; CHECK-NEXT:[[CONV1:%.*]] = ashr i16 [[SEXT]], 8
+; CHECK-NEXT:br label %[[LOOP_BODY:.*]]
+; CHECK:   [[LOOP_BODY]]:
+; CHECK-NEXT:br i1 true, label %[[EXIT:.*]], label %[[LOOP_BODY]]
+; CHECK:   [[EXIT]]:
+; CHECK-NEXT:[[CONV3:%.*]] = zext i16 [[CONV1]] to i32
+; CHECK-NEXT:ret i32 [[CONV3]]
+;
+entry:
+  %xor = xor i32 0, 3
+  %mul = mul i32 %xor, 329
+  %conv = trunc i32 %mul to i16
+  %sext = shl i16 %conv, 8
+  %conv1 = ashr i16 %sext, 8
+  %conv3 = zext i16 %conv1 to i32
+  br label %loop.body
+
+loop.body:
+  %indvar = phi i32 [ %indvar.inc, %loop.body ], [ 1, %entry ]
+  %indvar.inc = add nuw i32 %indvar, 1
+  %exitcond = icmp eq i32 %indvar, %conv3
+  br i1 %exitcond, label %exit, label %loop.body
+
+exit:
+  ret i32 %conv3
+}

``




https://github.com/llvm/llvm-project/pull/117136
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/19.x: [ConstraintElim] Bail out on non-dedicated exits when adding exiting conditions (#116627) (PR #117137)

2024-11-21 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-llvm-transforms

Author: None (llvmbot)


Changes

Backport 52361d0368b79841be12156bf03cf8c1851e5df7

Requested by: @antoniofrighetto

---
Full diff: https://github.com/llvm/llvm-project/pull/117137.diff


2 Files Affected:

- (modified) llvm/lib/Transforms/Scalar/ConstraintElimination.cpp (+8-5) 
- (modified) 
llvm/test/Transforms/ConstraintElimination/induction-condition-in-loop-exit.ll 
(+44) 


``diff
diff --git a/llvm/lib/Transforms/Scalar/ConstraintElimination.cpp 
b/llvm/lib/Transforms/Scalar/ConstraintElimination.cpp
index 37022104d0a9bd..d1c80aa6712433 100644
--- a/llvm/lib/Transforms/Scalar/ConstraintElimination.cpp
+++ b/llvm/lib/Transforms/Scalar/ConstraintElimination.cpp
@@ -1033,9 +1033,9 @@ void State::addInfoForInductions(BasicBlock &BB) {
   DTN, CmpInst::ICMP_SLT, PN, B,
   ConditionTy(CmpInst::ICMP_SLE, StartValue, B)));
 
-  // Try to add condition from header to the exit blocks. When exiting either
-  // with EQ or NE in the header, we know that the induction value must be u<=
-  // B, as other exits may only exit earlier.
+  // Try to add condition from header to the dedicated exit blocks. When 
exiting
+  // either with EQ or NE in the header, we know that the induction value must
+  // be u<= B, as other exits may only exit earlier.
   assert(!StepOffset.isNegative() && "induction must be increasing");
   assert((Pred == CmpInst::ICMP_EQ || Pred == CmpInst::ICMP_NE) &&
  "unsupported predicate");
@@ -1043,8 +1043,11 @@ void State::addInfoForInductions(BasicBlock &BB) {
   SmallVector ExitBBs;
   L->getExitBlocks(ExitBBs);
   for (BasicBlock *EB : ExitBBs) {
-WorkList.emplace_back(FactOrCheck::getConditionFact(
-DT.getNode(EB), CmpInst::ICMP_ULE, A, B, Precond));
+// Bail out on non-dedicated exits.
+if (DT.dominates(&BB, EB)) {
+  WorkList.emplace_back(FactOrCheck::getConditionFact(
+  DT.getNode(EB), CmpInst::ICMP_ULE, A, B, Precond));
+}
   }
 }
 
diff --git 
a/llvm/test/Transforms/ConstraintElimination/induction-condition-in-loop-exit.ll
 
b/llvm/test/Transforms/ConstraintElimination/induction-condition-in-loop-exit.ll
index 15e1d843726278..a04b06e1bf0a52 100644
--- 
a/llvm/test/Transforms/ConstraintElimination/induction-condition-in-loop-exit.ll
+++ 
b/llvm/test/Transforms/ConstraintElimination/induction-condition-in-loop-exit.ll
@@ -763,3 +763,47 @@ exit.2:
   %t.2 = icmp ult i32 %iv, %N
   ret i1 %t.2
 }
+
+define i1 @test_non_dedicated_exit(i16 %n) {
+; CHECK-LABEL: define i1 @test_non_dedicated_exit(
+; CHECK-SAME: i16 [[N:%.*]]) {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:[[COND:%.*]] = icmp slt i16 [[N]], 1
+; CHECK-NEXT:br i1 [[COND]], label %[[EXIT:.*]], label 
%[[LOOP_PREHEADER:.*]]
+; CHECK:   [[LOOP_PREHEADER]]:
+; CHECK-NEXT:[[SUB:%.*]] = add nsw i16 [[N]], -1
+; CHECK-NEXT:[[EXT:%.*]] = zext nneg i16 [[SUB]] to i32
+; CHECK-NEXT:br label %[[LOOP:.*]]
+; CHECK:   [[LOOP]]:
+; CHECK-NEXT:[[INDVAR:%.*]] = phi i32 [ [[INDVAR_INC:%.*]], 
%[[LOOP_LATCH:.*]] ], [ 0, %[[LOOP_PREHEADER]] ]
+; CHECK-NEXT:[[EXITCOND:%.*]] = icmp eq i32 [[INDVAR]], [[EXT]]
+; CHECK-NEXT:br i1 [[EXITCOND]], label %[[EXIT]], label %[[LOOP_LATCH]]
+; CHECK:   [[LOOP_LATCH]]:
+; CHECK-NEXT:[[INDVAR_INC]] = add nuw nsw i32 [[INDVAR]], 1
+; CHECK-NEXT:br label %[[LOOP]]
+; CHECK:   [[EXIT]]:
+; CHECK-NEXT:[[CMP:%.*]] = icmp sgt i16 [[N]], 0
+; CHECK-NEXT:ret i1 [[CMP]]
+;
+entry:
+  %cond = icmp slt i16 %n, 1
+  br i1 %cond, label %exit, label %loop.preheader
+
+loop.preheader:
+  %sub = add nsw i16 %n, -1
+  %ext = zext nneg i16 %sub to i32
+  br label %loop
+
+loop:
+  %indvar = phi i32 [ %indvar.inc, %loop.latch ], [ 0, %loop.preheader ]
+  %exitcond = icmp eq i32 %indvar, %ext
+  br i1 %exitcond, label %exit, label %loop.latch
+
+loop.latch:
+  %indvar.inc = add nuw nsw i32 %indvar, 1
+  br label %loop
+
+exit:
+  %cmp = icmp sgt i16 %n, 0
+  ret i1 %cmp
+}

``




https://github.com/llvm/llvm-project/pull/117137
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/19.x: [ConstraintElim] Bail out on non-dedicated exits when adding exiting conditions (#116627) (PR #117137)

2024-11-21 Thread via llvm-branch-commits

https://github.com/llvmbot milestoned 
https://github.com/llvm/llvm-project/pull/117137
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/19.x: [ConstraintElim] Bail out on non-dedicated exits when adding exiting conditions (#116627) (PR #117137)

2024-11-21 Thread via llvm-branch-commits

https://github.com/llvmbot created 
https://github.com/llvm/llvm-project/pull/117137

Backport 52361d0368b79841be12156bf03cf8c1851e5df7

Requested by: @antoniofrighetto

>From 4e3f5191928641fdf7298ee21fdf09ab0f17a53e Mon Sep 17 00:00:00 2001
From: Yingwei Zheng 
Date: Mon, 18 Nov 2024 23:41:04 +0800
Subject: [PATCH] [ConstraintElim] Bail out on non-dedicated exits when adding
 exiting conditions (#116627)

This patch bails out non-dedicated exits to avoid adding exiting
conditions to invalid context.
Closes https://github.com/llvm/llvm-project/issues/116553.

(cherry picked from commit 52361d0368b79841be12156bf03cf8c1851e5df7)
---
 .../Scalar/ConstraintElimination.cpp  | 13 +++---
 .../induction-condition-in-loop-exit.ll   | 44 +++
 2 files changed, 52 insertions(+), 5 deletions(-)

diff --git a/llvm/lib/Transforms/Scalar/ConstraintElimination.cpp 
b/llvm/lib/Transforms/Scalar/ConstraintElimination.cpp
index 37022104d0a9bd..d1c80aa6712433 100644
--- a/llvm/lib/Transforms/Scalar/ConstraintElimination.cpp
+++ b/llvm/lib/Transforms/Scalar/ConstraintElimination.cpp
@@ -1033,9 +1033,9 @@ void State::addInfoForInductions(BasicBlock &BB) {
   DTN, CmpInst::ICMP_SLT, PN, B,
   ConditionTy(CmpInst::ICMP_SLE, StartValue, B)));
 
-  // Try to add condition from header to the exit blocks. When exiting either
-  // with EQ or NE in the header, we know that the induction value must be u<=
-  // B, as other exits may only exit earlier.
+  // Try to add condition from header to the dedicated exit blocks. When 
exiting
+  // either with EQ or NE in the header, we know that the induction value must
+  // be u<= B, as other exits may only exit earlier.
   assert(!StepOffset.isNegative() && "induction must be increasing");
   assert((Pred == CmpInst::ICMP_EQ || Pred == CmpInst::ICMP_NE) &&
  "unsupported predicate");
@@ -1043,8 +1043,11 @@ void State::addInfoForInductions(BasicBlock &BB) {
   SmallVector ExitBBs;
   L->getExitBlocks(ExitBBs);
   for (BasicBlock *EB : ExitBBs) {
-WorkList.emplace_back(FactOrCheck::getConditionFact(
-DT.getNode(EB), CmpInst::ICMP_ULE, A, B, Precond));
+// Bail out on non-dedicated exits.
+if (DT.dominates(&BB, EB)) {
+  WorkList.emplace_back(FactOrCheck::getConditionFact(
+  DT.getNode(EB), CmpInst::ICMP_ULE, A, B, Precond));
+}
   }
 }
 
diff --git 
a/llvm/test/Transforms/ConstraintElimination/induction-condition-in-loop-exit.ll
 
b/llvm/test/Transforms/ConstraintElimination/induction-condition-in-loop-exit.ll
index 15e1d843726278..a04b06e1bf0a52 100644
--- 
a/llvm/test/Transforms/ConstraintElimination/induction-condition-in-loop-exit.ll
+++ 
b/llvm/test/Transforms/ConstraintElimination/induction-condition-in-loop-exit.ll
@@ -763,3 +763,47 @@ exit.2:
   %t.2 = icmp ult i32 %iv, %N
   ret i1 %t.2
 }
+
+define i1 @test_non_dedicated_exit(i16 %n) {
+; CHECK-LABEL: define i1 @test_non_dedicated_exit(
+; CHECK-SAME: i16 [[N:%.*]]) {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:[[COND:%.*]] = icmp slt i16 [[N]], 1
+; CHECK-NEXT:br i1 [[COND]], label %[[EXIT:.*]], label 
%[[LOOP_PREHEADER:.*]]
+; CHECK:   [[LOOP_PREHEADER]]:
+; CHECK-NEXT:[[SUB:%.*]] = add nsw i16 [[N]], -1
+; CHECK-NEXT:[[EXT:%.*]] = zext nneg i16 [[SUB]] to i32
+; CHECK-NEXT:br label %[[LOOP:.*]]
+; CHECK:   [[LOOP]]:
+; CHECK-NEXT:[[INDVAR:%.*]] = phi i32 [ [[INDVAR_INC:%.*]], 
%[[LOOP_LATCH:.*]] ], [ 0, %[[LOOP_PREHEADER]] ]
+; CHECK-NEXT:[[EXITCOND:%.*]] = icmp eq i32 [[INDVAR]], [[EXT]]
+; CHECK-NEXT:br i1 [[EXITCOND]], label %[[EXIT]], label %[[LOOP_LATCH]]
+; CHECK:   [[LOOP_LATCH]]:
+; CHECK-NEXT:[[INDVAR_INC]] = add nuw nsw i32 [[INDVAR]], 1
+; CHECK-NEXT:br label %[[LOOP]]
+; CHECK:   [[EXIT]]:
+; CHECK-NEXT:[[CMP:%.*]] = icmp sgt i16 [[N]], 0
+; CHECK-NEXT:ret i1 [[CMP]]
+;
+entry:
+  %cond = icmp slt i16 %n, 1
+  br i1 %cond, label %exit, label %loop.preheader
+
+loop.preheader:
+  %sub = add nsw i16 %n, -1
+  %ext = zext nneg i16 %sub to i32
+  br label %loop
+
+loop:
+  %indvar = phi i32 [ %indvar.inc, %loop.latch ], [ 0, %loop.preheader ]
+  %exitcond = icmp eq i32 %indvar, %ext
+  br i1 %exitcond, label %exit, label %loop.latch
+
+loop.latch:
+  %indvar.inc = add nuw nsw i32 %indvar, 1
+  br label %loop
+
+exit:
+  %cmp = icmp sgt i16 %n, 0
+  ret i1 %cmp
+}

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/19.x: [ConstraintElim] Bail out on non-dedicated exits when adding exiting conditions (#116627) (PR #117137)

2024-11-21 Thread via llvm-branch-commits

llvmbot wrote:

@fhahn What do you think about merging this PR to the release branch?

https://github.com/llvm/llvm-project/pull/117137
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoongArch] Fix GOT usage for `non-dso_local` function calls in large code model (PR #117134)

2024-11-21 Thread via llvm-branch-commits

https://github.com/wangleiat milestoned 
https://github.com/llvm/llvm-project/pull/117134
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/19.x: [SCEV] Fix sext handling for `getConstantMultiple` (#117093) (PR #117136)

2024-11-21 Thread via llvm-branch-commits

https://github.com/llvmbot created 
https://github.com/llvm/llvm-project/pull/117136

Backport 458dfbd855806461b4508bf8845cafe0411dbfd4

Requested by: @dtcxzyw

>From f6c67ad7a20fe7bb535242c78b8f06cacc48d521 Mon Sep 17 00:00:00 2001
From: Yingwei Zheng 
Date: Thu, 21 Nov 2024 17:23:04 +0800
Subject: [PATCH] [SCEV] Fix sext handling for `getConstantMultiple` (#117093)

Counterexample: 219 is a multiple of 73. But `sext i8 219 to i16 =
65499` is not.
Fixes https://github.com/llvm/llvm-project/issues/116483.

(cherry picked from commit 458dfbd855806461b4508bf8845cafe0411dbfd4)
---
 llvm/lib/Analysis/ScalarEvolution.cpp |  4 ++-
 .../test/Analysis/ScalarEvolution/pr116483.ll | 26 ++
 .../Transforms/IndVarSimplify/pr116483.ll | 36 +++
 3 files changed, 65 insertions(+), 1 deletion(-)
 create mode 100644 llvm/test/Analysis/ScalarEvolution/pr116483.ll
 create mode 100644 llvm/test/Transforms/IndVarSimplify/pr116483.ll

diff --git a/llvm/lib/Analysis/ScalarEvolution.cpp 
b/llvm/lib/Analysis/ScalarEvolution.cpp
index 51cffac8087689..412cfe73d3e559 100644
--- a/llvm/lib/Analysis/ScalarEvolution.cpp
+++ b/llvm/lib/Analysis/ScalarEvolution.cpp
@@ -6313,8 +6313,10 @@ APInt ScalarEvolution::getConstantMultipleImpl(const 
SCEV *S) {
 return getConstantMultiple(Z->getOperand()).zext(BitWidth);
   }
   case scSignExtend: {
+// Only multiples that are a power of 2 will hold after sext.
 const SCEVSignExtendExpr *E = cast(S);
-return getConstantMultiple(E->getOperand()).sext(BitWidth);
+uint32_t TZ = getMinTrailingZeros(E->getOperand());
+return GetShiftedByZeros(TZ);
   }
   case scMulExpr: {
 const SCEVMulExpr *M = cast(S);
diff --git a/llvm/test/Analysis/ScalarEvolution/pr116483.ll 
b/llvm/test/Analysis/ScalarEvolution/pr116483.ll
new file mode 100644
index 00..cc2334e9c64f92
--- /dev/null
+++ b/llvm/test/Analysis/ScalarEvolution/pr116483.ll
@@ -0,0 +1,26 @@
+; NOTE: Assertions have been autogenerated by 
utils/update_analyze_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -S -disable-output "-passes=print" < %s 2>&1 | 
FileCheck %s
+
+define i16 @test() {
+; CHECK-LABEL: 'test'
+; CHECK-NEXT:  Classifying expressions for: @test
+; CHECK-NEXT:%xor = xor i32 0, 3
+; CHECK-NEXT:--> %xor U: [3,4) S: [3,4)
+; CHECK-NEXT:%mul = mul i32 %xor, 329
+; CHECK-NEXT:--> (329 * %xor) U: [987,988) S: [987,988)
+; CHECK-NEXT:%conv = trunc i32 %mul to i16
+; CHECK-NEXT:--> (329 * (trunc i32 %xor to i16)) U: [987,988) S: 
[987,988)
+; CHECK-NEXT:%sext = shl i16 %conv, 8
+; CHECK-NEXT:--> (18688 * (trunc i32 %xor to i16)) U: [-9472,-9471) 
S: [-9472,-9471)
+; CHECK-NEXT:%conv1 = ashr i16 %sext, 8
+; CHECK-NEXT:--> (sext i8 (73 * (trunc i32 %xor to i8)) to i16) U: 
[-37,-36) S: [-37,-36)
+; CHECK-NEXT:  Determining loop execution counts for: @test
+;
+entry:
+  %xor = xor i32 0, 3
+  %mul = mul i32 %xor, 329
+  %conv = trunc i32 %mul to i16
+  %sext = shl i16 %conv, 8
+  %conv1 = ashr i16 %sext, 8
+  ret i16 %conv1
+}
diff --git a/llvm/test/Transforms/IndVarSimplify/pr116483.ll 
b/llvm/test/Transforms/IndVarSimplify/pr116483.ll
new file mode 100644
index 00..ae108a525223e0
--- /dev/null
+++ b/llvm/test/Transforms/IndVarSimplify/pr116483.ll
@@ -0,0 +1,36 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py 
UTC_ARGS: --version 5
+; RUN: opt -S -passes=indvars < %s | FileCheck %s
+
+define i32 @test() {
+; CHECK-LABEL: define i32 @test() {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:[[XOR:%.*]] = xor i32 0, 3
+; CHECK-NEXT:[[MUL:%.*]] = mul i32 [[XOR]], 329
+; CHECK-NEXT:[[CONV:%.*]] = trunc i32 [[MUL]] to i16
+; CHECK-NEXT:[[SEXT:%.*]] = shl i16 [[CONV]], 8
+; CHECK-NEXT:[[CONV1:%.*]] = ashr i16 [[SEXT]], 8
+; CHECK-NEXT:br label %[[LOOP_BODY:.*]]
+; CHECK:   [[LOOP_BODY]]:
+; CHECK-NEXT:br i1 true, label %[[EXIT:.*]], label %[[LOOP_BODY]]
+; CHECK:   [[EXIT]]:
+; CHECK-NEXT:[[CONV3:%.*]] = zext i16 [[CONV1]] to i32
+; CHECK-NEXT:ret i32 [[CONV3]]
+;
+entry:
+  %xor = xor i32 0, 3
+  %mul = mul i32 %xor, 329
+  %conv = trunc i32 %mul to i16
+  %sext = shl i16 %conv, 8
+  %conv1 = ashr i16 %sext, 8
+  %conv3 = zext i16 %conv1 to i32
+  br label %loop.body
+
+loop.body:
+  %indvar = phi i32 [ %indvar.inc, %loop.body ], [ 1, %entry ]
+  %indvar.inc = add nuw i32 %indvar, 1
+  %exitcond = icmp eq i32 %indvar, %conv3
+  br i1 %exitcond, label %exit, label %loop.body
+
+exit:
+  ret i32 %conv3
+}

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/19.x: [SCEV] Fix sext handling for `getConstantMultiple` (#117093) (PR #117136)

2024-11-21 Thread via llvm-branch-commits

llvmbot wrote:

@antoniofrighetto What do you think about merging this PR to the release branch?

https://github.com/llvm/llvm-project/pull/117136
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/19.x: [SCEV] Fix sext handling for `getConstantMultiple` (#117093) (PR #117136)

2024-11-21 Thread via llvm-branch-commits

https://github.com/llvmbot milestoned 
https://github.com/llvm/llvm-project/pull/117136
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/19.x: [SCEV] Fix sext handling for `getConstantMultiple` (#117093) (PR #117136)

2024-11-21 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-llvm-analysis

Author: None (llvmbot)


Changes

Backport 458dfbd855806461b4508bf8845cafe0411dbfd4

Requested by: @dtcxzyw

---
Full diff: https://github.com/llvm/llvm-project/pull/117136.diff


3 Files Affected:

- (modified) llvm/lib/Analysis/ScalarEvolution.cpp (+3-1) 
- (added) llvm/test/Analysis/ScalarEvolution/pr116483.ll (+26) 
- (added) llvm/test/Transforms/IndVarSimplify/pr116483.ll (+36) 


``diff
diff --git a/llvm/lib/Analysis/ScalarEvolution.cpp 
b/llvm/lib/Analysis/ScalarEvolution.cpp
index 51cffac8087689..412cfe73d3e559 100644
--- a/llvm/lib/Analysis/ScalarEvolution.cpp
+++ b/llvm/lib/Analysis/ScalarEvolution.cpp
@@ -6313,8 +6313,10 @@ APInt ScalarEvolution::getConstantMultipleImpl(const 
SCEV *S) {
 return getConstantMultiple(Z->getOperand()).zext(BitWidth);
   }
   case scSignExtend: {
+// Only multiples that are a power of 2 will hold after sext.
 const SCEVSignExtendExpr *E = cast(S);
-return getConstantMultiple(E->getOperand()).sext(BitWidth);
+uint32_t TZ = getMinTrailingZeros(E->getOperand());
+return GetShiftedByZeros(TZ);
   }
   case scMulExpr: {
 const SCEVMulExpr *M = cast(S);
diff --git a/llvm/test/Analysis/ScalarEvolution/pr116483.ll 
b/llvm/test/Analysis/ScalarEvolution/pr116483.ll
new file mode 100644
index 00..cc2334e9c64f92
--- /dev/null
+++ b/llvm/test/Analysis/ScalarEvolution/pr116483.ll
@@ -0,0 +1,26 @@
+; NOTE: Assertions have been autogenerated by 
utils/update_analyze_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -S -disable-output "-passes=print" < %s 2>&1 | 
FileCheck %s
+
+define i16 @test() {
+; CHECK-LABEL: 'test'
+; CHECK-NEXT:  Classifying expressions for: @test
+; CHECK-NEXT:%xor = xor i32 0, 3
+; CHECK-NEXT:--> %xor U: [3,4) S: [3,4)
+; CHECK-NEXT:%mul = mul i32 %xor, 329
+; CHECK-NEXT:--> (329 * %xor) U: [987,988) S: [987,988)
+; CHECK-NEXT:%conv = trunc i32 %mul to i16
+; CHECK-NEXT:--> (329 * (trunc i32 %xor to i16)) U: [987,988) S: 
[987,988)
+; CHECK-NEXT:%sext = shl i16 %conv, 8
+; CHECK-NEXT:--> (18688 * (trunc i32 %xor to i16)) U: [-9472,-9471) 
S: [-9472,-9471)
+; CHECK-NEXT:%conv1 = ashr i16 %sext, 8
+; CHECK-NEXT:--> (sext i8 (73 * (trunc i32 %xor to i8)) to i16) U: 
[-37,-36) S: [-37,-36)
+; CHECK-NEXT:  Determining loop execution counts for: @test
+;
+entry:
+  %xor = xor i32 0, 3
+  %mul = mul i32 %xor, 329
+  %conv = trunc i32 %mul to i16
+  %sext = shl i16 %conv, 8
+  %conv1 = ashr i16 %sext, 8
+  ret i16 %conv1
+}
diff --git a/llvm/test/Transforms/IndVarSimplify/pr116483.ll 
b/llvm/test/Transforms/IndVarSimplify/pr116483.ll
new file mode 100644
index 00..ae108a525223e0
--- /dev/null
+++ b/llvm/test/Transforms/IndVarSimplify/pr116483.ll
@@ -0,0 +1,36 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py 
UTC_ARGS: --version 5
+; RUN: opt -S -passes=indvars < %s | FileCheck %s
+
+define i32 @test() {
+; CHECK-LABEL: define i32 @test() {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:[[XOR:%.*]] = xor i32 0, 3
+; CHECK-NEXT:[[MUL:%.*]] = mul i32 [[XOR]], 329
+; CHECK-NEXT:[[CONV:%.*]] = trunc i32 [[MUL]] to i16
+; CHECK-NEXT:[[SEXT:%.*]] = shl i16 [[CONV]], 8
+; CHECK-NEXT:[[CONV1:%.*]] = ashr i16 [[SEXT]], 8
+; CHECK-NEXT:br label %[[LOOP_BODY:.*]]
+; CHECK:   [[LOOP_BODY]]:
+; CHECK-NEXT:br i1 true, label %[[EXIT:.*]], label %[[LOOP_BODY]]
+; CHECK:   [[EXIT]]:
+; CHECK-NEXT:[[CONV3:%.*]] = zext i16 [[CONV1]] to i32
+; CHECK-NEXT:ret i32 [[CONV3]]
+;
+entry:
+  %xor = xor i32 0, 3
+  %mul = mul i32 %xor, 329
+  %conv = trunc i32 %mul to i16
+  %sext = shl i16 %conv, 8
+  %conv1 = ashr i16 %sext, 8
+  %conv3 = zext i16 %conv1 to i32
+  br label %loop.body
+
+loop.body:
+  %indvar = phi i32 [ %indvar.inc, %loop.body ], [ 1, %entry ]
+  %indvar.inc = add nuw i32 %indvar, 1
+  %exitcond = icmp eq i32 %indvar, %conv3
+  br i1 %exitcond, label %exit, label %loop.body
+
+exit:
+  ret i32 %conv3
+}

``




https://github.com/llvm/llvm-project/pull/117136
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/19.x: [SCEV] Fix sext handling for `getConstantMultiple` (#117093) (PR #117136)

2024-11-21 Thread Nikita Popov via llvm-branch-commits

https://github.com/nikic approved this pull request.


https://github.com/llvm/llvm-project/pull/117136
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/19.x: [LICM] allow MemoryAccess creation failure (#116813) (PR #117082)

2024-11-21 Thread Nikita Popov via llvm-branch-commits


@@ -190,7 +190,8 @@ class MemorySSAUpdater {
   /// inaccessible and it *must* have removeMemoryAccess called on it.
   MemoryAccess *createMemoryAccessInBB(Instruction *I, MemoryAccess 
*Definition,
const BasicBlock *BB,
-   MemorySSA::InsertionPlace Point);
+   MemorySSA::InsertionPlace Point,
+   bool CreationMustSucceed = true);

nikic wrote:

This is an ABI-breaking change. Instead of an optional argument, you need to 
add two functions and forward one to the other.

https://github.com/llvm/llvm-project/pull/117082
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [lld] [llvm] release/19.x: [MC][LoongArch] Change default cpu in `MCSubtargetInfo`. (#114922) (PR #117105)

2024-11-21 Thread via llvm-branch-commits

heiher wrote:

> Some tests need to be fixed.
> 
> ```
> Failed Tests (3):
>   LLVM :: CodeGen/LoongArch/e_flags.ll
>   lld :: ELF/emulation-loongarch.s
>   lld :: ELF/loongarch-interlink.test
> ```

Fixed.

https://github.com/llvm/llvm-project/pull/117105
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoongArch] Fix GOT usage for `non-dso_local` function calls in large code model (PR #117134)

2024-11-21 Thread via llvm-branch-commits

https://github.com/wangleiat created 
https://github.com/llvm/llvm-project/pull/117134

This commit fixes an issue in the large code model where non-dso_local function 
calls did not use the GOT as expected in PIC mode. Instead, direct PC-relative 
access was incorrectly applied, leading to linker errors when building shared 
libraries.

For `ExternalSymbol`, it is not possible to determine whether it is dso_local 
during pseudo-instruction expansion. We use target flags to differentiate 
whether GOT should be used.

Cherry-picked from #117099, used for fix linker errors when bulding shared 
libraries with large code model.

>From 9616c8b70c9c272af93191624129dbf1f8992e41 Mon Sep 17 00:00:00 2001
From: wanglei 
Date: Thu, 21 Nov 2024 09:31:12 +0800
Subject: [PATCH] [LoongArch] Fix GOT usage for `non-dso_local` function calls
 in large code model

This commit fixes an issue in the large code model where non-dso_local
function calls did not use the GOT as expected in PIC mode. Instead,
direct PC-relative access was incorrectly applied, leading to linker
errors when building shared libraries.

For `ExternalSymbol`, it is not possible to determine whether it is
dso_local during pseudo-instruction expansion. We use target flags to
differentiate whether GOT should be used.

Cherry-picked from #117099, used for fix linker errors when bulding
shared libraries with large code model.
---
 .../LoongArch/LoongArchExpandPseudoInsts.cpp  |  2 +-
 llvm/test/CodeGen/LoongArch/code-models.ll| 10 ++---
 .../LoongArch/machinelicm-address-pseudos.ll  | 20 +-
 .../LoongArch/psabi-restricted-scheduling.ll  | 40 +--
 llvm/test/CodeGen/LoongArch/tls-models.ll | 20 +-
 5 files changed, 46 insertions(+), 46 deletions(-)

diff --git a/llvm/lib/Target/LoongArch/LoongArchExpandPseudoInsts.cpp 
b/llvm/lib/Target/LoongArch/LoongArchExpandPseudoInsts.cpp
index c136f5b3e515d7..e680dda7374d07 100644
--- a/llvm/lib/Target/LoongArch/LoongArchExpandPseudoInsts.cpp
+++ b/llvm/lib/Target/LoongArch/LoongArchExpandPseudoInsts.cpp
@@ -721,7 +721,7 @@ bool LoongArchExpandPseudo::expandFunctionCALL(
 IsTailCall ? LoongArch::PseudoJIRL_TAIL : LoongArch::PseudoJIRL_CALL;
 Register AddrReg = IsTailCall ? LoongArch::R19 : LoongArch::R1;
 
-bool UseGOT = Func.isGlobal() && !Func.getGlobal()->isDSOLocal();
+bool UseGOT = Func.getTargetFlags() == LoongArchII::MO_CALL_PLT;
 unsigned MO = UseGOT ? LoongArchII::MO_GOT_PC_HI : 
LoongArchII::MO_PCREL_LO;
 unsigned LAOpcode = UseGOT ? LoongArch::LDX_D : LoongArch::ADD_D;
 expandLargeAddressLoad(MBB, MBBI, NextMBBI, LAOpcode, MO, Func, AddrReg,
diff --git a/llvm/test/CodeGen/LoongArch/code-models.ll 
b/llvm/test/CodeGen/LoongArch/code-models.ll
index 4b2b72afaee171..4eb1e5e596fd3f 100644
--- a/llvm/test/CodeGen/LoongArch/code-models.ll
+++ b/llvm/test/CodeGen/LoongArch/code-models.ll
@@ -82,11 +82,11 @@ define void @call_external_sym(ptr %dst) {
 ; LARGE-NEXT:.cfi_offset 1, -8
 ; LARGE-NEXT:ori $a2, $zero, 1000
 ; LARGE-NEXT:move $a1, $zero
-; LARGE-NEXT:pcalau12i $ra, %pc_hi20(memset)
-; LARGE-NEXT:addi.d $t8, $zero, %pc_lo12(memset)
-; LARGE-NEXT:lu32i.d $t8, %pc64_lo20(memset)
-; LARGE-NEXT:lu52i.d $t8, $t8, %pc64_hi12(memset)
-; LARGE-NEXT:add.d $ra, $t8, $ra
+; LARGE-NEXT:pcalau12i $ra, %got_pc_hi20(memset)
+; LARGE-NEXT:addi.d $t8, $zero, %got_pc_lo12(memset)
+; LARGE-NEXT:lu32i.d $t8, %got64_pc_lo20(memset)
+; LARGE-NEXT:lu52i.d $t8, $t8, %got64_pc_hi12(memset)
+; LARGE-NEXT:ldx.d $ra, $t8, $ra
 ; LARGE-NEXT:jirl $ra, $ra, 0
 ; LARGE-NEXT:ld.d $ra, $sp, 8 # 8-byte Folded Reload
 ; LARGE-NEXT:addi.d $sp, $sp, 16
diff --git a/llvm/test/CodeGen/LoongArch/machinelicm-address-pseudos.ll 
b/llvm/test/CodeGen/LoongArch/machinelicm-address-pseudos.ll
index ed1a24e82b4e46..29348fe0d641ed 100644
--- a/llvm/test/CodeGen/LoongArch/machinelicm-address-pseudos.ll
+++ b/llvm/test/CodeGen/LoongArch/machinelicm-address-pseudos.ll
@@ -282,11 +282,11 @@ define void @test_la_tls_ld(i32 signext %n) {
 ; LA64LARGE-NEXT:  .LBB3_1: # %loop
 ; LA64LARGE-NEXT:# =>This Inner Loop Header: Depth=1
 ; LA64LARGE-NEXT:move $a0, $s0
-; LA64LARGE-NEXT:pcalau12i $ra, %pc_hi20(__tls_get_addr)
-; LA64LARGE-NEXT:addi.d $t8, $zero, %pc_lo12(__tls_get_addr)
-; LA64LARGE-NEXT:lu32i.d $t8, %pc64_lo20(__tls_get_addr)
-; LA64LARGE-NEXT:lu52i.d $t8, $t8, %pc64_hi12(__tls_get_addr)
-; LA64LARGE-NEXT:add.d $ra, $t8, $ra
+; LA64LARGE-NEXT:pcalau12i $ra, %got_pc_hi20(__tls_get_addr)
+; LA64LARGE-NEXT:addi.d $t8, $zero, %got_pc_lo12(__tls_get_addr)
+; LA64LARGE-NEXT:lu32i.d $t8, %got64_pc_lo20(__tls_get_addr)
+; LA64LARGE-NEXT:lu52i.d $t8, $t8, %got64_pc_hi12(__tls_get_addr)
+; LA64LARGE-NEXT:ldx.d $ra, $t8, $ra
 ; LA64LARGE-NEXT:jirl $ra, $ra, 0
 ; LA64LARGE-NEXT:ld.w $zero, $a0, 0
 ; LA64LARGE-NEXT:addi.w $s1, $s1, 1
@@ -448,11 +448,11 @@ define void @test_la_t

[llvm-branch-commits] [llvm] [LoongArch] Fix GOT usage for `non-dso_local` function calls in large code model (PR #117134)

2024-11-21 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-loongarch

Author: wanglei (wangleiat)


Changes

This commit fixes an issue in the large code model where non-dso_local function 
calls did not use the GOT as expected in PIC mode. Instead, direct PC-relative 
access was incorrectly applied, leading to linker errors when building shared 
libraries.

For `ExternalSymbol`, it is not possible to determine whether it is dso_local 
during pseudo-instruction expansion. We use target flags to differentiate 
whether GOT should be used.

Cherry-picked from #117099, used for fix linker errors when bulding 
shared libraries with large code model.

---
Full diff: https://github.com/llvm/llvm-project/pull/117134.diff


5 Files Affected:

- (modified) llvm/lib/Target/LoongArch/LoongArchExpandPseudoInsts.cpp (+1-1) 
- (modified) llvm/test/CodeGen/LoongArch/code-models.ll (+5-5) 
- (modified) llvm/test/CodeGen/LoongArch/machinelicm-address-pseudos.ll 
(+10-10) 
- (modified) llvm/test/CodeGen/LoongArch/psabi-restricted-scheduling.ll 
(+20-20) 
- (modified) llvm/test/CodeGen/LoongArch/tls-models.ll (+10-10) 


``diff
diff --git a/llvm/lib/Target/LoongArch/LoongArchExpandPseudoInsts.cpp 
b/llvm/lib/Target/LoongArch/LoongArchExpandPseudoInsts.cpp
index c136f5b3e515d7..e680dda7374d07 100644
--- a/llvm/lib/Target/LoongArch/LoongArchExpandPseudoInsts.cpp
+++ b/llvm/lib/Target/LoongArch/LoongArchExpandPseudoInsts.cpp
@@ -721,7 +721,7 @@ bool LoongArchExpandPseudo::expandFunctionCALL(
 IsTailCall ? LoongArch::PseudoJIRL_TAIL : LoongArch::PseudoJIRL_CALL;
 Register AddrReg = IsTailCall ? LoongArch::R19 : LoongArch::R1;
 
-bool UseGOT = Func.isGlobal() && !Func.getGlobal()->isDSOLocal();
+bool UseGOT = Func.getTargetFlags() == LoongArchII::MO_CALL_PLT;
 unsigned MO = UseGOT ? LoongArchII::MO_GOT_PC_HI : 
LoongArchII::MO_PCREL_LO;
 unsigned LAOpcode = UseGOT ? LoongArch::LDX_D : LoongArch::ADD_D;
 expandLargeAddressLoad(MBB, MBBI, NextMBBI, LAOpcode, MO, Func, AddrReg,
diff --git a/llvm/test/CodeGen/LoongArch/code-models.ll 
b/llvm/test/CodeGen/LoongArch/code-models.ll
index 4b2b72afaee171..4eb1e5e596fd3f 100644
--- a/llvm/test/CodeGen/LoongArch/code-models.ll
+++ b/llvm/test/CodeGen/LoongArch/code-models.ll
@@ -82,11 +82,11 @@ define void @call_external_sym(ptr %dst) {
 ; LARGE-NEXT:.cfi_offset 1, -8
 ; LARGE-NEXT:ori $a2, $zero, 1000
 ; LARGE-NEXT:move $a1, $zero
-; LARGE-NEXT:pcalau12i $ra, %pc_hi20(memset)
-; LARGE-NEXT:addi.d $t8, $zero, %pc_lo12(memset)
-; LARGE-NEXT:lu32i.d $t8, %pc64_lo20(memset)
-; LARGE-NEXT:lu52i.d $t8, $t8, %pc64_hi12(memset)
-; LARGE-NEXT:add.d $ra, $t8, $ra
+; LARGE-NEXT:pcalau12i $ra, %got_pc_hi20(memset)
+; LARGE-NEXT:addi.d $t8, $zero, %got_pc_lo12(memset)
+; LARGE-NEXT:lu32i.d $t8, %got64_pc_lo20(memset)
+; LARGE-NEXT:lu52i.d $t8, $t8, %got64_pc_hi12(memset)
+; LARGE-NEXT:ldx.d $ra, $t8, $ra
 ; LARGE-NEXT:jirl $ra, $ra, 0
 ; LARGE-NEXT:ld.d $ra, $sp, 8 # 8-byte Folded Reload
 ; LARGE-NEXT:addi.d $sp, $sp, 16
diff --git a/llvm/test/CodeGen/LoongArch/machinelicm-address-pseudos.ll 
b/llvm/test/CodeGen/LoongArch/machinelicm-address-pseudos.ll
index ed1a24e82b4e46..29348fe0d641ed 100644
--- a/llvm/test/CodeGen/LoongArch/machinelicm-address-pseudos.ll
+++ b/llvm/test/CodeGen/LoongArch/machinelicm-address-pseudos.ll
@@ -282,11 +282,11 @@ define void @test_la_tls_ld(i32 signext %n) {
 ; LA64LARGE-NEXT:  .LBB3_1: # %loop
 ; LA64LARGE-NEXT:# =>This Inner Loop Header: Depth=1
 ; LA64LARGE-NEXT:move $a0, $s0
-; LA64LARGE-NEXT:pcalau12i $ra, %pc_hi20(__tls_get_addr)
-; LA64LARGE-NEXT:addi.d $t8, $zero, %pc_lo12(__tls_get_addr)
-; LA64LARGE-NEXT:lu32i.d $t8, %pc64_lo20(__tls_get_addr)
-; LA64LARGE-NEXT:lu52i.d $t8, $t8, %pc64_hi12(__tls_get_addr)
-; LA64LARGE-NEXT:add.d $ra, $t8, $ra
+; LA64LARGE-NEXT:pcalau12i $ra, %got_pc_hi20(__tls_get_addr)
+; LA64LARGE-NEXT:addi.d $t8, $zero, %got_pc_lo12(__tls_get_addr)
+; LA64LARGE-NEXT:lu32i.d $t8, %got64_pc_lo20(__tls_get_addr)
+; LA64LARGE-NEXT:lu52i.d $t8, $t8, %got64_pc_hi12(__tls_get_addr)
+; LA64LARGE-NEXT:ldx.d $ra, $t8, $ra
 ; LA64LARGE-NEXT:jirl $ra, $ra, 0
 ; LA64LARGE-NEXT:ld.w $zero, $a0, 0
 ; LA64LARGE-NEXT:addi.w $s1, $s1, 1
@@ -448,11 +448,11 @@ define void @test_la_tls_gd(i32 signext %n) nounwind {
 ; LA64LARGE-NEXT:  .LBB5_1: # %loop
 ; LA64LARGE-NEXT:# =>This Inner Loop Header: Depth=1
 ; LA64LARGE-NEXT:move $a0, $s0
-; LA64LARGE-NEXT:pcalau12i $ra, %pc_hi20(__tls_get_addr)
-; LA64LARGE-NEXT:addi.d $t8, $zero, %pc_lo12(__tls_get_addr)
-; LA64LARGE-NEXT:lu32i.d $t8, %pc64_lo20(__tls_get_addr)
-; LA64LARGE-NEXT:lu52i.d $t8, $t8, %pc64_hi12(__tls_get_addr)
-; LA64LARGE-NEXT:add.d $ra, $t8, $ra
+; LA64LARGE-NEXT:pcalau12i $ra, %got_pc_hi20(__tls_get_addr)
+; LA64LARGE-NEXT:addi.d $t8, $zero, %got_pc_lo12(__tls_get_addr)
+; LA64LARGE-NEXT:lu32i.d $t8, %got64_

[llvm-branch-commits] [llvm] [LoongArch] Fix GOT usage for `non-dso_local` function calls in large code model (PR #117134)

2024-11-21 Thread via llvm-branch-commits

https://github.com/wangleiat edited 
https://github.com/llvm/llvm-project/pull/117134
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/19.x: [LICM] allow MemoryAccess creation failure (#116813) (PR #117082)

2024-11-21 Thread via llvm-branch-commits

https://github.com/DianQK updated 
https://github.com/llvm/llvm-project/pull/117082

>From e3364b6e56999488106d990b5f0f907823afa42c Mon Sep 17 00:00:00 2001
From: DianQK 
Date: Wed, 20 Nov 2024 19:52:51 +0800
Subject: [PATCH] [LICM] allow MemoryAccess creation failure (#116813)

Fixes #116809.

After running some passes (SimpleLoopUnswitch, LoopInstSimplify, etc.),
MemorySSA might be outdated, and the instruction `I` may have become a
non-memory touching instruction.

LICM has already handled this, but it does not pass
`CreationMustSucceed=false` to `createDefinedAccess`.

(cherry picked from commit 18b02bbf441660683df7f3925946984203d49bab)
---
 llvm/include/llvm/Analysis/MemorySSAUpdater.h |  6 +++
 llvm/lib/Analysis/MemorySSAUpdater.cpp| 12 -
 llvm/lib/Transforms/Scalar/LICM.cpp   |  7 ++-
 .../LICM/PR116813-memoryssa-outdated.ll   | 50 +++
 4 files changed, 71 insertions(+), 4 deletions(-)
 create mode 100644 llvm/test/Transforms/LICM/PR116813-memoryssa-outdated.ll

diff --git a/llvm/include/llvm/Analysis/MemorySSAUpdater.h 
b/llvm/include/llvm/Analysis/MemorySSAUpdater.h
index d4da3ef1146db7..015a652f309c56 100644
--- a/llvm/include/llvm/Analysis/MemorySSAUpdater.h
+++ b/llvm/include/llvm/Analysis/MemorySSAUpdater.h
@@ -192,6 +192,12 @@ class MemorySSAUpdater {
const BasicBlock *BB,
MemorySSA::InsertionPlace Point);
 
+  MemoryAccess *createMemoryAccessInBB2(Instruction *I,
+MemoryAccess *Definition,
+const BasicBlock *BB,
+MemorySSA::InsertionPlace Point,
+bool CreationMustSucceed = true);
+
   /// Create a MemoryAccess in MemorySSA before an existing MemoryAccess.
   ///
   /// See createMemoryAccessInBB() for usage details.
diff --git a/llvm/lib/Analysis/MemorySSAUpdater.cpp 
b/llvm/lib/Analysis/MemorySSAUpdater.cpp
index aa550f0b6a7bfd..c84b31a3a9374d 100644
--- a/llvm/lib/Analysis/MemorySSAUpdater.cpp
+++ b/llvm/lib/Analysis/MemorySSAUpdater.cpp
@@ -1404,8 +1404,16 @@ void MemorySSAUpdater::changeToUnreachable(const 
Instruction *I) {
 MemoryAccess *MemorySSAUpdater::createMemoryAccessInBB(
 Instruction *I, MemoryAccess *Definition, const BasicBlock *BB,
 MemorySSA::InsertionPlace Point) {
-  MemoryUseOrDef *NewAccess = MSSA->createDefinedAccess(I, Definition);
-  MSSA->insertIntoListsForBlock(NewAccess, BB, Point);
+  return createMemoryAccessInBB2(I, Definition, BB, Point);
+}
+
+MemoryAccess *MemorySSAUpdater::createMemoryAccessInBB2(
+Instruction *I, MemoryAccess *Definition, const BasicBlock *BB,
+MemorySSA::InsertionPlace Point, bool CreationMustSucceed) {
+  MemoryUseOrDef *NewAccess = MSSA->createDefinedAccess(
+  I, Definition, /*Template=*/nullptr, CreationMustSucceed);
+  if (NewAccess)
+MSSA->insertIntoListsForBlock(NewAccess, BB, Point);
   return NewAccess;
 }
 
diff --git a/llvm/lib/Transforms/Scalar/LICM.cpp 
b/llvm/lib/Transforms/Scalar/LICM.cpp
index 91ef2b4b7c1839..102a5bd5bbb88b 100644
--- a/llvm/lib/Transforms/Scalar/LICM.cpp
+++ b/llvm/lib/Transforms/Scalar/LICM.cpp
@@ -1464,8 +1464,11 @@ static Instruction *cloneInstructionInExitBlock(
 
   if (MSSAU.getMemorySSA()->getMemoryAccess(&I)) {
 // Create a new MemoryAccess and let MemorySSA set its defining access.
-MemoryAccess *NewMemAcc = MSSAU.createMemoryAccessInBB(
-New, nullptr, New->getParent(), MemorySSA::Beginning);
+// After running some passes, MemorySSA might be outdated, and the
+// instruction `I` may have become a non-memory touching instruction.
+MemoryAccess *NewMemAcc = MSSAU.createMemoryAccessInBB2(
+New, nullptr, New->getParent(), MemorySSA::Beginning,
+/*CreationMustSucceed=*/false);
 if (NewMemAcc) {
   if (auto *MemDef = dyn_cast(NewMemAcc))
 MSSAU.insertDef(MemDef, /*RenameUses=*/true);
diff --git a/llvm/test/Transforms/LICM/PR116813-memoryssa-outdated.ll 
b/llvm/test/Transforms/LICM/PR116813-memoryssa-outdated.ll
new file mode 100644
index 00..a040c3cc6947c6
--- /dev/null
+++ b/llvm/test/Transforms/LICM/PR116813-memoryssa-outdated.ll
@@ -0,0 +1,50 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py 
UTC_ARGS: --version 5
+; RUN: opt -passes='loop-mssa(simple-loop-unswitch,licm)' 
-verify-memoryssa -S < %s | FileCheck %s
+
+; Check that running LICM after SimpleLoopUnswitch does not result in a crash.
+
+define i32 @foo(i1 %arg, ptr %arg1) {
+; CHECK-LABEL: define i32 @foo(
+; CHECK-SAME: i1 [[ARG:%.*]], ptr [[ARG1:%.*]]) {
+; CHECK-NEXT:  [[START:.*:]]
+; CHECK-NEXT:[[ARG_FR:%.*]] = freeze i1 [[ARG]]
+; CHECK-NEXT:br i1 [[ARG_FR]], label %[[START_SPLIT_US:.*]], label 
%[[START_SPLIT:.*]]
+; CHECK:   [[START_SPLIT_US]]:
+; CHECK-NEXT:br label %[[LOOP_US:.*]]
+; CHECK:   [[LOOP_US]]:
+; CHEC

[llvm-branch-commits] [llvm] release/19.x: [ConstraintElim] Bail out on non-dedicated exits when adding exiting conditions (#116627) (PR #117137)

2024-11-21 Thread Florian Hahn via llvm-branch-commits

https://github.com/fhahn approved this pull request.

LGTM to cherry pick, thanks!

https://github.com/llvm/llvm-project/pull/117137
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang-tools-extra] 2a4a50d - Revert "[NFC] Explicitly pass a VFS when creating DiagnosticsEngine (#115852)"

2024-11-21 Thread via llvm-branch-commits

Author: Sylvestre Ledru
Date: 2024-11-21T07:04:23-05:00
New Revision: 2a4a50d85689bb2ac51258c485fceb64dfb6cd73

URL: 
https://github.com/llvm/llvm-project/commit/2a4a50d85689bb2ac51258c485fceb64dfb6cd73
DIFF: 
https://github.com/llvm/llvm-project/commit/2a4a50d85689bb2ac51258c485fceb64dfb6cd73.diff

LOG: Revert "[NFC] Explicitly pass a VFS when creating DiagnosticsEngine 
(#115852)"

This reverts commit bdd10d9d249bd1c2a45e3de56a5accd97e953458.

Added: 


Modified: 
clang-tools-extra/clang-include-fixer/IncludeFixer.cpp
clang-tools-extra/clangd/Compiler.cpp
clang-tools-extra/clangd/ModulesBuilder.cpp
clang-tools-extra/clangd/Preamble.cpp
clang-tools-extra/include-cleaner/unittests/RecordTest.cpp
clang/include/clang/Frontend/CompilerInstance.h
clang/lib/Frontend/CompilerInstance.cpp
clang/lib/Frontend/CreateInvocationFromCommandLine.cpp
clang/lib/Frontend/Rewrite/FrontendActions.cpp
clang/lib/Interpreter/Interpreter.cpp
clang/lib/StaticAnalyzer/Frontend/ModelInjector.cpp
clang/lib/Testing/TestAST.cpp
clang/lib/Tooling/DependencyScanning/DependencyScanningWorker.cpp
clang/lib/Tooling/Tooling.cpp
clang/tools/c-index-test/core_main.cpp
clang/tools/clang-import-test/clang-import-test.cpp
clang/tools/clang-installapi/ClangInstallAPI.cpp
clang/tools/clang-scan-deps/ClangScanDeps.cpp
clang/tools/diagtool/ShowEnabledWarnings.cpp
clang/tools/driver/cc1_main.cpp
clang/tools/libclang/CIndex.cpp
clang/tools/libclang/Indexing.cpp
clang/unittests/AST/ExternalASTSourceTest.cpp
clang/unittests/CodeGen/TestCompiler.h
clang/unittests/Driver/DXCModeTest.cpp
clang/unittests/Driver/ToolChainTest.cpp
clang/unittests/Frontend/ASTUnitTest.cpp
clang/unittests/Frontend/CodeGenActionTest.cpp
clang/unittests/Frontend/CompilerInstanceTest.cpp
clang/unittests/Frontend/CompilerInvocationTest.cpp
clang/unittests/Frontend/FrontendActionTest.cpp
clang/unittests/Frontend/OutputStreamTest.cpp
clang/unittests/Frontend/PCHPreambleTest.cpp
clang/unittests/Frontend/ReparseWorkingDirTest.cpp
clang/unittests/Frontend/UtilsTest.cpp
clang/unittests/Sema/SemaNoloadLookupTest.cpp
clang/unittests/Serialization/ForceCheckFileInputTest.cpp
clang/unittests/Serialization/ModuleCacheTest.cpp
clang/unittests/Serialization/NoCommentsTest.cpp
clang/unittests/Serialization/PreambleInNamedModulesTest.cpp
clang/unittests/Serialization/VarDeclConstantInitTest.cpp
clang/unittests/Support/TimeProfilerTest.cpp
clang/unittests/Tooling/DependencyScanning/DependencyScannerTest.cpp
clang/unittests/Tooling/ToolingTest.cpp

Removed: 




diff  --git a/clang-tools-extra/clang-include-fixer/IncludeFixer.cpp 
b/clang-tools-extra/clang-include-fixer/IncludeFixer.cpp
index bba8f8acc77da9..354f35cbadbeb9 100644
--- a/clang-tools-extra/clang-include-fixer/IncludeFixer.cpp
+++ b/clang-tools-extra/clang-include-fixer/IncludeFixer.cpp
@@ -95,8 +95,7 @@ bool IncludeFixerActionFactory::runInvocation(
 
   // Create the compiler's actual diagnostics engine. We want to drop all
   // diagnostics here.
-  Compiler.createDiagnostics(Files->getVirtualFileSystem(),
- new clang::IgnoringDiagConsumer,
+  Compiler.createDiagnostics(new clang::IgnoringDiagConsumer,
  /*ShouldOwnClient=*/true);
   Compiler.createSourceManager(*Files);
 

diff  --git a/clang-tools-extra/clangd/Compiler.cpp 
b/clang-tools-extra/clangd/Compiler.cpp
index 161cc9ae0ca365..c60ab8e1b8062a 100644
--- a/clang-tools-extra/clangd/Compiler.cpp
+++ b/clang-tools-extra/clangd/Compiler.cpp
@@ -110,8 +110,8 @@ buildCompilerInvocation(const ParseInputs &Inputs, 
clang::DiagnosticConsumer &D,
   CIOpts.VFS = Inputs.TFS->view(Inputs.CompileCommand.Directory);
   CIOpts.CC1Args = CC1Args;
   CIOpts.RecoverOnError = true;
-  CIOpts.Diags = CompilerInstance::createDiagnostics(
-  *CIOpts.VFS, new DiagnosticOptions, &D, false);
+  CIOpts.Diags =
+  CompilerInstance::createDiagnostics(new DiagnosticOptions, &D, false);
   CIOpts.ProbePrecompiled = false;
   std::unique_ptr CI = createInvocation(ArgStrs, CIOpts);
   if (!CI)
@@ -148,7 +148,7 @@ 
prepareCompilerInstance(std::unique_ptr CI,
   auto Clang = std::make_unique(
   std::make_shared());
   Clang->setInvocation(std::move(CI));
-  Clang->createDiagnostics(*VFS, &DiagsClient, false);
+  Clang->createDiagnostics(&DiagsClient, false);
 
   if (auto VFSWithRemapping = createVFSFromCompilerInvocation(
   Clang->getInvocation(), Clang->getDiagnostics(), VFS))

diff  --git a/clang-tools-extra/clangd/ModulesBuilder.cpp 
b/clang-tools-extra/clangd/ModulesBuilder.cpp
index 29508901f85bba..2bce3a20825616 100644
--- a/clang-tools-extra/clangd/ModulesBuilder.cpp
+++ b/clang-tools-extra/clangd/ModulesBuilder.cpp
@@ -188,8 +188,7 @@ bool IsModuleFileUpToDate(Pa

[llvm-branch-commits] [flang] [flang] handle fir.call in AliasAnalysis::getModRef (PR #117164)

2024-11-21 Thread via llvm-branch-commits

github-actions[bot] wrote:




:warning: Python code formatter, darker found issues in your code. :warning:



You can test this locally with the following command:


``bash
darker --check --diff -r 
c12869e010d892caf93d153c187db846ba995a9e...84c95d6c816004abe6c01eb754688fb35a666ffc
 flang/test/Analysis/AliasAnalysis/gen_mod_ref_test.py
``





View the diff from darker here.


``diff
--- gen_mod_ref_test.py 2024-11-21 13:14:25.00 +
+++ gen_mod_ref_test.py 2024-11-21 14:15:26.444588 +
@@ -11,8 +11,16 @@
 
 import sys
 import re
 
 for line in sys.stdin:
-  line = re.sub(r'(fir.call @_\w*P)(test_effect_\w*)(\(.*) : ', r'\1\2\3 
{test.ptr ="\2"} : ', line)
-  line = re.sub(r'(hlfir.declare .*uniq_name =.*E)(test_var_\w*)"', r'\1\2", 
test.ptr ="\2"', line)
-  sys.stdout.write(line)
+line = re.sub(
+r"(fir.call @_\w*P)(test_effect_\w*)(\(.*) : ",
+r'\1\2\3 {test.ptr ="\2"} : ',
+line,
+)
+line = re.sub(
+r'(hlfir.declare .*uniq_name =.*E)(test_var_\w*)"',
+r'\1\2", test.ptr ="\2"',
+line,
+)
+sys.stdout.write(line)

``




https://github.com/llvm/llvm-project/pull/117164
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/19.x: [MachineLICM] Don't allow hoisting invariant loads across mem barrier. (#116987) (PR #117154)

2024-11-21 Thread via llvm-branch-commits

https://github.com/llvmbot created 
https://github.com/llvm/llvm-project/pull/117154

Backport a9b3ec154d7ab2d0896ac5c9f1e9a1266a37be80 
ef102b4a6333a304e36dc623d5381257a7ef1ed6

Requested by: @fhahn

>From fccca51f3cdf8f918643b2afa0d410590e3acf95 Mon Sep 17 00:00:00 2001
From: Florian Hahn 
Date: Wed, 20 Nov 2024 15:10:19 +
Subject: [PATCH 1/2] [MachineLICM] Add test case showing load hoisted across
 memory barrier.

(cherry picked from commit a9b3ec154d7ab2d0896ac5c9f1e9a1266a37be80)
---
 .../AArch64/machine-licm-hoist-load.ll| 29 +++
 1 file changed, 29 insertions(+)

diff --git a/llvm/test/CodeGen/AArch64/machine-licm-hoist-load.ll 
b/llvm/test/CodeGen/AArch64/machine-licm-hoist-load.ll
index e8dafd5e8fbabe..932a5af264a000 100644
--- a/llvm/test/CodeGen/AArch64/machine-licm-hoist-load.ll
+++ b/llvm/test/CodeGen/AArch64/machine-licm-hoist-load.ll
@@ -497,6 +497,35 @@ for.exit: ; preds = 
%for.body
   ret i64 %spec.select
 }
 
+@a = external local_unnamed_addr global i32, align 4
+
+; FIXME: Load hoisted out of the loop across memory barriers.
+define i32 @load_between_memory_barriers() {
+; CHECK-LABEL: load_between_memory_barriers:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:adrp x8, :got:a
+; CHECK-NEXT:ldr x8, [x8, :got_lo12:a]
+; CHECK-NEXT:ldr w0, [x8]
+; CHECK-NEXT:  .LBB8_1: // %loop
+; CHECK-NEXT:// =>This Inner Loop Header: Depth=1
+; CHECK-NEXT://MEMBARRIER
+; CHECK-NEXT://MEMBARRIER
+; CHECK-NEXT:cbz w0, .LBB8_1
+; CHECK-NEXT:  // %bb.2: // %exit
+; CHECK-NEXT:ret
+  br label %loop
+
+loop:
+  fence syncscope("singlethread") acq_rel
+  %l = load i32, ptr @a, align 4
+  fence syncscope("singlethread") acq_rel
+  %c = icmp eq i32 %l, 0
+  br i1 %c, label %loop, label %exit
+
+exit:
+  ret i32 %l
+}
+
 declare i32 @bcmp(ptr, ptr, i64)
 declare i32 @memcmp(ptr, ptr, i64)
 declare void @func()

>From 4ed7f75167fc2979e1e63f33389bc6fdb617ea71 Mon Sep 17 00:00:00 2001
From: Florian Hahn 
Date: Thu, 21 Nov 2024 10:25:04 +
Subject: [PATCH 2/2] [MachineLICM] Don't allow hoisting invariant loads across
 mem barrier. (#116987)

The improvements in 63917e1 / #70796 do not check for memory
barriers/unmodelled sideeffects, which means we may incorrectly hoist
loads across memory barriers.

Fix this by checking any machine instruction in the loop is a load-fold
barrier.

PR: https://github.com/llvm/llvm-project/pull/116987
(cherry picked from commit ef102b4a6333a304e36dc623d5381257a7ef1ed6)
---
 llvm/lib/CodeGen/MachineLICM.cpp | 2 +-
 llvm/test/CodeGen/AArch64/machine-licm-hoist-load.ll | 4 ++--
 llvm/test/CodeGen/Mips/lcb5.ll   | 4 ++--
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/llvm/lib/CodeGen/MachineLICM.cpp b/llvm/lib/CodeGen/MachineLICM.cpp
index f24ab187ef4005..21a02a6f094784 100644
--- a/llvm/lib/CodeGen/MachineLICM.cpp
+++ b/llvm/lib/CodeGen/MachineLICM.cpp
@@ -1474,7 +1474,7 @@ void MachineLICMBase::InitializeLoadsHoistableLoops() {
   if (!AllowedToHoistLoads[Loop])
 continue;
   for (auto &MI : *MBB) {
-if (!MI.mayStore() && !MI.isCall() &&
+if (!MI.isLoadFoldBarrier() && !MI.mayStore() && !MI.isCall() &&
 !(MI.mayLoad() && MI.hasOrderedMemoryRef()))
   continue;
 for (MachineLoop *L = Loop; L != nullptr; L = L->getParentLoop())
diff --git a/llvm/test/CodeGen/AArch64/machine-licm-hoist-load.ll 
b/llvm/test/CodeGen/AArch64/machine-licm-hoist-load.ll
index 932a5af264a000..17f8263560430d 100644
--- a/llvm/test/CodeGen/AArch64/machine-licm-hoist-load.ll
+++ b/llvm/test/CodeGen/AArch64/machine-licm-hoist-load.ll
@@ -499,16 +499,16 @@ for.exit: ; preds = 
%for.body
 
 @a = external local_unnamed_addr global i32, align 4
 
-; FIXME: Load hoisted out of the loop across memory barriers.
+; Make sure the load is not hoisted out of the loop across memory barriers.
 define i32 @load_between_memory_barriers() {
 ; CHECK-LABEL: load_between_memory_barriers:
 ; CHECK:   // %bb.0:
 ; CHECK-NEXT:adrp x8, :got:a
 ; CHECK-NEXT:ldr x8, [x8, :got_lo12:a]
-; CHECK-NEXT:ldr w0, [x8]
 ; CHECK-NEXT:  .LBB8_1: // %loop
 ; CHECK-NEXT:// =>This Inner Loop Header: Depth=1
 ; CHECK-NEXT://MEMBARRIER
+; CHECK-NEXT:ldr w0, [x8]
 ; CHECK-NEXT://MEMBARRIER
 ; CHECK-NEXT:cbz w0, .LBB8_1
 ; CHECK-NEXT:  // %bb.2: // %exit
diff --git a/llvm/test/CodeGen/Mips/lcb5.ll b/llvm/test/CodeGen/Mips/lcb5.ll
index f320f6fc5660ce..bb059f1ee8453e 100644
--- a/llvm/test/CodeGen/Mips/lcb5.ll
+++ b/llvm/test/CodeGen/Mips/lcb5.ll
@@ -186,7 +186,7 @@ if.end:   ; preds = 
%if.then, %entry
 }
 
 ; ci:  .entz3
-; ci:  bteqz   $BB6_3
+; ci:  bteqz   $BB6_2
 ; ci:  .endz3
 
 ; Function Attrs: nounwind optsize
@@ -210,7 +210,7 @@ if.end:   ; preds = 
%if.then, %entry
 
 ; ci:  .entz4
 ; c

[llvm-branch-commits] [llvm] release/19.x: [MachineLICM] Don't allow hoisting invariant loads across mem barrier. (#116987) (PR #117154)

2024-11-21 Thread via llvm-branch-commits

llvmbot wrote:

@david-arm What do you think about merging this PR to the release branch?

https://github.com/llvm/llvm-project/pull/117154
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/19.x: [MachineLICM] Don't allow hoisting invariant loads across mem barrier. (#116987) (PR #117154)

2024-11-21 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-aarch64

Author: None (llvmbot)


Changes

Backport a9b3ec154d7ab2d0896ac5c9f1e9a1266a37be80 
ef102b4a6333a304e36dc623d5381257a7ef1ed6

Requested by: @fhahn

---
Full diff: https://github.com/llvm/llvm-project/pull/117154.diff


3 Files Affected:

- (modified) llvm/lib/CodeGen/MachineLICM.cpp (+1-1) 
- (modified) llvm/test/CodeGen/AArch64/machine-licm-hoist-load.ll (+29) 
- (modified) llvm/test/CodeGen/Mips/lcb5.ll (+2-2) 


``diff
diff --git a/llvm/lib/CodeGen/MachineLICM.cpp b/llvm/lib/CodeGen/MachineLICM.cpp
index f24ab187ef4005..21a02a6f094784 100644
--- a/llvm/lib/CodeGen/MachineLICM.cpp
+++ b/llvm/lib/CodeGen/MachineLICM.cpp
@@ -1474,7 +1474,7 @@ void MachineLICMBase::InitializeLoadsHoistableLoops() {
   if (!AllowedToHoistLoads[Loop])
 continue;
   for (auto &MI : *MBB) {
-if (!MI.mayStore() && !MI.isCall() &&
+if (!MI.isLoadFoldBarrier() && !MI.mayStore() && !MI.isCall() &&
 !(MI.mayLoad() && MI.hasOrderedMemoryRef()))
   continue;
 for (MachineLoop *L = Loop; L != nullptr; L = L->getParentLoop())
diff --git a/llvm/test/CodeGen/AArch64/machine-licm-hoist-load.ll 
b/llvm/test/CodeGen/AArch64/machine-licm-hoist-load.ll
index e8dafd5e8fbabe..17f8263560430d 100644
--- a/llvm/test/CodeGen/AArch64/machine-licm-hoist-load.ll
+++ b/llvm/test/CodeGen/AArch64/machine-licm-hoist-load.ll
@@ -497,6 +497,35 @@ for.exit: ; preds = 
%for.body
   ret i64 %spec.select
 }
 
+@a = external local_unnamed_addr global i32, align 4
+
+; Make sure the load is not hoisted out of the loop across memory barriers.
+define i32 @load_between_memory_barriers() {
+; CHECK-LABEL: load_between_memory_barriers:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:adrp x8, :got:a
+; CHECK-NEXT:ldr x8, [x8, :got_lo12:a]
+; CHECK-NEXT:  .LBB8_1: // %loop
+; CHECK-NEXT:// =>This Inner Loop Header: Depth=1
+; CHECK-NEXT://MEMBARRIER
+; CHECK-NEXT:ldr w0, [x8]
+; CHECK-NEXT://MEMBARRIER
+; CHECK-NEXT:cbz w0, .LBB8_1
+; CHECK-NEXT:  // %bb.2: // %exit
+; CHECK-NEXT:ret
+  br label %loop
+
+loop:
+  fence syncscope("singlethread") acq_rel
+  %l = load i32, ptr @a, align 4
+  fence syncscope("singlethread") acq_rel
+  %c = icmp eq i32 %l, 0
+  br i1 %c, label %loop, label %exit
+
+exit:
+  ret i32 %l
+}
+
 declare i32 @bcmp(ptr, ptr, i64)
 declare i32 @memcmp(ptr, ptr, i64)
 declare void @func()
diff --git a/llvm/test/CodeGen/Mips/lcb5.ll b/llvm/test/CodeGen/Mips/lcb5.ll
index f320f6fc5660ce..bb059f1ee8453e 100644
--- a/llvm/test/CodeGen/Mips/lcb5.ll
+++ b/llvm/test/CodeGen/Mips/lcb5.ll
@@ -186,7 +186,7 @@ if.end:   ; preds = 
%if.then, %entry
 }
 
 ; ci:  .entz3
-; ci:  bteqz   $BB6_3
+; ci:  bteqz   $BB6_2
 ; ci:  .endz3
 
 ; Function Attrs: nounwind optsize
@@ -210,7 +210,7 @@ if.end:   ; preds = 
%if.then, %entry
 
 ; ci:  .entz4
 ; ci:  btnez   $BB7_1  # 16 bit inst
-; ci:  jal $BB7_3  # branch
+; ci:  jal $BB7_2  # branch
 ; ci:  nop
 ; ci: $BB7_1:
 ; ci:  .p2align2

``




https://github.com/llvm/llvm-project/pull/117154
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/19.x: [MachineLICM] Don't allow hoisting invariant loads across mem barrier. (#116987) (PR #117154)

2024-11-21 Thread via llvm-branch-commits

https://github.com/llvmbot milestoned 
https://github.com/llvm/llvm-project/pull/117154
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [CodeGen][NewPM] Port SpillPlacement analysis to NPM (PR #116618)

2024-11-21 Thread Akshat Oke via llvm-branch-commits

https://github.com/optimisan updated 
https://github.com/llvm/llvm-project/pull/116618

>From 6408bcec55deafbf767a417684c2bfe3dd251068 Mon Sep 17 00:00:00 2001
From: Akshat Oke 
Date: Mon, 18 Nov 2024 12:42:00 +
Subject: [PATCH 1/3] [CodeGen][NewPM] Port SpillPlacement analysis to NPM

---
 llvm/include/llvm/InitializePasses.h |  2 +-
 llvm/lib/CodeGen/RegAllocGreedy.cpp  |  6 +-
 llvm/lib/CodeGen/SpillPlacement.cpp  | 91 ++--
 llvm/lib/CodeGen/SpillPlacement.h| 52 +---
 4 files changed, 104 insertions(+), 47 deletions(-)

diff --git a/llvm/include/llvm/InitializePasses.h 
b/llvm/include/llvm/InitializePasses.h
index e883aae2758688..88bca2c75c9498 100644
--- a/llvm/include/llvm/InitializePasses.h
+++ b/llvm/include/llvm/InitializePasses.h
@@ -289,7 +289,7 @@ void initializeSinkingLegacyPassPass(PassRegistry &);
 void initializeSjLjEHPreparePass(PassRegistry &);
 void initializeSlotIndexesWrapperPassPass(PassRegistry &);
 void initializeSpeculativeExecutionLegacyPassPass(PassRegistry &);
-void initializeSpillPlacementPass(PassRegistry &);
+void initializeSpillPlacementWrapperLegacyPass(PassRegistry &);
 void initializeStackColoringLegacyPass(PassRegistry &);
 void initializeStackFrameLayoutAnalysisPassPass(PassRegistry &);
 void initializeStackMapLivenessPass(PassRegistry &);
diff --git a/llvm/lib/CodeGen/RegAllocGreedy.cpp 
b/llvm/lib/CodeGen/RegAllocGreedy.cpp
index 3542bfe18af46f..3fdf2d6e07a75f 100644
--- a/llvm/lib/CodeGen/RegAllocGreedy.cpp
+++ b/llvm/lib/CodeGen/RegAllocGreedy.cpp
@@ -162,7 +162,7 @@ INITIALIZE_PASS_DEPENDENCY(MachineLoopInfoWrapperPass)
 INITIALIZE_PASS_DEPENDENCY(VirtRegMapWrapperLegacy)
 INITIALIZE_PASS_DEPENDENCY(LiveRegMatrixWrapperLegacy)
 INITIALIZE_PASS_DEPENDENCY(EdgeBundlesWrapperLegacy)
-INITIALIZE_PASS_DEPENDENCY(SpillPlacement)
+INITIALIZE_PASS_DEPENDENCY(SpillPlacementWrapperLegacy)
 INITIALIZE_PASS_DEPENDENCY(MachineOptimizationRemarkEmitterPass)
 INITIALIZE_PASS_DEPENDENCY(RegAllocEvictionAdvisorAnalysis)
 INITIALIZE_PASS_DEPENDENCY(RegAllocPriorityAdvisorAnalysis)
@@ -217,7 +217,7 @@ void RAGreedy::getAnalysisUsage(AnalysisUsage &AU) const {
   AU.addRequired();
   AU.addPreserved();
   AU.addRequired();
-  AU.addRequired();
+  AU.addRequired();
   AU.addRequired();
   AU.addRequired();
   AU.addRequired();
@@ -2731,7 +2731,7 @@ bool RAGreedy::runOnMachineFunction(MachineFunction &mf) {
   ORE = &getAnalysis().getORE();
   Loops = &getAnalysis().getLI();
   Bundles = &getAnalysis().getEdgeBundles();
-  SpillPlacer = &getAnalysis();
+  SpillPlacer = &getAnalysis().getResult();
   DebugVars = &getAnalysis();
 
   initializeCSRCost();
diff --git a/llvm/lib/CodeGen/SpillPlacement.cpp 
b/llvm/lib/CodeGen/SpillPlacement.cpp
index 318e2b19322bb4..c9baabf6161d3a 100644
--- a/llvm/lib/CodeGen/SpillPlacement.cpp
+++ b/llvm/lib/CodeGen/SpillPlacement.cpp
@@ -44,17 +44,17 @@ using namespace llvm;
 
 #define DEBUG_TYPE "spill-code-placement"
 
-char SpillPlacement::ID = 0;
+char SpillPlacementWrapperLegacy::ID = 0;
 
-char &llvm::SpillPlacementID = SpillPlacement::ID;
+char &llvm::SpillPlacementID = SpillPlacementWrapperLegacy::ID;
 
-INITIALIZE_PASS_BEGIN(SpillPlacement, DEBUG_TYPE,
+INITIALIZE_PASS_BEGIN(SpillPlacementWrapperLegacy, DEBUG_TYPE,
   "Spill Code Placement Analysis", true, true)
 INITIALIZE_PASS_DEPENDENCY(EdgeBundlesWrapperLegacy)
-INITIALIZE_PASS_END(SpillPlacement, DEBUG_TYPE,
+INITIALIZE_PASS_END(SpillPlacementWrapperLegacy, DEBUG_TYPE,
 "Spill Code Placement Analysis", true, true)
 
-void SpillPlacement::getAnalysisUsage(AnalysisUsage &AU) const {
+void SpillPlacementWrapperLegacy::getAnalysisUsage(AnalysisUsage &AU) const {
   AU.setPreservesAll();
   AU.addRequired();
   AU.addRequiredTransitive();
@@ -189,32 +189,57 @@ struct SpillPlacement::Node {
   }
 };
 
-bool SpillPlacement::runOnMachineFunction(MachineFunction &mf) {
+bool SpillPlacementWrapperLegacy::runOnMachineFunction(MachineFunction &MF) {
+  auto *Bundles = &getAnalysis().getEdgeBundles();
+  auto *MBFI = &getAnalysis().getMBFI();
+
+  Impl.reset(new SpillPlacement(Bundles, MBFI));
+  Impl->run(MF);
+  return false;
+}
+
+AnalysisKey SpillPlacementAnalysis::Key;
+
+SpillPlacement
+SpillPlacementAnalysis::run(MachineFunction &MF,
+MachineFunctionAnalysisManager &MFAM) {
+  auto *Bundles = &MFAM.getResult(MF);
+  auto *MBFI = &MFAM.getResult(MF);
+  SpillPlacement Impl(Bundles, MBFI);
+  Impl.run(MF);
+  return Impl;
+}
+
+bool SpillPlacementAnalysis::Result::invalidate(
+MachineFunction &MF, const PreservedAnalyses &PA,
+MachineFunctionAnalysisManager::Invalidator &Inv) {
+  auto PAC = PA.getChecker();
+  return !(PAC.preserved() ||
+   PAC.preservedSet>()) ||
+ Inv.invalidate(MF, PA) ||
+ Inv.invalidate(MF, PA);
+}
+
+void SpillPlacement::arrayDeleter(Node *N) {
+  if (N)
+delete[] N;
+}
+
+void SpillPlacement::run(MachineFunction &mf) {
   MF = &m

[llvm-branch-commits] [flang] [flang] handle fir.call in AliasAnalysis::getModRef (PR #117164)

2024-11-21 Thread Tom Eccles via llvm-branch-commits


@@ -329,14 +341,92 @@ AliasResult AliasAnalysis::alias(Source lhsSrc, Source 
rhsSrc, mlir::Value lhs,
 // AliasAnalysis: getModRef
 
//===--===//
 
+static bool isSavedLocal(const fir::AliasAnalysis::Source &src) {
+  if (auto symRef = llvm::dyn_cast(src.origin.u)) {
+auto [nameKind, deconstruct] =
+fir::NameUniquer::deconstruct(symRef.getLeafReference().getValue());
+return nameKind == fir::NameUniquer::NameKind::VARIABLE &&
+   !deconstruct.procs.empty();
+  }
+  return false;
+}
+
+static bool isCallToFortranUserProcedure(fir::CallOp call) {
+  // TODO: indirect calls are excluded by these checks. Maybe some attribute is
+  // needed to flag user calls in this case.
+  if (fir::hasBindcAttr(call))
+return true;
+  if (std::optional callee = call.getCallee())
+return fir::NameUniquer::deconstruct(callee->getLeafReference().getValue())
+   .first == fir::NameUniquer::NameKind::PROCEDURE;
+  return false;
+}
+
+static ModRefResult getCallModRef(fir::CallOp call, mlir::Value var) {
+  // TODO: limit to Fortran functions??
+  // 1. Detect variables that can be accessed indirectly.
+  fir::AliasAnalysis aliasAnalysis;
+  fir::AliasAnalysis::Source varSrc = aliasAnalysis.getSource(var);
+  // If the variable is not a user variable, we cannot safely assume that
+  // Fortran semantics apply (e.g., a bare alloca/allocmem result may very well
+  // be placed in an allocatable/pointer descriptor and escape).
+
+  // All the logic bellows are based on Fortran semantics and only holds if 
this
+  // is a call to a procedure form the Fortran source and this is a variable
+  // from the Fortran source. Compiler generated temporaries or functions may
+  // not adhere to this semantic.
+  // TODO: add some opt-in or op-out mechanism for compiler generated temps.
+  // An example of something currently problematic is the allocmem generated 
for
+  // ALLOCATE of allocatable target. It currently does not have the target
+  // attribute, which would lead this analysis to believe it cannot escape.
+  if (!varSrc.isFortranUserVariable() || !isCallToFortranUserProcedure(call))
+return ModRefResult::getModAndRef();
+  // Pointer and target may have been captured.
+  if (varSrc.isTargetOrPointer())
+return ModRefResult::getModAndRef();
+  // Host associated variables may be addressed indirectly via an internal
+  // function call, whether the call is in the parent or an internal procedure.
+  // Note that the host associated/internal procedure may be referenced
+  // indirectly inside calls to non internal procedure. This is because 
internal
+  // procedures may be captured or passed. As this is tricky to analyze, always
+  // consider such variables may be accessed in any calls.
+  if (varSrc.kind == fir::AliasAnalysis::SourceKind::HostAssoc ||
+  varSrc.isCapturedInInternalProcedure)
+return ModRefResult::getModAndRef();
+  // At that stage, it has been ruled out that local (including the saved ones)
+  // and dummy cannot be indirectly accessed in the call.
+  if (varSrc.kind != fir::AliasAnalysis::SourceKind::Allocate &&
+  !varSrc.isDummyArgument()) {
+if (varSrc.kind != fir::AliasAnalysis::SourceKind::Global ||
+!isSavedLocal(varSrc))
+  return ModRefResult::getModAndRef();
+  }
+  // 2. Check if the variable is passed via the arguments.
+  for (auto arg : call.getArgs()) {
+if (fir::conformsWithPassByRef(arg.getType()) &&
+!aliasAnalysis.alias(arg, var).isNo()) {
+  // TODO: intent(in) would allow returning Ref here. This can be obtained
+  // in the func.func attributes for direct calls, but the module lookup is
+  // linear with the number of MLIR symbols, which would introduce a pseudo
+  // quadratic behavior num_calls * num_func.

tblah wrote:

I believe lookups in an `mlir::SymbolTable` are constant time. Constructing a 
SymbolTable is linear, but perhaps one could be re-used from a calling context. 
Or `fir::AliasAnalysis` could have a `LazySymbolTable` (`AbstractResult.cpp`). 

It is fine by me to leave this as a TODO in this PR and only attempt this if 
the optimization turns out to be useful on some real code.

https://github.com/llvm/llvm-project/pull/117164
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [flang] handle fir.call in AliasAnalysis::getModRef (PR #117164)

2024-11-21 Thread Tom Eccles via llvm-branch-commits

https://github.com/tblah edited https://github.com/llvm/llvm-project/pull/117164
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [flang] handle fir.call in AliasAnalysis::getModRef (PR #117164)

2024-11-21 Thread Tom Eccles via llvm-branch-commits


@@ -329,14 +341,92 @@ AliasResult AliasAnalysis::alias(Source lhsSrc, Source 
rhsSrc, mlir::Value lhs,
 // AliasAnalysis: getModRef
 
//===--===//
 
+static bool isSavedLocal(const fir::AliasAnalysis::Source &src) {
+  if (auto symRef = llvm::dyn_cast(src.origin.u)) {
+auto [nameKind, deconstruct] =
+fir::NameUniquer::deconstruct(symRef.getLeafReference().getValue());
+return nameKind == fir::NameUniquer::NameKind::VARIABLE &&
+   !deconstruct.procs.empty();
+  }
+  return false;
+}
+
+static bool isCallToFortranUserProcedure(fir::CallOp call) {
+  // TODO: indirect calls are excluded by these checks. Maybe some attribute is
+  // needed to flag user calls in this case.
+  if (fir::hasBindcAttr(call))
+return true;
+  if (std::optional callee = call.getCallee())
+return fir::NameUniquer::deconstruct(callee->getLeafReference().getValue())
+   .first == fir::NameUniquer::NameKind::PROCEDURE;
+  return false;
+}
+
+static ModRefResult getCallModRef(fir::CallOp call, mlir::Value var) {
+  // TODO: limit to Fortran functions??
+  // 1. Detect variables that can be accessed indirectly.
+  fir::AliasAnalysis aliasAnalysis;
+  fir::AliasAnalysis::Source varSrc = aliasAnalysis.getSource(var);
+  // If the variable is not a user variable, we cannot safely assume that
+  // Fortran semantics apply (e.g., a bare alloca/allocmem result may very well
+  // be placed in an allocatable/pointer descriptor and escape).
+
+  // All the logic bellows are based on Fortran semantics and only holds if 
this

tblah wrote:

```suggestion
  // All the logic bellow is based on Fortran semantics and only holds if this
```
nit

https://github.com/llvm/llvm-project/pull/117164
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [flang] handle fir.call in AliasAnalysis::getModRef (PR #117164)

2024-11-21 Thread Tom Eccles via llvm-branch-commits


@@ -329,14 +341,92 @@ AliasResult AliasAnalysis::alias(Source lhsSrc, Source 
rhsSrc, mlir::Value lhs,
 // AliasAnalysis: getModRef
 
//===--===//
 
+static bool isSavedLocal(const fir::AliasAnalysis::Source &src) {
+  if (auto symRef = llvm::dyn_cast(src.origin.u)) {
+auto [nameKind, deconstruct] =
+fir::NameUniquer::deconstruct(symRef.getLeafReference().getValue());
+return nameKind == fir::NameUniquer::NameKind::VARIABLE &&
+   !deconstruct.procs.empty();
+  }
+  return false;
+}
+
+static bool isCallToFortranUserProcedure(fir::CallOp call) {
+  // TODO: indirect calls are excluded by these checks. Maybe some attribute is
+  // needed to flag user calls in this case.
+  if (fir::hasBindcAttr(call))
+return true;
+  if (std::optional callee = call.getCallee())
+return fir::NameUniquer::deconstruct(callee->getLeafReference().getValue())
+   .first == fir::NameUniquer::NameKind::PROCEDURE;
+  return false;
+}
+
+static ModRefResult getCallModRef(fir::CallOp call, mlir::Value var) {
+  // TODO: limit to Fortran functions??
+  // 1. Detect variables that can be accessed indirectly.
+  fir::AliasAnalysis aliasAnalysis;
+  fir::AliasAnalysis::Source varSrc = aliasAnalysis.getSource(var);
+  // If the variable is not a user variable, we cannot safely assume that
+  // Fortran semantics apply (e.g., a bare alloca/allocmem result may very well
+  // be placed in an allocatable/pointer descriptor and escape).
+
+  // All the logic bellows are based on Fortran semantics and only holds if 
this
+  // is a call to a procedure form the Fortran source and this is a variable

tblah wrote:

```suggestion
  // is a call to a procedure from the Fortran source and this is a variable
```
nit

https://github.com/llvm/llvm-project/pull/117164
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [flang] handle fir.call in AliasAnalysis::getModRef (PR #117164)

2024-11-21 Thread Tom Eccles via llvm-branch-commits

https://github.com/tblah approved this pull request.

Looks great to me. I have reviewed that this does implement the language rules 
you mentioned in the description (which match my understanding). Please wait 
for Peter to check those before merging.

https://github.com/llvm/llvm-project/pull/117164
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [flang][OpenMP] Use new modifier code in ORDER and SCHEDULE clauses (PR #117081)

2024-11-21 Thread Krzysztof Parzyszek via llvm-branch-commits

https://github.com/kparzysz updated 
https://github.com/llvm/llvm-project/pull/117081

>From 43bdfcdb48328fcdfe762734bd5a4c1df3987c4b Mon Sep 17 00:00:00 2001
From: Krzysztof Parzyszek 
Date: Mon, 18 Nov 2024 13:01:30 -0600
Subject: [PATCH 1/2] [flang][OpenMP] Use new modifier code in ORDER and
 SCHEDULE clauses

This actually simplifies the AST node for the schedule clause: the two
allowed modifiers can be easily classified as the ordering-modifier and
the chunk-modifier during parsing without the need to create additional
classes.
---
 flang/examples/FeatureList/FeatureList.cpp| 13 ++-
 .../FlangOmpReport/FlangOmpReportVisitor.cpp  | 10 ++-
 .../FlangOmpReport/FlangOmpReportVisitor.h|  3 +-
 flang/include/flang/Parser/dump-parse-tree.h  | 17 ++--
 flang/include/flang/Parser/parse-tree.h   | 81 ---
 .../flang/Semantics/openmp-modifiers.h|  6 ++
 flang/lib/Lower/OpenMP/Clauses.cpp| 75 ++---
 flang/lib/Lower/OpenMP/Clauses.h  | 10 +++
 flang/lib/Parser/openmp-parsers.cpp   | 71 
 flang/lib/Parser/unparse.cpp  | 23 +++---
 flang/lib/Semantics/check-omp-structure.cpp   | 58 ++---
 flang/lib/Semantics/check-omp-structure.h |  2 -
 flang/lib/Semantics/openmp-modifiers.cpp  | 48 +++
 flang/test/Parser/OpenMP/order-clause01.f90   | 50 ++--
 14 files changed, 263 insertions(+), 204 deletions(-)

diff --git a/flang/examples/FeatureList/FeatureList.cpp 
b/flang/examples/FeatureList/FeatureList.cpp
index 753ecb918a9ccb..e1c42586c62c94 100644
--- a/flang/examples/FeatureList/FeatureList.cpp
+++ b/flang/examples/FeatureList/FeatureList.cpp
@@ -505,9 +505,9 @@ struct NodeVisitor {
   READ_FEATURE(OmpObject)
   READ_FEATURE(OmpObjectList)
   READ_FEATURE(OmpOrderClause)
-  READ_FEATURE(OmpOrderClause::Type)
+  READ_FEATURE(OmpOrderClause::Ordering)
   READ_FEATURE(OmpOrderModifier)
-  READ_FEATURE(OmpOrderModifier::Kind)
+  READ_FEATURE(OmpOrderModifier::Value)
   READ_FEATURE(OmpProcBindClause)
   READ_FEATURE(OmpProcBindClause::Type)
   READ_FEATURE(OmpReductionClause)
@@ -527,11 +527,10 @@ struct NodeVisitor {
   READ_FEATURE(OmpDeviceClause::DeviceModifier)
   READ_FEATURE(OmpDeviceTypeClause)
   READ_FEATURE(OmpDeviceTypeClause::Type)
-  READ_FEATURE(OmpScheduleModifier)
-  READ_FEATURE(OmpScheduleModifier::Modifier1)
-  READ_FEATURE(OmpScheduleModifier::Modifier2)
-  READ_FEATURE(OmpScheduleModifierType)
-  READ_FEATURE(OmpScheduleModifierType::ModType)
+  READ_FEATURE(OmpChunkModifier)
+  READ_FEATURE(OmpChunkModifier::Value)
+  READ_FEATURE(OmpOrderingModifier)
+  READ_FEATURE(OmpOrderingModifier::Value)
   READ_FEATURE(OmpSectionBlocks)
   READ_FEATURE(OmpSectionsDirective)
   READ_FEATURE(OmpSimpleStandaloneDirective)
diff --git a/flang/examples/FlangOmpReport/FlangOmpReportVisitor.cpp 
b/flang/examples/FlangOmpReport/FlangOmpReportVisitor.cpp
index a9ff163f8243ce..a3d9b0cfdc79b8 100644
--- a/flang/examples/FlangOmpReport/FlangOmpReportVisitor.cpp
+++ b/flang/examples/FlangOmpReport/FlangOmpReportVisitor.cpp
@@ -213,14 +213,18 @@ void OpenMPCounterVisitor::Post(const 
OmpVariableCategory::Value &c) {
   "variable_category=" + std::string{OmpVariableCategory::EnumToString(c)} 
+
   ";";
 }
-void OpenMPCounterVisitor::Post(const OmpScheduleModifierType::ModType &c) {
+void OpenMPCounterVisitor::Post(const OmpChunkModifier::Value &c) {
   clauseDetails +=
-  "modifier=" + std::string{OmpScheduleModifierType::EnumToString(c)} + 
";";
+  "modifier=" + std::string{OmpChunkModifier::EnumToString(c)} + ";";
 }
 void OpenMPCounterVisitor::Post(const OmpLinearModifier::Value &c) {
   clauseDetails +=
   "modifier=" + std::string{OmpLinearModifier::EnumToString(c)} + ";";
 }
+void OpenMPCounterVisitor::Post(const OmpOrderingModifier::Value &c) {
+  clauseDetails +=
+  "modifier=" + std::string{OmpOrderingModifier::EnumToString(c)} + ";";
+}
 void OpenMPCounterVisitor::Post(const OmpTaskDependenceType::Value &c) {
   clauseDetails +=
   "type=" + std::string{OmpTaskDependenceType::EnumToString(c)} + ";";
@@ -228,7 +232,7 @@ void OpenMPCounterVisitor::Post(const 
OmpTaskDependenceType::Value &c) {
 void OpenMPCounterVisitor::Post(const OmpMapClause::Type &c) {
   clauseDetails += "type=" + std::string{OmpMapClause::EnumToString(c)} + ";";
 }
-void OpenMPCounterVisitor::Post(const OmpScheduleClause::ScheduleType &c) {
+void OpenMPCounterVisitor::Post(const OmpScheduleClause::Kind &c) {
   clauseDetails +=
   "type=" + std::string{OmpScheduleClause::EnumToString(c)} + ";";
 }
diff --git a/flang/examples/FlangOmpReport/FlangOmpReportVisitor.h 
b/flang/examples/FlangOmpReport/FlangOmpReportVisitor.h
index 83bd3644577e1c..608cb5a2241b83 100644
--- a/flang/examples/FlangOmpReport/FlangOmpReportVisitor.h
+++ b/flang/examples/FlangOmpReport/FlangOmpReportVisitor.h
@@ -71,8 +71,9 @@ struct OpenMPCounterVisitor {
   void Post(const OmpDefaultmapClause::Implici

[llvm-branch-commits] [flang] [flang] handle fir.call in AliasAnalysis::getModRef (PR #117164)

2024-11-21 Thread Razvan Lupusoru via llvm-branch-commits

https://github.com/razvanlupusoru approved this pull request.

Looks amazing! I agree with the various limitations and as far as I can tell - 
the non-implemented TODOs are not a correctness problem - just a limitation.

Do you have plans to add support for Fortran runtime calls also? I think a 
similar approach as your check for escaping args would work conservatively for 
them as well.

https://github.com/llvm/llvm-project/pull/117164
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [flang] handle fir.call in AliasAnalysis::getModRef (PR #117164)

2024-11-21 Thread via llvm-branch-commits

https://github.com/jeanPerier created 
https://github.com/llvm/llvm-project/pull/117164

fir.call side effects are hard to describe in a useful way using 
`MemoryEffectOpInterface` because it is impossible to list which memory 
location a user procedure read/write without doing a data flow analysis of its 
body (even PURE procedures may read from any module variable, Fortran SIMPLE 
procedure from F2023 will allow that, but they are far from common at that 
point).

While doing a data flow analysis is likely unavoidable at some point, it will 
not address cases where the procedure body is not available in the current 
compilation unit, and will be rather expansive to do.

Luckily, Fortran language specifications allow the compiler to deduce that a 
procedure call cannot access a variable in many cases (this mainly stems from 
the 15.5.2.14 restrictions about dummy argument, and the inability to capture 
variables that do not have the TARGET attribute).

MLIR provides the perfect interface to leverages that: 
`AliasAnalysis::getModRef(mlir::Operation*op, mlir::Value location)`. This 
interface allows telling whether `op` may reference or modify the memory 
"location".

This patch extends `fir::AliasAnalysis::getModRef` to deal with fir.call. The 
cost is reasonable: "number of arguments" * "average(memory SSA defining-op 
chain depth)".
It is currently very conservative and will only apply Fortran rules if:
1. It was able to find [hl]fir.declare for a Fortran variable from the source 
in the SSA defining-op chain depth starting from "location".
2. The fir.call is a direct call to a procedure from the Fortran source (not a 
runtime or compiler generated function).

It then:
1. Try to rule out any indirect access to "location" inside the procedure 
(location must not: have the POINTER/TARGET attributes, or a be host procedure 
variable used in an internal procedure, or be a module variable, or be in a 
common block).
2. Try to rule out any access via the arguments (Must not alias with any of the 
arguments. The cases where the access would be made via some pointer inside the 
data passed by argument is covered by the fact that the location must not be a 
POINTER/TARGET).

Currently, it is always replying "ModRef" (may be referenced or modified) or 
"NoModRef" (may nor be referenced neither modified). This could be refined in 
the future to reply "Ref" for the cases where the only access is made via 
"Intent(IN)" argument.

It also inherits a lot of "false positive cases" coming from alias analysis 
current limitations (e.g., any copy-in/out on an arguments will make it return 
"ModRef" because alias analysis currently does not handle hlfir.copy_in in the 
SSA chain). These will be improved with time.

@klausler, I am adding you as a reviewer for the Fortran test (not the 
implementation) because it is very important that I am getting the language 
specifications correct here.

Any `! CHECK: function_name -> variable_name#0 : ModRef` lines in the test are 
verifying that the optimizer considers that, in the FIR representation of the 
Fortran code right above, `call function_name()` may access/modify the variable 
`variable_name` (from the scope of the call). If `NoModRef` is used instead of 
`ModRef`, the optimizer considers the variable cannot be accessed/modified. 
Please flag any expectations where you disagree (especially bad `NoModRef`, 
which would be bugs, while bad "ModRef" will only cause missing optimization 
opportunities).

This will allow implementing "array = array_function()" optimization in a 
future patch.

>From 84c95d6c816004abe6c01eb754688fb35a666ffc Mon Sep 17 00:00:00 2001
From: Jean Perier 
Date: Wed, 20 Nov 2024 05:44:28 -0800
Subject: [PATCH] [flang] handle fir.call in getModRef

---
 .../flang/Optimizer/Analysis/AliasAnalysis.h  |  11 +-
 .../Dialect/FortranVariableInterface.td   |   7 +
 .../lib/Optimizer/Analysis/AliasAnalysis.cpp  | 111 +-
 flang/lib/Optimizer/Analysis/CMakeLists.txt   |   1 +
 .../lib/Optimizer/Transforms/AddAliasTags.cpp |   5 +-
 .../AliasAnalysis/gen_mod_ref_test.py |  18 +++
 .../modref-call-after-inlining.fir|  45 ++
 .../AliasAnalysis/modref-call-args.f90|  62 
 .../AliasAnalysis/modref-call-dummies.f90 |  53 +++
 .../AliasAnalysis/modref-call-equivalence.f90 |  34 +
 .../AliasAnalysis/modref-call-globals.f90 |  68 +
 .../modref-call-internal-proc.f90 | 135 ++
 .../AliasAnalysis/modref-call-locals.f90  |  52 +++
 .../AliasAnalysis/modref-call-not-fortran.fir |  25 
 14 files changed, 614 insertions(+), 13 deletions(-)
 create mode 100755 flang/test/Analysis/AliasAnalysis/gen_mod_ref_test.py
 create mode 100644 
flang/test/Analysis/AliasAnalysis/modref-call-after-inlining.fir
 create mode 100644 flang/test/Analysis/AliasAnalysis/modref-call-args.f90
 create mode 100644 flang/test/Analysis/AliasAnalysis/modref-call-dummies.f90
 create mode 1

[llvm-branch-commits] [flang] [flang] handle fir.call in AliasAnalysis::getModRef (PR #117164)

2024-11-21 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-flang-fir-hlfir

Author: None (jeanPerier)


Changes

fir.call side effects are hard to describe in a useful way using 
`MemoryEffectOpInterface` because it is impossible to list which memory 
location a user procedure read/write without doing a data flow analysis of its 
body (even PURE procedures may read from any module variable, Fortran SIMPLE 
procedure from F2023 will allow that, but they are far from common at that 
point).

While doing a data flow analysis is likely unavoidable at some point, it will 
not address cases where the procedure body is not available in the current 
compilation unit, and will be rather expansive to do.

Luckily, Fortran language specifications allow the compiler to deduce that a 
procedure call cannot access a variable in many cases (this mainly stems from 
the 15.5.2.14 restrictions about dummy argument, and the inability to capture 
variables that do not have the TARGET attribute).

MLIR provides the perfect interface to leverages that: 
`AliasAnalysis::getModRef(mlir::Operation*op, mlir::Value location)`. This 
interface allows telling whether `op` may reference or modify the memory 
"location".

This patch extends `fir::AliasAnalysis::getModRef` to deal with fir.call. The 
cost is reasonable: "number of arguments" * "average(memory SSA defining-op 
chain depth)".
It is currently very conservative and will only apply Fortran rules if:
1. It was able to find [hl]fir.declare for a Fortran variable from the source 
in the SSA defining-op chain depth starting from "location".
2. The fir.call is a direct call to a procedure from the Fortran source (not a 
runtime or compiler generated function).

It then:
1. Try to rule out any indirect access to "location" inside the procedure 
(location must not: have the POINTER/TARGET attributes, or a be host procedure 
variable used in an internal procedure, or be a module variable, or be in a 
common block).
2. Try to rule out any access via the arguments (Must not alias with any of the 
arguments. The cases where the access would be made via some pointer inside the 
data passed by argument is covered by the fact that the location must not be a 
POINTER/TARGET).

Currently, it is always replying "ModRef" (may be referenced or modified) or 
"NoModRef" (may nor be referenced neither modified). This could be refined in 
the future to reply "Ref" for the cases where the only access is made via 
"Intent(IN)" argument.

It also inherits a lot of "false positive cases" coming from alias analysis 
current limitations (e.g., any copy-in/out on an arguments will make it return 
"ModRef" because alias analysis currently does not handle hlfir.copy_in in the 
SSA chain). These will be improved with time.

@klausler, I am adding you as a reviewer for the Fortran test (not the 
implementation) because it is very important that I am getting the language 
specifications correct here.

Any `! CHECK: function_name -> variable_name#0 : ModRef` lines in 
the test are verifying that the optimizer considers that, in the FIR 
representation of the Fortran code right above, `call function_name()` may 
access/modify the variable `variable_name` (from the scope of the call). If 
`NoModRef` is used instead of `ModRef`, the optimizer considers the variable 
cannot be accessed/modified. Please flag any expectations where you disagree 
(especially bad `NoModRef`, which would be bugs, while bad "ModRef" will only 
cause missing optimization opportunities).

This will allow implementing "array = array_function()" optimization in a 
future patch.

---

Patch is 33.65 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/117164.diff


14 Files Affected:

- (modified) flang/include/flang/Optimizer/Analysis/AliasAnalysis.h (+8-3) 
- (modified) flang/include/flang/Optimizer/Dialect/FortranVariableInterface.td 
(+7) 
- (modified) flang/lib/Optimizer/Analysis/AliasAnalysis.cpp (+104-7) 
- (modified) flang/lib/Optimizer/Analysis/CMakeLists.txt (+1) 
- (modified) flang/lib/Optimizer/Transforms/AddAliasTags.cpp (+2-3) 
- (added) flang/test/Analysis/AliasAnalysis/gen_mod_ref_test.py (+18) 
- (added) flang/test/Analysis/AliasAnalysis/modref-call-after-inlining.fir 
(+45) 
- (added) flang/test/Analysis/AliasAnalysis/modref-call-args.f90 (+62) 
- (added) flang/test/Analysis/AliasAnalysis/modref-call-dummies.f90 (+53) 
- (added) flang/test/Analysis/AliasAnalysis/modref-call-equivalence.f90 (+34) 
- (added) flang/test/Analysis/AliasAnalysis/modref-call-globals.f90 (+68) 
- (added) flang/test/Analysis/AliasAnalysis/modref-call-internal-proc.f90 
(+135) 
- (added) flang/test/Analysis/AliasAnalysis/modref-call-locals.f90 (+52) 
- (added) flang/test/Analysis/AliasAnalysis/modref-call-not-fortran.fir (+25) 


``diff
diff --git a/flang/include/flang/Optimizer/Analysis/AliasAnalysis.h 
b/flang/include/flang/Optimizer/Analysis/AliasAnalysis.h
index d9953f580f401d..e410831c0fc3eb 100644
--- a/flang/include

[llvm-branch-commits] [clang] [llvm] AMDGPU: Shrink used number of registers for mfma scale based on format (PR #117047)

2024-11-21 Thread Matt Arsenault via llvm-branch-commits

arsenm wrote:

### Merge activity

* **Nov 21, 11:47 AM EST**: A user started a stack merge that includes this 
pull request via 
[Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/117047).


https://github.com/llvm/llvm-project/pull/117047
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Optimize mfma_scale intrinsics with 0 inputs (PR #116724)

2024-11-21 Thread Matt Arsenault via llvm-branch-commits

arsenm wrote:

### Merge activity

* **Nov 21, 11:47 AM EST**: A user started a stack merge that includes this 
pull request via 
[Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/116724).


https://github.com/llvm/llvm-project/pull/116724
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [flang] handle fir.call in AliasAnalysis::getModRef (PR #117164)

2024-11-21 Thread Peter Klausler via llvm-branch-commits

https://github.com/klausler approved this pull request.


https://github.com/llvm/llvm-project/pull/117164
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [flang] handle fir.call in AliasAnalysis::getModRef (PR #117164)

2024-11-21 Thread Peter Klausler via llvm-branch-commits


@@ -0,0 +1,135 @@
+! RUN: bbc -emit-hlfir %s -o - | %python %S/gen_mod_ref_test.py | \
+! RUN:  fir-opt 
-pass-pipeline='builtin.module(func.func(test-fir-alias-analysis-modref))' \
+! RUN:  --mlir-disable-threading -o /dev/null 2>&1 | FileCheck %s
+
+! Test fir.call modref with internal procedures
+
+subroutine simple_modref_test(test_var_x)
+  implicit none
+  real :: test_var_x
+  call test_effect_internal()
+contains
+  subroutine test_effect_internal()
+test_var_x = 0.
+  end subroutine
+end subroutine
+! CHECK-LABEL: Testing : "_QPsimple_modref_test"
+! CHECK: test_effect_internal -> test_var_x#0: ModRef
+
+subroutine simple_nomodref_test(test_var_x)
+  implicit none
+  real :: test_var_x
+  call test_effect_internal()
+contains
+  subroutine test_effect_internal()
+call some_external()
+  end subroutine
+end subroutine
+! CHECK-LABEL: Testing : "_QPsimple_nomodref_test"
+! CHECK: test_effect_internal -> test_var_x#0: NoModRef
+
+! Test that effects on captured variable are propagated to associated variables
+! in associate construct.
+
+subroutine test_associate()
+  implicit none
+  real :: test_var_x(10)
+  associate (test_var_y=>test_var_x)
+test_var_y = test_effect_internal()

klausler wrote:

Is it necessary for this test that `test_var_y` be the LHS of the assignment 
statement?  You would expect ModRef even if it were not modified by/after the 
call.  Might be more clear if the result were stored elsewhere.

https://github.com/llvm/llvm-project/pull/117164
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [MLIR][OpenMP] Add Lowering support for OpenMP Declare Mapper directive (PR #117046)

2024-11-21 Thread Akash Banerjee via llvm-branch-commits


@@ -21,7 +21,7 @@ subroutine declare_mapper_1
   type (my_type2):: t
   real   :: x, y(nvals)
   !$omp declare mapper (my_type :: var) map (var, var%values (1:var%num_vals))
-!CHECK: not yet implemented: OpenMPDeclareMapperConstruct
+!CHECK: not yet implemented: lowering symbol to HLFIR

TIFitis wrote:

This error is now from an unhandled form of map clause rather than declare 
mapper. As such, I believe it's out of scope for this PR.

I will however subsequently look into fixing it in a separate PR, hope that 
doesn't hold up this PR.

https://github.com/llvm/llvm-project/pull/117046
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Add a baseline, non-comprehensive test for scaled mfma hazards (PR #117055)

2024-11-21 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/117055

>From a0485e65e1c41a3113b68b7c4c3456f7d9337f97 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Mon, 4 Mar 2024 17:36:33 +0530
Subject: [PATCH] AMDGPU: Add a baseline, non-comprehensive test for scaled
 mfma hazards

Add some tests which will demonstrate that we treat the number of cycles
differently depending on whether the first matrix uses an f8 format.
---
 .../CodeGen/AMDGPU/mai-hazards-gfx940.mir |   2 +-
 .../AMDGPU/mai-hazards-mfma-scale.gfx950.mir  | 274 ++
 2 files changed, 275 insertions(+), 1 deletion(-)
 create mode 100644 llvm/test/CodeGen/AMDGPU/mai-hazards-mfma-scale.gfx950.mir

diff --git a/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir 
b/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir
index a98b02d792d984..9681b01f334f9a 100644
--- a/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir
+++ b/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir
@@ -2199,7 +2199,7 @@ name:
xdl_mfma_4pass_write_vgpr_sgemm_mfma_read_overlap_srcb
 body: |
   bb.0:
 $vgpr0_vgpr1_vgpr2_vgpr3 = V_MFMA_F32_16X16X16F16_vgprcd_e64 $vgpr4_vgpr5, 
$vgpr6_vgpr7, $vgpr0_vgpr1_vgpr2_vgpr3, 1, 2, 3, implicit $mode, implicit $exec
-$vgpr0_vgpr1_vgpr2_vgpr3 = V_MFMA_F32_4X4X1F32_vgprcd_e64 $vgpr8, $vgpr1, 
$vgpr6_vgpr7_vgpr8_vgpr9, 0, 0, 0, implicit $mode, implicit $exec
+$vgpr0_vgpr1_vgpr2_vgpr3 = V_MFMA_F32_4X4X1F32_vgprcd_e64 $vgpr8, $vgpr1, 
$vgpr2_vgpr3_vgpr4_vgpr5, 0, 0, 0, implicit $mode, implicit $exec
 
 ...
 
diff --git a/llvm/test/CodeGen/AMDGPU/mai-hazards-mfma-scale.gfx950.mir 
b/llvm/test/CodeGen/AMDGPU/mai-hazards-mfma-scale.gfx950.mir
new file mode 100644
index 00..c0f0482debbcb3
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/mai-hazards-mfma-scale.gfx950.mir
@@ -0,0 +1,274 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py 
UTC_ARGS: --version 4
+# RUN: llc -march=amdgcn -mcpu=gfx950 -verify-machineinstrs -run-pass 
post-RA-hazard-rec %s -o - | FileCheck -check-prefix=GCN %s
+
+# Immediate operand order = cbsz, abid, blgp
+
+# First MFMA uses f8 format, so should be treated as 32 cycles
+---
+name:
V_MFMA_F32_16X16X128_F8F6F4_vgprcd_e64___xdl_write_vgpr__cbsz0_blgp0xdl_read_overlap_vgpr_srcC
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5, $vgpr6, $vgpr7, 
$vgpr8, $vgpr9, $vgpr10, $vgpr11, $vgpr12, $vgpr13, $vgpr14, $vgpr15, $vgpr16, 
$vgpr17, $vgpr18, $vgpr19, $vgpr20, $vgpr21
+
+; GCN-LABEL: name: 
V_MFMA_F32_16X16X128_F8F6F4_vgprcd_e64___xdl_write_vgpr__cbsz0_blgp0xdl_read_overlap_vgpr_srcC
+; GCN: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5, $vgpr6, 
$vgpr7, $vgpr8, $vgpr9, $vgpr10, $vgpr11, $vgpr12, $vgpr13, $vgpr14, $vgpr15, 
$vgpr16, $vgpr17, $vgpr18, $vgpr19, $vgpr20, $vgpr21
+; GCN-NEXT: {{  $}}
+; GCN-NEXT: renamable $vgpr0_vgpr1_vgpr2_vgpr3 = nofpexcept 
V_MFMA_F32_16X16X128_F8F6F4_vgprcd_e64 
$vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11, 
$vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17_vgpr18_vgpr19, killed 
$vgpr0_vgpr1_vgpr2_vgpr3, 0, 0, 0, implicit $mode, implicit $exec
+; GCN-NEXT: S_NOP 1
+; GCN-NEXT: renamable $vgpr0_vgpr1_vgpr2_vgpr3 = nofpexcept 
V_MFMA_F32_16X16X128_F8F6F4_vgprcd_e64 killed 
$vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11, killed 
$vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17_vgpr18_vgpr19, killed 
$vgpr2_vgpr3_vgpr4_vgpr5, 0, 0, 0, implicit $mode, implicit $exec
+; GCN-NEXT: S_SETPC_B64_return undef $sgpr30_sgpr31, implicit $vgpr0, 
implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
+renamable $vgpr0_vgpr1_vgpr2_vgpr3 = nofpexcept 
V_MFMA_F32_16X16X128_F8F6F4_vgprcd_e64 
$vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11, 
$vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17_vgpr18_vgpr19, killed 
$vgpr0_vgpr1_vgpr2_vgpr3, 0, 0, 0, implicit $mode, implicit $exec
+renamable $vgpr0_vgpr1_vgpr2_vgpr3 = nofpexcept 
V_MFMA_F32_16X16X128_F8F6F4_vgprcd_e64 killed 
$vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11, killed 
$vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17_vgpr18_vgpr19, killed 
$vgpr2_vgpr3_vgpr4_vgpr5, 0, 0, 0, implicit $mode, implicit $exec
+S_SETPC_B64_return undef $sgpr30_sgpr31, implicit $vgpr0, implicit $vgpr1, 
implicit $vgpr2, implicit $vgpr3
+
+...
+
+# First MFMA uses f8 format, so should be treated as 32 cycles
+---
+name:
V_MFMA_F32_16X16X128_F8F6F4_vgprcd_e64___xdl_write_vgpr__cbsz1_blgp1xdl_read_overlap_vgpr_srcC
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5, $vgpr6, $vgpr7, 
$vgpr8, $vgpr9, $vgpr10, $vgpr11, $vgpr12, $vgpr13, $vgpr14, $vgpr15, $vgpr16, 
$vgpr17, $vgpr18, $vgpr19, $vgpr20, $vgpr21
+
+; GCN-LABEL: name: 
V_MFMA_F32_16X16X128_F8F6F4_vgprcd_e64___xdl_write_vgpr__cbsz1_blgp1xdl_read_overlap_vgpr_srcC
+; GCN: liveins: $vgpr0, $vgpr1, $vgpr2,

[llvm-branch-commits] [flang] [flang] handle fir.call in AliasAnalysis::getModRef (PR #117164)

2024-11-21 Thread Peter Klausler via llvm-branch-commits


@@ -0,0 +1,68 @@
+! RUN: bbc -emit-hlfir %s -o - | %python %S/gen_mod_ref_test.py | \
+! RUN:  fir-opt 
-pass-pipeline='builtin.module(func.func(test-fir-alias-analysis-modref))' \
+! RUN:  --mlir-disable-threading -o /dev/null 2>&1 | FileCheck %s
+
+! Test fir.call modref for global variables (module, saved, common).
+
+
+module somemod
+  implicit none
+  real :: test_var_xmod
+  interface
+subroutine may_capture(x)
+  real, target :: x
+end subroutine
+  end interface
+end module
+
+subroutine test_module
+  use somemod, only : test_var_xmod
+  implicit none
+  call test_effect_external()
+end subroutine
+! CHECK-LABEL: Testing : "_QPtest_module"
+! CHECK: test_effect_external -> test_var_xmod#0: ModRef
+
+subroutine test_saved_local
+  use somemod, only : may_capture
+  implicit none
+  real, save :: test_var_xsaved
+  ! Capture is invalid after the call because test_var_xsaved does not have the
+  ! target attribute.
+  call may_capture(test_var_xsaved)
+  call test_effect_external()
+end subroutine
+! CHECK-LABEL: Testing : "_QPtest_saved_local"
+! CHECK: test_effect_external -> test_var_xsaved#0: NoModRef
+
+subroutine test_saved_target
+  use somemod, only : may_capture
+  implicit none
+  real, save, target :: test_var_target_xsaved

klausler wrote:

The 'save' attribute shouldn't matter; the result would be ModRef with and 
without `save`, yes?

https://github.com/llvm/llvm-project/pull/117164
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_mfma_f32_16x16x32_bf16 for gfx950 (PR #117053)

2024-11-21 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/117053

>From 84c3383558d5962f78086b64244997ca7a2b8c01 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Mon, 29 Jan 2024 18:16:52 +0530
Subject: [PATCH] AMDGPU: Add v_mfma_f32_16x16x32_bf16 for gfx950

---
 .../CodeGenOpenCL/builtins-amdgcn-mfma.cl |   7 +
 .../builtins-amdgcn-error-gfx950-param.cl |   7 +
 .../builtins-amdgcn-error-gfx950.cl   |   1 +
 llvm/include/llvm/IR/IntrinsicsAMDGPU.td  |   2 +-
 .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp  |   3 +-
 llvm/lib/Target/AMDGPU/SIInstrInfo.td |   1 +
 llvm/lib/Target/AMDGPU/VOP3PInstructions.td   |   5 +
 .../UniformityAnalysis/AMDGPU/intrinsics.ll   |   9 +
 .../CodeGen/AMDGPU/llvm.amdgcn.mfma.gfx950.ll | 198 ++
 llvm/test/MC/AMDGPU/mai-gfx950.s  |  56 +
 .../MC/Disassembler/AMDGPU/gfx950_mai.txt |  34 +++
 llvm/test/tools/llvm-mca/AMDGPU/gfx950.s  |  10 +-
 12 files changed, 328 insertions(+), 5 deletions(-)

diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl 
b/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl
index b21394b6982631..bfe2901ee962a3 100644
--- a/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl
+++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl
@@ -458,4 +458,11 @@ v16i test_mfma_i32_32x32x32_i8(v4i a, v4i b, v16i c) {
   return __builtin_amdgcn_mfma_i32_32x32x32_i8(a, b, c, 1, 2, 3);
 }
 
+// CHECK-GFX950-LABEL: @test_mfma_f32_16x16x32_bf16(
+// CHECK-GFX950: tail call <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.bf16(<8 
x bfloat> %a, <8 x bfloat> %b, <4 x float> %c, i32 1, i32 2, i32 3)
+v4f test_mfma_f32_16x16x32_bf16(v8bf16 a, v8bf16 b, v4f c)
+{
+  return __builtin_amdgcn_mfma_f32_16x16x32_bf16(a, b, c, 1, 2, 3);
+}
+
 #endif
diff --git a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl 
b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl
index 9c14c0541ff3b8..acaa20090dfcba 100644
--- a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl
+++ b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl
@@ -55,3 +55,10 @@ void test_mfma_i32_32x32x32_i8(__global int16* out, int4 a, 
int4 b, int16 c, int
   *out = __builtin_amdgcn_mfma_i32_32x32x32_i8(a, b, c, 0, X, 0);  // 
expected-error{{argument to '__builtin_amdgcn_mfma_i32_32x32x32_i8' must be a 
constant integer}}
   *out = __builtin_amdgcn_mfma_i32_32x32x32_i8(a, b, c, 0, 0, X);  // 
expected-error{{argument to '__builtin_amdgcn_mfma_i32_32x32x32_i8' must be a 
constant integer}}
 }
+
+void test_mfma_f32_16x16x32_bf16(__global float4* out, bfloat8 a, bfloat8 b, 
float4 c, int X) {
+
+  *out = __builtin_amdgcn_mfma_f32_16x16x32_bf16(a, b, c, X, 0, 0); // 
expected-error{{argument to '__builtin_amdgcn_mfma_f32_16x16x32_bf16' must be a 
constant integer}}
+  *out = __builtin_amdgcn_mfma_f32_16x16x32_bf16(a, b, c, 0, X, 0); // 
expected-error{{argument to '__builtin_amdgcn_mfma_f32_16x16x32_bf16' must be a 
constant integer}}
+  *out = __builtin_amdgcn_mfma_f32_16x16x32_bf16(a, b, c, 0, 0, X); // 
expected-error{{argument to '__builtin_amdgcn_mfma_f32_16x16x32_bf16' must be a 
constant integer}}
+}
diff --git a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl 
b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl
index 71a110066342cb..6bf76b3cba0f59 100644
--- a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl
+++ b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl
@@ -33,6 +33,7 @@ void test(__global float4* out0, half8 a0, half8 b0, float4 
c0,
   *out2 = __builtin_amdgcn_mfma_f32_32x32x16_bf16(a2, b2, c2, 0, 0, 0); // 
expected-error{{'__builtin_amdgcn_mfma_f32_32x32x16_bf16' needs target feature 
gfx950-insts}}
   *out3 = __builtin_amdgcn_mfma_i32_16x16x64_i8(a3, b3, c3, 0, 0, 0); // 
expected-error{{'__builtin_amdgcn_mfma_i32_16x16x64_i8' needs target feature 
gfx950-insts}}
   *out4 = __builtin_amdgcn_mfma_i32_32x32x32_i8(a4, b4, c4, 0, 0, 0); // 
expected-error{{'__builtin_amdgcn_mfma_i32_32x32x32_i8' needs target feature 
gfx950-insts}}
+  *out5 = __builtin_amdgcn_mfma_f32_16x16x32_bf16(a5, b5, c5, 0, 0, 0); // 
expected-error{{'__builtin_amdgcn_mfma_f32_16x16x32_bf16' needs target feature 
gfx950-insts}}
   *out14 = __builtin_amdgcn_mfma_scale_f32_16x16x128_f8f6f4(a14, b14, c14, 0, 
0, 0, d14, 0, e14); // 
expected-error{{'__builtin_amdgcn_mfma_scale_f32_16x16x128_f8f6f4' needs target 
feature gfx950-insts}}
   *out15 = __builtin_amdgcn_mfma_scale_f32_32x32x64_f8f6f4(a15, b15, c15, 0, 
0, 0, d15, 0, e15); // 
expected-error{{'__builtin_amdgcn_mfma_scale_f32_32x32x64_f8f6f4' needs target 
feature gfx950-insts}}
 }
diff --git a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td 
b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td
index b5d5eae0c7cd7e..479120f9c202bf 100644
--- a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td
+++ b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td
@@ -3148,7 +3148,7 @@ def int_amdgcn_mfma_f32_16x16x32_f16 : 
AMDGPUMfmaIntrinsic;
 def int_amdgcn_mfma_i32_16x16x64_i8 : AM

[llvm-branch-commits] [llvm] AMDGPU: Add a baseline, non-comprehensive test for scaled mfma hazards (PR #117055)

2024-11-21 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/117055

>From a5ed11b07ab7ac28d304db851abf01c6b1230c24 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Mon, 4 Mar 2024 17:36:33 +0530
Subject: [PATCH] AMDGPU: Add a baseline, non-comprehensive test for scaled
 mfma hazards

Add some tests which will demonstrate that we treat the number of cycles
differently depending on whether the first matrix uses an f8 format.
---
 .../CodeGen/AMDGPU/mai-hazards-gfx940.mir |   2 +-
 .../AMDGPU/mai-hazards-mfma-scale.gfx950.mir  | 274 ++
 2 files changed, 275 insertions(+), 1 deletion(-)
 create mode 100644 llvm/test/CodeGen/AMDGPU/mai-hazards-mfma-scale.gfx950.mir

diff --git a/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir 
b/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir
index a98b02d792d984..9681b01f334f9a 100644
--- a/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir
+++ b/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir
@@ -2199,7 +2199,7 @@ name:
xdl_mfma_4pass_write_vgpr_sgemm_mfma_read_overlap_srcb
 body: |
   bb.0:
 $vgpr0_vgpr1_vgpr2_vgpr3 = V_MFMA_F32_16X16X16F16_vgprcd_e64 $vgpr4_vgpr5, 
$vgpr6_vgpr7, $vgpr0_vgpr1_vgpr2_vgpr3, 1, 2, 3, implicit $mode, implicit $exec
-$vgpr0_vgpr1_vgpr2_vgpr3 = V_MFMA_F32_4X4X1F32_vgprcd_e64 $vgpr8, $vgpr1, 
$vgpr6_vgpr7_vgpr8_vgpr9, 0, 0, 0, implicit $mode, implicit $exec
+$vgpr0_vgpr1_vgpr2_vgpr3 = V_MFMA_F32_4X4X1F32_vgprcd_e64 $vgpr8, $vgpr1, 
$vgpr2_vgpr3_vgpr4_vgpr5, 0, 0, 0, implicit $mode, implicit $exec
 
 ...
 
diff --git a/llvm/test/CodeGen/AMDGPU/mai-hazards-mfma-scale.gfx950.mir 
b/llvm/test/CodeGen/AMDGPU/mai-hazards-mfma-scale.gfx950.mir
new file mode 100644
index 00..c0f0482debbcb3
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/mai-hazards-mfma-scale.gfx950.mir
@@ -0,0 +1,274 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py 
UTC_ARGS: --version 4
+# RUN: llc -march=amdgcn -mcpu=gfx950 -verify-machineinstrs -run-pass 
post-RA-hazard-rec %s -o - | FileCheck -check-prefix=GCN %s
+
+# Immediate operand order = cbsz, abid, blgp
+
+# First MFMA uses f8 format, so should be treated as 32 cycles
+---
+name:
V_MFMA_F32_16X16X128_F8F6F4_vgprcd_e64___xdl_write_vgpr__cbsz0_blgp0xdl_read_overlap_vgpr_srcC
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5, $vgpr6, $vgpr7, 
$vgpr8, $vgpr9, $vgpr10, $vgpr11, $vgpr12, $vgpr13, $vgpr14, $vgpr15, $vgpr16, 
$vgpr17, $vgpr18, $vgpr19, $vgpr20, $vgpr21
+
+; GCN-LABEL: name: 
V_MFMA_F32_16X16X128_F8F6F4_vgprcd_e64___xdl_write_vgpr__cbsz0_blgp0xdl_read_overlap_vgpr_srcC
+; GCN: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5, $vgpr6, 
$vgpr7, $vgpr8, $vgpr9, $vgpr10, $vgpr11, $vgpr12, $vgpr13, $vgpr14, $vgpr15, 
$vgpr16, $vgpr17, $vgpr18, $vgpr19, $vgpr20, $vgpr21
+; GCN-NEXT: {{  $}}
+; GCN-NEXT: renamable $vgpr0_vgpr1_vgpr2_vgpr3 = nofpexcept 
V_MFMA_F32_16X16X128_F8F6F4_vgprcd_e64 
$vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11, 
$vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17_vgpr18_vgpr19, killed 
$vgpr0_vgpr1_vgpr2_vgpr3, 0, 0, 0, implicit $mode, implicit $exec
+; GCN-NEXT: S_NOP 1
+; GCN-NEXT: renamable $vgpr0_vgpr1_vgpr2_vgpr3 = nofpexcept 
V_MFMA_F32_16X16X128_F8F6F4_vgprcd_e64 killed 
$vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11, killed 
$vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17_vgpr18_vgpr19, killed 
$vgpr2_vgpr3_vgpr4_vgpr5, 0, 0, 0, implicit $mode, implicit $exec
+; GCN-NEXT: S_SETPC_B64_return undef $sgpr30_sgpr31, implicit $vgpr0, 
implicit $vgpr1, implicit $vgpr2, implicit $vgpr3
+renamable $vgpr0_vgpr1_vgpr2_vgpr3 = nofpexcept 
V_MFMA_F32_16X16X128_F8F6F4_vgprcd_e64 
$vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11, 
$vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17_vgpr18_vgpr19, killed 
$vgpr0_vgpr1_vgpr2_vgpr3, 0, 0, 0, implicit $mode, implicit $exec
+renamable $vgpr0_vgpr1_vgpr2_vgpr3 = nofpexcept 
V_MFMA_F32_16X16X128_F8F6F4_vgprcd_e64 killed 
$vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11, killed 
$vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17_vgpr18_vgpr19, killed 
$vgpr2_vgpr3_vgpr4_vgpr5, 0, 0, 0, implicit $mode, implicit $exec
+S_SETPC_B64_return undef $sgpr30_sgpr31, implicit $vgpr0, implicit $vgpr1, 
implicit $vgpr2, implicit $vgpr3
+
+...
+
+# First MFMA uses f8 format, so should be treated as 32 cycles
+---
+name:
V_MFMA_F32_16X16X128_F8F6F4_vgprcd_e64___xdl_write_vgpr__cbsz1_blgp1xdl_read_overlap_vgpr_srcC
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5, $vgpr6, $vgpr7, 
$vgpr8, $vgpr9, $vgpr10, $vgpr11, $vgpr12, $vgpr13, $vgpr14, $vgpr15, $vgpr16, 
$vgpr17, $vgpr18, $vgpr19, $vgpr20, $vgpr21
+
+; GCN-LABEL: name: 
V_MFMA_F32_16X16X128_F8F6F4_vgprcd_e64___xdl_write_vgpr__cbsz1_blgp1xdl_read_overlap_vgpr_srcC
+; GCN: liveins: $vgpr0, $vgpr1, $vgpr2,

[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_smfmac_f32_32x32x64_fp8_fp8 for gfx950 (PR #117259)

2024-11-21 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/117259

>From d36a1301eb84377617c35c125e136230327eb3e9 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Sat, 3 Feb 2024 21:43:00 +0530
Subject: [PATCH] AMDGPU: Add v_smfmac_f32_32x32x64_fp8_fp8 for gfx950

---
 clang/include/clang/Basic/BuiltinsAMDGPU.def  |   1 +
 .../CodeGenOpenCL/builtins-amdgcn-mfma.cl |   7 +
 .../builtins-amdgcn-error-gfx950-param.cl |   6 +
 .../builtins-amdgcn-error-gfx950.cl   |   1 +
 llvm/include/llvm/IR/IntrinsicsAMDGPU.td  |   1 +
 .../AMDGPU/AMDGPUInstructionSelector.cpp  |   4 +
 .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp  |   3 +-
 llvm/lib/Target/AMDGPU/VOP3PInstructions.td   |   2 +
 .../UniformityAnalysis/AMDGPU/intrinsics.ll   |   9 +
 .../AMDGPU/llvm.amdgcn.smfmac.gfx950.ll   | 414 ++
 llvm/test/MC/AMDGPU/mai-gfx950.s  |  36 ++
 .../MC/Disassembler/AMDGPU/gfx950_mai.txt |  22 +
 12 files changed, 505 insertions(+), 1 deletion(-)

diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def 
b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index f90af7000e3196..51a5b1dbad495c 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -457,6 +457,7 @@ 
TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_16x16x128_fp8_fp8, "V4fV4iV8iV4fiIiIi
 TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_32x32x64_bf8_bf8, 
"V16fV4iV8iV16fiIiIi", "nc", "gfx950-insts")
 TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8, 
"V16fV4iV8iV16fiIiIi", "nc", "gfx950-insts")
 TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_32x32x64_fp8_bf8, 
"V16fV4iV8iV16fiIiIi", "nc", "gfx950-insts")
+TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_32x32x64_fp8_fp8, 
"V16fV4iV8iV16fiIiIi", "nc", "gfx950-insts")
 
 
//===--===//
 // GFX12+ only builtins.
diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl 
b/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl
index 33b60d53f11cc8..00346baa6ff84d 100644
--- a/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl
+++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl
@@ -559,4 +559,11 @@ void test_smfmac_f32_32x32x64_fp8_bf8(global v16f* out, 
v4i a, v8i b, v16f c, in
   *out = __builtin_amdgcn_smfmac_f32_32x32x64_fp8_bf8(a, b, c, idx, 0, 0);
 }
 
+// CHECK-GFX950-LABEL: @test_smfmac_f32_32x32x64_fp8_fp8
+// CHECK-GFX950: call <16 x float> @llvm.amdgcn.smfmac.f32.32x32x64.fp8.fp8(<4 
x i32> %a, <8 x i32> %b, <16 x float> %c, i32 %idx, i32 0, i32 0)
+void test_smfmac_f32_32x32x64_fp8_fp8(global v16f* out, v4i a, v8i b, v16f c, 
int idx)
+{
+  *out = __builtin_amdgcn_smfmac_f32_32x32x64_fp8_fp8(a, b, c, idx, 0, 0);
+}
+
 #endif
diff --git a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl 
b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl
index c53ca8a7c3513f..b3b359a1e0c65b 100644
--- a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl
+++ b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl
@@ -142,3 +142,9 @@ void test_smfmac_f32_32x32x64_fp8_bf8(global float16* out, 
int4 a, int8 b, float
   *out = __builtin_amdgcn_smfmac_f32_32x32x64_fp8_bf8(a, b, c, idx, d, 0); // 
expected-error{{argument to '__builtin_amdgcn_smfmac_f32_32x32x64_fp8_bf8' must 
be a constant integer}}
   *out = __builtin_amdgcn_smfmac_f32_32x32x64_fp8_bf8(a, b, c, idx, 0, d); // 
expected-error{{argument to '__builtin_amdgcn_smfmac_f32_32x32x64_fp8_bf8' must 
be a constant integer}}
 }
+
+void test_smfmac_f32_32x32x64_fp8_fp8(global float16* out, int4 a, int8 b, 
float16 c, int idx, int d)
+{
+  *out = __builtin_amdgcn_smfmac_f32_32x32x64_fp8_fp8(a, b, c, idx, d, 0); // 
expected-error{{argument to '__builtin_amdgcn_smfmac_f32_32x32x64_fp8_fp8' must 
be a constant integer}}
+  *out = __builtin_amdgcn_smfmac_f32_32x32x64_fp8_fp8(a, b, c, idx, 0, d); // 
expected-error{{argument to '__builtin_amdgcn_smfmac_f32_32x32x64_fp8_fp8' must 
be a constant integer}}
+}
diff --git a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl 
b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl
index 9e563a7b0bd64c..57523cf0af1b18 100644
--- a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl
+++ b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl
@@ -47,6 +47,7 @@ void test(__global float4* out0, half8 a0, half8 b0, float4 
c0,
   *out13 = __builtin_amdgcn_smfmac_f32_32x32x64_bf8_bf8(a13, b13, c13, 0, 0, 
0); // expected-error{{'__builtin_amdgcn_smfmac_f32_32x32x64_bf8_bf8' needs 
target feature gfx950-insts}}
   *out13 = __builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8(a13, b13, c13, 0, 0, 
0); // expected-error{{'__builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8' needs 
target feature gfx950-insts}}
   *out13 = __builtin_amdgcn_smfmac_f32_32x32x64_fp8_bf8(a13, b13, c13, 0, 0, 
0); // expected-error{{'__builtin_amdgcn_smfmac_f32_32x32x64_fp8_bf8' needs 
target feature gfx950-insts}}
+  *out13 = __builtin_amdgcn_smfmac_f32_32

[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_smfmac_f32_32x32x64_bf8_fp8 for gfx950 (PR #117257)

2024-11-21 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/117257

>From 73f8fed93b6fd985cf79d384fee64fc506ceb062 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Sat, 3 Feb 2024 21:09:21 +0530
Subject: [PATCH] AMDGPU: Add v_smfmac_f32_32x32x64_bf8_fp8 for gfx950

---
 clang/include/clang/Basic/BuiltinsAMDGPU.def  |   1 +
 .../CodeGenOpenCL/builtins-amdgcn-mfma.cl |   7 +
 .../builtins-amdgcn-error-gfx950-param.cl |   6 +
 .../builtins-amdgcn-error-gfx950.cl   |   1 +
 llvm/include/llvm/IR/IntrinsicsAMDGPU.td  |   1 +
 .../AMDGPU/AMDGPUInstructionSelector.cpp  |   4 +
 .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp  |   3 +-
 llvm/lib/Target/AMDGPU/VOP3PInstructions.td   |   2 +
 .../UniformityAnalysis/AMDGPU/intrinsics.ll   |   9 +
 .../AMDGPU/llvm.amdgcn.smfmac.gfx950.ll   | 414 ++
 llvm/test/MC/AMDGPU/mai-gfx950.s  |  36 ++
 .../MC/Disassembler/AMDGPU/gfx950_mai.txt |  22 +
 12 files changed, 505 insertions(+), 1 deletion(-)

diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def 
b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index 8abfcf496b7d73..d6123fa41ca8b8 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -455,6 +455,7 @@ 
TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_16x16x128_bf8_fp8, "V4fV4iV8iV4fiIiIi
 TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_16x16x128_fp8_bf8, 
"V4fV4iV8iV4fiIiIi", "nc", "gfx950-insts")
 TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_16x16x128_fp8_fp8, 
"V4fV4iV8iV4fiIiIi", "nc", "gfx950-insts")
 TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_32x32x64_bf8_bf8, 
"V16fV4iV8iV16fiIiIi", "nc", "gfx950-insts")
+TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8, 
"V16fV4iV8iV16fiIiIi", "nc", "gfx950-insts")
 
 
//===--===//
 // GFX12+ only builtins.
diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl 
b/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl
index fdaedc1f92bede..d79ca36f003c5e 100644
--- a/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl
+++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl
@@ -545,4 +545,11 @@ void test_smfmac_f32_32x32x64_bf8_bf8(global v16f* out, 
v4i a, v8i b, v16f c, in
   *out = __builtin_amdgcn_smfmac_f32_32x32x64_bf8_bf8(a, b, c, idx, 0, 0);
 }
 
+// CHECK-GFX950-LABEL: @test_smfmac_f32_32x32x64_bf8_fp8
+// CHECK-GFX950: call <16 x float> @llvm.amdgcn.smfmac.f32.32x32x64.bf8.fp8(<4 
x i32> %a, <8 x i32> %b, <16 x float> %c, i32 %idx, i32 0, i32 0)
+void test_smfmac_f32_32x32x64_bf8_fp8(global v16f* out, v4i a, v8i b, v16f c, 
int idx)
+{
+  *out = __builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8(a, b, c, idx, 0, 0);
+}
+
 #endif
diff --git a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl 
b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl
index 9e0c46b8777533..d1751a6af15463 100644
--- a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl
+++ b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl
@@ -130,3 +130,9 @@ void test_smfmac_f32_32x32x64_bf8_bf8(global float16* out, 
int4 a, int8 b, float
   *out = __builtin_amdgcn_smfmac_f32_32x32x64_bf8_bf8(a, b, c, idx, d, 0); // 
expected-error{{argument to '__builtin_amdgcn_smfmac_f32_32x32x64_bf8_bf8' must 
be a constant integer}}
   *out = __builtin_amdgcn_smfmac_f32_32x32x64_bf8_bf8(a, b, c, idx, 0, d); // 
expected-error{{argument to '__builtin_amdgcn_smfmac_f32_32x32x64_bf8_bf8' must 
be a constant integer}}
 }
+
+void test_smfmac_f32_32x32x64_bf8_fp8(global float16* out, int4 a, int8 b, 
float16 c, int idx, int d)
+{
+  *out = __builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8(a, b, c, idx, d, 0); // 
expected-error{{argument to '__builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8' must 
be a constant integer}}
+  *out = __builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8(a, b, c, idx, 0, d); // 
expected-error{{argument to '__builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8' must 
be a constant integer}}
+}
diff --git a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl 
b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl
index a0955b290c9830..f8ac3399d2b64b 100644
--- a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl
+++ b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl
@@ -45,6 +45,7 @@ void test(__global float4* out0, half8 a0, half8 b0, float4 
c0,
   *out12 = __builtin_amdgcn_smfmac_f32_16x16x128_fp8_bf8(a12, b12, c12, 0, 0, 
0); // expected-error{{'__builtin_amdgcn_smfmac_f32_16x16x128_fp8_bf8' needs 
target feature gfx950-insts}}
   *out12 = __builtin_amdgcn_smfmac_f32_16x16x128_fp8_fp8(a12, b12, c12, 0, 0, 
0); // expected-error{{'__builtin_amdgcn_smfmac_f32_16x16x128_fp8_fp8' needs 
target feature gfx950-insts}}
   *out13 = __builtin_amdgcn_smfmac_f32_32x32x64_bf8_bf8(a13, b13, c13, 0, 0, 
0); // expected-error{{'__builtin_amdgcn_smfmac_f32_32x32x64_bf8_bf8' needs 
target feature gfx950-insts}}
+  *out13 = __builtin_amdgcn_smfmac_f32_

[llvm-branch-commits] [flang] [flang][OpenMP] Use new modifier code in ORDER and SCHEDULE clauses (PR #117081)

2024-11-21 Thread Tom Eccles via llvm-branch-commits


@@ -153,6 +153,16 @@ std::optional maybeApply(FuncTy &&func,
   return std::move(func(*arg));
 }
 
+template <
+typename FuncTy, //
+typename ArgTy,  //
+typename ResultTy = std::invoke_result_t>
+std::optional maybeApplyToV(FuncTy &&func, const ArgTy *arg) {
+  if (!arg)
+return std::nullopt;
+  return std::move(func(arg->v));

tblah wrote:

nit: I don't think this `std::move` is necessary. In the uses I can see here 
`ResultTy` is not a reference. Therefore, the function result is a prvalue and 
so will be moved automatically.

https://github.com/llvm/llvm-project/pull/117081
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [flang][OpenMP] Use new modifier code in ORDER and SCHEDULE clauses (PR #117081)

2024-11-21 Thread Tom Eccles via llvm-branch-commits

https://github.com/tblah edited https://github.com/llvm/llvm-project/pull/117081
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_smfmac_i32_32x32x64_i8 for gfx950 (PR #117214)

2024-11-21 Thread Sirish Pande via llvm-branch-commits

https://github.com/srpande approved this pull request.

lgrm

https://github.com/llvm/llvm-project/pull/117214
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_smfmac_f32_16x16x64_bf16 for gfx950 (PR #117211)

2024-11-21 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm ready_for_review 
https://github.com/llvm/llvm-project/pull/117211
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_smfmac_f32_16x16x128_fp8_fp8 for gfx950 (PR #117235)

2024-11-21 Thread Matt Arsenault via llvm-branch-commits

arsenm wrote:

### Merge activity

* **Nov 21, 7:53 PM EST**: A user started a stack merge that includes this pull 
request via 
[Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/117235).


https://github.com/llvm/llvm-project/pull/117235
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] 14b474b - Revert "[LV][VPlan] Remove any-of reduction from precomputeCost. NFC (#117109)"

2024-11-21 Thread via llvm-branch-commits

Author: Elvis Wang
Date: 2024-11-22T11:32:12+08:00
New Revision: 14b474be36144527a55b5d49954379a3484c5f84

URL: 
https://github.com/llvm/llvm-project/commit/14b474be36144527a55b5d49954379a3484c5f84
DIFF: 
https://github.com/llvm/llvm-project/commit/14b474be36144527a55b5d49954379a3484c5f84.diff

LOG: Revert "[LV][VPlan] Remove any-of reduction from precomputeCost. NFC 
(#117109)"

This reverts commit ce66b56865426fc1760b5a090ca2748c046094f5.

Added: 


Modified: 
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

Removed: 




diff  --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp 
b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 5b556058cc762c..d13770a35c108f 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -7303,14 +7303,34 @@ LoopVectorizationPlanner::precomputeCosts(VPlan &Plan, 
ElementCount VF,
 
   // The legacy cost model has special logic to compute the cost of in-loop
   // reductions, which may be smaller than the sum of all instructions involved
-  // in the reduction.
+  // in the reduction. For AnyOf reductions, VPlan codegen may remove the 
select
+  // which the legacy cost model uses to assign cost. Pre-compute their costs
+  // for now.
   // TODO: Switch to costing based on VPlan once the logic has been ported.
   for (const auto &[RedPhi, RdxDesc] : Legal->getReductionVars()) {
 if (ForceTargetInstructionCost.getNumOccurrences())
   continue;
 
-if (!CM.isInLoopReduction(RedPhi))
+if (!CM.isInLoopReduction(RedPhi) &&
+!RecurrenceDescriptor::isAnyOfRecurrenceKind(
+RdxDesc.getRecurrenceKind()))
+  continue;
+
+// AnyOf reduction codegen may remove the select. To match the legacy cost
+// model, pre-compute the cost for AnyOf reductions here.
+if (RecurrenceDescriptor::isAnyOfRecurrenceKind(
+RdxDesc.getRecurrenceKind())) {
+  auto *Select = cast(*find_if(
+  RedPhi->users(), [](User *U) { return isa(U); }));
+  assert(!CostCtx.SkipCostComputation.contains(Select) &&
+ "reduction op visited multiple times");
+  CostCtx.SkipCostComputation.insert(Select);
+  auto ReductionCost = CostCtx.getLegacyCost(Select, VF);
+  LLVM_DEBUG(dbgs() << "Cost of " << ReductionCost << " for VF " << VF
+<< ":\n any-of reduction " << *Select << "\n");
+  Cost += ReductionCost;
   continue;
+}
 
 const auto &ChainOps = RdxDesc.getReductionOpChain(RedPhi, OrigLoop);
 SetVector ChainOpsAndOperands(ChainOps.begin(),



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Handle v_mfma_f64_16x16x4_f64 srcc write VGPR hazard change for gfx950 (PR #117283)

2024-11-21 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)


Changes

Read by sgemm/dgemm in srcc after v_mfma_f64_16x16x4_f64 increases from 9 to 17
wait states.

---
Full diff: https://github.com/llvm/llvm-project/pull/117283.diff


2 Files Affected:

- (modified) llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp (+5-1) 
- (modified) llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir (+33-12) 


``diff
diff --git a/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp 
b/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
index be0936ce74835f..4a4c9788b3d881 100644
--- a/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
@@ -2302,6 +2302,7 @@ int GCNHazardRecognizer::checkMAIHazards90A(MachineInstr 
*MI) {
 const int SMFMA16x16WritesVGPROverlappedDMFMASrcCWaitStates = 9;
 const int SMFMA32x32WritesVGPROverlappedDMFMASrcCWaitStates = 17;
 const int DMFMA16x16WritesVGPROverlappedSrcCWaitStates = 9;
+const int GFX950_DMFMA16x16WritesVGPROverlappedSrcCWaitStates = 17;
 const int DMFMA4x4WritesVGPROverlappedSrcCWaitStates = 4;
 const int SMFMA4x4WritesVGPROverlappedSrcABWaitStates = 5;
 const int SMFMA16x16WritesVGPROverlappedSrcABWaitStates = 11;
@@ -2359,7 +2360,10 @@ int GCNHazardRecognizer::checkMAIHazards90A(MachineInstr 
*MI) {
 case AMDGPU::V_MFMA_F64_16X16X4F64_mac_e64:
 case AMDGPU::V_MFMA_F64_16X16X4F64_mac_vgprcd_e64:
   if (!isXDL(ST, *MI))
-NeedWaitStates = DMFMA16x16WritesVGPROverlappedSrcCWaitStates;
+NeedWaitStates =
+ST.hasGFX950Insts()
+? GFX950_DMFMA16x16WritesVGPROverlappedSrcCWaitStates
+: DMFMA16x16WritesVGPROverlappedSrcCWaitStates;
   break;
 case AMDGPU::V_MFMA_F64_4X4X4F64_e64:
 case AMDGPU::V_MFMA_F64_4X4X4F64_vgprcd_e64:
diff --git a/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir 
b/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir
index b9135dbd46fc1f..1499fd4907a181 100644
--- a/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir
+++ b/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir
@@ -298,8 +298,12 @@ body: |
 ...
 # GCN-LABEL: name: dgemm16x16_mfma_write_vgpr_mfma_read_overlap
 # GCN:  V_MFMA
-# GCN-NEXT: S_NOP 7
-# GCN-NEXT: S_NOP 0
+# GFX940-NEXT: S_NOP 7
+# GFX940-NEXT: S_NOP 0
+
+# GFX950-NEXT: S_NOP 7
+# GFX950-NEXT: S_NOP 7
+# GFX950-NEXT: S_NOP 0
 # GCN-NEXT: V_MFMA
 name:dgemm16x16_mfma_write_vgpr_mfma_read_overlap
 body: |
@@ -319,8 +323,12 @@ body: |
 ...
 # GCN-LABEL: name: dgemm16x16_mfma_write_vgpr_sgemm_mfma_read_overlap
 # GCN:  V_MFMA
-# GCN-NEXT: S_NOP 7
-# GCN-NEXT: S_NOP 0
+# GFX940-NEXT: S_NOP 7
+# GFX940-NEXT: S_NOP 0
+
+# GFX950-NEXT: S_NOP 7
+# GFX950-NEXT: S_NOP 7
+# GFX950-NEXT: S_NOP 0
 # GCN-NEXT: V_MFMA
 name:dgemm16x16_mfma_write_vgpr_sgemm_mfma_read_overlap
 body: |
@@ -549,8 +557,12 @@ body: |
 ...
 # GCN-LABEL: name: dgemm16x16_mfma_write_vgpr_sgemm_mfma_srca_read_overlap
 # GCN:  V_MFMA
-# GCN-NEXT: S_NOP 7
-# GCN-NEXT: S_NOP 2
+# GFX940-NEXT: S_NOP 7
+# GFX940-NEXT: S_NOP 2
+
+# GFX950-NEXT: S_NOP 7
+# GFX950-NEXT: S_NOP 7
+# GFX950-NEXT: S_NOP 0
 # GCN-NEXT: V_MFMA
 name:dgemm16x16_mfma_write_vgpr_sgemm_mfma_srca_read_overlap
 body: |
@@ -1333,8 +1345,12 @@ body: |
 ...
 # GCN-LABEL: name: dgemm16x16_mfma_write_agpr_mfma_read_overlap
 # GCN:  V_MFMA
-# GCN-NEXT: S_NOP 7
-# GCN-NEXT: S_NOP 0
+# GFX940-NEXT: S_NOP 7
+# GFX940-NEXT: S_NOP 0
+
+# GFX950-NEXT: S_NOP 7
+# GFX950-NEXT: S_NOP 7
+# GFX950-NEXT: S_NOP 0
 # GCN-NEXT: V_MFMA
 name:dgemm16x16_mfma_write_agpr_mfma_read_overlap
 body: |
@@ -1354,8 +1370,13 @@ body: |
 ...
 # GCN-LABEL: name: dgemm16x16_mfma_write_agpr_sgemm_mfma_read_overlap
 # GCN:  V_MFMA
-# GCN-NEXT: S_NOP 7
-# GCN-NEXT: S_NOP 0
+# GFX940-NEXT: S_NOP 7
+# GFX940-NEXT: S_NOP 0
+
+# GFX950-NEXT: S_NOP 7
+# GFX950-NEXT: S_NOP 7
+# GFX950-NEXT: S_NOP 0
+
 # GCN-NEXT: V_MFMA
 name:dgemm16x16_mfma_write_agpr_sgemm_mfma_read_overlap
 body: |
@@ -2502,8 +2523,8 @@ body: |
 ...
 # GCN-LABEL: name: xdl_4pass_mfma_write_agpr_smfmac_read_overlap_srcc
 # GCN:  V_MFMA
-# GFX940: S_NOP 4
-# GFX950: S_NOP 5
+# GFX940-NEXT: S_NOP 4
+# GFX950-NEXT: S_NOP 5
 # GCN-NEXT: V_SMFMAC_
 name:xdl_4pass_mfma_write_agpr_smfmac_read_overlap_srcc
 body: |

``




https://github.com/llvm/llvm-project/pull/117283
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Handle gfx950 change in mfma_f64_16x16x4 + valu hazard (PR #117262)

2024-11-21 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/117262

>From 06412577e65e05abf3edc1a884edc8640b924933 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Thu, 7 Mar 2024 15:01:08 +0530
Subject: [PATCH] AMDGPU: Handle gfx950 change in mfma_f64_16x16x4 + valu
 hazard

Increase from 11 wait states to 19
---
 .../lib/Target/AMDGPU/GCNHazardRecognizer.cpp | 10 +--
 .../CodeGen/AMDGPU/mai-hazards-gfx940.mir | 28 ++-
 2 files changed, 28 insertions(+), 10 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp 
b/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
index 44afccb0690d0d..99a176731599cc 100644
--- a/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
@@ -2603,6 +2603,7 @@ int GCNHazardRecognizer::checkMAIVALUHazards(MachineInstr 
*MI) {
 const int DMFMA16x16WriteVgprMemExpReadWaitStates = 18;
 const int DMFMA4x4WriteVgprVALUReadWaitStates = 6;
 const int DMFMA16x16WriteVgprVALUReadWaitStates = 11;
+const int GFX950_DMFMA16x16WriteVgprVALUReadWaitStates = 19;
 const int DotWriteSameDotReadSrcAB = 3;
 const int DotWriteDifferentVALURead = 3;
 const int DMFMABetweenVALUWriteVMEMRead = 2;
@@ -2663,9 +2664,12 @@ int 
GCNHazardRecognizer::checkMAIVALUHazards(MachineInstr *MI) {
   break;
 case 8:
 case 16:
-  NeedWaitStates = IsMemOrExport
-   ? DMFMA16x16WriteVgprMemExpReadWaitStates
-   : DMFMA16x16WriteVgprVALUReadWaitStates;
+  NeedWaitStates =
+  IsMemOrExport
+  ? DMFMA16x16WriteVgprMemExpReadWaitStates
+  : (ST.hasGFX950Insts()
+ ? GFX950_DMFMA16x16WriteVgprVALUReadWaitStates
+ : DMFMA16x16WriteVgprVALUReadWaitStates);
   break;
 default:
   llvm_unreachable("unexpected dgemm");
diff --git a/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir 
b/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir
index 9681b01f334f9a..d2b2f226404da8 100644
--- a/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir
+++ b/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir
@@ -1,4 +1,5 @@
-# RUN: llc -mtriple=amdgcn -mcpu=gfx940 -verify-machineinstrs -run-pass 
post-RA-hazard-rec %s -o - | FileCheck -check-prefix=GCN %s
+# RUN: llc -mtriple=amdgcn -mcpu=gfx940 -verify-machineinstrs -run-pass 
post-RA-hazard-rec %s -o - | FileCheck -check-prefixes=GCN,GFX940 %s
+# RUN: llc -mtriple=amdgcn -mcpu=gfx950 -verify-machineinstrs -run-pass 
post-RA-hazard-rec %s -o - | FileCheck -check-prefixes=GCN,GFX950 %s
 
 # GCN-LABEL: name: valu_write_vgpr_sgemm_mfma_read
 # GCN:  V_MOV_B32
@@ -803,8 +804,12 @@ body: |
 ...
 # GCN-LABEL: name: dmfma16x16_write_vgpr_valu_read
 # GCN:  V_MFMA
-# GCN-NEXT: S_NOP 7
-# GCN-NEXT: S_NOP 2
+# GFX940-NEXT: S_NOP 7
+# GFX940-NEXT: S_NOP 2
+
+# GFX950-NEXT: S_NOP 7
+# GFX950-NEXT: S_NOP 7
+# GFX950-NEXT: S_NOP 2
 # GCN-NEXT: V_MOV_B32
 name:dmfma16x16_write_vgpr_valu_read
 body: |
@@ -867,8 +872,13 @@ body: |
 ...
 # GCN-LABEL: name: dmfma16x16_write_vgpr_dot_read
 # GCN:  V_MFMA
-# GCN-NEXT: S_NOP 7
-# GCN-NEXT: S_NOP 2
+# GFX940-NEXT: S_NOP 7
+# GFX940-NEXT: S_NOP 2
+
+# GFX950-NEXT: S_NOP 7
+# GFX950-NEXT: S_NOP 7
+# GFX950-NEXT: S_NOP 2
+
 # GCN-NEXT: V_DOT
 name:dmfma16x16_write_vgpr_dot_read
 body: |
@@ -1505,8 +1515,12 @@ body: |
 ...
 # GCN-LABEL: name: dmfma16x16_write_agpr_valu_read
 # GCN:  V_MFMA
-# GCN-NEXT: S_NOP 7
-# GCN-NEXT: S_NOP 2
+# GFX940-NEXT: S_NOP 7
+# GFX940-NEXT: S_NOP 2
+
+# GFX950-NEXT: S_NOP 7
+# GFX950-NEXT: S_NOP 7
+# GFX950-NEXT: S_NOP 2
 # GCN-NEXT: V_ACCVGPR_READ_B32_e64
 name:dmfma16x16_write_agpr_valu_read
 body: |

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Handle vcmpx+permalane gfx950 hazard (PR #117286)

2024-11-21 Thread via llvm-branch-commits

github-actions[bot] wrote:




:warning: C/C++ code formatter, clang-format found issues in your code. 
:warning:



You can test this locally with the following command:


``bash
git-clang-format --diff 52f540df160ad84aef090acb35c9372c270d758b 
0cbee40e03bff1514abbf1e879522a4808175c1a --extensions cpp,h -- 
llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp 
llvm/lib/Target/AMDGPU/GCNHazardRecognizer.h
``





View the diff from clang-format here.


``diff
diff --git a/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp 
b/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
index 45ff1f4a63..9799556084 100644
--- a/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
@@ -1207,12 +1207,11 @@ void GCNHazardRecognizer::fixHazards(MachineInstr *MI) {
   fixRequiredExportPriority(MI);
 }
 
-static bool isVCmpXWritesExec(const SIInstrInfo &TII,
-  const SIRegisterInfo &TRI,
+static bool isVCmpXWritesExec(const SIInstrInfo &TII, const SIRegisterInfo 
&TRI,
   const MachineInstr &MI) {
   return (TII.isVOPC(MI) ||
   (MI.isCompare() && (TII.isVOP3(MI) || TII.isSDWA(MI &&
-MI.modifiesRegister(AMDGPU::EXEC, &TRI);
+ MI.modifiesRegister(AMDGPU::EXEC, &TRI);
 }
 
 bool GCNHazardRecognizer::fixVcmpxPermlaneHazards(MachineInstr *MI) {

``




https://github.com/llvm/llvm-project/pull/117286
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Refine gfx950 xdl-write-vgpr hazard cases (PR #117285)

2024-11-21 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)


Changes

The 2-pass XDL write VGPR, read by non-XDL SGEMM/DGEMM case
was 1 wait state overly conservative. Previously, for gfx940,
the XDL/non-XDL cases happened to have the same number of cycles
in all cases. Now the XDL consumer case has an additional state for
2 pass sources.

---
Full diff: https://github.com/llvm/llvm-project/pull/117285.diff


2 Files Affected:

- (modified) llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp (+18-4) 
- (modified) llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir (+5-10) 


``diff
diff --git a/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp 
b/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
index 218f487f7e12ce..8008b5f7bcc991 100644
--- a/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
@@ -2232,8 +2232,8 @@ int GCNHazardRecognizer::checkMAIHazards908(MachineInstr 
*MI) {
 }
 
 static int
-GFX940_XDL_N_PassWritesVGPROverlappedSMFMASrcCWaitStates(int NumPasses,
- bool IsGFX950) {
+GFX940_XDL_N_PassWritesVGPROverlappedXDLOrSMFMASrcCWaitStates(int NumPasses,
+  bool IsGFX950) {
   // xdl def cycles | gfx940 | gfx950
   // 2 pass |  34
   // 4 pass |  56
@@ -2242,6 +2242,17 @@ 
GFX940_XDL_N_PassWritesVGPROverlappedSMFMASrcCWaitStates(int NumPasses,
   return NumPasses + 1 + IsGFX950;
 }
 
+static int
+GFX940_XDL_N_PassWritesVGPROverlappedSGEMMDGEMMSrcCWaitStates(int NumPasses,
+  bool IsGFX950) {
+  // xdl def cycles | gfx940 | gfx950
+  // 2 pass |  33
+  // 4 pass |  56
+  // 8 pass |  910
+  // 16 pass|  17   18
+  return NumPasses + 1 + (NumPasses != 2 && IsGFX950);
+}
+
 static int
 GFX940_SMFMA_N_PassWritesVGPROverlappedSMFMASrcCWaitStates(int NumPasses) {
   // 2 pass -> 2
@@ -2379,8 +2390,11 @@ int GCNHazardRecognizer::checkMAIHazards90A(MachineInstr 
*MI) {
 
 NeedWaitStates =
 isXDL(ST, *MI1)
-? GFX940_XDL_N_PassWritesVGPROverlappedSMFMASrcCWaitStates(
-  NumPasses, ST.hasGFX950Insts())
+? (isXDL(ST, *MI)
+   ? 
GFX940_XDL_N_PassWritesVGPROverlappedXDLOrSMFMASrcCWaitStates(
+ NumPasses, ST.hasGFX950Insts())
+   : 
GFX940_XDL_N_PassWritesVGPROverlappedSGEMMDGEMMSrcCWaitStates(
+ NumPasses, ST.hasGFX950Insts()))
 : 
GFX940_SMFMA_N_PassWritesVGPROverlappedSMFMASrcCWaitStates(
   NumPasses);
 break;
diff --git a/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir 
b/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir
index 2ba873f55a1eb0..d59bcfb16eece2 100644
--- a/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir
+++ b/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir
@@ -156,8 +156,7 @@ body: |
 ...
 # GCN-LABEL: name: sgemm4x4_mfma_write_vgpr_mfma_read_overlap
 # GCN:  V_MFMA
-# GFX940-NEXT: S_NOP 2
-# GFX950-NEXT: S_NOP 3
+# GCN-NEXT: S_NOP 2
 # GCN-NEXT: V_MFMA
 name:sgemm4x4_mfma_write_vgpr_mfma_read_overlap
 body: |
@@ -348,8 +347,7 @@ body: |
 ...
 # GCN-LABEL: name: sgemm4x4_mfma_write_vgpr_dgemm_mfma_read_overlap
 # GCN:  V_MFMA
-# GFX940-NEXT: S_NOP 2
-# GFX950-NEXT: S_NOP 3
+# GCN-NEXT: S_NOP 2
 # GCN-NEXT: V_MFMA
 name:sgemm4x4_mfma_write_vgpr_dgemm_mfma_read_overlap
 body: |
@@ -1403,8 +1401,7 @@ body: |
 ...
 # GCN-LABEL: name: sgemm4x4_mfma_write_agpr_dgemm_mfma_read_overlap
 # GCN:  V_MFMA
-# GFX940-NEXT: S_NOP 2
-# GFX950-NEXT: S_NOP 3
+# GCN-NEXT: S_NOP 2
 # GCN-NEXT: V_MFMA
 name:sgemm4x4_mfma_write_agpr_dgemm_mfma_read_overlap
 body: |
@@ -1885,8 +1882,7 @@ body: |
 ...
 # GCN-LABEL: name: xdl_sgemm4x4_mfma_write_agpr_mfma_read_overlap
 # GCN:  V_MFMA
-# GFX940-NEXT: S_NOP 2
-# GFX950-NEXT: S_NOP 3
+# GCN-NEXT: S_NOP 2
 # GCN-NEXT: V_MFMA
 name:xdl_sgemm4x4_mfma_write_agpr_mfma_read_overlap
 body: |
@@ -2220,8 +2216,7 @@ body: |
 # 2 pass source
 # GCN-LABEL: name: xdl_mfma_2pass_write_vgpr_sgemm_mfma_read_overlap_srcc
 # GCN:  V_MFMA
-# GFX940-NEXT: S_NOP 2
-# GFX950-NEXT: S_NOP 3
+# GCN-NEXT: S_NOP 2
 # GCN-NEXT: V_MFMA
 name:xdl_mfma_2pass_write_vgpr_sgemm_mfma_read_overlap_srcc
 body: |

``




https://github.com/llvm/llvm-project/pull/117285
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_smfmac_f32_32x32x64_bf8_fp8 for gfx950 (PR #117257)

2024-11-21 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian approved this pull request.


https://github.com/llvm/llvm-project/pull/117257
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_smfmac_f32_32x32x64_bf8_fp8 for gfx950 (PR #117257)

2024-11-21 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian edited 
https://github.com/llvm/llvm-project/pull/117257
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_permlane16_swap_b32 and v_permlane32_swap_b32 for gfx950 (PR #117260)

2024-11-21 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/117260

>From 426d5baaf7d373a6d35ead2af4515e108a6eb8b8 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Mon, 22 Jan 2024 12:40:54 +0700
Subject: [PATCH] AMDGPU: Add v_permlane16_swap_b32 and v_permlane32_swap_b32
 for gfx950

This was a bit annoying because these introduce a new special case
encoding usage. op_sel is repurposed as a subset of dpp controls,
and is eligible for VOP3->VOP1 shrinking. For some reason fi also
uses an enum value, so we need to convert the raw boolean to 1 instead
of -1.

The 2 registers are swapped, so this has 2 defs. Ideally the builtin
would return a pair, but that's difficult so return a vector instead.
This would make a hypothetical builtin that supports v2f16 directly
uglier.
---
 clang/include/clang/Basic/BuiltinsAMDGPU.def  |   3 +
 clang/lib/CodeGen/CGBuiltin.cpp   |  26 
 clang/test/CodeGenOpenCL/amdgpu-features.cl   |   2 +-
 .../builtins-amdgcn-gfx950-err.cl |   6 +-
 .../CodeGenOpenCL/builtins-amdgcn-gfx950.cl   |  87 +
 .../builtins-amdgcn-error-gfx950-param.cl |  10 ++
 .../builtins-amdgcn-error-gfx950.cl   |   5 +-
 llvm/docs/AMDGPUUsage.rst |  13 ++
 llvm/include/llvm/IR/IntrinsicsAMDGPU.td  |  14 ++
 llvm/lib/Target/AMDGPU/AMDGPU.td  |  23 +++-
 llvm/lib/Target/AMDGPU/AMDGPUGISel.td |   3 +
 llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp |  25 
 .../AMDGPU/AMDGPUInstructionSelector.cpp  |  32 +
 .../Target/AMDGPU/AMDGPUInstructionSelector.h |   3 +
 .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp  |   9 ++
 .../Target/AMDGPU/AMDGPUSearchableTables.td   |   2 +
 llvm/lib/Target/AMDGPU/GCNSubtarget.h |   8 +-
 llvm/lib/Target/AMDGPU/SIInstrInfo.cpp|   6 +-
 llvm/lib/Target/AMDGPU/SIInstrInfo.td |   4 +
 llvm/lib/Target/AMDGPU/VOP1Instructions.td|  46 +++
 llvm/lib/Target/AMDGPU/VOPInstructions.td |  12 ++
 llvm/lib/TargetParser/TargetParser.cpp|   2 +
 .../UniformityAnalysis/AMDGPU/intrinsics.ll   |  16 +++
 .../AMDGPU/llvm.amdgcn.permlane16.swap.ll | 121 ++
 .../AMDGPU/llvm.amdgcn.permlane32.swap.ll | 121 ++
 llvm/test/MC/AMDGPU/gfx950_asm_features.s |  82 
 llvm/test/MC/AMDGPU/gfx950_err.s  |  31 +
 llvm/test/MC/Disassembler/AMDGPU/gfx950.txt   |  32 +
 28 files changed, 737 insertions(+), 7 deletions(-)
 create mode 100644 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.permlane16.swap.ll
 create mode 100644 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.permlane32.swap.ll
 create mode 100644 llvm/test/MC/AMDGPU/gfx950_err.s

diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def 
b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index 51a5b1dbad495c..548bcc8ad55f48 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -459,6 +459,9 @@ 
TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8, "V16fV4iV8iV16fiIiI
 TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_32x32x64_fp8_bf8, 
"V16fV4iV8iV16fiIiIi", "nc", "gfx950-insts")
 TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_32x32x64_fp8_fp8, 
"V16fV4iV8iV16fiIiIi", "nc", "gfx950-insts")
 
+TARGET_BUILTIN(__builtin_amdgcn_permlane16_swap, "V2UiUiUiIbIb", "nc", 
"permlane16-swap")
+TARGET_BUILTIN(__builtin_amdgcn_permlane32_swap, "V2UiUiUiIbIb", "nc", 
"permlane32-swap")
+
 
//===--===//
 // GFX12+ only builtins.
 
//===--===//
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index ff7132fd8bc1e7..3b3c46b56868cf 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -20162,6 +20162,32 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
 CGM.getIntrinsic(Intrinsic::amdgcn_s_sendmsg_rtn, {ResultType});
 return Builder.CreateCall(F, {Arg});
   }
+  case AMDGPU::BI__builtin_amdgcn_permlane16_swap:
+  case AMDGPU::BI__builtin_amdgcn_permlane32_swap: {
+// Because builtin types are limited, and the intrinsic uses a struct/pair
+// output, marshal the pair-of-i32 to <2 x i32>.
+Value *VDstOld = EmitScalarExpr(E->getArg(0));
+Value *VSrcOld = EmitScalarExpr(E->getArg(1));
+Value *FI = EmitScalarExpr(E->getArg(2));
+Value *BoundCtrl = EmitScalarExpr(E->getArg(3));
+Function *F =
+CGM.getIntrinsic(BuiltinID == 
AMDGPU::BI__builtin_amdgcn_permlane16_swap
+ ? Intrinsic::amdgcn_permlane16_swap
+ : Intrinsic::amdgcn_permlane32_swap);
+llvm::CallInst *Call =
+Builder.CreateCall(F, {VDstOld, VSrcOld, FI, BoundCtrl});
+
+llvm::Value *Elt0 = Builder.CreateExtractValue(Call, 0);
+llvm::Value *Elt1 = Builder.CreateExtractValue(Call, 1);
+
+llvm::Type *ResultType = Con

[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_smfmac_f32_32x32x64_bf8_fp8 for gfx950 (PR #117257)

2024-11-21 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/117257

>From 698095bb278b20ff853018b997a563a2387eeca6 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Sat, 3 Feb 2024 21:09:21 +0530
Subject: [PATCH] AMDGPU: Add v_smfmac_f32_32x32x64_bf8_fp8 for gfx950

---
 clang/include/clang/Basic/BuiltinsAMDGPU.def  |   1 +
 .../CodeGenOpenCL/builtins-amdgcn-mfma.cl |   7 +
 .../builtins-amdgcn-error-gfx950-param.cl |   6 +
 .../builtins-amdgcn-error-gfx950.cl   |   1 +
 llvm/include/llvm/IR/IntrinsicsAMDGPU.td  |   1 +
 .../AMDGPU/AMDGPUInstructionSelector.cpp  |   4 +
 .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp  |   3 +-
 llvm/lib/Target/AMDGPU/VOP3PInstructions.td   |   2 +
 .../UniformityAnalysis/AMDGPU/intrinsics.ll   |   9 +
 .../AMDGPU/llvm.amdgcn.smfmac.gfx950.ll   | 414 ++
 llvm/test/MC/AMDGPU/mai-gfx950.s  |  36 ++
 .../MC/Disassembler/AMDGPU/gfx950_mai.txt |  22 +
 12 files changed, 505 insertions(+), 1 deletion(-)

diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def 
b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index 8abfcf496b7d73..d6123fa41ca8b8 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -455,6 +455,7 @@ 
TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_16x16x128_bf8_fp8, "V4fV4iV8iV4fiIiIi
 TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_16x16x128_fp8_bf8, 
"V4fV4iV8iV4fiIiIi", "nc", "gfx950-insts")
 TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_16x16x128_fp8_fp8, 
"V4fV4iV8iV4fiIiIi", "nc", "gfx950-insts")
 TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_32x32x64_bf8_bf8, 
"V16fV4iV8iV16fiIiIi", "nc", "gfx950-insts")
+TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8, 
"V16fV4iV8iV16fiIiIi", "nc", "gfx950-insts")
 
 
//===--===//
 // GFX12+ only builtins.
diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl 
b/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl
index fdaedc1f92bede..d79ca36f003c5e 100644
--- a/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl
+++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl
@@ -545,4 +545,11 @@ void test_smfmac_f32_32x32x64_bf8_bf8(global v16f* out, 
v4i a, v8i b, v16f c, in
   *out = __builtin_amdgcn_smfmac_f32_32x32x64_bf8_bf8(a, b, c, idx, 0, 0);
 }
 
+// CHECK-GFX950-LABEL: @test_smfmac_f32_32x32x64_bf8_fp8
+// CHECK-GFX950: call <16 x float> @llvm.amdgcn.smfmac.f32.32x32x64.bf8.fp8(<4 
x i32> %a, <8 x i32> %b, <16 x float> %c, i32 %idx, i32 0, i32 0)
+void test_smfmac_f32_32x32x64_bf8_fp8(global v16f* out, v4i a, v8i b, v16f c, 
int idx)
+{
+  *out = __builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8(a, b, c, idx, 0, 0);
+}
+
 #endif
diff --git a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl 
b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl
index 9e0c46b8777533..d1751a6af15463 100644
--- a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl
+++ b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl
@@ -130,3 +130,9 @@ void test_smfmac_f32_32x32x64_bf8_bf8(global float16* out, 
int4 a, int8 b, float
   *out = __builtin_amdgcn_smfmac_f32_32x32x64_bf8_bf8(a, b, c, idx, d, 0); // 
expected-error{{argument to '__builtin_amdgcn_smfmac_f32_32x32x64_bf8_bf8' must 
be a constant integer}}
   *out = __builtin_amdgcn_smfmac_f32_32x32x64_bf8_bf8(a, b, c, idx, 0, d); // 
expected-error{{argument to '__builtin_amdgcn_smfmac_f32_32x32x64_bf8_bf8' must 
be a constant integer}}
 }
+
+void test_smfmac_f32_32x32x64_bf8_fp8(global float16* out, int4 a, int8 b, 
float16 c, int idx, int d)
+{
+  *out = __builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8(a, b, c, idx, d, 0); // 
expected-error{{argument to '__builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8' must 
be a constant integer}}
+  *out = __builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8(a, b, c, idx, 0, d); // 
expected-error{{argument to '__builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8' must 
be a constant integer}}
+}
diff --git a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl 
b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl
index a0955b290c9830..f8ac3399d2b64b 100644
--- a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl
+++ b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl
@@ -45,6 +45,7 @@ void test(__global float4* out0, half8 a0, half8 b0, float4 
c0,
   *out12 = __builtin_amdgcn_smfmac_f32_16x16x128_fp8_bf8(a12, b12, c12, 0, 0, 
0); // expected-error{{'__builtin_amdgcn_smfmac_f32_16x16x128_fp8_bf8' needs 
target feature gfx950-insts}}
   *out12 = __builtin_amdgcn_smfmac_f32_16x16x128_fp8_fp8(a12, b12, c12, 0, 0, 
0); // expected-error{{'__builtin_amdgcn_smfmac_f32_16x16x128_fp8_fp8' needs 
target feature gfx950-insts}}
   *out13 = __builtin_amdgcn_smfmac_f32_32x32x64_bf8_bf8(a13, b13, c13, 0, 0, 
0); // expected-error{{'__builtin_amdgcn_smfmac_f32_32x32x64_bf8_bf8' needs 
target feature gfx950-insts}}
+  *out13 = __builtin_amdgcn_smfmac_f32_

[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_smfmac_f32_32x32x64_fp8_fp8 for gfx950 (PR #117259)

2024-11-21 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/117259

>From d5b3bb6210d19c81a935790c5267c3d97125a00d Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Sat, 3 Feb 2024 21:43:00 +0530
Subject: [PATCH] AMDGPU: Add v_smfmac_f32_32x32x64_fp8_fp8 for gfx950

---
 clang/include/clang/Basic/BuiltinsAMDGPU.def  |   1 +
 .../CodeGenOpenCL/builtins-amdgcn-mfma.cl |   7 +
 .../builtins-amdgcn-error-gfx950-param.cl |   6 +
 .../builtins-amdgcn-error-gfx950.cl   |   1 +
 llvm/include/llvm/IR/IntrinsicsAMDGPU.td  |   1 +
 .../AMDGPU/AMDGPUInstructionSelector.cpp  |   4 +
 .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp  |   3 +-
 llvm/lib/Target/AMDGPU/VOP3PInstructions.td   |   2 +
 .../UniformityAnalysis/AMDGPU/intrinsics.ll   |   9 +
 .../AMDGPU/llvm.amdgcn.smfmac.gfx950.ll   | 414 ++
 llvm/test/MC/AMDGPU/mai-gfx950.s  |  36 ++
 .../MC/Disassembler/AMDGPU/gfx950_mai.txt |  22 +
 12 files changed, 505 insertions(+), 1 deletion(-)

diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def 
b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index f90af7000e3196..51a5b1dbad495c 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -457,6 +457,7 @@ 
TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_16x16x128_fp8_fp8, "V4fV4iV8iV4fiIiIi
 TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_32x32x64_bf8_bf8, 
"V16fV4iV8iV16fiIiIi", "nc", "gfx950-insts")
 TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8, 
"V16fV4iV8iV16fiIiIi", "nc", "gfx950-insts")
 TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_32x32x64_fp8_bf8, 
"V16fV4iV8iV16fiIiIi", "nc", "gfx950-insts")
+TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_32x32x64_fp8_fp8, 
"V16fV4iV8iV16fiIiIi", "nc", "gfx950-insts")
 
 
//===--===//
 // GFX12+ only builtins.
diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl 
b/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl
index 33b60d53f11cc8..00346baa6ff84d 100644
--- a/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl
+++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl
@@ -559,4 +559,11 @@ void test_smfmac_f32_32x32x64_fp8_bf8(global v16f* out, 
v4i a, v8i b, v16f c, in
   *out = __builtin_amdgcn_smfmac_f32_32x32x64_fp8_bf8(a, b, c, idx, 0, 0);
 }
 
+// CHECK-GFX950-LABEL: @test_smfmac_f32_32x32x64_fp8_fp8
+// CHECK-GFX950: call <16 x float> @llvm.amdgcn.smfmac.f32.32x32x64.fp8.fp8(<4 
x i32> %a, <8 x i32> %b, <16 x float> %c, i32 %idx, i32 0, i32 0)
+void test_smfmac_f32_32x32x64_fp8_fp8(global v16f* out, v4i a, v8i b, v16f c, 
int idx)
+{
+  *out = __builtin_amdgcn_smfmac_f32_32x32x64_fp8_fp8(a, b, c, idx, 0, 0);
+}
+
 #endif
diff --git a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl 
b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl
index c53ca8a7c3513f..b3b359a1e0c65b 100644
--- a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl
+++ b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl
@@ -142,3 +142,9 @@ void test_smfmac_f32_32x32x64_fp8_bf8(global float16* out, 
int4 a, int8 b, float
   *out = __builtin_amdgcn_smfmac_f32_32x32x64_fp8_bf8(a, b, c, idx, d, 0); // 
expected-error{{argument to '__builtin_amdgcn_smfmac_f32_32x32x64_fp8_bf8' must 
be a constant integer}}
   *out = __builtin_amdgcn_smfmac_f32_32x32x64_fp8_bf8(a, b, c, idx, 0, d); // 
expected-error{{argument to '__builtin_amdgcn_smfmac_f32_32x32x64_fp8_bf8' must 
be a constant integer}}
 }
+
+void test_smfmac_f32_32x32x64_fp8_fp8(global float16* out, int4 a, int8 b, 
float16 c, int idx, int d)
+{
+  *out = __builtin_amdgcn_smfmac_f32_32x32x64_fp8_fp8(a, b, c, idx, d, 0); // 
expected-error{{argument to '__builtin_amdgcn_smfmac_f32_32x32x64_fp8_fp8' must 
be a constant integer}}
+  *out = __builtin_amdgcn_smfmac_f32_32x32x64_fp8_fp8(a, b, c, idx, 0, d); // 
expected-error{{argument to '__builtin_amdgcn_smfmac_f32_32x32x64_fp8_fp8' must 
be a constant integer}}
+}
diff --git a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl 
b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl
index 9e563a7b0bd64c..57523cf0af1b18 100644
--- a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl
+++ b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl
@@ -47,6 +47,7 @@ void test(__global float4* out0, half8 a0, half8 b0, float4 
c0,
   *out13 = __builtin_amdgcn_smfmac_f32_32x32x64_bf8_bf8(a13, b13, c13, 0, 0, 
0); // expected-error{{'__builtin_amdgcn_smfmac_f32_32x32x64_bf8_bf8' needs 
target feature gfx950-insts}}
   *out13 = __builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8(a13, b13, c13, 0, 0, 
0); // expected-error{{'__builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8' needs 
target feature gfx950-insts}}
   *out13 = __builtin_amdgcn_smfmac_f32_32x32x64_fp8_bf8(a13, b13, c13, 0, 0, 
0); // expected-error{{'__builtin_amdgcn_smfmac_f32_32x32x64_fp8_bf8' needs 
target feature gfx950-insts}}
+  *out13 = __builtin_amdgcn_smfmac_f32_32

[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_smfmac_f32_32x32x32x64_fp8_bf8 for gfx950 (PR #117258)

2024-11-21 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/117258

>From 32ccf3950258693e8ca7be1c7ecc6670debc2bf7 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Sat, 3 Feb 2024 21:25:33 +0530
Subject: [PATCH] AMDGPU: Add v_smfmac_f32_32x32x32x64_fp8_bf8 for gfx950

---
 clang/include/clang/Basic/BuiltinsAMDGPU.def  |   1 +
 .../CodeGenOpenCL/builtins-amdgcn-mfma.cl |   7 +
 .../builtins-amdgcn-error-gfx950-param.cl |   6 +
 .../builtins-amdgcn-error-gfx950.cl   |   1 +
 llvm/include/llvm/IR/IntrinsicsAMDGPU.td  |   1 +
 .../AMDGPU/AMDGPUInstructionSelector.cpp  |   4 +
 .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp  |   3 +-
 llvm/lib/Target/AMDGPU/VOP3PInstructions.td   |   2 +
 .../UniformityAnalysis/AMDGPU/intrinsics.ll   |   9 +
 .../AMDGPU/llvm.amdgcn.smfmac.gfx950.ll   | 414 ++
 llvm/test/MC/AMDGPU/mai-gfx950.s  |  36 ++
 .../MC/Disassembler/AMDGPU/gfx950_mai.txt |  22 +
 12 files changed, 505 insertions(+), 1 deletion(-)

diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def 
b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index d6123fa41ca8b8..f90af7000e3196 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -456,6 +456,7 @@ 
TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_16x16x128_fp8_bf8, "V4fV4iV8iV4fiIiIi
 TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_16x16x128_fp8_fp8, 
"V4fV4iV8iV4fiIiIi", "nc", "gfx950-insts")
 TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_32x32x64_bf8_bf8, 
"V16fV4iV8iV16fiIiIi", "nc", "gfx950-insts")
 TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8, 
"V16fV4iV8iV16fiIiIi", "nc", "gfx950-insts")
+TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_32x32x64_fp8_bf8, 
"V16fV4iV8iV16fiIiIi", "nc", "gfx950-insts")
 
 
//===--===//
 // GFX12+ only builtins.
diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl 
b/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl
index d79ca36f003c5e..33b60d53f11cc8 100644
--- a/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl
+++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl
@@ -552,4 +552,11 @@ void test_smfmac_f32_32x32x64_bf8_fp8(global v16f* out, 
v4i a, v8i b, v16f c, in
   *out = __builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8(a, b, c, idx, 0, 0);
 }
 
+// CHECK-GFX950-LABEL: @test_smfmac_f32_32x32x64_fp8_bf8
+// CHECK-GFX950: call <16 x float> @llvm.amdgcn.smfmac.f32.32x32x64.fp8.bf8(<4 
x i32> %a, <8 x i32> %b, <16 x float> %c, i32 %idx, i32 0, i32 0)
+void test_smfmac_f32_32x32x64_fp8_bf8(global v16f* out, v4i a, v8i b, v16f c, 
int idx)
+{
+  *out = __builtin_amdgcn_smfmac_f32_32x32x64_fp8_bf8(a, b, c, idx, 0, 0);
+}
+
 #endif
diff --git a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl 
b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl
index d1751a6af15463..c53ca8a7c3513f 100644
--- a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl
+++ b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl
@@ -136,3 +136,9 @@ void test_smfmac_f32_32x32x64_bf8_fp8(global float16* out, 
int4 a, int8 b, float
   *out = __builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8(a, b, c, idx, d, 0); // 
expected-error{{argument to '__builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8' must 
be a constant integer}}
   *out = __builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8(a, b, c, idx, 0, d); // 
expected-error{{argument to '__builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8' must 
be a constant integer}}
 }
+
+void test_smfmac_f32_32x32x64_fp8_bf8(global float16* out, int4 a, int8 b, 
float16 c, int idx, int d)
+{
+  *out = __builtin_amdgcn_smfmac_f32_32x32x64_fp8_bf8(a, b, c, idx, d, 0); // 
expected-error{{argument to '__builtin_amdgcn_smfmac_f32_32x32x64_fp8_bf8' must 
be a constant integer}}
+  *out = __builtin_amdgcn_smfmac_f32_32x32x64_fp8_bf8(a, b, c, idx, 0, d); // 
expected-error{{argument to '__builtin_amdgcn_smfmac_f32_32x32x64_fp8_bf8' must 
be a constant integer}}
+}
diff --git a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl 
b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl
index f8ac3399d2b64b..9e563a7b0bd64c 100644
--- a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl
+++ b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl
@@ -46,6 +46,7 @@ void test(__global float4* out0, half8 a0, half8 b0, float4 
c0,
   *out12 = __builtin_amdgcn_smfmac_f32_16x16x128_fp8_fp8(a12, b12, c12, 0, 0, 
0); // expected-error{{'__builtin_amdgcn_smfmac_f32_16x16x128_fp8_fp8' needs 
target feature gfx950-insts}}
   *out13 = __builtin_amdgcn_smfmac_f32_32x32x64_bf8_bf8(a13, b13, c13, 0, 0, 
0); // expected-error{{'__builtin_amdgcn_smfmac_f32_32x32x64_bf8_bf8' needs 
target feature gfx950-insts}}
   *out13 = __builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8(a13, b13, c13, 0, 0, 
0); // expected-error{{'__builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8' needs 
target feature gfx950-insts}}
+  *out13 = __builtin_amdgcn_smfmac_f3

[llvm-branch-commits] [llvm] AMDGPU: Handle gfx950 XDL-write-overlapped-smfma-src-c wait state change (PR #117263)

2024-11-21 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/117263

>From 087117bc3dc327237d52746813e932d4c8f0b8bc Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Wed, 6 Mar 2024 19:51:00 +0530
Subject: [PATCH] AMDGPU: Handle gfx950 XDL-write-overlapped-smfma-src-c wait
 state change

These have an additional wait state compared to gfx940.
---
 .../lib/Target/AMDGPU/GCNHazardRecognizer.cpp |  16 ++-
 .../CodeGen/AMDGPU/mai-hazards-gfx940.mir | 129 --
 .../AMDGPU/mai-hazards-mfma-scale.gfx950.mir  |  22 +--
 3 files changed, 107 insertions(+), 60 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp 
b/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
index 99a176731599cc..be0936ce74835f 100644
--- a/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
@@ -2232,12 +2232,14 @@ int 
GCNHazardRecognizer::checkMAIHazards908(MachineInstr *MI) {
 }
 
 static int
-GFX940_XDL_N_PassWritesVGPROverlappedSMFMASrcCWaitStates(int NumPasses) {
-  // 2 pass -> 3
-  // 4 pass -> 5
-  // 8 pass -> 9
-  // 16 pass -> 17
-  return NumPasses + 1;
+GFX940_XDL_N_PassWritesVGPROverlappedSMFMASrcCWaitStates(int NumPasses,
+ bool IsGFX950) {
+  // xdl def cycles | gfx940 | gfx950
+  // 2 pass |  34
+  // 4 pass |  56
+  // 8 pass |  910
+  // 16 pass|  17   18
+  return NumPasses + 1 + IsGFX950;
 }
 
 static int
@@ -2373,7 +2375,7 @@ int GCNHazardRecognizer::checkMAIHazards90A(MachineInstr 
*MI) {
 NeedWaitStates =
 isXDL(ST, *MI1)
 ? GFX940_XDL_N_PassWritesVGPROverlappedSMFMASrcCWaitStates(
-  NumPasses)
+  NumPasses, ST.hasGFX950Insts())
 : 
GFX940_SMFMA_N_PassWritesVGPROverlappedSMFMASrcCWaitStates(
   NumPasses);
 break;
diff --git a/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir 
b/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir
index d2b2f226404da8..b9135dbd46fc1f 100644
--- a/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir
+++ b/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir
@@ -145,7 +145,8 @@ body: |
 ...
 # GCN-LABEL: name: sgemm4x4_mfma_write_agpr_mfma_read_overlap
 # GCN:  V_MFMA
-# GCN-NEXT: S_NOP 2
+# GFX940-NEXT: S_NOP 2
+# GFX950-NEXT: S_NOP 3
 # GCN-NEXT: V_MFMA
 name:sgemm4x4_mfma_write_agpr_mfma_read_overlap
 body: |
@@ -155,7 +156,8 @@ body: |
 ...
 # GCN-LABEL: name: sgemm4x4_mfma_write_vgpr_mfma_read_overlap
 # GCN:  V_MFMA
-# GCN-NEXT: S_NOP 2
+# GFX940-NEXT: S_NOP 2
+# GFX950-NEXT: S_NOP 3
 # GCN-NEXT: V_MFMA
 name:sgemm4x4_mfma_write_vgpr_mfma_read_overlap
 body: |
@@ -165,7 +167,8 @@ body: |
 ...
 # GCN-LABEL: name: sgemm4x4_mfma_write_agpr_smfmac_read_overlap
 # GCN:  V_MFMA
-# GCN-NEXT: S_NOP 2
+# GFX940-NEXT: S_NOP 2
+# GFX950-NEXT: S_NOP 3
 # GCN-NEXT: V_SMFMAC
 name:sgemm4x4_mfma_write_agpr_smfmac_read_overlap
 body: |
@@ -175,8 +178,11 @@ body: |
 ...
 # GCN-LABEL: name: xdl_sgemm16x16_mfma_write_agpr_mfma_read_overlap
 # GCN:  V_MFMA
-# GCN-NEXT: S_NOP 7
-# GCN-NEXT: S_NOP 0
+# GFX940-NEXT: S_NOP 7
+# GFX940-NEXT: S_NOP 0
+
+# GFX950-NEXT: S_NOP 7
+# GFX950-NEXT: S_NOP 1
 # GCN-NEXT: V_MFMA
 name:xdl_sgemm16x16_mfma_write_agpr_mfma_read_overlap
 body: |
@@ -186,8 +192,11 @@ body: |
 ...
 # GCN-LABEL: name: xdl_sgemm16x16_mfma_write_vgpr_mfma_read_overlap
 # GCN:  V_MFMA
-# GCN-NEXT: S_NOP 7
-# GCN-NEXT: S_NOP 0
+# GFX940-NEXT: S_NOP 7
+# GFX940-NEXT: S_NOP 0
+
+# GFX950-NEXT: S_NOP 7
+# GFX950-NEXT: S_NOP 1
 # GCN-NEXT: V_MFMA
 name:xdl_sgemm16x16_mfma_write_vgpr_mfma_read_overlap
 body: |
@@ -216,8 +225,11 @@ body: |
 ...
 # GCN-LABEL: name: xdl_sgemm16x16_mfma_write_agpr_smfmac_read_overlap
 # GCN:  V_MFMA
-# GCN-NEXT: S_NOP 7
-# GCN-NEXT: S_NOP 0
+# GFX940-NEXT: S_NOP 7
+# GFX940-NEXT: S_NOP 0
+
+# GFX950-NEXT: S_NOP 7
+# GFX950-NEXT: S_NOP 1
 # GCN-NEXT: V_SMFMAC
 name:xdl_sgemm16x16_mfma_write_agpr_smfmac_read_overlap
 body: |
@@ -229,7 +241,8 @@ body: |
 # GCN:  V_MFMA
 # GCN-NEXT: S_NOP 7
 # GCN-NEXT: S_NOP 7
-# GCN-NEXT: S_NOP 0
+# GFX940-NEXT: S_NOP 0
+# GFX950-NEXT: S_NOP 1
 # GCN-NEXT: V_MFMA
 name:xdl_sgemm32x32_mfma_write_agpr_mfma_read_overlap
 body: |
@@ -241,7 +254,8 @@ body: |
 # GCN:  V_MFMA
 # GCN-NEXT: S_NOP 7
 # GCN-NEXT: S_NOP 7
-# GCN-NEXT: S_NOP 0
+# GFX940-NEXT: S_NOP 0
+# GFX950-NEXT: S_NOP 1
 # GCN-NEXT: V_MFMA
 name:xdl_sgemm32x32_mfma_write_vgpr_mfma_read_overlap
 body: |
@@ -273,7 +287,8 @@ body: |
 # GCN:  V_MFMA
 # GCN-NEXT: S_NOP 7
 # GCN-NEXT: S_NOP 7
-# GCN-NEXT: S_

[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_smfmac_f32_16x16x128_bf8_fp8 for gfx950 (PR #117233)

2024-11-21 Thread Matt Arsenault via llvm-branch-commits

arsenm wrote:

### Merge activity

* **Nov 21, 7:53 PM EST**: A user started a stack merge that includes this pull 
request via 
[Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/117233).


https://github.com/llvm/llvm-project/pull/117233
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_smfmac_f32_16x16x128_fp8_bf8 for gfx950 (PR #117234)

2024-11-21 Thread Matt Arsenault via llvm-branch-commits

arsenm wrote:

### Merge activity

* **Nov 21, 7:53 PM EST**: A user started a stack merge that includes this pull 
request via 
[Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/117234).


https://github.com/llvm/llvm-project/pull/117234
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Handle vcmpx+permalane gfx950 hazard (PR #117286)

2024-11-21 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)


Changes

Confusingly, this is a different hazard to the one on gfx10
with a subtarget feature.

---
Full diff: https://github.com/llvm/llvm-project/pull/117286.diff


3 Files Affected:

- (modified) llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp (+31-4) 
- (modified) llvm/lib/Target/AMDGPU/GCNHazardRecognizer.h (+1) 
- (added) llvm/test/CodeGen/AMDGPU/hazards-gfx950.mir (+144) 


``diff
diff --git a/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp 
b/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
index 8008b5f7bcc991..45ff1f4a63cf03 100644
--- a/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
@@ -168,7 +168,11 @@ static bool isPermlane(const MachineInstr &MI) {
  Opcode == AMDGPU::V_PERMLANE64_B32 ||
  Opcode == AMDGPU::V_PERMLANEX16_B32_e64 ||
  Opcode == AMDGPU::V_PERMLANE16_VAR_B32_e64 ||
- Opcode == AMDGPU::V_PERMLANEX16_VAR_B32_e64;
+ Opcode == AMDGPU::V_PERMLANEX16_VAR_B32_e64 ||
+ Opcode == AMDGPU::V_PERMLANE16_SWAP_B32_e32 ||
+ Opcode == AMDGPU::V_PERMLANE16_SWAP_B32_e64 ||
+ Opcode == AMDGPU::V_PERMLANE32_SWAP_B32_e32 ||
+ Opcode == AMDGPU::V_PERMLANE32_SWAP_B32_e64;
 }
 
 static bool isLdsDma(const MachineInstr &MI) {
@@ -395,6 +399,9 @@ unsigned 
GCNHazardRecognizer::PreEmitNoopsCommon(MachineInstr *MI) {
   SIInstrInfo::isDS(*MI))
 return std::max(WaitStates, checkMAILdStHazards(MI));
 
+  if (ST.hasGFX950Insts() && isPermlane(*MI))
+return std::max(WaitStates, checkPermlaneHazards(MI));
+
   return WaitStates;
 }
 
@@ -1200,6 +1207,14 @@ void GCNHazardRecognizer::fixHazards(MachineInstr *MI) {
   fixRequiredExportPriority(MI);
 }
 
+static bool isVCmpXWritesExec(const SIInstrInfo &TII,
+  const SIRegisterInfo &TRI,
+  const MachineInstr &MI) {
+  return (TII.isVOPC(MI) ||
+  (MI.isCompare() && (TII.isVOP3(MI) || TII.isSDWA(MI &&
+MI.modifiesRegister(AMDGPU::EXEC, &TRI);
+}
+
 bool GCNHazardRecognizer::fixVcmpxPermlaneHazards(MachineInstr *MI) {
   if (!ST.hasVcmpxPermlaneHazard() || !isPermlane(*MI))
 return false;
@@ -1207,9 +1222,7 @@ bool 
GCNHazardRecognizer::fixVcmpxPermlaneHazards(MachineInstr *MI) {
   const SIInstrInfo *TII = ST.getInstrInfo();
   const SIRegisterInfo *TRI = ST.getRegisterInfo();
   auto IsHazardFn = [TII, TRI](const MachineInstr &MI) {
-return (TII->isVOPC(MI) ||
-((TII->isVOP3(MI) || TII->isSDWA(MI)) && MI.isCompare())) &&
-   MI.modifiesRegister(AMDGPU::EXEC, TRI);
+return isVCmpXWritesExec(*TII, *TRI, MI);
   };
 
   auto IsExpiredFn = [](const MachineInstr &MI, int) {
@@ -2529,6 +2542,20 @@ int 
GCNHazardRecognizer::checkMAILdStHazards(MachineInstr *MI) {
   return WaitStatesNeeded;
 }
 
+int GCNHazardRecognizer::checkPermlaneHazards(MachineInstr *MI) {
+  assert(!ST.hasVcmpxPermlaneHazard() &&
+ "this is a different vcmpx+permlane hazard");
+  const SIRegisterInfo *TRI = ST.getRegisterInfo();
+  const SIInstrInfo *TII = ST.getInstrInfo();
+
+  auto IsVCmpXWritesExecFn = [TII, TRI](const MachineInstr &MI) {
+return isVCmpXWritesExec(*TII, *TRI, MI);
+  };
+
+  const int NumWaitStates = 4;
+  return NumWaitStates - getWaitStatesSince(IsVCmpXWritesExecFn, 
NumWaitStates);
+}
+
 static int GFX940_SMFMA_N_PassWriteVgprVALUWawWaitStates(int NumPasses) {
   // 2 pass -> 4
   // 4 pass -> 6
diff --git a/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.h 
b/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.h
index adb2278c48eebe..83ce100c58f0a6 100644
--- a/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.h
+++ b/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.h
@@ -134,6 +134,7 @@ class GCNHazardRecognizer final : public 
ScheduleHazardRecognizer {
   int checkMFMAPadding(MachineInstr *MI);
   int checkMAIVALUHazards(MachineInstr *MI);
   int checkMAILdStHazards(MachineInstr *MI);
+  int checkPermlaneHazards(MachineInstr *MI);
 
 public:
   GCNHazardRecognizer(const MachineFunction &MF);
diff --git a/llvm/test/CodeGen/AMDGPU/hazards-gfx950.mir 
b/llvm/test/CodeGen/AMDGPU/hazards-gfx950.mir
new file mode 100644
index 00..97bef7be711ff2
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/hazards-gfx950.mir
@@ -0,0 +1,144 @@
+# RUN: llc -mtriple=amdgcn -mcpu=gfx950 -verify-machineinstrs 
-run-pass=post-RA-hazard-rec %s -o - | FileCheck -check-prefix=GCN %s
+
+---
+# GCN-LABEL: name: vcmpx_vopc_write_exec_permlane16_swap_vop1
+# GCN:  V_CMPX_EQ_I32_e32
+# GCN-NEXT: S_NOP 3
+# GCN-NEXT: V_PERMLANE
+name:vcmpx_vopc_write_exec_permlane16_swap_vop1
+body: |
+  bb.0:
+liveins: $vgpr0, $vgpr1
+V_CMPX_EQ_I32_e32 $vgpr0, $vgpr1, implicit-def $exec, implicit-def $vcc, 
implicit $exec
+renamable $vgpr0, renamable $vgpr1 = V_PERMLANE16_SWAP_B32_e32 killed 
$vgpr0, killed $vgpr1, implicit $exec
+...
+
+---
+# GCN-LABEL: nam

[llvm-branch-commits] [llvm] AMDGPU: Handle gfx950 XDL-write-overlapped-smfma-src-c wait state change (PR #117263)

2024-11-21 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/117263

>From 736d914241979efb46b506fb45cee79e73bbd20e Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Wed, 6 Mar 2024 19:51:00 +0530
Subject: [PATCH] AMDGPU: Handle gfx950 XDL-write-overlapped-smfma-src-c wait
 state change

These have an additional wait state compared to gfx940.
---
 .../lib/Target/AMDGPU/GCNHazardRecognizer.cpp |  16 ++-
 .../CodeGen/AMDGPU/mai-hazards-gfx940.mir | 129 --
 .../AMDGPU/mai-hazards-mfma-scale.gfx950.mir  |  22 +--
 3 files changed, 107 insertions(+), 60 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp 
b/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
index 99a176731599cc..be0936ce74835f 100644
--- a/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
@@ -2232,12 +2232,14 @@ int 
GCNHazardRecognizer::checkMAIHazards908(MachineInstr *MI) {
 }
 
 static int
-GFX940_XDL_N_PassWritesVGPROverlappedSMFMASrcCWaitStates(int NumPasses) {
-  // 2 pass -> 3
-  // 4 pass -> 5
-  // 8 pass -> 9
-  // 16 pass -> 17
-  return NumPasses + 1;
+GFX940_XDL_N_PassWritesVGPROverlappedSMFMASrcCWaitStates(int NumPasses,
+ bool IsGFX950) {
+  // xdl def cycles | gfx940 | gfx950
+  // 2 pass |  34
+  // 4 pass |  56
+  // 8 pass |  910
+  // 16 pass|  17   18
+  return NumPasses + 1 + IsGFX950;
 }
 
 static int
@@ -2373,7 +2375,7 @@ int GCNHazardRecognizer::checkMAIHazards90A(MachineInstr 
*MI) {
 NeedWaitStates =
 isXDL(ST, *MI1)
 ? GFX940_XDL_N_PassWritesVGPROverlappedSMFMASrcCWaitStates(
-  NumPasses)
+  NumPasses, ST.hasGFX950Insts())
 : 
GFX940_SMFMA_N_PassWritesVGPROverlappedSMFMASrcCWaitStates(
   NumPasses);
 break;
diff --git a/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir 
b/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir
index d2b2f226404da8..b9135dbd46fc1f 100644
--- a/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir
+++ b/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir
@@ -145,7 +145,8 @@ body: |
 ...
 # GCN-LABEL: name: sgemm4x4_mfma_write_agpr_mfma_read_overlap
 # GCN:  V_MFMA
-# GCN-NEXT: S_NOP 2
+# GFX940-NEXT: S_NOP 2
+# GFX950-NEXT: S_NOP 3
 # GCN-NEXT: V_MFMA
 name:sgemm4x4_mfma_write_agpr_mfma_read_overlap
 body: |
@@ -155,7 +156,8 @@ body: |
 ...
 # GCN-LABEL: name: sgemm4x4_mfma_write_vgpr_mfma_read_overlap
 # GCN:  V_MFMA
-# GCN-NEXT: S_NOP 2
+# GFX940-NEXT: S_NOP 2
+# GFX950-NEXT: S_NOP 3
 # GCN-NEXT: V_MFMA
 name:sgemm4x4_mfma_write_vgpr_mfma_read_overlap
 body: |
@@ -165,7 +167,8 @@ body: |
 ...
 # GCN-LABEL: name: sgemm4x4_mfma_write_agpr_smfmac_read_overlap
 # GCN:  V_MFMA
-# GCN-NEXT: S_NOP 2
+# GFX940-NEXT: S_NOP 2
+# GFX950-NEXT: S_NOP 3
 # GCN-NEXT: V_SMFMAC
 name:sgemm4x4_mfma_write_agpr_smfmac_read_overlap
 body: |
@@ -175,8 +178,11 @@ body: |
 ...
 # GCN-LABEL: name: xdl_sgemm16x16_mfma_write_agpr_mfma_read_overlap
 # GCN:  V_MFMA
-# GCN-NEXT: S_NOP 7
-# GCN-NEXT: S_NOP 0
+# GFX940-NEXT: S_NOP 7
+# GFX940-NEXT: S_NOP 0
+
+# GFX950-NEXT: S_NOP 7
+# GFX950-NEXT: S_NOP 1
 # GCN-NEXT: V_MFMA
 name:xdl_sgemm16x16_mfma_write_agpr_mfma_read_overlap
 body: |
@@ -186,8 +192,11 @@ body: |
 ...
 # GCN-LABEL: name: xdl_sgemm16x16_mfma_write_vgpr_mfma_read_overlap
 # GCN:  V_MFMA
-# GCN-NEXT: S_NOP 7
-# GCN-NEXT: S_NOP 0
+# GFX940-NEXT: S_NOP 7
+# GFX940-NEXT: S_NOP 0
+
+# GFX950-NEXT: S_NOP 7
+# GFX950-NEXT: S_NOP 1
 # GCN-NEXT: V_MFMA
 name:xdl_sgemm16x16_mfma_write_vgpr_mfma_read_overlap
 body: |
@@ -216,8 +225,11 @@ body: |
 ...
 # GCN-LABEL: name: xdl_sgemm16x16_mfma_write_agpr_smfmac_read_overlap
 # GCN:  V_MFMA
-# GCN-NEXT: S_NOP 7
-# GCN-NEXT: S_NOP 0
+# GFX940-NEXT: S_NOP 7
+# GFX940-NEXT: S_NOP 0
+
+# GFX950-NEXT: S_NOP 7
+# GFX950-NEXT: S_NOP 1
 # GCN-NEXT: V_SMFMAC
 name:xdl_sgemm16x16_mfma_write_agpr_smfmac_read_overlap
 body: |
@@ -229,7 +241,8 @@ body: |
 # GCN:  V_MFMA
 # GCN-NEXT: S_NOP 7
 # GCN-NEXT: S_NOP 7
-# GCN-NEXT: S_NOP 0
+# GFX940-NEXT: S_NOP 0
+# GFX950-NEXT: S_NOP 1
 # GCN-NEXT: V_MFMA
 name:xdl_sgemm32x32_mfma_write_agpr_mfma_read_overlap
 body: |
@@ -241,7 +254,8 @@ body: |
 # GCN:  V_MFMA
 # GCN-NEXT: S_NOP 7
 # GCN-NEXT: S_NOP 7
-# GCN-NEXT: S_NOP 0
+# GFX940-NEXT: S_NOP 0
+# GFX950-NEXT: S_NOP 1
 # GCN-NEXT: V_MFMA
 name:xdl_sgemm32x32_mfma_write_vgpr_mfma_read_overlap
 body: |
@@ -273,7 +287,8 @@ body: |
 # GCN:  V_MFMA
 # GCN-NEXT: S_NOP 7
 # GCN-NEXT: S_NOP 7
-# GCN-NEXT: S_

[llvm-branch-commits] [llvm] AMDGPU: Handle gfx950 change in mfma_f64_16x16x4 + valu hazard (PR #117262)

2024-11-21 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/117262

>From fc9424bd9d0d54a931f4059ff9a6f657f1c5a2dd Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Thu, 7 Mar 2024 15:01:08 +0530
Subject: [PATCH] AMDGPU: Handle gfx950 change in mfma_f64_16x16x4 + valu
 hazard

Increase from 11 wait states to 19
---
 .../lib/Target/AMDGPU/GCNHazardRecognizer.cpp | 10 +--
 .../CodeGen/AMDGPU/mai-hazards-gfx940.mir | 28 ++-
 2 files changed, 28 insertions(+), 10 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp 
b/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
index 44afccb0690d0d..99a176731599cc 100644
--- a/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
@@ -2603,6 +2603,7 @@ int GCNHazardRecognizer::checkMAIVALUHazards(MachineInstr 
*MI) {
 const int DMFMA16x16WriteVgprMemExpReadWaitStates = 18;
 const int DMFMA4x4WriteVgprVALUReadWaitStates = 6;
 const int DMFMA16x16WriteVgprVALUReadWaitStates = 11;
+const int GFX950_DMFMA16x16WriteVgprVALUReadWaitStates = 19;
 const int DotWriteSameDotReadSrcAB = 3;
 const int DotWriteDifferentVALURead = 3;
 const int DMFMABetweenVALUWriteVMEMRead = 2;
@@ -2663,9 +2664,12 @@ int 
GCNHazardRecognizer::checkMAIVALUHazards(MachineInstr *MI) {
   break;
 case 8:
 case 16:
-  NeedWaitStates = IsMemOrExport
-   ? DMFMA16x16WriteVgprMemExpReadWaitStates
-   : DMFMA16x16WriteVgprVALUReadWaitStates;
+  NeedWaitStates =
+  IsMemOrExport
+  ? DMFMA16x16WriteVgprMemExpReadWaitStates
+  : (ST.hasGFX950Insts()
+ ? GFX950_DMFMA16x16WriteVgprVALUReadWaitStates
+ : DMFMA16x16WriteVgprVALUReadWaitStates);
   break;
 default:
   llvm_unreachable("unexpected dgemm");
diff --git a/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir 
b/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir
index 9681b01f334f9a..d2b2f226404da8 100644
--- a/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir
+++ b/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir
@@ -1,4 +1,5 @@
-# RUN: llc -mtriple=amdgcn -mcpu=gfx940 -verify-machineinstrs -run-pass 
post-RA-hazard-rec %s -o - | FileCheck -check-prefix=GCN %s
+# RUN: llc -mtriple=amdgcn -mcpu=gfx940 -verify-machineinstrs -run-pass 
post-RA-hazard-rec %s -o - | FileCheck -check-prefixes=GCN,GFX940 %s
+# RUN: llc -mtriple=amdgcn -mcpu=gfx950 -verify-machineinstrs -run-pass 
post-RA-hazard-rec %s -o - | FileCheck -check-prefixes=GCN,GFX950 %s
 
 # GCN-LABEL: name: valu_write_vgpr_sgemm_mfma_read
 # GCN:  V_MOV_B32
@@ -803,8 +804,12 @@ body: |
 ...
 # GCN-LABEL: name: dmfma16x16_write_vgpr_valu_read
 # GCN:  V_MFMA
-# GCN-NEXT: S_NOP 7
-# GCN-NEXT: S_NOP 2
+# GFX940-NEXT: S_NOP 7
+# GFX940-NEXT: S_NOP 2
+
+# GFX950-NEXT: S_NOP 7
+# GFX950-NEXT: S_NOP 7
+# GFX950-NEXT: S_NOP 2
 # GCN-NEXT: V_MOV_B32
 name:dmfma16x16_write_vgpr_valu_read
 body: |
@@ -867,8 +872,13 @@ body: |
 ...
 # GCN-LABEL: name: dmfma16x16_write_vgpr_dot_read
 # GCN:  V_MFMA
-# GCN-NEXT: S_NOP 7
-# GCN-NEXT: S_NOP 2
+# GFX940-NEXT: S_NOP 7
+# GFX940-NEXT: S_NOP 2
+
+# GFX950-NEXT: S_NOP 7
+# GFX950-NEXT: S_NOP 7
+# GFX950-NEXT: S_NOP 2
+
 # GCN-NEXT: V_DOT
 name:dmfma16x16_write_vgpr_dot_read
 body: |
@@ -1505,8 +1515,12 @@ body: |
 ...
 # GCN-LABEL: name: dmfma16x16_write_agpr_valu_read
 # GCN:  V_MFMA
-# GCN-NEXT: S_NOP 7
-# GCN-NEXT: S_NOP 2
+# GFX940-NEXT: S_NOP 7
+# GFX940-NEXT: S_NOP 2
+
+# GFX950-NEXT: S_NOP 7
+# GFX950-NEXT: S_NOP 7
+# GFX950-NEXT: S_NOP 2
 # GCN-NEXT: V_ACCVGPR_READ_B32_e64
 name:dmfma16x16_write_agpr_valu_read
 body: |

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_smfmac_f32_32x32x32x64_fp8_bf8 for gfx950 (PR #117258)

2024-11-21 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/117258

>From 24576df683abfa29c9d7f4406a318b6b67701732 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Sat, 3 Feb 2024 21:25:33 +0530
Subject: [PATCH] AMDGPU: Add v_smfmac_f32_32x32x32x64_fp8_bf8 for gfx950

---
 clang/include/clang/Basic/BuiltinsAMDGPU.def  |   1 +
 .../CodeGenOpenCL/builtins-amdgcn-mfma.cl |   7 +
 .../builtins-amdgcn-error-gfx950-param.cl |   6 +
 .../builtins-amdgcn-error-gfx950.cl   |   1 +
 llvm/include/llvm/IR/IntrinsicsAMDGPU.td  |   1 +
 .../AMDGPU/AMDGPUInstructionSelector.cpp  |   4 +
 .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp  |   3 +-
 llvm/lib/Target/AMDGPU/VOP3PInstructions.td   |   2 +
 .../UniformityAnalysis/AMDGPU/intrinsics.ll   |   9 +
 .../AMDGPU/llvm.amdgcn.smfmac.gfx950.ll   | 414 ++
 llvm/test/MC/AMDGPU/mai-gfx950.s  |  36 ++
 .../MC/Disassembler/AMDGPU/gfx950_mai.txt |  22 +
 12 files changed, 505 insertions(+), 1 deletion(-)

diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def 
b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index d6123fa41ca8b8..f90af7000e3196 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -456,6 +456,7 @@ 
TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_16x16x128_fp8_bf8, "V4fV4iV8iV4fiIiIi
 TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_16x16x128_fp8_fp8, 
"V4fV4iV8iV4fiIiIi", "nc", "gfx950-insts")
 TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_32x32x64_bf8_bf8, 
"V16fV4iV8iV16fiIiIi", "nc", "gfx950-insts")
 TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8, 
"V16fV4iV8iV16fiIiIi", "nc", "gfx950-insts")
+TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_32x32x64_fp8_bf8, 
"V16fV4iV8iV16fiIiIi", "nc", "gfx950-insts")
 
 
//===--===//
 // GFX12+ only builtins.
diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl 
b/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl
index d79ca36f003c5e..33b60d53f11cc8 100644
--- a/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl
+++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl
@@ -552,4 +552,11 @@ void test_smfmac_f32_32x32x64_bf8_fp8(global v16f* out, 
v4i a, v8i b, v16f c, in
   *out = __builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8(a, b, c, idx, 0, 0);
 }
 
+// CHECK-GFX950-LABEL: @test_smfmac_f32_32x32x64_fp8_bf8
+// CHECK-GFX950: call <16 x float> @llvm.amdgcn.smfmac.f32.32x32x64.fp8.bf8(<4 
x i32> %a, <8 x i32> %b, <16 x float> %c, i32 %idx, i32 0, i32 0)
+void test_smfmac_f32_32x32x64_fp8_bf8(global v16f* out, v4i a, v8i b, v16f c, 
int idx)
+{
+  *out = __builtin_amdgcn_smfmac_f32_32x32x64_fp8_bf8(a, b, c, idx, 0, 0);
+}
+
 #endif
diff --git a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl 
b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl
index d1751a6af15463..c53ca8a7c3513f 100644
--- a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl
+++ b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl
@@ -136,3 +136,9 @@ void test_smfmac_f32_32x32x64_bf8_fp8(global float16* out, 
int4 a, int8 b, float
   *out = __builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8(a, b, c, idx, d, 0); // 
expected-error{{argument to '__builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8' must 
be a constant integer}}
   *out = __builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8(a, b, c, idx, 0, d); // 
expected-error{{argument to '__builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8' must 
be a constant integer}}
 }
+
+void test_smfmac_f32_32x32x64_fp8_bf8(global float16* out, int4 a, int8 b, 
float16 c, int idx, int d)
+{
+  *out = __builtin_amdgcn_smfmac_f32_32x32x64_fp8_bf8(a, b, c, idx, d, 0); // 
expected-error{{argument to '__builtin_amdgcn_smfmac_f32_32x32x64_fp8_bf8' must 
be a constant integer}}
+  *out = __builtin_amdgcn_smfmac_f32_32x32x64_fp8_bf8(a, b, c, idx, 0, d); // 
expected-error{{argument to '__builtin_amdgcn_smfmac_f32_32x32x64_fp8_bf8' must 
be a constant integer}}
+}
diff --git a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl 
b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl
index f8ac3399d2b64b..9e563a7b0bd64c 100644
--- a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl
+++ b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl
@@ -46,6 +46,7 @@ void test(__global float4* out0, half8 a0, half8 b0, float4 
c0,
   *out12 = __builtin_amdgcn_smfmac_f32_16x16x128_fp8_fp8(a12, b12, c12, 0, 0, 
0); // expected-error{{'__builtin_amdgcn_smfmac_f32_16x16x128_fp8_fp8' needs 
target feature gfx950-insts}}
   *out13 = __builtin_amdgcn_smfmac_f32_32x32x64_bf8_bf8(a13, b13, c13, 0, 0, 
0); // expected-error{{'__builtin_amdgcn_smfmac_f32_32x32x64_bf8_bf8' needs 
target feature gfx950-insts}}
   *out13 = __builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8(a13, b13, c13, 0, 0, 
0); // expected-error{{'__builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8' needs 
target feature gfx950-insts}}
+  *out13 = __builtin_amdgcn_smfmac_f3

[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_permlane16_swap_b32 and v_permlane32_swap_b32 for gfx950 (PR #117260)

2024-11-21 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/117260

>From 549b571ea25a06301f719778786a288d85604464 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Mon, 22 Jan 2024 12:40:54 +0700
Subject: [PATCH] AMDGPU: Add v_permlane16_swap_b32 and v_permlane32_swap_b32
 for gfx950

This was a bit annoying because these introduce a new special case
encoding usage. op_sel is repurposed as a subset of dpp controls,
and is eligible for VOP3->VOP1 shrinking. For some reason fi also
uses an enum value, so we need to convert the raw boolean to 1 instead
of -1.

The 2 registers are swapped, so this has 2 defs. Ideally the builtin
would return a pair, but that's difficult so return a vector instead.
This would make a hypothetical builtin that supports v2f16 directly
uglier.
---
 clang/include/clang/Basic/BuiltinsAMDGPU.def  |   3 +
 clang/lib/CodeGen/CGBuiltin.cpp   |  26 
 clang/test/CodeGenOpenCL/amdgpu-features.cl   |   2 +-
 .../builtins-amdgcn-gfx950-err.cl |   6 +-
 .../CodeGenOpenCL/builtins-amdgcn-gfx950.cl   |  87 +
 .../builtins-amdgcn-error-gfx950-param.cl |  10 ++
 .../builtins-amdgcn-error-gfx950.cl   |   5 +-
 llvm/docs/AMDGPUUsage.rst |  13 ++
 llvm/include/llvm/IR/IntrinsicsAMDGPU.td  |  14 ++
 llvm/lib/Target/AMDGPU/AMDGPU.td  |  23 +++-
 llvm/lib/Target/AMDGPU/AMDGPUGISel.td |   3 +
 llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp |  25 
 .../AMDGPU/AMDGPUInstructionSelector.cpp  |  32 +
 .../Target/AMDGPU/AMDGPUInstructionSelector.h |   3 +
 .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp  |   9 ++
 .../Target/AMDGPU/AMDGPUSearchableTables.td   |   2 +
 llvm/lib/Target/AMDGPU/GCNSubtarget.h |   8 +-
 llvm/lib/Target/AMDGPU/SIInstrInfo.cpp|   6 +-
 llvm/lib/Target/AMDGPU/SIInstrInfo.td |   4 +
 llvm/lib/Target/AMDGPU/VOP1Instructions.td|  46 +++
 llvm/lib/Target/AMDGPU/VOPInstructions.td |  12 ++
 llvm/lib/TargetParser/TargetParser.cpp|   2 +
 .../UniformityAnalysis/AMDGPU/intrinsics.ll   |  16 +++
 .../AMDGPU/llvm.amdgcn.permlane16.swap.ll | 121 ++
 .../AMDGPU/llvm.amdgcn.permlane32.swap.ll | 121 ++
 llvm/test/MC/AMDGPU/gfx950_asm_features.s |  82 
 llvm/test/MC/AMDGPU/gfx950_err.s  |  31 +
 llvm/test/MC/Disassembler/AMDGPU/gfx950.txt   |  32 +
 28 files changed, 737 insertions(+), 7 deletions(-)
 create mode 100644 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.permlane16.swap.ll
 create mode 100644 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.permlane32.swap.ll
 create mode 100644 llvm/test/MC/AMDGPU/gfx950_err.s

diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def 
b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index 51a5b1dbad495c..548bcc8ad55f48 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -459,6 +459,9 @@ 
TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8, "V16fV4iV8iV16fiIiI
 TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_32x32x64_fp8_bf8, 
"V16fV4iV8iV16fiIiIi", "nc", "gfx950-insts")
 TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_32x32x64_fp8_fp8, 
"V16fV4iV8iV16fiIiIi", "nc", "gfx950-insts")
 
+TARGET_BUILTIN(__builtin_amdgcn_permlane16_swap, "V2UiUiUiIbIb", "nc", 
"permlane16-swap")
+TARGET_BUILTIN(__builtin_amdgcn_permlane32_swap, "V2UiUiUiIbIb", "nc", 
"permlane32-swap")
+
 
//===--===//
 // GFX12+ only builtins.
 
//===--===//
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index ff7132fd8bc1e7..3b3c46b56868cf 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -20162,6 +20162,32 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
 CGM.getIntrinsic(Intrinsic::amdgcn_s_sendmsg_rtn, {ResultType});
 return Builder.CreateCall(F, {Arg});
   }
+  case AMDGPU::BI__builtin_amdgcn_permlane16_swap:
+  case AMDGPU::BI__builtin_amdgcn_permlane32_swap: {
+// Because builtin types are limited, and the intrinsic uses a struct/pair
+// output, marshal the pair-of-i32 to <2 x i32>.
+Value *VDstOld = EmitScalarExpr(E->getArg(0));
+Value *VSrcOld = EmitScalarExpr(E->getArg(1));
+Value *FI = EmitScalarExpr(E->getArg(2));
+Value *BoundCtrl = EmitScalarExpr(E->getArg(3));
+Function *F =
+CGM.getIntrinsic(BuiltinID == 
AMDGPU::BI__builtin_amdgcn_permlane16_swap
+ ? Intrinsic::amdgcn_permlane16_swap
+ : Intrinsic::amdgcn_permlane32_swap);
+llvm::CallInst *Call =
+Builder.CreateCall(F, {VDstOld, VSrcOld, FI, BoundCtrl});
+
+llvm::Value *Elt0 = Builder.CreateExtractValue(Call, 0);
+llvm::Value *Elt1 = Builder.CreateExtractValue(Call, 1);
+
+llvm::Type *ResultType = Con

[llvm-branch-commits] [llvm] [LoongArch] Fix GOT usage for `non-dso_local` function calls in large code model (PR #117134)

2024-11-21 Thread Lu Weining via llvm-branch-commits

https://github.com/SixWeining approved this pull request.


https://github.com/llvm/llvm-project/pull/117134
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] [RISCV] Support __builtin_cpu_is (PR #116231)

2024-11-21 Thread Pengcheng Wang via llvm-branch-commits

https://github.com/wangpc-pp updated 
https://github.com/llvm/llvm-project/pull/116231

>From 9686a2c5c5276289e72d9098f497a9f246a1c457 Mon Sep 17 00:00:00 2001
From: Wang Pengcheng 
Date: Thu, 14 Nov 2024 22:06:45 +0800
Subject: [PATCH 1/4] Remove stale CHECKs

Created using spr 1.3.6-beta.1
---
 clang/test/CodeGen/builtin-cpu-is.c | 20 
 1 file changed, 20 deletions(-)

diff --git a/clang/test/CodeGen/builtin-cpu-is.c 
b/clang/test/CodeGen/builtin-cpu-is.c
index e4a2071cf46795..b8dd97eeacebcf 100644
--- a/clang/test/CodeGen/builtin-cpu-is.c
+++ b/clang/test/CodeGen/builtin-cpu-is.c
@@ -7,8 +7,6 @@
 // global, the bit grab, and the icmp correct.
 extern void a(const char *);
 
-// CHECK: @__cpu_model = external dso_local global { i32, i32, i32, [1 x i32] }
-
 // CHECK-X86-LABEL: define dso_local void @intel(
 // CHECK-X86-SAME: ) #[[ATTR0:[0-9]+]] {
 // CHECK-X86-NEXT:  [[ENTRY:.*:]]
@@ -24,9 +22,6 @@ extern void a(const char *);
 void intel(void) {
   if (__builtin_cpu_is("intel"))
 a("intel");
-
-  // CHECK: [[LOAD:%[^ ]+]] = load i32, ptr @__cpu_model
-  // CHECK: = icmp eq i32 [[LOAD]], 1
 }
 
 // CHECK-X86-LABEL: define dso_local void @amd(
@@ -44,9 +39,6 @@ void intel(void) {
 void amd(void) {
   if (__builtin_cpu_is("amd"))
 a("amd");
-
-  // CHECK: [[LOAD:%[^ ]+]] = load i32, ptr @__cpu_model
-  // CHECK: = icmp eq i32 [[LOAD]], 2
 }
 
 // CHECK-X86-LABEL: define dso_local void @atom(
@@ -64,9 +56,6 @@ void amd(void) {
 void atom(void) {
   if (__builtin_cpu_is("atom"))
 a("atom");
-
-  // CHECK: [[LOAD:%[^ ]+]] = load i32, ptr getelementptr inbounds ({ i32, 
i32, i32, [1 x i32] }, ptr @__cpu_model, i32 0, i32 1)
-  // CHECK: = icmp eq i32 [[LOAD]], 1
 }
 
 // CHECK-X86-LABEL: define dso_local void @amdfam10h(
@@ -84,9 +73,6 @@ void atom(void) {
 void amdfam10h(void) {
   if (__builtin_cpu_is("amdfam10h"))
 a("amdfam10h");
-
-  // CHECK: [[LOAD:%[^ ]+]] = load i32, ptr getelementptr inbounds ({ i32, 
i32, i32, [1 x i32] }, ptr @__cpu_model, i32 0, i32 1)
-  // CHECK: = icmp eq i32 [[LOAD]], 4
 }
 
 // CHECK-X86-LABEL: define dso_local void @barcelona(
@@ -104,9 +90,6 @@ void amdfam10h(void) {
 void barcelona(void) {
   if (__builtin_cpu_is("barcelona"))
 a("barcelona");
-
-  // CHECK: [[LOAD:%[^ ]+]] = load i32, ptr getelementptr inbounds ({ i32, 
i32, i32, [1 x i32] }, ptr @__cpu_model, i32 0, i32 2)
-  // CHECK: = icmp eq i32 [[LOAD]], 4
 }
 
 // CHECK-X86-LABEL: define dso_local void @nehalem(
@@ -124,9 +107,6 @@ void barcelona(void) {
 void nehalem(void) {
   if (__builtin_cpu_is("nehalem"))
 a("nehalem");
-
-  // CHECK: [[LOAD:%[^ ]+]] = load i32, ptr getelementptr inbounds ({ i32, 
i32, i32, [1 x i32] }, ptr @__cpu_model, i32 0, i32 2)
-  // CHECK: = icmp eq i32 [[LOAD]], 1
 }
 #endif
 

>From 2bb2d5079b5bf98ba9f87e082ca3e67ab70068aa Mon Sep 17 00:00:00 2001
From: Wang Pengcheng 
Date: Thu, 14 Nov 2024 22:12:36 +0800
Subject: [PATCH 2/4] Simplify test

Created using spr 1.3.6-beta.1
---
 clang/test/CodeGen/builtin-cpu-is.c | 25 ++---
 1 file changed, 6 insertions(+), 19 deletions(-)

diff --git a/clang/test/CodeGen/builtin-cpu-is.c 
b/clang/test/CodeGen/builtin-cpu-is.c
index b8dd97eeacebcf..8e78213a7cfcfb 100644
--- a/clang/test/CodeGen/builtin-cpu-is.c
+++ b/clang/test/CodeGen/builtin-cpu-is.c
@@ -111,12 +111,9 @@ void nehalem(void) {
 #endif
 
 #ifdef __riscv
-// CHECK-RV64-LABEL: define dso_local signext i32 @test_riscv(
-// CHECK-RV64-SAME: i32 noundef signext [[A:%.*]]) #[[ATTR0:[0-9]+]] {
+// CHECK-RV64-LABEL: define dso_local signext i32 @test_cpu_is_veyron_v1(
+// CHECK-RV64-SAME: ) #[[ATTR0:[0-9]+]] {
 // CHECK-RV64-NEXT:  [[ENTRY:.*:]]
-// CHECK-RV64-NEXT:[[RETVAL:%.*]] = alloca i32, align 4
-// CHECK-RV64-NEXT:[[A_ADDR:%.*]] = alloca i32, align 4
-// CHECK-RV64-NEXT:store i32 [[A]], ptr [[A_ADDR]], align 4
 // CHECK-RV64-NEXT:[[TMP0:%.*]] = load i32, ptr @__riscv_cpu_model, align 4
 // CHECK-RV64-NEXT:[[TMP1:%.*]] = icmp eq i32 [[TMP0]], 1567
 // CHECK-RV64-NEXT:[[TMP2:%.*]] = load i64, ptr getelementptr inbounds ({ 
i32, i64, i64 }, ptr @__riscv_cpu_model, i32 0, i32 1), align 8
@@ -125,20 +122,10 @@ void nehalem(void) {
 // CHECK-RV64-NEXT:[[TMP5:%.*]] = load i64, ptr getelementptr inbounds ({ 
i32, i64, i64 }, ptr @__riscv_cpu_model, i32 0, i32 2), align 8
 // CHECK-RV64-NEXT:[[TMP6:%.*]] = icmp eq i64 [[TMP5]], 273
 // CHECK-RV64-NEXT:[[TMP7:%.*]] = and i1 [[TMP4]], [[TMP6]]
-// CHECK-RV64-NEXT:br i1 [[TMP7]], label %[[IF_THEN:.*]], label 
%[[IF_END:.*]]
-// CHECK-RV64:   [[IF_THEN]]:
-// CHECK-RV64-NEXT:store i32 3, ptr [[RETVAL]], align 4
-// CHECK-RV64-NEXT:br label %[[RETURN:.*]]
-// CHECK-RV64:   [[IF_END]]:
-// CHECK-RV64-NEXT:store i32 0, ptr [[RETVAL]], align 4
-// CHECK-RV64-NEXT:br label %[[RETURN]]
-// CHECK-RV64:   [[RETURN]]:
-// CHECK-RV64-NEXT:[[TMP8:%.*]] = load i32, ptr [[RETVAL]], align 4
-// CHECK-RV64-NEXT:ret i32 [[TM

[llvm-branch-commits] [clang] [llvm] [RISCV] Support __builtin_cpu_is (PR #116231)

2024-11-21 Thread Pengcheng Wang via llvm-branch-commits


@@ -58,6 +58,19 @@ bool hasFastVectorUnalignedAccess(StringRef CPU) {
   return Info && Info->FastVectorUnalignedAccess;
 }
 
+bool hasValidCPUModel(StringRef CPU) {
+  const CPUModel CPUModel = getCPUModel(CPU);
+  return CPUModel.MVendorID != 0 && CPUModel.MArchID != 0 &&

wangpc-pp wrote:

Done!

https://github.com/llvm/llvm-project/pull/116231
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] [RISCV] Support __builtin_cpu_is (PR #116231)

2024-11-21 Thread Pengcheng Wang via llvm-branch-commits

https://github.com/wangpc-pp updated 
https://github.com/llvm/llvm-project/pull/116231

>From 9686a2c5c5276289e72d9098f497a9f246a1c457 Mon Sep 17 00:00:00 2001
From: Wang Pengcheng 
Date: Thu, 14 Nov 2024 22:06:45 +0800
Subject: [PATCH 1/4] Remove stale CHECKs

Created using spr 1.3.6-beta.1
---
 clang/test/CodeGen/builtin-cpu-is.c | 20 
 1 file changed, 20 deletions(-)

diff --git a/clang/test/CodeGen/builtin-cpu-is.c 
b/clang/test/CodeGen/builtin-cpu-is.c
index e4a2071cf46795..b8dd97eeacebcf 100644
--- a/clang/test/CodeGen/builtin-cpu-is.c
+++ b/clang/test/CodeGen/builtin-cpu-is.c
@@ -7,8 +7,6 @@
 // global, the bit grab, and the icmp correct.
 extern void a(const char *);
 
-// CHECK: @__cpu_model = external dso_local global { i32, i32, i32, [1 x i32] }
-
 // CHECK-X86-LABEL: define dso_local void @intel(
 // CHECK-X86-SAME: ) #[[ATTR0:[0-9]+]] {
 // CHECK-X86-NEXT:  [[ENTRY:.*:]]
@@ -24,9 +22,6 @@ extern void a(const char *);
 void intel(void) {
   if (__builtin_cpu_is("intel"))
 a("intel");
-
-  // CHECK: [[LOAD:%[^ ]+]] = load i32, ptr @__cpu_model
-  // CHECK: = icmp eq i32 [[LOAD]], 1
 }
 
 // CHECK-X86-LABEL: define dso_local void @amd(
@@ -44,9 +39,6 @@ void intel(void) {
 void amd(void) {
   if (__builtin_cpu_is("amd"))
 a("amd");
-
-  // CHECK: [[LOAD:%[^ ]+]] = load i32, ptr @__cpu_model
-  // CHECK: = icmp eq i32 [[LOAD]], 2
 }
 
 // CHECK-X86-LABEL: define dso_local void @atom(
@@ -64,9 +56,6 @@ void amd(void) {
 void atom(void) {
   if (__builtin_cpu_is("atom"))
 a("atom");
-
-  // CHECK: [[LOAD:%[^ ]+]] = load i32, ptr getelementptr inbounds ({ i32, 
i32, i32, [1 x i32] }, ptr @__cpu_model, i32 0, i32 1)
-  // CHECK: = icmp eq i32 [[LOAD]], 1
 }
 
 // CHECK-X86-LABEL: define dso_local void @amdfam10h(
@@ -84,9 +73,6 @@ void atom(void) {
 void amdfam10h(void) {
   if (__builtin_cpu_is("amdfam10h"))
 a("amdfam10h");
-
-  // CHECK: [[LOAD:%[^ ]+]] = load i32, ptr getelementptr inbounds ({ i32, 
i32, i32, [1 x i32] }, ptr @__cpu_model, i32 0, i32 1)
-  // CHECK: = icmp eq i32 [[LOAD]], 4
 }
 
 // CHECK-X86-LABEL: define dso_local void @barcelona(
@@ -104,9 +90,6 @@ void amdfam10h(void) {
 void barcelona(void) {
   if (__builtin_cpu_is("barcelona"))
 a("barcelona");
-
-  // CHECK: [[LOAD:%[^ ]+]] = load i32, ptr getelementptr inbounds ({ i32, 
i32, i32, [1 x i32] }, ptr @__cpu_model, i32 0, i32 2)
-  // CHECK: = icmp eq i32 [[LOAD]], 4
 }
 
 // CHECK-X86-LABEL: define dso_local void @nehalem(
@@ -124,9 +107,6 @@ void barcelona(void) {
 void nehalem(void) {
   if (__builtin_cpu_is("nehalem"))
 a("nehalem");
-
-  // CHECK: [[LOAD:%[^ ]+]] = load i32, ptr getelementptr inbounds ({ i32, 
i32, i32, [1 x i32] }, ptr @__cpu_model, i32 0, i32 2)
-  // CHECK: = icmp eq i32 [[LOAD]], 1
 }
 #endif
 

>From 2bb2d5079b5bf98ba9f87e082ca3e67ab70068aa Mon Sep 17 00:00:00 2001
From: Wang Pengcheng 
Date: Thu, 14 Nov 2024 22:12:36 +0800
Subject: [PATCH 2/4] Simplify test

Created using spr 1.3.6-beta.1
---
 clang/test/CodeGen/builtin-cpu-is.c | 25 ++---
 1 file changed, 6 insertions(+), 19 deletions(-)

diff --git a/clang/test/CodeGen/builtin-cpu-is.c 
b/clang/test/CodeGen/builtin-cpu-is.c
index b8dd97eeacebcf..8e78213a7cfcfb 100644
--- a/clang/test/CodeGen/builtin-cpu-is.c
+++ b/clang/test/CodeGen/builtin-cpu-is.c
@@ -111,12 +111,9 @@ void nehalem(void) {
 #endif
 
 #ifdef __riscv
-// CHECK-RV64-LABEL: define dso_local signext i32 @test_riscv(
-// CHECK-RV64-SAME: i32 noundef signext [[A:%.*]]) #[[ATTR0:[0-9]+]] {
+// CHECK-RV64-LABEL: define dso_local signext i32 @test_cpu_is_veyron_v1(
+// CHECK-RV64-SAME: ) #[[ATTR0:[0-9]+]] {
 // CHECK-RV64-NEXT:  [[ENTRY:.*:]]
-// CHECK-RV64-NEXT:[[RETVAL:%.*]] = alloca i32, align 4
-// CHECK-RV64-NEXT:[[A_ADDR:%.*]] = alloca i32, align 4
-// CHECK-RV64-NEXT:store i32 [[A]], ptr [[A_ADDR]], align 4
 // CHECK-RV64-NEXT:[[TMP0:%.*]] = load i32, ptr @__riscv_cpu_model, align 4
 // CHECK-RV64-NEXT:[[TMP1:%.*]] = icmp eq i32 [[TMP0]], 1567
 // CHECK-RV64-NEXT:[[TMP2:%.*]] = load i64, ptr getelementptr inbounds ({ 
i32, i64, i64 }, ptr @__riscv_cpu_model, i32 0, i32 1), align 8
@@ -125,20 +122,10 @@ void nehalem(void) {
 // CHECK-RV64-NEXT:[[TMP5:%.*]] = load i64, ptr getelementptr inbounds ({ 
i32, i64, i64 }, ptr @__riscv_cpu_model, i32 0, i32 2), align 8
 // CHECK-RV64-NEXT:[[TMP6:%.*]] = icmp eq i64 [[TMP5]], 273
 // CHECK-RV64-NEXT:[[TMP7:%.*]] = and i1 [[TMP4]], [[TMP6]]
-// CHECK-RV64-NEXT:br i1 [[TMP7]], label %[[IF_THEN:.*]], label 
%[[IF_END:.*]]
-// CHECK-RV64:   [[IF_THEN]]:
-// CHECK-RV64-NEXT:store i32 3, ptr [[RETVAL]], align 4
-// CHECK-RV64-NEXT:br label %[[RETURN:.*]]
-// CHECK-RV64:   [[IF_END]]:
-// CHECK-RV64-NEXT:store i32 0, ptr [[RETVAL]], align 4
-// CHECK-RV64-NEXT:br label %[[RETURN]]
-// CHECK-RV64:   [[RETURN]]:
-// CHECK-RV64-NEXT:[[TMP8:%.*]] = load i32, ptr [[RETVAL]], align 4
-// CHECK-RV64-NEXT:ret i32 [[TM

[llvm-branch-commits] [clang] [llvm] [RISCV] Support __builtin_cpu_is (PR #116231)

2024-11-21 Thread Pengcheng Wang via llvm-branch-commits

https://github.com/wangpc-pp updated 
https://github.com/llvm/llvm-project/pull/116231

>From 9686a2c5c5276289e72d9098f497a9f246a1c457 Mon Sep 17 00:00:00 2001
From: Wang Pengcheng 
Date: Thu, 14 Nov 2024 22:06:45 +0800
Subject: [PATCH 1/4] Remove stale CHECKs

Created using spr 1.3.6-beta.1
---
 clang/test/CodeGen/builtin-cpu-is.c | 20 
 1 file changed, 20 deletions(-)

diff --git a/clang/test/CodeGen/builtin-cpu-is.c 
b/clang/test/CodeGen/builtin-cpu-is.c
index e4a2071cf46795..b8dd97eeacebcf 100644
--- a/clang/test/CodeGen/builtin-cpu-is.c
+++ b/clang/test/CodeGen/builtin-cpu-is.c
@@ -7,8 +7,6 @@
 // global, the bit grab, and the icmp correct.
 extern void a(const char *);
 
-// CHECK: @__cpu_model = external dso_local global { i32, i32, i32, [1 x i32] }
-
 // CHECK-X86-LABEL: define dso_local void @intel(
 // CHECK-X86-SAME: ) #[[ATTR0:[0-9]+]] {
 // CHECK-X86-NEXT:  [[ENTRY:.*:]]
@@ -24,9 +22,6 @@ extern void a(const char *);
 void intel(void) {
   if (__builtin_cpu_is("intel"))
 a("intel");
-
-  // CHECK: [[LOAD:%[^ ]+]] = load i32, ptr @__cpu_model
-  // CHECK: = icmp eq i32 [[LOAD]], 1
 }
 
 // CHECK-X86-LABEL: define dso_local void @amd(
@@ -44,9 +39,6 @@ void intel(void) {
 void amd(void) {
   if (__builtin_cpu_is("amd"))
 a("amd");
-
-  // CHECK: [[LOAD:%[^ ]+]] = load i32, ptr @__cpu_model
-  // CHECK: = icmp eq i32 [[LOAD]], 2
 }
 
 // CHECK-X86-LABEL: define dso_local void @atom(
@@ -64,9 +56,6 @@ void amd(void) {
 void atom(void) {
   if (__builtin_cpu_is("atom"))
 a("atom");
-
-  // CHECK: [[LOAD:%[^ ]+]] = load i32, ptr getelementptr inbounds ({ i32, 
i32, i32, [1 x i32] }, ptr @__cpu_model, i32 0, i32 1)
-  // CHECK: = icmp eq i32 [[LOAD]], 1
 }
 
 // CHECK-X86-LABEL: define dso_local void @amdfam10h(
@@ -84,9 +73,6 @@ void atom(void) {
 void amdfam10h(void) {
   if (__builtin_cpu_is("amdfam10h"))
 a("amdfam10h");
-
-  // CHECK: [[LOAD:%[^ ]+]] = load i32, ptr getelementptr inbounds ({ i32, 
i32, i32, [1 x i32] }, ptr @__cpu_model, i32 0, i32 1)
-  // CHECK: = icmp eq i32 [[LOAD]], 4
 }
 
 // CHECK-X86-LABEL: define dso_local void @barcelona(
@@ -104,9 +90,6 @@ void amdfam10h(void) {
 void barcelona(void) {
   if (__builtin_cpu_is("barcelona"))
 a("barcelona");
-
-  // CHECK: [[LOAD:%[^ ]+]] = load i32, ptr getelementptr inbounds ({ i32, 
i32, i32, [1 x i32] }, ptr @__cpu_model, i32 0, i32 2)
-  // CHECK: = icmp eq i32 [[LOAD]], 4
 }
 
 // CHECK-X86-LABEL: define dso_local void @nehalem(
@@ -124,9 +107,6 @@ void barcelona(void) {
 void nehalem(void) {
   if (__builtin_cpu_is("nehalem"))
 a("nehalem");
-
-  // CHECK: [[LOAD:%[^ ]+]] = load i32, ptr getelementptr inbounds ({ i32, 
i32, i32, [1 x i32] }, ptr @__cpu_model, i32 0, i32 2)
-  // CHECK: = icmp eq i32 [[LOAD]], 1
 }
 #endif
 

>From 2bb2d5079b5bf98ba9f87e082ca3e67ab70068aa Mon Sep 17 00:00:00 2001
From: Wang Pengcheng 
Date: Thu, 14 Nov 2024 22:12:36 +0800
Subject: [PATCH 2/4] Simplify test

Created using spr 1.3.6-beta.1
---
 clang/test/CodeGen/builtin-cpu-is.c | 25 ++---
 1 file changed, 6 insertions(+), 19 deletions(-)

diff --git a/clang/test/CodeGen/builtin-cpu-is.c 
b/clang/test/CodeGen/builtin-cpu-is.c
index b8dd97eeacebcf..8e78213a7cfcfb 100644
--- a/clang/test/CodeGen/builtin-cpu-is.c
+++ b/clang/test/CodeGen/builtin-cpu-is.c
@@ -111,12 +111,9 @@ void nehalem(void) {
 #endif
 
 #ifdef __riscv
-// CHECK-RV64-LABEL: define dso_local signext i32 @test_riscv(
-// CHECK-RV64-SAME: i32 noundef signext [[A:%.*]]) #[[ATTR0:[0-9]+]] {
+// CHECK-RV64-LABEL: define dso_local signext i32 @test_cpu_is_veyron_v1(
+// CHECK-RV64-SAME: ) #[[ATTR0:[0-9]+]] {
 // CHECK-RV64-NEXT:  [[ENTRY:.*:]]
-// CHECK-RV64-NEXT:[[RETVAL:%.*]] = alloca i32, align 4
-// CHECK-RV64-NEXT:[[A_ADDR:%.*]] = alloca i32, align 4
-// CHECK-RV64-NEXT:store i32 [[A]], ptr [[A_ADDR]], align 4
 // CHECK-RV64-NEXT:[[TMP0:%.*]] = load i32, ptr @__riscv_cpu_model, align 4
 // CHECK-RV64-NEXT:[[TMP1:%.*]] = icmp eq i32 [[TMP0]], 1567
 // CHECK-RV64-NEXT:[[TMP2:%.*]] = load i64, ptr getelementptr inbounds ({ 
i32, i64, i64 }, ptr @__riscv_cpu_model, i32 0, i32 1), align 8
@@ -125,20 +122,10 @@ void nehalem(void) {
 // CHECK-RV64-NEXT:[[TMP5:%.*]] = load i64, ptr getelementptr inbounds ({ 
i32, i64, i64 }, ptr @__riscv_cpu_model, i32 0, i32 2), align 8
 // CHECK-RV64-NEXT:[[TMP6:%.*]] = icmp eq i64 [[TMP5]], 273
 // CHECK-RV64-NEXT:[[TMP7:%.*]] = and i1 [[TMP4]], [[TMP6]]
-// CHECK-RV64-NEXT:br i1 [[TMP7]], label %[[IF_THEN:.*]], label 
%[[IF_END:.*]]
-// CHECK-RV64:   [[IF_THEN]]:
-// CHECK-RV64-NEXT:store i32 3, ptr [[RETVAL]], align 4
-// CHECK-RV64-NEXT:br label %[[RETURN:.*]]
-// CHECK-RV64:   [[IF_END]]:
-// CHECK-RV64-NEXT:store i32 0, ptr [[RETVAL]], align 4
-// CHECK-RV64-NEXT:br label %[[RETURN]]
-// CHECK-RV64:   [[RETURN]]:
-// CHECK-RV64-NEXT:[[TMP8:%.*]] = load i32, ptr [[RETVAL]], align 4
-// CHECK-RV64-NEXT:ret i32 [[TM

[llvm-branch-commits] [clang] [llvm] [RISCV] Support __builtin_cpu_is (PR #116231)

2024-11-21 Thread Pengcheng Wang via llvm-branch-commits

https://github.com/wangpc-pp updated 
https://github.com/llvm/llvm-project/pull/116231

>From 9686a2c5c5276289e72d9098f497a9f246a1c457 Mon Sep 17 00:00:00 2001
From: Wang Pengcheng 
Date: Thu, 14 Nov 2024 22:06:45 +0800
Subject: [PATCH 1/4] Remove stale CHECKs

Created using spr 1.3.6-beta.1
---
 clang/test/CodeGen/builtin-cpu-is.c | 20 
 1 file changed, 20 deletions(-)

diff --git a/clang/test/CodeGen/builtin-cpu-is.c 
b/clang/test/CodeGen/builtin-cpu-is.c
index e4a2071cf46795..b8dd97eeacebcf 100644
--- a/clang/test/CodeGen/builtin-cpu-is.c
+++ b/clang/test/CodeGen/builtin-cpu-is.c
@@ -7,8 +7,6 @@
 // global, the bit grab, and the icmp correct.
 extern void a(const char *);
 
-// CHECK: @__cpu_model = external dso_local global { i32, i32, i32, [1 x i32] }
-
 // CHECK-X86-LABEL: define dso_local void @intel(
 // CHECK-X86-SAME: ) #[[ATTR0:[0-9]+]] {
 // CHECK-X86-NEXT:  [[ENTRY:.*:]]
@@ -24,9 +22,6 @@ extern void a(const char *);
 void intel(void) {
   if (__builtin_cpu_is("intel"))
 a("intel");
-
-  // CHECK: [[LOAD:%[^ ]+]] = load i32, ptr @__cpu_model
-  // CHECK: = icmp eq i32 [[LOAD]], 1
 }
 
 // CHECK-X86-LABEL: define dso_local void @amd(
@@ -44,9 +39,6 @@ void intel(void) {
 void amd(void) {
   if (__builtin_cpu_is("amd"))
 a("amd");
-
-  // CHECK: [[LOAD:%[^ ]+]] = load i32, ptr @__cpu_model
-  // CHECK: = icmp eq i32 [[LOAD]], 2
 }
 
 // CHECK-X86-LABEL: define dso_local void @atom(
@@ -64,9 +56,6 @@ void amd(void) {
 void atom(void) {
   if (__builtin_cpu_is("atom"))
 a("atom");
-
-  // CHECK: [[LOAD:%[^ ]+]] = load i32, ptr getelementptr inbounds ({ i32, 
i32, i32, [1 x i32] }, ptr @__cpu_model, i32 0, i32 1)
-  // CHECK: = icmp eq i32 [[LOAD]], 1
 }
 
 // CHECK-X86-LABEL: define dso_local void @amdfam10h(
@@ -84,9 +73,6 @@ void atom(void) {
 void amdfam10h(void) {
   if (__builtin_cpu_is("amdfam10h"))
 a("amdfam10h");
-
-  // CHECK: [[LOAD:%[^ ]+]] = load i32, ptr getelementptr inbounds ({ i32, 
i32, i32, [1 x i32] }, ptr @__cpu_model, i32 0, i32 1)
-  // CHECK: = icmp eq i32 [[LOAD]], 4
 }
 
 // CHECK-X86-LABEL: define dso_local void @barcelona(
@@ -104,9 +90,6 @@ void amdfam10h(void) {
 void barcelona(void) {
   if (__builtin_cpu_is("barcelona"))
 a("barcelona");
-
-  // CHECK: [[LOAD:%[^ ]+]] = load i32, ptr getelementptr inbounds ({ i32, 
i32, i32, [1 x i32] }, ptr @__cpu_model, i32 0, i32 2)
-  // CHECK: = icmp eq i32 [[LOAD]], 4
 }
 
 // CHECK-X86-LABEL: define dso_local void @nehalem(
@@ -124,9 +107,6 @@ void barcelona(void) {
 void nehalem(void) {
   if (__builtin_cpu_is("nehalem"))
 a("nehalem");
-
-  // CHECK: [[LOAD:%[^ ]+]] = load i32, ptr getelementptr inbounds ({ i32, 
i32, i32, [1 x i32] }, ptr @__cpu_model, i32 0, i32 2)
-  // CHECK: = icmp eq i32 [[LOAD]], 1
 }
 #endif
 

>From 2bb2d5079b5bf98ba9f87e082ca3e67ab70068aa Mon Sep 17 00:00:00 2001
From: Wang Pengcheng 
Date: Thu, 14 Nov 2024 22:12:36 +0800
Subject: [PATCH 2/4] Simplify test

Created using spr 1.3.6-beta.1
---
 clang/test/CodeGen/builtin-cpu-is.c | 25 ++---
 1 file changed, 6 insertions(+), 19 deletions(-)

diff --git a/clang/test/CodeGen/builtin-cpu-is.c 
b/clang/test/CodeGen/builtin-cpu-is.c
index b8dd97eeacebcf..8e78213a7cfcfb 100644
--- a/clang/test/CodeGen/builtin-cpu-is.c
+++ b/clang/test/CodeGen/builtin-cpu-is.c
@@ -111,12 +111,9 @@ void nehalem(void) {
 #endif
 
 #ifdef __riscv
-// CHECK-RV64-LABEL: define dso_local signext i32 @test_riscv(
-// CHECK-RV64-SAME: i32 noundef signext [[A:%.*]]) #[[ATTR0:[0-9]+]] {
+// CHECK-RV64-LABEL: define dso_local signext i32 @test_cpu_is_veyron_v1(
+// CHECK-RV64-SAME: ) #[[ATTR0:[0-9]+]] {
 // CHECK-RV64-NEXT:  [[ENTRY:.*:]]
-// CHECK-RV64-NEXT:[[RETVAL:%.*]] = alloca i32, align 4
-// CHECK-RV64-NEXT:[[A_ADDR:%.*]] = alloca i32, align 4
-// CHECK-RV64-NEXT:store i32 [[A]], ptr [[A_ADDR]], align 4
 // CHECK-RV64-NEXT:[[TMP0:%.*]] = load i32, ptr @__riscv_cpu_model, align 4
 // CHECK-RV64-NEXT:[[TMP1:%.*]] = icmp eq i32 [[TMP0]], 1567
 // CHECK-RV64-NEXT:[[TMP2:%.*]] = load i64, ptr getelementptr inbounds ({ 
i32, i64, i64 }, ptr @__riscv_cpu_model, i32 0, i32 1), align 8
@@ -125,20 +122,10 @@ void nehalem(void) {
 // CHECK-RV64-NEXT:[[TMP5:%.*]] = load i64, ptr getelementptr inbounds ({ 
i32, i64, i64 }, ptr @__riscv_cpu_model, i32 0, i32 2), align 8
 // CHECK-RV64-NEXT:[[TMP6:%.*]] = icmp eq i64 [[TMP5]], 273
 // CHECK-RV64-NEXT:[[TMP7:%.*]] = and i1 [[TMP4]], [[TMP6]]
-// CHECK-RV64-NEXT:br i1 [[TMP7]], label %[[IF_THEN:.*]], label 
%[[IF_END:.*]]
-// CHECK-RV64:   [[IF_THEN]]:
-// CHECK-RV64-NEXT:store i32 3, ptr [[RETVAL]], align 4
-// CHECK-RV64-NEXT:br label %[[RETURN:.*]]
-// CHECK-RV64:   [[IF_END]]:
-// CHECK-RV64-NEXT:store i32 0, ptr [[RETVAL]], align 4
-// CHECK-RV64-NEXT:br label %[[RETURN]]
-// CHECK-RV64:   [[RETURN]]:
-// CHECK-RV64-NEXT:[[TMP8:%.*]] = load i32, ptr [[RETVAL]], align 4
-// CHECK-RV64-NEXT:ret i32 [[TM

[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_permlane16_swap_b32 and v_permlane32_swap_b32 for gfx950 (PR #117260)

2024-11-21 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm created 
https://github.com/llvm/llvm-project/pull/117260

This was a bit annoying because these introduce a new special case
encoding usage. op_sel is repurposed as a subset of dpp controls,
and is eligible for VOP3->VOP1 shrinking. For some reason fi also
uses an enum value, so we need to convert the raw boolean to 1 instead
of -1.

The 2 registers are swapped, so this has 2 defs. Ideally the builtin
would return a pair, but that's difficult so return a vector instead.
This would make a hypothetical builtin that supports v2f16 directly
uglier.

>From 66e98ff5b008512e73f63e037f3f76defa6c0a19 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Mon, 22 Jan 2024 12:40:54 +0700
Subject: [PATCH] AMDGPU: Add v_permlane16_swap_b32 and v_permlane32_swap_b32
 for gfx950

This was a bit annoying because these introduce a new special case
encoding usage. op_sel is repurposed as a subset of dpp controls,
and is eligible for VOP3->VOP1 shrinking. For some reason fi also
uses an enum value, so we need to convert the raw boolean to 1 instead
of -1.

The 2 registers are swapped, so this has 2 defs. Ideally the builtin
would return a pair, but that's difficult so return a vector instead.
This would make a hypothetical builtin that supports v2f16 directly
uglier.
---
 clang/include/clang/Basic/BuiltinsAMDGPU.def  |   3 +
 clang/lib/CodeGen/CGBuiltin.cpp   |  26 
 clang/test/CodeGenOpenCL/amdgpu-features.cl   |   2 +-
 .../builtins-amdgcn-gfx950-err.cl |   6 +-
 .../CodeGenOpenCL/builtins-amdgcn-gfx950.cl   |  87 +
 .../builtins-amdgcn-error-gfx950-param.cl |  10 ++
 .../builtins-amdgcn-error-gfx950.cl   |   5 +-
 llvm/docs/AMDGPUUsage.rst |  13 ++
 llvm/include/llvm/IR/IntrinsicsAMDGPU.td  |  14 ++
 llvm/lib/Target/AMDGPU/AMDGPU.td  |  23 +++-
 llvm/lib/Target/AMDGPU/AMDGPUGISel.td |   3 +
 llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp |  25 
 .../AMDGPU/AMDGPUInstructionSelector.cpp  |  32 +
 .../Target/AMDGPU/AMDGPUInstructionSelector.h |   3 +
 .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp  |   9 ++
 .../Target/AMDGPU/AMDGPUSearchableTables.td   |   2 +
 llvm/lib/Target/AMDGPU/GCNSubtarget.h |   8 +-
 llvm/lib/Target/AMDGPU/SIInstrInfo.cpp|   6 +-
 llvm/lib/Target/AMDGPU/SIInstrInfo.td |   4 +
 llvm/lib/Target/AMDGPU/VOP1Instructions.td|  46 +++
 llvm/lib/Target/AMDGPU/VOPInstructions.td |  12 ++
 llvm/lib/TargetParser/TargetParser.cpp|   2 +
 .../UniformityAnalysis/AMDGPU/intrinsics.ll   |  16 +++
 .../AMDGPU/llvm.amdgcn.permlane16.swap.ll | 121 ++
 .../AMDGPU/llvm.amdgcn.permlane32.swap.ll | 121 ++
 llvm/test/MC/AMDGPU/gfx950_asm_features.s |  82 
 llvm/test/MC/AMDGPU/gfx950_err.s  |  31 +
 llvm/test/MC/Disassembler/AMDGPU/gfx950.txt   |  32 +
 28 files changed, 737 insertions(+), 7 deletions(-)
 create mode 100644 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.permlane16.swap.ll
 create mode 100644 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.permlane32.swap.ll
 create mode 100644 llvm/test/MC/AMDGPU/gfx950_err.s

diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def 
b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index 51a5b1dbad495c..548bcc8ad55f48 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -459,6 +459,9 @@ 
TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_32x32x64_bf8_fp8, "V16fV4iV8iV16fiIiI
 TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_32x32x64_fp8_bf8, 
"V16fV4iV8iV16fiIiIi", "nc", "gfx950-insts")
 TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_32x32x64_fp8_fp8, 
"V16fV4iV8iV16fiIiIi", "nc", "gfx950-insts")
 
+TARGET_BUILTIN(__builtin_amdgcn_permlane16_swap, "V2UiUiUiIbIb", "nc", 
"permlane16-swap")
+TARGET_BUILTIN(__builtin_amdgcn_permlane32_swap, "V2UiUiUiIbIb", "nc", 
"permlane32-swap")
+
 
//===--===//
 // GFX12+ only builtins.
 
//===--===//
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index ff7132fd8bc1e7..3b3c46b56868cf 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -20162,6 +20162,32 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
 CGM.getIntrinsic(Intrinsic::amdgcn_s_sendmsg_rtn, {ResultType});
 return Builder.CreateCall(F, {Arg});
   }
+  case AMDGPU::BI__builtin_amdgcn_permlane16_swap:
+  case AMDGPU::BI__builtin_amdgcn_permlane32_swap: {
+// Because builtin types are limited, and the intrinsic uses a struct/pair
+// output, marshal the pair-of-i32 to <2 x i32>.
+Value *VDstOld = EmitScalarExpr(E->getArg(0));
+Value *VSrcOld = EmitScalarExpr(E->getArg(1));
+Value *FI = EmitScalarExpr(E->getArg(2));
+Value *BoundCtrl = EmitScalarExpr(E->getArg(3))

[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_smfmac_f32_16x16x64_f16 for gfx950 (PR #117202)

2024-11-21 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-mc

Author: Matt Arsenault (arsenm)


Changes



---

Patch is 27.26 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/117202.diff


13 Files Affected:

- (modified) clang/include/clang/Basic/BuiltinsAMDGPU.def (+1) 
- (modified) clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl (+7) 
- (modified) clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl (+7) 
- (modified) clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl (+1) 
- (modified) llvm/include/llvm/IR/IntrinsicsAMDGPU.td (+1) 
- (modified) llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp (+6) 
- (modified) llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp (+2-1) 
- (modified) llvm/lib/Target/AMDGPU/SIInstrInfo.td (+1) 
- (modified) llvm/lib/Target/AMDGPU/VOP3PInstructions.td (+7) 
- (modified) llvm/test/Analysis/UniformityAnalysis/AMDGPU/intrinsics.ll (+9) 
- (added) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.smfmac.gfx950.ll (+218) 
- (modified) llvm/test/MC/AMDGPU/mai-gfx950.s (+42) 
- (modified) llvm/test/MC/Disassembler/AMDGPU/gfx950_mai.txt (+22) 


``diff
diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def 
b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index 3b7cc559e88b29..f013714798cc54 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -444,6 +444,7 @@ TARGET_BUILTIN(__builtin_amdgcn_mfma_f32_32x32x16_bf16, 
"V16fV8yV8yV16fIiIiIi",
 TARGET_BUILTIN(__builtin_amdgcn_mfma_i32_16x16x64_i8, "V4iV4iV4iV4iIiIiIi", 
"nc", "gfx950-insts")
 TARGET_BUILTIN(__builtin_amdgcn_mfma_i32_32x32x32_i8, "V16iV4iV4iV16iIiIiIi", 
"nc", "gfx950-insts")
 
+TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_16x16x64_f16, "V4fV8hV16hV4fiIiIi", 
"nc", "gfx950-insts")
 
//===--===//
 // GFX12+ only builtins.
 
//===--===//
diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl 
b/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl
index 345f05f463bf44..e63d89a28de44d 100644
--- a/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl
+++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl
@@ -467,4 +467,11 @@ v4f test_mfma_f32_16x16x32_bf16(v8bf16 a, v8bf16 b, v4f c)
   return __builtin_amdgcn_mfma_f32_16x16x32_bf16(a, b, c, 1, 2, 3);
 }
 
+// CHECK-GFX950-LABEL: @test_smfmac_f32_16x16x64_f16
+// CHECK-GFX950: call <4 x float> @llvm.amdgcn.smfmac.f32.16x16x64.f16(<8 x 
half> %a, <16 x half> %b, <4 x float> %c, i32 %idx, i32 0, i32 0)
+void test_smfmac_f32_16x16x64_f16(global v4f* out, v8h a, v16h b, v4f c, int 
idx)
+{
+  *out = __builtin_amdgcn_smfmac_f32_16x16x64_f16(a, b, c, idx, 0, 0);
+}
+
 #endif
diff --git a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl 
b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl
index acaa20090dfcba..6366997465aeff 100644
--- a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl
+++ b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl
@@ -4,6 +4,7 @@
 typedef float float4 __attribute__((ext_vector_type(4)));
 typedef float float16 __attribute__((ext_vector_type(16)));
 typedef half half8 __attribute__((ext_vector_type(8)));
+typedef half half16 __attribute__((ext_vector_type(16)));
 typedef __bf16 bfloat8 __attribute__((ext_vector_type(8)));
 typedef int int4 __attribute__((ext_vector_type(4)));
 typedef int int8 __attribute__((ext_vector_type(8)));
@@ -62,3 +63,9 @@ void test_mfma_f32_16x16x32_bf16(__global float4* out, 
bfloat8 a, bfloat8 b, flo
   *out = __builtin_amdgcn_mfma_f32_16x16x32_bf16(a, b, c, 0, X, 0); // 
expected-error{{argument to '__builtin_amdgcn_mfma_f32_16x16x32_bf16' must be a 
constant integer}}
   *out = __builtin_amdgcn_mfma_f32_16x16x32_bf16(a, b, c, 0, 0, X); // 
expected-error{{argument to '__builtin_amdgcn_mfma_f32_16x16x32_bf16' must be a 
constant integer}}
 }
+
+void test_smfmac_f32_16x16x64_f16(global float4* out, half8 a, half16 b, 
float4 c, int idx, int d)
+{
+  *out = __builtin_amdgcn_smfmac_f32_16x16x64_f16(a, b, c, idx, d, 0); // 
expected-error{{argument to '__builtin_amdgcn_smfmac_f32_16x16x64_f16' must be 
a constant integer}}
+  *out = __builtin_amdgcn_smfmac_f32_16x16x64_f16(a, b, c, idx, 0, d); // 
expected-error{{argument to '__builtin_amdgcn_smfmac_f32_16x16x64_f16' must be 
a constant integer}}
+}
diff --git a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl 
b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl
index 6bf76b3cba0f59..1e924e86f3b897 100644
--- a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl
+++ b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl
@@ -34,6 +34,7 @@ void test(__global float4* out0, half8 a0, half8 b0, float4 
c0,
   *out3 = __builtin_amdgcn_mfma_i32_16x16x64_i8(a3, b3, c3, 0, 0, 0); // 
expected-error{{'__builtin_amdgcn_mfma_i32_16x16x64_i8' needs target feature 
gfx950-insts}}
   *out4 = __builtin_amdgcn_mfma_i32_3

[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_smfmac_f32_16x16x64_f16 for gfx950 (PR #117202)

2024-11-21 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-clang

Author: Matt Arsenault (arsenm)


Changes



---

Patch is 27.26 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/117202.diff


13 Files Affected:

- (modified) clang/include/clang/Basic/BuiltinsAMDGPU.def (+1) 
- (modified) clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl (+7) 
- (modified) clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl (+7) 
- (modified) clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl (+1) 
- (modified) llvm/include/llvm/IR/IntrinsicsAMDGPU.td (+1) 
- (modified) llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp (+6) 
- (modified) llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp (+2-1) 
- (modified) llvm/lib/Target/AMDGPU/SIInstrInfo.td (+1) 
- (modified) llvm/lib/Target/AMDGPU/VOP3PInstructions.td (+7) 
- (modified) llvm/test/Analysis/UniformityAnalysis/AMDGPU/intrinsics.ll (+9) 
- (added) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.smfmac.gfx950.ll (+218) 
- (modified) llvm/test/MC/AMDGPU/mai-gfx950.s (+42) 
- (modified) llvm/test/MC/Disassembler/AMDGPU/gfx950_mai.txt (+22) 


``diff
diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def 
b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index 3b7cc559e88b29..f013714798cc54 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -444,6 +444,7 @@ TARGET_BUILTIN(__builtin_amdgcn_mfma_f32_32x32x16_bf16, 
"V16fV8yV8yV16fIiIiIi",
 TARGET_BUILTIN(__builtin_amdgcn_mfma_i32_16x16x64_i8, "V4iV4iV4iV4iIiIiIi", 
"nc", "gfx950-insts")
 TARGET_BUILTIN(__builtin_amdgcn_mfma_i32_32x32x32_i8, "V16iV4iV4iV16iIiIiIi", 
"nc", "gfx950-insts")
 
+TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_16x16x64_f16, "V4fV8hV16hV4fiIiIi", 
"nc", "gfx950-insts")
 
//===--===//
 // GFX12+ only builtins.
 
//===--===//
diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl 
b/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl
index 345f05f463bf44..e63d89a28de44d 100644
--- a/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl
+++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl
@@ -467,4 +467,11 @@ v4f test_mfma_f32_16x16x32_bf16(v8bf16 a, v8bf16 b, v4f c)
   return __builtin_amdgcn_mfma_f32_16x16x32_bf16(a, b, c, 1, 2, 3);
 }
 
+// CHECK-GFX950-LABEL: @test_smfmac_f32_16x16x64_f16
+// CHECK-GFX950: call <4 x float> @llvm.amdgcn.smfmac.f32.16x16x64.f16(<8 x 
half> %a, <16 x half> %b, <4 x float> %c, i32 %idx, i32 0, i32 0)
+void test_smfmac_f32_16x16x64_f16(global v4f* out, v8h a, v16h b, v4f c, int 
idx)
+{
+  *out = __builtin_amdgcn_smfmac_f32_16x16x64_f16(a, b, c, idx, 0, 0);
+}
+
 #endif
diff --git a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl 
b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl
index acaa20090dfcba..6366997465aeff 100644
--- a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl
+++ b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl
@@ -4,6 +4,7 @@
 typedef float float4 __attribute__((ext_vector_type(4)));
 typedef float float16 __attribute__((ext_vector_type(16)));
 typedef half half8 __attribute__((ext_vector_type(8)));
+typedef half half16 __attribute__((ext_vector_type(16)));
 typedef __bf16 bfloat8 __attribute__((ext_vector_type(8)));
 typedef int int4 __attribute__((ext_vector_type(4)));
 typedef int int8 __attribute__((ext_vector_type(8)));
@@ -62,3 +63,9 @@ void test_mfma_f32_16x16x32_bf16(__global float4* out, 
bfloat8 a, bfloat8 b, flo
   *out = __builtin_amdgcn_mfma_f32_16x16x32_bf16(a, b, c, 0, X, 0); // 
expected-error{{argument to '__builtin_amdgcn_mfma_f32_16x16x32_bf16' must be a 
constant integer}}
   *out = __builtin_amdgcn_mfma_f32_16x16x32_bf16(a, b, c, 0, 0, X); // 
expected-error{{argument to '__builtin_amdgcn_mfma_f32_16x16x32_bf16' must be a 
constant integer}}
 }
+
+void test_smfmac_f32_16x16x64_f16(global float4* out, half8 a, half16 b, 
float4 c, int idx, int d)
+{
+  *out = __builtin_amdgcn_smfmac_f32_16x16x64_f16(a, b, c, idx, d, 0); // 
expected-error{{argument to '__builtin_amdgcn_smfmac_f32_16x16x64_f16' must be 
a constant integer}}
+  *out = __builtin_amdgcn_smfmac_f32_16x16x64_f16(a, b, c, idx, 0, d); // 
expected-error{{argument to '__builtin_amdgcn_smfmac_f32_16x16x64_f16' must be 
a constant integer}}
+}
diff --git a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl 
b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl
index 6bf76b3cba0f59..1e924e86f3b897 100644
--- a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl
+++ b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl
@@ -34,6 +34,7 @@ void test(__global float4* out0, half8 a0, half8 b0, float4 
c0,
   *out3 = __builtin_amdgcn_mfma_i32_16x16x64_i8(a3, b3, c3, 0, 0, 0); // 
expected-error{{'__builtin_amdgcn_mfma_i32_16x16x64_i8' needs target feature 
gfx950-insts}}
   *out4 = __builtin_amdgcn_mfma_i3

[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_smfmac_f32_16x16x64_f16 for gfx950 (PR #117202)

2024-11-21 Thread Matt Arsenault via llvm-branch-commits

arsenm wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/117202?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#117202** https://app.graphite.dev/github/pr/llvm/llvm-project/117202?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/117202?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#117055** https://app.graphite.dev/github/pr/llvm/llvm-project/117055?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117053** https://app.graphite.dev/github/pr/llvm/llvm-project/117053?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117052** https://app.graphite.dev/github/pr/llvm/llvm-project/117052?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#116728** https://app.graphite.dev/github/pr/llvm/llvm-project/116728?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#116724** https://app.graphite.dev/github/pr/llvm/llvm-project/116724?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>: 1 other dependent PR 
([#117047](https://github.com/llvm/llvm-project/pull/117047) https://app.graphite.dev/github/pr/llvm/llvm-project/117047?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>)
* **#116723** https://app.graphite.dev/github/pr/llvm/llvm-project/116723?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#116722** https://app.graphite.dev/github/pr/llvm/llvm-project/116722?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#116681** https://app.graphite.dev/github/pr/llvm/llvm-project/116681?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#116680** https://app.graphite.dev/github/pr/llvm/llvm-project/116680?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#116679** https://app.graphite.dev/github/pr/llvm/llvm-project/116679?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#116678** https://app.graphite.dev/github/pr/llvm/llvm-project/116678?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#116312** https://app.graphite.dev/github/pr/llvm/llvm-project/116312?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#116311** https://app.graphite.dev/github/pr/llvm/llvm-project/116311?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#116310** https://app.graphite.dev/github/pr/llvm/llvm-project/116310?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#116309** https://app.graphite.dev/github/pr/llvm/llvm-project/116309?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#116308** https://app.graphite.dev/github/pr/llvm/llvm-project/116308?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#116307** https://app.graphite.dev/github/pr/llvm/llvm-project/116307?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`



This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-co

[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_smfmac_f32_16x16x64_f16 for gfx950 (PR #117202)

2024-11-21 Thread via llvm-branch-commits

llvmbot wrote:



@llvm/pr-subscribers-backend-amdgpu

@llvm/pr-subscribers-llvm-analysis

Author: Matt Arsenault (arsenm)


Changes



---

Patch is 27.26 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/117202.diff


13 Files Affected:

- (modified) clang/include/clang/Basic/BuiltinsAMDGPU.def (+1) 
- (modified) clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl (+7) 
- (modified) clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl (+7) 
- (modified) clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl (+1) 
- (modified) llvm/include/llvm/IR/IntrinsicsAMDGPU.td (+1) 
- (modified) llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp (+6) 
- (modified) llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp (+2-1) 
- (modified) llvm/lib/Target/AMDGPU/SIInstrInfo.td (+1) 
- (modified) llvm/lib/Target/AMDGPU/VOP3PInstructions.td (+7) 
- (modified) llvm/test/Analysis/UniformityAnalysis/AMDGPU/intrinsics.ll (+9) 
- (added) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.smfmac.gfx950.ll (+218) 
- (modified) llvm/test/MC/AMDGPU/mai-gfx950.s (+42) 
- (modified) llvm/test/MC/Disassembler/AMDGPU/gfx950_mai.txt (+22) 


``diff
diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def 
b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index 3b7cc559e88b29..f013714798cc54 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -444,6 +444,7 @@ TARGET_BUILTIN(__builtin_amdgcn_mfma_f32_32x32x16_bf16, 
"V16fV8yV8yV16fIiIiIi",
 TARGET_BUILTIN(__builtin_amdgcn_mfma_i32_16x16x64_i8, "V4iV4iV4iV4iIiIiIi", 
"nc", "gfx950-insts")
 TARGET_BUILTIN(__builtin_amdgcn_mfma_i32_32x32x32_i8, "V16iV4iV4iV16iIiIiIi", 
"nc", "gfx950-insts")
 
+TARGET_BUILTIN(__builtin_amdgcn_smfmac_f32_16x16x64_f16, "V4fV8hV16hV4fiIiIi", 
"nc", "gfx950-insts")
 
//===--===//
 // GFX12+ only builtins.
 
//===--===//
diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl 
b/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl
index 345f05f463bf44..e63d89a28de44d 100644
--- a/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl
+++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl
@@ -467,4 +467,11 @@ v4f test_mfma_f32_16x16x32_bf16(v8bf16 a, v8bf16 b, v4f c)
   return __builtin_amdgcn_mfma_f32_16x16x32_bf16(a, b, c, 1, 2, 3);
 }
 
+// CHECK-GFX950-LABEL: @test_smfmac_f32_16x16x64_f16
+// CHECK-GFX950: call <4 x float> @llvm.amdgcn.smfmac.f32.16x16x64.f16(<8 x 
half> %a, <16 x half> %b, <4 x float> %c, i32 %idx, i32 0, i32 0)
+void test_smfmac_f32_16x16x64_f16(global v4f* out, v8h a, v16h b, v4f c, int 
idx)
+{
+  *out = __builtin_amdgcn_smfmac_f32_16x16x64_f16(a, b, c, idx, 0, 0);
+}
+
 #endif
diff --git a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl 
b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl
index acaa20090dfcba..6366997465aeff 100644
--- a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl
+++ b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl
@@ -4,6 +4,7 @@
 typedef float float4 __attribute__((ext_vector_type(4)));
 typedef float float16 __attribute__((ext_vector_type(16)));
 typedef half half8 __attribute__((ext_vector_type(8)));
+typedef half half16 __attribute__((ext_vector_type(16)));
 typedef __bf16 bfloat8 __attribute__((ext_vector_type(8)));
 typedef int int4 __attribute__((ext_vector_type(4)));
 typedef int int8 __attribute__((ext_vector_type(8)));
@@ -62,3 +63,9 @@ void test_mfma_f32_16x16x32_bf16(__global float4* out, 
bfloat8 a, bfloat8 b, flo
   *out = __builtin_amdgcn_mfma_f32_16x16x32_bf16(a, b, c, 0, X, 0); // 
expected-error{{argument to '__builtin_amdgcn_mfma_f32_16x16x32_bf16' must be a 
constant integer}}
   *out = __builtin_amdgcn_mfma_f32_16x16x32_bf16(a, b, c, 0, 0, X); // 
expected-error{{argument to '__builtin_amdgcn_mfma_f32_16x16x32_bf16' must be a 
constant integer}}
 }
+
+void test_smfmac_f32_16x16x64_f16(global float4* out, half8 a, half16 b, 
float4 c, int idx, int d)
+{
+  *out = __builtin_amdgcn_smfmac_f32_16x16x64_f16(a, b, c, idx, d, 0); // 
expected-error{{argument to '__builtin_amdgcn_smfmac_f32_16x16x64_f16' must be 
a constant integer}}
+  *out = __builtin_amdgcn_smfmac_f32_16x16x64_f16(a, b, c, idx, 0, d); // 
expected-error{{argument to '__builtin_amdgcn_smfmac_f32_16x16x64_f16' must be 
a constant integer}}
+}
diff --git a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl 
b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl
index 6bf76b3cba0f59..1e924e86f3b897 100644
--- a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl
+++ b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950.cl
@@ -34,6 +34,7 @@ void test(__global float4* out0, half8 a0, half8 b0, float4 
c0,
   *out3 = __builtin_amdgcn_mfma_i32_16x16x64_i8(a3, b3, c3, 0, 0, 0); // 
expected-error{{'__builtin_amdgcn_mfma_i32_16x16x64_i8' needs target feature 
gfx950

[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_smfmac_f32_16x16x64_f16 for gfx950 (PR #117202)

2024-11-21 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm ready_for_review 
https://github.com/llvm/llvm-project/pull/117202
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_smfmac_f32_32x32x32_f16 for gfx950 (PR #117205)

2024-11-21 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm ready_for_review 
https://github.com/llvm/llvm-project/pull/117205
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [HLSL] Add RWBuffer::Load(Index) (PR #117018)

2024-11-21 Thread Helena Kotas via llvm-branch-commits

https://github.com/hekota edited 
https://github.com/llvm/llvm-project/pull/117018
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_smfmac_f32_32x32x32_f16 for gfx950 (PR #117205)

2024-11-21 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian edited 
https://github.com/llvm/llvm-project/pull/117205
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_smfmac_f32_16x16x64_bf16 for gfx950 (PR #117211)

2024-11-21 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian edited 
https://github.com/llvm/llvm-project/pull/117211
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_smfmac_f32_16x16x64_bf16 for gfx950 (PR #117211)

2024-11-21 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian approved this pull request.


https://github.com/llvm/llvm-project/pull/117211
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_smfmac_f32_32x32x32_bf16 for gfx950 (PR #117212)

2024-11-21 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian approved this pull request.


https://github.com/llvm/llvm-project/pull/117212
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_smfmac_i32_16x16x128_i8 for gfx950 (PR #117213)

2024-11-21 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian approved this pull request.


https://github.com/llvm/llvm-project/pull/117213
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_smfmac_i32_32x32x64_i8 for gfx950 (PR #117214)

2024-11-21 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian edited 
https://github.com/llvm/llvm-project/pull/117214
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_smfmac_f32_16x16x128_bf8_fp8 for gfx950 (PR #117233)

2024-11-21 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian edited 
https://github.com/llvm/llvm-project/pull/117233
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_smfmac_f32_16x16x128_bf8_bf8 for gfx950 (PR #117232)

2024-11-21 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian edited 
https://github.com/llvm/llvm-project/pull/117232
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_smfmac_f32_16x16x128_bf8_fp8 for gfx950 (PR #117233)

2024-11-21 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian approved this pull request.


https://github.com/llvm/llvm-project/pull/117233
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_smfmac_f32_16x16x128_fp8_fp8 for gfx950 (PR #117235)

2024-11-21 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian approved this pull request.


https://github.com/llvm/llvm-project/pull/117235
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] AMDGPU: Add v_smfmac_f32_16x16x128_fp8_fp8 for gfx950 (PR #117235)

2024-11-21 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian edited 
https://github.com/llvm/llvm-project/pull/117235
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


  1   2   >