[llvm-branch-commits] [llvm] release/19.x: [MachineLICM] Don't allow hoisting invariant loads across mem barrier. (#116987) (PR #117154)

2024-11-25 Thread Tobias Hieta via llvm-branch-commits

tru wrote:

@david-arm Should this be merged?

https://github.com/llvm/llvm-project/pull/117154
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [compiler-rt] release/19.x: [compiler-rt] [test] Remove an unintended grep parameter (PR #116774)

2024-11-25 Thread Tobias Hieta via llvm-branch-commits

https://github.com/tru updated https://github.com/llvm/llvm-project/pull/116774

>From fb6b195cae03ba6e5b50870031d710ca6886c5bb Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Martin=20Storsj=C3=B6?= 
Date: Sun, 20 Oct 2024 13:51:50 +0300
Subject: [PATCH] [compiler-rt] [test] Remove an unintended grep parameter

This parameter seems unintentional here; we're trying to grep
the input on stdin, from the earlier stage in the pipeline.

Since a recent update on Github Actions runners, the previous
form (grepping a file, while piping in data on stdin) would fail
running the test, with the test runner Python script throwing
an exception when evaluating it:

  File 
"D:\a\llvm-mingw\llvm-mingw\llvm-project\llvm\utils\lit\lit\TestRunner.py", 
line 935, in _executeShCmd
out = procs[i].stdout.read()
  ^^
  File 
"C:\hostedtoolcache\windows\Python\3.12.7\x64\Lib\encodings\cp1252.py", line 
23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
   ^^^
TypeError: a bytes-like object is required, not 'NoneType'

(cherry picked from commit c2717a89b8437d041d532c7b2c535ca4f4b35872)
---
 compiler-rt/test/asan/TestCases/Windows/delay_dbghelp.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/compiler-rt/test/asan/TestCases/Windows/delay_dbghelp.cpp 
b/compiler-rt/test/asan/TestCases/Windows/delay_dbghelp.cpp
index 9277fe0b235160..38e99cf6859451 100644
--- a/compiler-rt/test/asan/TestCases/Windows/delay_dbghelp.cpp
+++ b/compiler-rt/test/asan/TestCases/Windows/delay_dbghelp.cpp
@@ -9,7 +9,7 @@
 // static build, there won't be any clang_rt DLLs.
 // RUN: not grep cl""ang_rt %t || \
 // RUN:   grep cl""ang_rt %t | xargs which | \
-// RUN:   xargs llvm-readobj --coff-imports | not grep dbghelp.dll %t
+// RUN:   xargs llvm-readobj --coff-imports | not grep dbghelp.dll
 
 extern "C" int puts(const char *);
 

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/19.x: [MachineLICM] Don't allow hoisting invariant loads across mem barrier. (#116987) (PR #117154)

2024-11-25 Thread David Sherwood via llvm-branch-commits

david-arm wrote:

> @david-arm Should this be merged?

Hi yes I think it should be merged. It's a fairly serious bug fix.

https://github.com/llvm/llvm-project/pull/117154
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [lld] [llvm] release/19.x: [MC][LoongArch] Change default cpu in `MCSubtargetInfo`. (#114922) (PR #117105)

2024-11-25 Thread via llvm-branch-commits

heiher wrote:

> Can you squash this PR so it's just one commit?

Sure, it's done now.

https://github.com/llvm/llvm-project/pull/117105
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] 3d12f45 - [SDAG][ISel][TableGen][LoongArch] Report error for trivial bitcasts when there are predicate calls (#116075)

2024-11-25 Thread Tobias Hieta via llvm-branch-commits

Author: Yingwei Zheng
Date: 2024-11-25T09:36:43+01:00
New Revision: 3d12f45e50b68ac908ef05571e5cc52f4b966d94

URL: 
https://github.com/llvm/llvm-project/commit/3d12f45e50b68ac908ef05571e5cc52f4b966d94
DIFF: 
https://github.com/llvm/llvm-project/commit/3d12f45e50b68ac908ef05571e5cc52f4b966d94.diff

LOG: [SDAG][ISel][TableGen][LoongArch] Report error for trivial bitcasts when 
there are predicate calls (#116075)

On loongarch64 with lsx extension, we select `VBITREV_W` for `v4i32 (xor
X, (shl splat(1), Y))`:

https://github.com/llvm/llvm-project/blob/8e6630391699116641cf390a10476295b7d4b95c/llvm/lib/Target/LoongArch/LoongArchLSXInstrInfo.td#L1583-L1584

And `vsplat_imm_eq_1` is defined as:

https://github.com/llvm/llvm-project/blob/8e6630391699116641cf390a10476295b7d4b95c/llvm/lib/Target/LoongArch/LoongArchLSXInstrInfo.td#L77-L87

For the `(bitconvert (v4i32 (build_vector)))` case, the pattern is
expected to be:
```
PATTERN: (xor:{ *:[v4i32] } v4i32:{ *:[v4i32] }:$vj, (shl:{ *:[v4i32] } 
(bitconvert:{ *:[v4i32] } (build_vector:{ *:[v4i32] 
}))<>, v4i32:{ *:[v4i32] }:$vk))
RESULT:  (VBITREV_W:{ *:[v4i32] } v4i32:{ *:[v4i32] }:$vj, v4i32:{ *:[v4i32] 
}:$vk)
```

However, `simplifyTree` drops the `bitconvert` node and its predicates:

https://github.com/llvm/llvm-project/blob/8e6630391699116641cf390a10476295b7d4b95c/llvm/utils/TableGen/Common/CodeGenDAGPatterns.cpp#L3036-L3062

Then llvm will match `vsplat_imm_eq_1` for any v4i32 splats and cause a
miscompilation:
```
PATTERN: (xor:{ *:[v4i32] } v4i32:{ *:[v4i32] }:$vj, (shl:{ *:[v4i32] } 
(build_vector:{ *:[v4i32] }), v4i32:{ *:[v4i32] }:$vk))
RESULT:  (VBITREV_W:{ *:[v4i32] } v4i32:{ *:[v4i32] }:$vj, v4i32:{ *:[v4i32] 
}:$vk)
```

This patch adds additional checks for predicates associated with the
trivial bitconvert node. Unused patterns in the LoongArch target are
also removed.

Fixes https://github.com/llvm/llvm-project/issues/116008.

(cherry picked from commit c727b48287cc96888f9e262f23d53cf635cf3b3d)

Added: 
llvm/test/CodeGen/LoongArch/lsx/pr116008.ll

Modified: 
llvm/lib/Target/LoongArch/LoongArchLSXInstrInfo.td
llvm/utils/TableGen/Common/CodeGenDAGPatterns.cpp

Removed: 




diff  --git a/llvm/lib/Target/LoongArch/LoongArchLSXInstrInfo.td 
b/llvm/lib/Target/LoongArch/LoongArchLSXInstrInfo.td
index 0580683c3ce303..0233baecf6dd9c 100644
--- a/llvm/lib/Target/LoongArch/LoongArchLSXInstrInfo.td
+++ b/llvm/lib/Target/LoongArch/LoongArchLSXInstrInfo.td
@@ -67,8 +67,7 @@ class VecCondgetValueType(0).getVectorElementType();
 
@@ -109,8 +108,7 @@ def vsplati32_imm_eq_31 : PatFrags<(ops), [(build_vector)], 
[{
   return selectVSplat(N, Imm, EltTy.getSizeInBits()) &&
  Imm.getBitWidth() == EltTy.getSizeInBits() && Imm == 31;
 }]>;
-def vsplati64_imm_eq_63 : PatFrags<(ops), [(build_vector),
-   (bitconvert (v4i32 
(build_vector)))], [{
+def vsplati64_imm_eq_63 : PatFrags<(ops), [(build_vector)], [{
   APInt Imm;
   EVT EltTy = N->getValueType(0).getVectorElementType();
 

diff  --git a/llvm/test/CodeGen/LoongArch/lsx/pr116008.ll 
b/llvm/test/CodeGen/LoongArch/lsx/pr116008.ll
new file mode 100644
index 00..ba8ffc34931893
--- /dev/null
+++ b/llvm/test/CodeGen/LoongArch/lsx/pr116008.ll
@@ -0,0 +1,17 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py 
UTC_ARGS: --version 5
+; RUN: llc --mtriple=loongarch64 --mattr=+lsx < %s | FileCheck %s
+
+define <4 x i32> @xor_shl_splat_vec_one(i32 %x, <4 x i32> %y) nounwind {
+; CHECK-LABEL: xor_shl_splat_vec_one:
+; CHECK:   # %bb.0: # %entry
+; CHECK-NEXT:vreplgr2vr.w $vr1, $a0
+; CHECK-NEXT:vsll.w $vr0, $vr1, $vr0
+; CHECK-NEXT:vbitrevi.w $vr0, $vr0, 0
+; CHECK-NEXT:ret
+entry:
+  %ins = insertelement <4 x i32> poison, i32 %x, i64 0
+  %splat = shufflevector <4 x i32> %ins, <4 x i32> poison, <4 x i32> 
zeroinitializer
+  %shl = shl <4 x i32> %splat, %y
+  %xor = xor <4 x i32> %shl, splat (i32 1)
+  ret <4 x i32> %xor
+}

diff  --git a/llvm/utils/TableGen/Common/CodeGenDAGPatterns.cpp 
b/llvm/utils/TableGen/Common/CodeGenDAGPatterns.cpp
index a8cecca0d4a54f..ca71569008d5ec 100644
--- a/llvm/utils/TableGen/Common/CodeGenDAGPatterns.cpp
+++ b/llvm/utils/TableGen/Common/CodeGenDAGPatterns.cpp
@@ -3042,6 +3042,14 @@ static bool SimplifyTree(TreePatternNodePtr &N) {
   !N->getExtType(0).empty() &&
   N->getExtType(0) == N->getChild(0).getExtType(0) &&
   N->getName().empty()) {
+if (!N->getPredicateCalls().empty()) {
+  std::string Str;
+  raw_string_ostream OS(Str);
+  OS << *N
+ << "\n trivial bitconvert node should not have predicate calls\n";
+  PrintFatalError(Str);
+  return false;
+}
 N = N->getChildShared(0);
 SimplifyTree(N);
 return true;



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm

[llvm-branch-commits] [llvm] release/19.x: [SDAG][ISel][TableGen][LoongArch] Report error for trivial bitcasts when there are predicate calls (#116075) (PR #116797)

2024-11-25 Thread via llvm-branch-commits

github-actions[bot] wrote:



@llvmbot Congratulations on having your first Pull Request (PR) merged into the 
LLVM Project!

Your changes will be combined with recent changes from other authors, then 
tested by our [build bots](https://lab.llvm.org/buildbot/). If there is a 
problem with a build, you may receive a report in an email or a comment on this 
PR.

Please check whether problems have been caused by your change specifically, as 
the builds can include changes from many authors. It is not uncommon for your 
change to be included in a build that fails due to someone else's changes, or 
infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail 
[here](https://llvm.org/docs/MyFirstTypoFix.html#myfirsttypofix-issues-after-landing-your-pr).

If your change does cause a problem, it may be reverted, or you can revert it 
yourself. This is a normal part of [LLVM 
development](https://llvm.org/docs/DeveloperPolicy.html#patch-reversion-policy).
 You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are 
working as expected, well done!


https://github.com/llvm/llvm-project/pull/116797
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] f9ae37c - [InstCombine] Handle constant GEP expr in `SimplifyDemandedUseBits` (#116794)

2024-11-25 Thread Tobias Hieta via llvm-branch-commits

Author: Yingwei Zheng
Date: 2024-11-25T09:37:30+01:00
New Revision: f9ae37c670d4bcf4713278ac94d2c8991a326f9e

URL: 
https://github.com/llvm/llvm-project/commit/f9ae37c670d4bcf4713278ac94d2c8991a326f9e
DIFF: 
https://github.com/llvm/llvm-project/commit/f9ae37c670d4bcf4713278ac94d2c8991a326f9e.diff

LOG: [InstCombine] Handle constant GEP expr in `SimplifyDemandedUseBits` 
(#116794)

Closes https://github.com/llvm/llvm-project/issues/116775.

(cherry picked from commit 03d8831fa8ef5b7e32172c718b550a454645faea)

Added: 


Modified: 
llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
llvm/test/Transforms/InstCombine/ptrmask.ll

Removed: 




diff  --git a/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp 
b/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
index 8a6ec3076ac621..b9d06b59368508 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
@@ -1004,7 +1004,7 @@ Value 
*InstCombinerImpl::SimplifyDemandedUseBits(Instruction *I,
 uint64_t MaskedGEPIndex = HighBitsGEPIndex | MaskedLowBitsGEPIndex;
 
 if (MaskedGEPIndex != GEPIndex) {
-  auto *GEP = cast(II->getArgOperand(0));
+  auto *GEP = cast(II->getArgOperand(0));
   Builder.SetInsertPoint(I);
   Type *GEPIndexType =
   DL.getIndexType(GEP->getPointerOperand()->getType());

diff  --git a/llvm/test/Transforms/InstCombine/ptrmask.ll 
b/llvm/test/Transforms/InstCombine/ptrmask.ll
index 4631b81cd1ce1f..cd998bac3f9f0d 100644
--- a/llvm/test/Transforms/InstCombine/ptrmask.ll
+++ b/llvm/test/Transforms/InstCombine/ptrmask.ll
@@ -578,3 +578,16 @@ define ptr @ptrmask_is_useless_fail1(i64 %i, i64 %m) {
   %r = call ptr @llvm.ptrmask.p0.i64(ptr %p0, i64 %m0)
   ret ptr %r
 }
+
+@GC_arrays = external global { i8, i8, i64 }
+
+define ptr @ptrmask_demandedbits_constantexpr() {
+; CHECK-LABEL: define ptr @ptrmask_demandedbits_constantexpr() {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:[[ALIGNED_RESULT:%.*]] = call align 8 ptr 
@llvm.ptrmask.p0.i64(ptr nonnull @GC_arrays, i64 -8)
+; CHECK-NEXT:ret ptr [[ALIGNED_RESULT]]
+;
+entry:
+  %aligned_result = call ptr @llvm.ptrmask.p0.i64(ptr getelementptr inbounds 
(i8, ptr @GC_arrays, i64 1), i64 -8)
+  ret ptr %aligned_result
+}



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/19.x: [InstCombine] Handle constant GEP expr in `SimplifyDemandedUseBits` (#116794) (PR #116814)

2024-11-25 Thread Tobias Hieta via llvm-branch-commits

https://github.com/tru updated https://github.com/llvm/llvm-project/pull/116814

>From f9ae37c670d4bcf4713278ac94d2c8991a326f9e Mon Sep 17 00:00:00 2001
From: Yingwei Zheng 
Date: Tue, 19 Nov 2024 22:17:24 +0800
Subject: [PATCH] [InstCombine] Handle constant GEP expr in
 `SimplifyDemandedUseBits` (#116794)

Closes https://github.com/llvm/llvm-project/issues/116775.

(cherry picked from commit 03d8831fa8ef5b7e32172c718b550a454645faea)
---
 .../InstCombine/InstCombineSimplifyDemanded.cpp |  2 +-
 llvm/test/Transforms/InstCombine/ptrmask.ll | 13 +
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp 
b/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
index 8a6ec3076ac621..b9d06b59368508 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
@@ -1004,7 +1004,7 @@ Value 
*InstCombinerImpl::SimplifyDemandedUseBits(Instruction *I,
 uint64_t MaskedGEPIndex = HighBitsGEPIndex | MaskedLowBitsGEPIndex;
 
 if (MaskedGEPIndex != GEPIndex) {
-  auto *GEP = cast(II->getArgOperand(0));
+  auto *GEP = cast(II->getArgOperand(0));
   Builder.SetInsertPoint(I);
   Type *GEPIndexType =
   DL.getIndexType(GEP->getPointerOperand()->getType());
diff --git a/llvm/test/Transforms/InstCombine/ptrmask.ll 
b/llvm/test/Transforms/InstCombine/ptrmask.ll
index 4631b81cd1ce1f..cd998bac3f9f0d 100644
--- a/llvm/test/Transforms/InstCombine/ptrmask.ll
+++ b/llvm/test/Transforms/InstCombine/ptrmask.ll
@@ -578,3 +578,16 @@ define ptr @ptrmask_is_useless_fail1(i64 %i, i64 %m) {
   %r = call ptr @llvm.ptrmask.p0.i64(ptr %p0, i64 %m0)
   ret ptr %r
 }
+
+@GC_arrays = external global { i8, i8, i64 }
+
+define ptr @ptrmask_demandedbits_constantexpr() {
+; CHECK-LABEL: define ptr @ptrmask_demandedbits_constantexpr() {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:[[ALIGNED_RESULT:%.*]] = call align 8 ptr 
@llvm.ptrmask.p0.i64(ptr nonnull @GC_arrays, i64 -8)
+; CHECK-NEXT:ret ptr [[ALIGNED_RESULT]]
+;
+entry:
+  %aligned_result = call ptr @llvm.ptrmask.p0.i64(ptr getelementptr inbounds 
(i8, ptr @GC_arrays, i64 1), i64 -8)
+  ret ptr %aligned_result
+}

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/19.x: [LICM] allow MemoryAccess creation failure (#116813) (PR #117082)

2024-11-25 Thread via llvm-branch-commits

github-actions[bot] wrote:

@DianQK (or anyone else). If you would like to add a note about this fix in the 
release notes (completely optional). Please reply to this comment with a one or 
two sentence description of the fix.  When you are done, please add the 
release:note label to this PR. 

https://github.com/llvm/llvm-project/pull/117082
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoongArch] Fix GOT usage for `non-dso_local` function calls in large code model (PR #117134)

2024-11-25 Thread Tobias Hieta via llvm-branch-commits

https://github.com/tru closed https://github.com/llvm/llvm-project/pull/117134
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] 336f877 - [LoongArch] Fix GOT usage for `non-dso_local` function calls in large code model

2024-11-25 Thread Tobias Hieta via llvm-branch-commits

Author: wanglei
Date: 2024-11-25T09:45:06+01:00
New Revision: 336f87753b510aed840daf87f8d3a4996e6c8f15

URL: 
https://github.com/llvm/llvm-project/commit/336f87753b510aed840daf87f8d3a4996e6c8f15
DIFF: 
https://github.com/llvm/llvm-project/commit/336f87753b510aed840daf87f8d3a4996e6c8f15.diff

LOG: [LoongArch] Fix GOT usage for `non-dso_local` function calls in large code 
model

This commit fixes an issue in the large code model where non-dso_local
function calls did not use the GOT as expected in PIC mode. Instead,
direct PC-relative access was incorrectly applied, leading to linker
errors when building shared libraries.

For `ExternalSymbol`, it is not possible to determine whether it is
dso_local during pseudo-instruction expansion. We use target flags to
differentiate whether GOT should be used.

Cherry-picked from #117099, used for fix linker errors when bulding
shared libraries with large code model.

Added: 


Modified: 
llvm/lib/Target/LoongArch/LoongArchExpandPseudoInsts.cpp
llvm/test/CodeGen/LoongArch/code-models.ll
llvm/test/CodeGen/LoongArch/machinelicm-address-pseudos.ll
llvm/test/CodeGen/LoongArch/psabi-restricted-scheduling.ll
llvm/test/CodeGen/LoongArch/tls-models.ll

Removed: 




diff  --git a/llvm/lib/Target/LoongArch/LoongArchExpandPseudoInsts.cpp 
b/llvm/lib/Target/LoongArch/LoongArchExpandPseudoInsts.cpp
index c136f5b3e515d7..e680dda7374d07 100644
--- a/llvm/lib/Target/LoongArch/LoongArchExpandPseudoInsts.cpp
+++ b/llvm/lib/Target/LoongArch/LoongArchExpandPseudoInsts.cpp
@@ -721,7 +721,7 @@ bool LoongArchExpandPseudo::expandFunctionCALL(
 IsTailCall ? LoongArch::PseudoJIRL_TAIL : LoongArch::PseudoJIRL_CALL;
 Register AddrReg = IsTailCall ? LoongArch::R19 : LoongArch::R1;
 
-bool UseGOT = Func.isGlobal() && !Func.getGlobal()->isDSOLocal();
+bool UseGOT = Func.getTargetFlags() == LoongArchII::MO_CALL_PLT;
 unsigned MO = UseGOT ? LoongArchII::MO_GOT_PC_HI : 
LoongArchII::MO_PCREL_LO;
 unsigned LAOpcode = UseGOT ? LoongArch::LDX_D : LoongArch::ADD_D;
 expandLargeAddressLoad(MBB, MBBI, NextMBBI, LAOpcode, MO, Func, AddrReg,

diff  --git a/llvm/test/CodeGen/LoongArch/code-models.ll 
b/llvm/test/CodeGen/LoongArch/code-models.ll
index 4b2b72afaee171..4eb1e5e596fd3f 100644
--- a/llvm/test/CodeGen/LoongArch/code-models.ll
+++ b/llvm/test/CodeGen/LoongArch/code-models.ll
@@ -82,11 +82,11 @@ define void @call_external_sym(ptr %dst) {
 ; LARGE-NEXT:.cfi_offset 1, -8
 ; LARGE-NEXT:ori $a2, $zero, 1000
 ; LARGE-NEXT:move $a1, $zero
-; LARGE-NEXT:pcalau12i $ra, %pc_hi20(memset)
-; LARGE-NEXT:addi.d $t8, $zero, %pc_lo12(memset)
-; LARGE-NEXT:lu32i.d $t8, %pc64_lo20(memset)
-; LARGE-NEXT:lu52i.d $t8, $t8, %pc64_hi12(memset)
-; LARGE-NEXT:add.d $ra, $t8, $ra
+; LARGE-NEXT:pcalau12i $ra, %got_pc_hi20(memset)
+; LARGE-NEXT:addi.d $t8, $zero, %got_pc_lo12(memset)
+; LARGE-NEXT:lu32i.d $t8, %got64_pc_lo20(memset)
+; LARGE-NEXT:lu52i.d $t8, $t8, %got64_pc_hi12(memset)
+; LARGE-NEXT:ldx.d $ra, $t8, $ra
 ; LARGE-NEXT:jirl $ra, $ra, 0
 ; LARGE-NEXT:ld.d $ra, $sp, 8 # 8-byte Folded Reload
 ; LARGE-NEXT:addi.d $sp, $sp, 16

diff  --git a/llvm/test/CodeGen/LoongArch/machinelicm-address-pseudos.ll 
b/llvm/test/CodeGen/LoongArch/machinelicm-address-pseudos.ll
index ed1a24e82b4e46..29348fe0d641ed 100644
--- a/llvm/test/CodeGen/LoongArch/machinelicm-address-pseudos.ll
+++ b/llvm/test/CodeGen/LoongArch/machinelicm-address-pseudos.ll
@@ -282,11 +282,11 @@ define void @test_la_tls_ld(i32 signext %n) {
 ; LA64LARGE-NEXT:  .LBB3_1: # %loop
 ; LA64LARGE-NEXT:# =>This Inner Loop Header: Depth=1
 ; LA64LARGE-NEXT:move $a0, $s0
-; LA64LARGE-NEXT:pcalau12i $ra, %pc_hi20(__tls_get_addr)
-; LA64LARGE-NEXT:addi.d $t8, $zero, %pc_lo12(__tls_get_addr)
-; LA64LARGE-NEXT:lu32i.d $t8, %pc64_lo20(__tls_get_addr)
-; LA64LARGE-NEXT:lu52i.d $t8, $t8, %pc64_hi12(__tls_get_addr)
-; LA64LARGE-NEXT:add.d $ra, $t8, $ra
+; LA64LARGE-NEXT:pcalau12i $ra, %got_pc_hi20(__tls_get_addr)
+; LA64LARGE-NEXT:addi.d $t8, $zero, %got_pc_lo12(__tls_get_addr)
+; LA64LARGE-NEXT:lu32i.d $t8, %got64_pc_lo20(__tls_get_addr)
+; LA64LARGE-NEXT:lu52i.d $t8, $t8, %got64_pc_hi12(__tls_get_addr)
+; LA64LARGE-NEXT:ldx.d $ra, $t8, $ra
 ; LA64LARGE-NEXT:jirl $ra, $ra, 0
 ; LA64LARGE-NEXT:ld.w $zero, $a0, 0
 ; LA64LARGE-NEXT:addi.w $s1, $s1, 1
@@ -448,11 +448,11 @@ define void @test_la_tls_gd(i32 signext %n) nounwind {
 ; LA64LARGE-NEXT:  .LBB5_1: # %loop
 ; LA64LARGE-NEXT:# =>This Inner Loop Header: Depth=1
 ; LA64LARGE-NEXT:move $a0, $s0
-; LA64LARGE-NEXT:pcalau12i $ra, %pc_hi20(__tls_get_addr)
-; LA64LARGE-NEXT:addi.d $t8, $zero, %pc_lo12(__tls_get_addr)
-; LA64LARGE-NEXT:lu32i.d $t8, %pc64_lo20(__tls_get_addr)
-; LA64LARGE-NEXT:lu52i.d $t8, $t8, %pc64_hi12(__tls_get_addr)
-; LA64LARGE-N

[llvm-branch-commits] [llvm] [LoongArch] Fix GOT usage for `non-dso_local` function calls in large code model (PR #117134)

2024-11-25 Thread Tobias Hieta via llvm-branch-commits

https://github.com/tru updated https://github.com/llvm/llvm-project/pull/117134

>From 336f87753b510aed840daf87f8d3a4996e6c8f15 Mon Sep 17 00:00:00 2001
From: wanglei 
Date: Thu, 21 Nov 2024 09:31:12 +0800
Subject: [PATCH] [LoongArch] Fix GOT usage for `non-dso_local` function calls
 in large code model

This commit fixes an issue in the large code model where non-dso_local
function calls did not use the GOT as expected in PIC mode. Instead,
direct PC-relative access was incorrectly applied, leading to linker
errors when building shared libraries.

For `ExternalSymbol`, it is not possible to determine whether it is
dso_local during pseudo-instruction expansion. We use target flags to
differentiate whether GOT should be used.

Cherry-picked from #117099, used for fix linker errors when bulding
shared libraries with large code model.
---
 .../LoongArch/LoongArchExpandPseudoInsts.cpp  |  2 +-
 llvm/test/CodeGen/LoongArch/code-models.ll| 10 ++---
 .../LoongArch/machinelicm-address-pseudos.ll  | 20 +-
 .../LoongArch/psabi-restricted-scheduling.ll  | 40 +--
 llvm/test/CodeGen/LoongArch/tls-models.ll | 20 +-
 5 files changed, 46 insertions(+), 46 deletions(-)

diff --git a/llvm/lib/Target/LoongArch/LoongArchExpandPseudoInsts.cpp 
b/llvm/lib/Target/LoongArch/LoongArchExpandPseudoInsts.cpp
index c136f5b3e515d7..e680dda7374d07 100644
--- a/llvm/lib/Target/LoongArch/LoongArchExpandPseudoInsts.cpp
+++ b/llvm/lib/Target/LoongArch/LoongArchExpandPseudoInsts.cpp
@@ -721,7 +721,7 @@ bool LoongArchExpandPseudo::expandFunctionCALL(
 IsTailCall ? LoongArch::PseudoJIRL_TAIL : LoongArch::PseudoJIRL_CALL;
 Register AddrReg = IsTailCall ? LoongArch::R19 : LoongArch::R1;
 
-bool UseGOT = Func.isGlobal() && !Func.getGlobal()->isDSOLocal();
+bool UseGOT = Func.getTargetFlags() == LoongArchII::MO_CALL_PLT;
 unsigned MO = UseGOT ? LoongArchII::MO_GOT_PC_HI : 
LoongArchII::MO_PCREL_LO;
 unsigned LAOpcode = UseGOT ? LoongArch::LDX_D : LoongArch::ADD_D;
 expandLargeAddressLoad(MBB, MBBI, NextMBBI, LAOpcode, MO, Func, AddrReg,
diff --git a/llvm/test/CodeGen/LoongArch/code-models.ll 
b/llvm/test/CodeGen/LoongArch/code-models.ll
index 4b2b72afaee171..4eb1e5e596fd3f 100644
--- a/llvm/test/CodeGen/LoongArch/code-models.ll
+++ b/llvm/test/CodeGen/LoongArch/code-models.ll
@@ -82,11 +82,11 @@ define void @call_external_sym(ptr %dst) {
 ; LARGE-NEXT:.cfi_offset 1, -8
 ; LARGE-NEXT:ori $a2, $zero, 1000
 ; LARGE-NEXT:move $a1, $zero
-; LARGE-NEXT:pcalau12i $ra, %pc_hi20(memset)
-; LARGE-NEXT:addi.d $t8, $zero, %pc_lo12(memset)
-; LARGE-NEXT:lu32i.d $t8, %pc64_lo20(memset)
-; LARGE-NEXT:lu52i.d $t8, $t8, %pc64_hi12(memset)
-; LARGE-NEXT:add.d $ra, $t8, $ra
+; LARGE-NEXT:pcalau12i $ra, %got_pc_hi20(memset)
+; LARGE-NEXT:addi.d $t8, $zero, %got_pc_lo12(memset)
+; LARGE-NEXT:lu32i.d $t8, %got64_pc_lo20(memset)
+; LARGE-NEXT:lu52i.d $t8, $t8, %got64_pc_hi12(memset)
+; LARGE-NEXT:ldx.d $ra, $t8, $ra
 ; LARGE-NEXT:jirl $ra, $ra, 0
 ; LARGE-NEXT:ld.d $ra, $sp, 8 # 8-byte Folded Reload
 ; LARGE-NEXT:addi.d $sp, $sp, 16
diff --git a/llvm/test/CodeGen/LoongArch/machinelicm-address-pseudos.ll 
b/llvm/test/CodeGen/LoongArch/machinelicm-address-pseudos.ll
index ed1a24e82b4e46..29348fe0d641ed 100644
--- a/llvm/test/CodeGen/LoongArch/machinelicm-address-pseudos.ll
+++ b/llvm/test/CodeGen/LoongArch/machinelicm-address-pseudos.ll
@@ -282,11 +282,11 @@ define void @test_la_tls_ld(i32 signext %n) {
 ; LA64LARGE-NEXT:  .LBB3_1: # %loop
 ; LA64LARGE-NEXT:# =>This Inner Loop Header: Depth=1
 ; LA64LARGE-NEXT:move $a0, $s0
-; LA64LARGE-NEXT:pcalau12i $ra, %pc_hi20(__tls_get_addr)
-; LA64LARGE-NEXT:addi.d $t8, $zero, %pc_lo12(__tls_get_addr)
-; LA64LARGE-NEXT:lu32i.d $t8, %pc64_lo20(__tls_get_addr)
-; LA64LARGE-NEXT:lu52i.d $t8, $t8, %pc64_hi12(__tls_get_addr)
-; LA64LARGE-NEXT:add.d $ra, $t8, $ra
+; LA64LARGE-NEXT:pcalau12i $ra, %got_pc_hi20(__tls_get_addr)
+; LA64LARGE-NEXT:addi.d $t8, $zero, %got_pc_lo12(__tls_get_addr)
+; LA64LARGE-NEXT:lu32i.d $t8, %got64_pc_lo20(__tls_get_addr)
+; LA64LARGE-NEXT:lu52i.d $t8, $t8, %got64_pc_hi12(__tls_get_addr)
+; LA64LARGE-NEXT:ldx.d $ra, $t8, $ra
 ; LA64LARGE-NEXT:jirl $ra, $ra, 0
 ; LA64LARGE-NEXT:ld.w $zero, $a0, 0
 ; LA64LARGE-NEXT:addi.w $s1, $s1, 1
@@ -448,11 +448,11 @@ define void @test_la_tls_gd(i32 signext %n) nounwind {
 ; LA64LARGE-NEXT:  .LBB5_1: # %loop
 ; LA64LARGE-NEXT:# =>This Inner Loop Header: Depth=1
 ; LA64LARGE-NEXT:move $a0, $s0
-; LA64LARGE-NEXT:pcalau12i $ra, %pc_hi20(__tls_get_addr)
-; LA64LARGE-NEXT:addi.d $t8, $zero, %pc_lo12(__tls_get_addr)
-; LA64LARGE-NEXT:lu32i.d $t8, %pc64_lo20(__tls_get_addr)
-; LA64LARGE-NEXT:lu52i.d $t8, $t8, %pc64_hi12(__tls_get_addr)
-; LA64LARGE-NEXT:add.d $ra, $t8, $ra
+; LA64LARGE-NEXT:pcalau12i $ra, %got_pc_hi20(__tls_get_addr)
+; LA64LARGE-NEXT:a

[llvm-branch-commits] [llvm] [LoongArch] Fix GOT usage for `non-dso_local` function calls in large code model (PR #117134)

2024-11-25 Thread via llvm-branch-commits

github-actions[bot] wrote:

@wangleiat (or anyone else). If you would like to add a note about this fix in 
the release notes (completely optional). Please reply to this comment with a 
one or two sentence description of the fix.  When you are done, please add the 
release:note label to this PR. 

https://github.com/llvm/llvm-project/pull/117134
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/19.x: [SCEV] Fix sext handling for `getConstantMultiple` (#117093) (PR #117136)

2024-11-25 Thread Tobias Hieta via llvm-branch-commits

https://github.com/tru closed https://github.com/llvm/llvm-project/pull/117136
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [compiler-rt] release/19.x: [compiler-rt] [test] Remove an unintended grep parameter (PR #116774)

2024-11-25 Thread Tobias Hieta via llvm-branch-commits

https://github.com/tru closed https://github.com/llvm/llvm-project/pull/116774
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [flang] handle fir.call in AliasAnalysis::getModRef (PR #117164)

2024-11-25 Thread Tom Eccles via llvm-branch-commits


@@ -329,14 +341,92 @@ AliasResult AliasAnalysis::alias(Source lhsSrc, Source 
rhsSrc, mlir::Value lhs,
 // AliasAnalysis: getModRef
 
//===--===//
 
+static bool isSavedLocal(const fir::AliasAnalysis::Source &src) {
+  if (auto symRef = llvm::dyn_cast(src.origin.u)) {
+auto [nameKind, deconstruct] =
+fir::NameUniquer::deconstruct(symRef.getLeafReference().getValue());
+return nameKind == fir::NameUniquer::NameKind::VARIABLE &&
+   !deconstruct.procs.empty();
+  }
+  return false;
+}
+
+static bool isCallToFortranUserProcedure(fir::CallOp call) {
+  // TODO: indirect calls are excluded by these checks. Maybe some attribute is
+  // needed to flag user calls in this case.
+  if (fir::hasBindcAttr(call))
+return true;
+  if (std::optional callee = call.getCallee())
+return fir::NameUniquer::deconstruct(callee->getLeafReference().getValue())
+   .first == fir::NameUniquer::NameKind::PROCEDURE;
+  return false;
+}
+
+static ModRefResult getCallModRef(fir::CallOp call, mlir::Value var) {
+  // TODO: limit to Fortran functions??
+  // 1. Detect variables that can be accessed indirectly.
+  fir::AliasAnalysis aliasAnalysis;
+  fir::AliasAnalysis::Source varSrc = aliasAnalysis.getSource(var);
+  // If the variable is not a user variable, we cannot safely assume that
+  // Fortran semantics apply (e.g., a bare alloca/allocmem result may very well
+  // be placed in an allocatable/pointer descriptor and escape).
+
+  // All the logic bellows are based on Fortran semantics and only holds if 
this
+  // is a call to a procedure form the Fortran source and this is a variable
+  // from the Fortran source. Compiler generated temporaries or functions may
+  // not adhere to this semantic.
+  // TODO: add some opt-in or op-out mechanism for compiler generated temps.
+  // An example of something currently problematic is the allocmem generated 
for
+  // ALLOCATE of allocatable target. It currently does not have the target
+  // attribute, which would lead this analysis to believe it cannot escape.
+  if (!varSrc.isFortranUserVariable() || !isCallToFortranUserProcedure(call))
+return ModRefResult::getModAndRef();
+  // Pointer and target may have been captured.
+  if (varSrc.isTargetOrPointer())
+return ModRefResult::getModAndRef();
+  // Host associated variables may be addressed indirectly via an internal
+  // function call, whether the call is in the parent or an internal procedure.
+  // Note that the host associated/internal procedure may be referenced
+  // indirectly inside calls to non internal procedure. This is because 
internal
+  // procedures may be captured or passed. As this is tricky to analyze, always
+  // consider such variables may be accessed in any calls.
+  if (varSrc.kind == fir::AliasAnalysis::SourceKind::HostAssoc ||
+  varSrc.isCapturedInInternalProcedure)
+return ModRefResult::getModAndRef();
+  // At that stage, it has been ruled out that local (including the saved ones)
+  // and dummy cannot be indirectly accessed in the call.
+  if (varSrc.kind != fir::AliasAnalysis::SourceKind::Allocate &&
+  !varSrc.isDummyArgument()) {
+if (varSrc.kind != fir::AliasAnalysis::SourceKind::Global ||
+!isSavedLocal(varSrc))
+  return ModRefResult::getModAndRef();
+  }
+  // 2. Check if the variable is passed via the arguments.
+  for (auto arg : call.getArgs()) {
+if (fir::conformsWithPassByRef(arg.getType()) &&
+!aliasAnalysis.alias(arg, var).isNo()) {
+  // TODO: intent(in) would allow returning Ref here. This can be obtained
+  // in the func.func attributes for direct calls, but the module lookup is
+  // linear with the number of MLIR symbols, which would introduce a pseudo
+  // quadratic behavior num_calls * num_func.

tblah wrote:

That sounds great!

https://github.com/llvm/llvm-project/pull/117164
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [MLIR][OpenMP] Add Lowering support for OpenMP Declare Mapper directive (PR #117046)

2024-11-25 Thread Tom Eccles via llvm-branch-commits


@@ -2701,7 +2701,42 @@ static void
 genOMP(lower::AbstractConverter &converter, lower::SymMap &symTable,
semantics::SemanticsContext &semaCtx, lower::pft::Evaluation &eval,
const parser::OpenMPDeclareMapperConstruct &declareMapperConstruct) {
-  TODO(converter.getCurrentLocation(), "OpenMPDeclareMapperConstruct");
+  fir::FirOpBuilder &firOpBuilder = converter.getFirOpBuilder();
+  lower::StatementContext stmtCtx;
+  const auto &spec =
+  std::get(declareMapperConstruct.t);
+  const auto &mapperName{std::get>(spec.t)};
+  const auto &varType{std::get(spec.t)};
+  const auto &varName{std::get(spec.t)};
+  assert(varType.declTypeSpec->category() ==
+ semantics::DeclTypeSpec::Category::TypeDerived &&
+ "Expected derived type");
+
+  std::string mapperNameStr;
+  if (mapperName.has_value())
+mapperNameStr = mapperName->ToString();
+  else
+mapperNameStr =
+"default_" + varType.declTypeSpec->derivedTypeSpec().name().ToString();
+
+  mlir::OpBuilder::InsertPoint insPt = firOpBuilder.saveInsertionPoint();
+  firOpBuilder.setInsertionPointToStart(converter.getModuleOp().getBody());
+  auto mlirType = converter.genType(varType.declTypeSpec->derivedTypeSpec());
+  auto varVal = firOpBuilder.createTemporaryAlloc(
+  converter.getCurrentLocation(), mlirType, varName.ToString());

tblah wrote:

Sorry I didn't notice this before.

So far as I understand, this will create the `fir.alloca` and `hlfir.declare` 
at the beginning of the MLIR module, not nested in any intermediate operation.

How do you intend to lower this to LLVMIR? We would normally nest these in some 
kind of "function-like" wrapper operation e.g. `func.func` `fir.global` 
`omp.private` etc. I wonder if the declare mapper operation needs a nested 
region for this allocation (like we do for `omp.private` and 
`omp.declare_reduction`).

https://github.com/llvm/llvm-project/pull/117046
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: MC support for v_cvt_scalef32_pk_f32_[fp|bf]8 of gfx950. (PR #117383)

2024-11-25 Thread Matt Arsenault via llvm-branch-commits

arsenm wrote:

### Merge activity

* **Nov 25, 12:19 PM EST**: A user started a stack merge that includes this 
pull request via 
[Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/117383).


https://github.com/llvm/llvm-project/pull/117383
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Handle gfx950 valu write vdst + permlane read hazard (PR #117287)

2024-11-25 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian approved this pull request.


https://github.com/llvm/llvm-project/pull/117287
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Handle gfx950 valu write vdst + permlane read hazard (PR #117287)

2024-11-25 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian edited 
https://github.com/llvm/llvm-project/pull/117287
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] AMDGPU: Add support for load transpose instructions for gfx950 (PR #117378)

2024-11-25 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian edited 
https://github.com/llvm/llvm-project/pull/117378
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] AMDGPU: Add support for load transpose instructions for gfx950 (PR #117378)

2024-11-25 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian approved this pull request.


https://github.com/llvm/llvm-project/pull/117378
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Handle gfx950 valu write vdst + permlane read hazard (PR #117287)

2024-11-25 Thread Matt Arsenault via llvm-branch-commits


@@ -2551,8 +2551,34 @@ int 
GCNHazardRecognizer::checkPermlaneHazards(MachineInstr *MI) {
 return isVCmpXWritesExec(*TII, *TRI, MI);
   };
 
-  const int NumWaitStates = 4;
-  return NumWaitStates - getWaitStatesSince(IsVCmpXWritesExecFn, 
NumWaitStates);
+  auto IsVALUFn = [](const MachineInstr &MI) {
+return SIInstrInfo::isVALU(MI);
+  };
+
+  const int VCmpXWritesExecWaitStates = 4;
+  const int VALUWritesVDstWaitStates = 2;
+  int WaitStatesNeeded = 0;
+
+  for (const MachineOperand &Op : MI->explicit_uses()) {
+if (!Op.isReg() || !TRI->isVGPR(MF.getRegInfo(), Op.getReg()))
+  continue;
+Register Reg = Op.getReg();
+
+int WaitStatesSinceDef =
+VALUWritesVDstWaitStates -
+getWaitStatesSinceDef(Reg, IsVALUFn,
+  /*MaxWaitStates=*/VALUWritesVDstWaitStates);

arsenm wrote:

The usage doesn't exactly map to the definition name though 

https://github.com/llvm/llvm-project/pull/117287
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: MC support for v_cvt_scale_[f16|f32]_fp8 of gfx950. (PR #117380)

2024-11-25 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian approved this pull request.


https://github.com/llvm/llvm-project/pull/117380
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Add MC support for gfx950 V_BITOP3_B32/B16 (PR #117379)

2024-11-25 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian approved this pull request.


https://github.com/llvm/llvm-project/pull/117379
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: MC support for v_cvt_scale_[f16|f32]_fp8 of gfx950. (PR #117380)

2024-11-25 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian edited 
https://github.com/llvm/llvm-project/pull/117380
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Add MC support for gfx950 V_BITOP3_B32/B16 (PR #117379)

2024-11-25 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian edited 
https://github.com/llvm/llvm-project/pull/117379
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: MC support for v_cvt_scalef32_pk_{fp8|bf8}_f32 of gfx950. (PR #117382)

2024-11-25 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian approved this pull request.


https://github.com/llvm/llvm-project/pull/117382
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] AMDGPU: Add support for load transpose instructions for gfx950 (PR #117378)

2024-11-25 Thread Matt Arsenault via llvm-branch-commits

arsenm wrote:

### Merge activity

* **Nov 25, 12:19 PM EST**: A user started a stack merge that includes this 
pull request via 
[Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/117378).


https://github.com/llvm/llvm-project/pull/117378
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] AMDGPU: Support buffer_atomic_pk_add_bf16 for gfx950 (PR #117599)

2024-11-25 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)


Changes

Co-authored-by: Sirish Pande 

---

Patch is 23.74 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/117599.diff


10 Files Affected:

- (modified) clang/test/CodeGenOpenCL/amdgpu-features.cl (+3-3) 
- (modified) llvm/include/llvm/IR/IntrinsicsAMDGPU.td (+2-2) 
- (modified) llvm/lib/Target/AMDGPU/AMDGPU.td (+1) 
- (modified) llvm/lib/Target/AMDGPU/BUFInstructions.td (+2) 
- (modified) llvm/lib/TargetParser/TargetParser.cpp (+2) 
- (added) llvm/test/CodeGen/AMDGPU/fp-atomics-gfx950.ll (+92) 
- (modified) llvm/test/MC/AMDGPU/gfx12_asm_vbuffer_mubuf.s (+4-4) 
- (modified) llvm/test/MC/AMDGPU/gfx950_asm_features.s (+60) 
- (modified) llvm/test/MC/AMDGPU/gfx950_err.s (+12) 
- (modified) llvm/test/MC/Disassembler/AMDGPU/gfx950.txt (+45) 


``diff
diff --git a/clang/test/CodeGenOpenCL/amdgpu-features.cl 
b/clang/test/CodeGenOpenCL/amdgpu-features.cl
index 1e2921160d28f2..f739872685e780 100644
--- a/clang/test/CodeGenOpenCL/amdgpu-features.cl
+++ b/clang/test/CodeGenOpenCL/amdgpu-features.cl
@@ -89,7 +89,7 @@
 // GFX941: 
"target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+fp8-conversion-insts,+fp8-insts,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64,+xf32-insts"
 // GFX942: 
"target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+fp8-conversion-insts,+fp8-insts,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64,+xf32-insts"
 // GFX9_4_Generic: 
"target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64"
-// GFX950: 
"target-features"="+16-bit-insts,+ashr-pk-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot12-insts,+dot13-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+f16bf16-to-fp6bf6-cvt-scale-insts,+fp8-conversion-insts,+fp8-insts,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+gfx950-insts,+mai-insts,+permlane16-swap,+permlane32-swap,+prng-inst,+s-memrealtime,+s-memtime-inst,+wavefrontsize64"
+// GFX950: 
"target-features"="+16-bit-insts,+ashr-pk-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-buffer-pk-add-bf16-inst,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot12-insts,+dot13-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+f16bf16-to-fp6bf6-cvt-scale-insts,+fp8-conversion-insts,+fp8-insts,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+gfx950-insts,+mai-insts,+permlane16-swap,+permlane32-swap,+prng-inst,+s-memrealtime,+s-memtime-inst,+wavefrontsize64"
 // GFX1010: 
"target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dpp,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize32"
 // GFX1011: 
"target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize32"
 // GFX1012: 
"target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize32"
@@ -109,8 +109,8 @@
 // GFX1151: 
"target-features"="+16-bit-insts,+atomic-fadd-rtn-insts,+ci-insts,+dl-insts,+dot10-insts,+dot12-insts,+dot5-insts,+dot7-insts,+dot8-insts,+dot9-insts,+dpp,+gfx10-3-insts,+gfx10-insts,+gfx11-insts,+gfx8-insts,+gfx9-insts,+wavefrontsize32"
 // GFX1152: 
"target-features"="+16-bit-insts,+atomic-fadd-rtn-insts,+ci-insts,+dl-insts,+dot10-insts,+dot12-insts,+dot5-insts,+dot7-insts,+dot8-insts,+dot9-insts,+dpp,+gfx10-3-insts,+gfx10-insts,+gfx11-insts,+gfx8-insts,+gfx9-insts,+wavefrontsize32"
 // GFX1153: 
"target-features"="+16-bit-insts,+at

[llvm-branch-commits] [llvm] AMDGPU: MC support for v_cvt_scalef32_pk32_{bf|f}16_{bf|fp}6 of gfx950. (PR #117591)

2024-11-25 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian edited 
https://github.com/llvm/llvm-project/pull/117591
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: MC support for v_cvt_scalef32_pk32_f32_[fp|bf]6 of gfx950 (PR #117590)

2024-11-25 Thread Shilei Tian via llvm-branch-commits


@@ -1552,7 +1558,9 @@ def FeatureISAVersion9_5_Common : FeatureSet<
FeatureBitOp3Insts,
FeatureFP8ConversionScaleInsts,
FeatureBF8ConversionScaleInsts,
-   FeatureFP4ConversionScaleInsts
+   FeatureFP4ConversionScaleInsts,
+   FeatureFP6BF6ConversionScaleInsts,
+   FeatureFP8Insts

shiltian wrote:

why `FeatureFP8Insts` is added here?

https://github.com/llvm/llvm-project/pull/117590
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: MC support for v_cvt_scalef32_pk32_{bf|f}16_{bf|fp}6 of gfx950. (PR #117591)

2024-11-25 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian approved this pull request.


https://github.com/llvm/llvm-project/pull/117591
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] AMDGPU: Support v_cvt_scalef32_pk32_{bf|f}6_{bf|fp}16 for gfx950 (PR #117592)

2024-11-25 Thread Shilei Tian via llvm-branch-commits


@@ -408,11 +408,23 @@ def FeatureFP6BF6ConversionScaleInsts : 
SubtargetFeature<"fp6bf6-cvt-scale-insts
   "Has fp6 and bf6 conversion scale instructions"
 >;
 
+def FeatureF16BF16ToFP6BF6ConversionScaleInsts : 
SubtargetFeature<"f16bf16-to-fp6bf6-cvt-scale-insts",
+  "HasF16BF16ToFP6BF6ConversionScaleInsts",
+  "true",
+  "Has f16bf16 to fp6bf6 conversion scale instructions"
+>;
+
 def FeatureGFX950Insts : SubtargetFeature<"gfx950-insts",
   "GFX950Insts",
   "true",
   "Additional instructions for GFX950+",
-  [FeaturePermlane16Swap, FeaturePermlane32Swap, 
FeatureFP8ConversionScaleInsts, FeatureBF8ConversionScaleInsts, 
FeatureFP4ConversionScaleInsts, FeatureFP6BF6ConversionScaleInsts]
+  [FeaturePermlane16Swap,
+  FeaturePermlane32Swap,

shiltian wrote:

the alignment is off here but that can be fixed later

https://github.com/llvm/llvm-project/pull/117592
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] AMDGPU: Support v_cvt_scalef32_pk32_{bf|f}6_{bf|fp}16 for gfx950 (PR #117592)

2024-11-25 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian edited 
https://github.com/llvm/llvm-project/pull/117592
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] AMDGPU: Support v_cvt_scalef32_pk32_{bf|f}6_{bf|fp}16 for gfx950 (PR #117592)

2024-11-25 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian approved this pull request.


https://github.com/llvm/llvm-project/pull/117592
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: MC support for v_cvt_scalef32_pk_fp4_{f|bf}16 on gfx950. (PR #117594)

2024-11-25 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian edited 
https://github.com/llvm/llvm-project/pull/117594
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: MC support for v_cvt_scalef32_pk_{bf|f}16_{bf|fp}8 of gfx950. (PR #117593)

2024-11-25 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian approved this pull request.


https://github.com/llvm/llvm-project/pull/117593
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: MC support for v_cvt_scalef32_pk_fp4_{f|bf}16 on gfx950. (PR #117594)

2024-11-25 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian approved this pull request.


https://github.com/llvm/llvm-project/pull/117594
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: MC support for v_cvt_scalef32_pk_{bf|f}16_{bf|fp}8 of gfx950. (PR #117593)

2024-11-25 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian edited 
https://github.com/llvm/llvm-project/pull/117593
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] AMDGPU: Add support for v_ashr_pk_i8/u8_i32 instructions for gfx950 (PR #117596)

2024-11-25 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian approved this pull request.


https://github.com/llvm/llvm-project/pull/117596
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] AMDGPU: Add support for v_ashr_pk_i8/u8_i32 instructions for gfx950 (PR #117596)

2024-11-25 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian edited 
https://github.com/llvm/llvm-project/pull/117596
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] AMDGPU: Add support for v_dot2_f32_bf16 instruction for gfx950 (PR #117597)

2024-11-25 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian approved this pull request.


https://github.com/llvm/llvm-project/pull/117597
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] AMDGPU: Add support for v_dot2_f32_bf16 instruction for gfx950 (PR #117597)

2024-11-25 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian edited 
https://github.com/llvm/llvm-project/pull/117597
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] AMDGPU: Add support for v_dot2c_f32_bf16 instruction for gfx950 (PR #117598)

2024-11-25 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian approved this pull request.


https://github.com/llvm/llvm-project/pull/117598
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] AMDGPU: Support v_cvt_scalef32_2xpk16_{bf|fp}6_f32 for gfx950. (PR #117595)

2024-11-25 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian approved this pull request.


https://github.com/llvm/llvm-project/pull/117595
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] AMDGPU: Support v_cvt_scalef32_2xpk16_{bf|fp}6_f32 for gfx950. (PR #117595)

2024-11-25 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian edited 
https://github.com/llvm/llvm-project/pull/117595
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] AMDGPU: Support buffer_atomic_pk_add_bf16 for gfx950 (PR #117599)

2024-11-25 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian edited 
https://github.com/llvm/llvm-project/pull/117599
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Add minimum3/maximum3 pkf16 for gfx950 encodings (PR #117601)

2024-11-25 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian edited 
https://github.com/llvm/llvm-project/pull/117601
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Add encodings for minimum3/maximum3 f32 for gfx950 (PR #117600)

2024-11-25 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian edited 
https://github.com/llvm/llvm-project/pull/117600
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Add encodings for minimum3/maximum3 f32 for gfx950 (PR #117600)

2024-11-25 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian approved this pull request.


https://github.com/llvm/llvm-project/pull/117600
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] AMDGPU: Support buffer_atomic_pk_add_bf16 for gfx950 (PR #117599)

2024-11-25 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian approved this pull request.


https://github.com/llvm/llvm-project/pull/117599
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Add minimum3/maximum3 pkf16 for gfx950 encodings (PR #117601)

2024-11-25 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian approved this pull request.


https://github.com/llvm/llvm-project/pull/117601
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [TySan] A Type Sanitizer (Clang) (PR #76260)

2024-11-25 Thread Erich Keane via llvm-branch-commits

https://github.com/erichkeane commented:

A pair of minor changes requested, else this looks about right?  Not sure who 
the right person to approve this is though


https://github.com/llvm/llvm-project/pull/76260
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [TySan] A Type Sanitizer (Clang) (PR #76260)

2024-11-25 Thread Erich Keane via llvm-branch-commits


@@ -5740,7 +5740,8 @@ void CodeGenModule::EmitGlobalVarDefinition(const VarDecl 
*D,
   if (NeedsGlobalCtor || NeedsGlobalDtor)
 EmitCXXGlobalVarDeclInitFunc(D, GV, NeedsGlobalCtor);
 
-  SanitizerMD->reportGlobal(GV, *D, NeedsGlobalCtor);
+  SanitizerMD->reportGlobalToASan(GV, *D, NeedsGlobalCtor);

erichkeane wrote:

This has happened a few times, I would suggest keeping `reportGlobal` and 
documenting that it does BOTH of these things, and in the few cases you only 
need ASan, do `reportGlobalToASan`.

https://github.com/llvm/llvm-project/pull/76260
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [TySan] A Type Sanitizer (Clang) (PR #76260)

2024-11-25 Thread Erich Keane via llvm-branch-commits

https://github.com/erichkeane edited 
https://github.com/llvm/llvm-project/pull/76260
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [compiler-rt] 1a6525e - Revert "[profile] Use base+vaddr for `__llvm_write_binary_ids` note pointers …"

2024-11-25 Thread via llvm-branch-commits

Author: Petr Hosek
Date: 2024-11-25T11:53:16-08:00
New Revision: 1a6525e438abfe54708f14b9ceec27c0e337f336

URL: 
https://github.com/llvm/llvm-project/commit/1a6525e438abfe54708f14b9ceec27c0e337f336
DIFF: 
https://github.com/llvm/llvm-project/commit/1a6525e438abfe54708f14b9ceec27c0e337f336.diff

LOG: Revert "[profile] Use base+vaddr for `__llvm_write_binary_ids` note 
pointers …"

This reverts commit 667e1fadcf4376ce41f5cae7cabab9f5ccc77b15.

Added: 


Modified: 
compiler-rt/lib/profile/InstrProfilingPlatformLinux.c

Removed: 
compiler-rt/test/profile/Linux/binary-id-offset.c



diff  --git a/compiler-rt/lib/profile/InstrProfilingPlatformLinux.c 
b/compiler-rt/lib/profile/InstrProfilingPlatformLinux.c
index 5b230c1b200623..613cfb60857cf3 100644
--- a/compiler-rt/lib/profile/InstrProfilingPlatformLinux.c
+++ b/compiler-rt/lib/profile/InstrProfilingPlatformLinux.c
@@ -194,33 +194,41 @@ static int WriteBinaryIds(ProfDataWriter *Writer, const 
ElfW(Nhdr) * Note,
  */
 COMPILER_RT_VISIBILITY int __llvm_write_binary_ids(ProfDataWriter *Writer) {
   extern const ElfW(Ehdr) __ehdr_start __attribute__((visibility("hidden")));
-  extern ElfW(Dyn) _DYNAMIC[] __attribute__((weak, visibility("hidden")));
-
   const ElfW(Ehdr) *ElfHeader = &__ehdr_start;
   const ElfW(Phdr) *ProgramHeader =
   (const ElfW(Phdr) *)((uintptr_t)ElfHeader + ElfHeader->e_phoff);
 
-  /* Compute the added base address in case of position-independent code. */
-  uintptr_t Base = 0;
-  for (uint32_t I = 0; I < ElfHeader->e_phnum; I++) {
-if (ProgramHeader[I].p_type == PT_PHDR)
-  Base = (uintptr_t)ProgramHeader - ProgramHeader[I].p_vaddr;
-if (ProgramHeader[I].p_type == PT_DYNAMIC && _DYNAMIC)
-  Base = (uintptr_t)_DYNAMIC - ProgramHeader[I].p_vaddr;
-  }
-
   int TotalBinaryIdsSize = 0;
+  uint32_t I;
   /* Iterate through entries in the program header. */
-  for (uint32_t I = 0; I < ElfHeader->e_phnum; I++) {
+  for (I = 0; I < ElfHeader->e_phnum; I++) {
 /* Look for the notes segment in program header entries. */
 if (ProgramHeader[I].p_type != PT_NOTE)
   continue;
 
 /* There can be multiple notes segment, and examine each of them. */
-const ElfW(Nhdr) *Note =
-(const ElfW(Nhdr) *)(Base + ProgramHeader[I].p_vaddr);
-const ElfW(Nhdr) *NotesEnd =
-(const ElfW(Nhdr) *)((const char *)(Note) + ProgramHeader[I].p_memsz);
+const ElfW(Nhdr) * Note;
+const ElfW(Nhdr) * NotesEnd;
+/*
+ * When examining notes in file, use p_offset, which is the offset within
+ * the elf file, to find the start of notes.
+ */
+if (ProgramHeader[I].p_memsz == 0 ||
+ProgramHeader[I].p_memsz == ProgramHeader[I].p_filesz) {
+  Note = (const ElfW(Nhdr) *)((uintptr_t)ElfHeader +
+  ProgramHeader[I].p_offset);
+  NotesEnd = (const ElfW(Nhdr) *)((const char *)(Note) +
+  ProgramHeader[I].p_filesz);
+} else {
+  /*
+   * When examining notes in memory, use p_vaddr, which is the address of
+   * section after loaded to memory, to find the start of notes.
+   */
+  Note =
+  (const ElfW(Nhdr) *)((uintptr_t)ElfHeader + 
ProgramHeader[I].p_vaddr);
+  NotesEnd =
+  (const ElfW(Nhdr) *)((const char *)(Note) + 
ProgramHeader[I].p_memsz);
+}
 
 int BinaryIdsSize = WriteBinaryIds(Writer, Note, NotesEnd);
 if (TotalBinaryIdsSize == -1)

diff  --git a/compiler-rt/test/profile/Linux/binary-id-offset.c 
b/compiler-rt/test/profile/Linux/binary-id-offset.c
deleted file mode 100644
index c66fe82d714ce9..00
--- a/compiler-rt/test/profile/Linux/binary-id-offset.c
+++ /dev/null
@@ -1,33 +0,0 @@
-// REQUIRES: linux
-//
-// Make sure the build-id can be found in both EXEC and DYN (PIE) files,
-// even when the note's section-start is forced to a weird address.
-// (The DYN case would also apply to libraries, not explicitly tested here.)
-
-// DEFINE: %{cflags} =
-// DEFINE: %{check} = ( \
-// DEFINE: %clang_profgen -Wl,--build-id -o %t %s %{cflags}  && \
-// DEFINE: env LLVM_PROFILE_FILE=%t.profraw %run %t  && \
-// DEFINE: llvm-readelf --notes %t   && \
-// DEFINE: llvm-profdata show --binary-ids %t.profraw   \
-// DEFINE:   ) | FileCheck %s
-
-// REDEFINE: %{cflags} = -no-pie
-// RUN: %{check}
-
-// REDEFINE: %{cflags} = -pie -fPIE
-// RUN: %{check}
-
-// REDEFINE: %{cflags} = -no-pie 
-Wl,--section-start=.note.gnu.build-id=0x100
-// RUN: %{check}
-
-// REDEFINE: %{cflags} = -pie -fPIE 
-Wl,--section-start=.note.gnu.build-id=0x100
-// RUN: %{check}
-
-// CHECK-LABEL{LITERAL}: .note.gnu.build-id
-// CHECK: Build ID: [[ID:[0-9a-f]+]]
-
-// CHECK-LABEL{LITERAL}: Binary IDs:
-// CHECK-NEXT: [[ID]]
-
-int main() { return 0; }




[llvm-branch-commits] [llvm] release/19.x: [InstCombine] Drop noundef attributes in `foldCttzCtlz` (#116718) (PR #116865)

2024-11-25 Thread Nikita Popov via llvm-branch-commits

https://github.com/nikic milestoned 
https://github.com/llvm/llvm-project/pull/116865
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Handle vcmpx+permalane gfx950 hazard (PR #117286)

2024-11-25 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian edited 
https://github.com/llvm/llvm-project/pull/117286
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Handle vcmpx+permalane gfx950 hazard (PR #117286)

2024-11-25 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian approved this pull request.


https://github.com/llvm/llvm-project/pull/117286
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Handle gfx950 valu write vdst + permlane read hazard (PR #117287)

2024-11-25 Thread Shilei Tian via llvm-branch-commits


@@ -2551,8 +2551,34 @@ int 
GCNHazardRecognizer::checkPermlaneHazards(MachineInstr *MI) {
 return isVCmpXWritesExec(*TII, *TRI, MI);
   };
 
-  const int NumWaitStates = 4;
-  return NumWaitStates - getWaitStatesSince(IsVCmpXWritesExecFn, 
NumWaitStates);
+  auto IsVALUFn = [](const MachineInstr &MI) {
+return SIInstrInfo::isVALU(MI);
+  };
+
+  const int VCmpXWritesExecWaitStates = 4;
+  const int VALUWritesVDstWaitStates = 2;
+  int WaitStatesNeeded = 0;
+
+  for (const MachineOperand &Op : MI->explicit_uses()) {
+if (!Op.isReg() || !TRI->isVGPR(MF.getRegInfo(), Op.getReg()))
+  continue;
+Register Reg = Op.getReg();
+
+int WaitStatesSinceDef =
+VALUWritesVDstWaitStates -
+getWaitStatesSinceDef(Reg, IsVALUFn,
+  /*MaxWaitStates=*/VALUWritesVDstWaitStates);

shiltian wrote:

`/*MaxWaitStates=*/` is not needed here, as `VALUWritesVDstWaitStates` is a 
variable.

https://github.com/llvm/llvm-project/pull/117287
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] Add documentation for Multilib custom flags (PR #114998)

2024-11-25 Thread Victor Campos via llvm-branch-commits

https://github.com/vhscampos updated 
https://github.com/llvm/llvm-project/pull/114998

>From be0d5d6ee15e22b78a6fe671dc4f665680fd2aa5 Mon Sep 17 00:00:00 2001
From: Victor Campos 
Date: Tue, 5 Nov 2024 14:22:06 +
Subject: [PATCH 1/2] Add documentation for Multilib custom flags

---
 clang/docs/Multilib.rst | 90 +
 1 file changed, 90 insertions(+)

diff --git a/clang/docs/Multilib.rst b/clang/docs/Multilib.rst
index 7637d0db9565b8..85cb789b9847ac 100644
--- a/clang/docs/Multilib.rst
+++ b/clang/docs/Multilib.rst
@@ -122,6 +122,78 @@ subclass and a suitable base multilib variant is present 
then the
 It is the responsibility of layered multilib authors to ensure that headers and
 libraries in each layer are complete enough to mask any incompatibilities.
 
+Multilib custom flags
+=
+
+Introduction
+
+
+The multilib mechanism supports library variants that correspond to target,
+code generation or language command-line flags. Examples include ``--target``,
+``-mcpu``, ``-mfpu``, ``-mbranch-protection``, ``-fno-rtti``. However, some 
library
+variants are particular to features that do not correspond to any command-line
+option. Multithreading and semihosting, for instance, have no associated
+compiler option.
+
+In order to support the selection of variants for which no compiler option
+exists, the multilib specification includes the concept of *custom flags*.
+These flags have no impact on code generation and are only used in the multilib
+processing.
+
+Multilib custom flags follow this format in the driver invocation:
+
+::
+
+  -fmultilib-flag=
+
+They are fed into the multilib system alongside the remaining flags.
+
+Custom flag declarations
+
+
+Custom flags can be declared in the YAML file under the *Flags* section.
+
+.. code-block:: yaml
+
+  Flags:
+  - Name: multithreaded
+Values:
+- Name: no-multithreaded
+  DriverArgs: [-D__SINGLE_THREAD__]
+- Name: multithreaded
+Default: no-multithreaded
+
+* Name: the name to categorize a flag.
+* Values: a list of flag *Value*s (defined below).
+* Default: it specifies the name of the value this flag should take if not
+  specified in the command-line invocation. It must be one value from the 
Values
+  field.
+
+A Default value is useful to save users from specifying custom flags that have 
a
+most commonly used value.
+
+Each flag *Value* is defined as:
+
+* Name: name of the value. This is the string to be used in
+  ``-fmultilib-flag=``.
+* DriverArgs: a list of strings corresponding to the extra driver arguments
+  used to build a library variant that's in accordance to this specific custom
+  flag value. These arguments are fed back into the driver if this flag *Value*
+  is enabled.
+
+The namespace of flag values is common across all flags. This means that flag
+value names must be unique.
+
+Usage of custom flags in the *Variants* specifications
+--
+
+Library variants should list their requirement on one or more custom flags like
+they do for any other flag. Each requirement must be listed as
+``-fmultilib-flag=``.
+
+A variant that does not specify a requirement on one particular flag can be
+matched against any value of that flag.
+
 Stability
 =
 
@@ -222,6 +294,24 @@ For a more comprehensive example see
 # Flags is a list of one or more strings.
 Flags: [--target=thumbv7m-none-eabi]
 
+  # Custom flag declarations. Each item is a different declaration.
+  Flags:
+# Name of the flag
+  - Name: multithreaded
+# List of custom flag values
+Values:
+  # Name of the custom flag value. To be used in -fmultilib-flag=.
+- Name: no-multithreaded
+  # Extra driver arguments to be printed with -print-multi-lib. Useful for
+  # specifying extra arguments for building the the associated library
+  # variant(s).
+  DriverArgs: [-D__SINGLE_THREAD__]
+- Name: multithreaded
+# Default flag value. If no value for this flag declaration is used in the
+# command-line, the multilib system will use this one. Must be equal to one
+# of the flag value names from this flag declaration.
+Default: no-multithreaded
+
 Design principles
 =
 

>From a940ccd9eec0f683df9f41f2a9e218df76357364 Mon Sep 17 00:00:00 2001
From: Victor Campos 
Date: Mon, 25 Nov 2024 15:07:57 +
Subject: [PATCH 2/2] Fix doc build warning

---
 clang/docs/Multilib.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/clang/docs/Multilib.rst b/clang/docs/Multilib.rst
index 85cb789b9847ac..48d84087dda01c 100644
--- a/clang/docs/Multilib.rst
+++ b/clang/docs/Multilib.rst
@@ -164,7 +164,7 @@ Custom flags can be declared in the YAML file under the 
*Flags* section.
 Default: no-multithreaded
 
 * Name: the name to categorize a flag.
-* Values: a list of flag *Value*s (defined below).
+* Values: a list of flag Values (defined b

[llvm-branch-commits] [llvm] AMDGPU: MC support for v_cvt_scalef32_pk_f32_[fp|bf]8 of gfx950. (PR #117383)

2024-11-25 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian approved this pull request.


https://github.com/llvm/llvm-project/pull/117383
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] AMDGPU: Support v_cvt_scalef32_pk32_{bf|f}6_{bf|fp}16 for gfx950 (PR #117592)

2024-11-25 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm created 
https://github.com/llvm/llvm-project/pull/117592

Co-authored-by: Pravin Jagtap 

>From 3ba5c37284ce7df02470662c790cc5280e0a62a2 Mon Sep 17 00:00:00 2001
From: Pravin Jagtap 
Date: Mon, 8 Apr 2024 04:56:56 -0400
Subject: [PATCH] AMDGPU: Support v_cvt_scalef32_pk32_{bf|f}6_{bf|fp}16 for
 gfx950

Co-authored-by: Pravin Jagtap 
---
 clang/include/clang/Basic/BuiltinsAMDGPU.def  |   4 +
 clang/test/CodeGenOpenCL/amdgpu-features.cl   |   2 +-
 .../CodeGenOpenCL/builtins-amdgcn-gfx950.cl   |  43 ++
 llvm/include/llvm/IR/IntrinsicsAMDGPU.td  |   9 +
 llvm/lib/Target/AMDGPU/AMDGPU.td  |  17 +-
 .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp  |   4 +
 llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h  |   3 +
 .../Disassembler/AMDGPUDisassembler.cpp   |   1 +
 llvm/lib/Target/AMDGPU/SIInstrInfo.td |   7 +-
 llvm/lib/Target/AMDGPU/SIRegisterInfo.td  |   1 +
 llvm/lib/Target/AMDGPU/VOP3Instructions.td|  14 +
 llvm/lib/TargetParser/TargetParser.cpp|   1 +
 .../AMDGPU/llvm.amdgcn.cvt.scalef32.pk.ll | 474 ++
 llvm/test/MC/AMDGPU/gfx950_asm_features.s |  16 +
 llvm/test/MC/AMDGPU/gfx950_err.s  |  48 ++
 .../Disassembler/AMDGPU/gfx950_dasm_vop3.txt  |  12 +
 16 files changed, 653 insertions(+), 3 deletions(-)
 create mode 100644 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.scalef32.pk.ll

diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def 
b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index a42ad56ce4f998..e09dc0e1107a82 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -559,6 +559,10 @@ 
TARGET_BUILTIN(__builtin_amdgcn_swmmac_f32_16x16x32_bf8_fp8_w64, "V4fiV2iV4fs",
 TARGET_BUILTIN(__builtin_amdgcn_swmmac_f32_16x16x32_bf8_bf8_w64, 
"V4fiV2iV4fs", "nc", "gfx12-insts,wavefrontsize64")
 
 TARGET_BUILTIN(__builtin_amdgcn_prng_b32, "UiUi", "nc", "prng-inst")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_pk32_fp6_f16, "V6UiV32hf", "nc", 
"f16bf16-to-fp6bf6-cvt-scale-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_pk32_bf6_f16, "V6UiV32hf", "nc", 
"f16bf16-to-fp6bf6-cvt-scale-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_pk32_fp6_bf16, "V6UiV32yf", "nc", 
"f16bf16-to-fp6bf6-cvt-scale-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_pk32_bf6_bf16, "V6UiV32yf", "nc", 
"f16bf16-to-fp6bf6-cvt-scale-insts")
 
 #undef BUILTIN
 #undef TARGET_BUILTIN
diff --git a/clang/test/CodeGenOpenCL/amdgpu-features.cl 
b/clang/test/CodeGenOpenCL/amdgpu-features.cl
index f9e07fbc6b0480..56013dad9b6651 100644
--- a/clang/test/CodeGenOpenCL/amdgpu-features.cl
+++ b/clang/test/CodeGenOpenCL/amdgpu-features.cl
@@ -89,7 +89,7 @@
 // GFX941: 
"target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+fp8-conversion-insts,+fp8-insts,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64,+xf32-insts"
 // GFX942: 
"target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+fp8-conversion-insts,+fp8-insts,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64,+xf32-insts"
 // GFX9_4_Generic: 
"target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64"
-// GFX950: 
"target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+fp8-conversion-insts,+fp8-insts,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+gfx950-insts,+mai-insts,+permlane16-swap,+permlane32-swap,+prng-inst,+s-memrealtime,+s-memtime-inst,+wavefrontsize64"
+// GFX950: 
"target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+f16bf16-to-fp6bf6-cvt-scale-insts,+fp8-conversion

[llvm-branch-commits] [llvm] AMDGPU: MC support for v_cvt_scalef32_pk_{bf|f}16_{bf|fp}8 of gfx950. (PR #117593)

2024-11-25 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm created 
https://github.com/llvm/llvm-project/pull/117593

OPSEL[0] selects src_word to read.

Co-authored-by: Pravin Jagtap 

>From b4657178189eac34b30147a2e9343616ee5ea8b7 Mon Sep 17 00:00:00 2001
From: Pravin Jagtap 
Date: Mon, 8 Apr 2024 07:44:32 -0400
Subject: [PATCH] AMDGPU: MC support for v_cvt_scalef32_pk_{bf|f}16_{bf|fp}8 of
 gfx950.

OPSEL[0] selects src_word to read.

Co-authored-by: Pravin Jagtap 
---
 llvm/lib/Target/AMDGPU/VOP3Instructions.td|  8 ++
 llvm/test/MC/AMDGPU/gfx950_asm_features.s | 96 +++
 llvm/test/MC/AMDGPU/gfx950_err.s  | 50 +-
 .../Disassembler/AMDGPU/gfx950_dasm_vop3.txt  | 72 ++
 4 files changed, 225 insertions(+), 1 deletion(-)

diff --git a/llvm/lib/Target/AMDGPU/VOP3Instructions.td 
b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
index 764a2275205665..fdffb2c36dcccf 100644
--- a/llvm/lib/Target/AMDGPU/VOP3Instructions.td
+++ b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
@@ -945,6 +945,8 @@ let SubtargetPredicate = HasFP8ConversionScaleInsts, 
mayRaiseFPException = 0 in
   defm V_CVT_SCALEF32_PK_F32_FP8 : VOP3Inst<"v_cvt_scalef32_pk_f32_fp8", 
VOP3_CVT_SCALE_PK_F16BF16F32_FP4FP8BF8_Profile>;
   defm V_CVT_SCALEF32_PK_FP8_F16 : VOP3Inst<"v_cvt_scalef32_pk_fp8_f16", 
VOP3_CVT_SCALE_PK_FP8BF8_F16BF16_Profile>;
   defm V_CVT_SCALEF32_PK_FP8_BF16 : VOP3Inst<"v_cvt_scalef32_pk_fp8_bf16", 
VOP3_CVT_SCALE_PK_FP8BF8_F16BF16_Profile>;
+  defm V_CVT_SCALEF32_PK_F16_FP8: VOP3Inst<"v_cvt_scalef32_pk_f16_fp8",  
VOP3_CVT_SCALE_PK_F16BF16F32_FP4FP8BF8_Profile>;
+  defm V_CVT_SCALEF32_PK_BF16_FP8   : VOP3Inst<"v_cvt_scalef32_pk_bf16_fp8", 
VOP3_CVT_SCALE_PK_F16BF16F32_FP4FP8BF8_Profile>;
 }
 
 let SubtargetPredicate = HasBF8ConversionScaleInsts, mayRaiseFPException = 0 
in {
@@ -954,6 +956,8 @@ let SubtargetPredicate = HasBF8ConversionScaleInsts, 
mayRaiseFPException = 0 in
   defm V_CVT_SCALEF32_PK_F32_BF8 : VOP3Inst<"v_cvt_scalef32_pk_f32_bf8", 
VOP3_CVT_SCALE_PK_F16BF16F32_FP4FP8BF8_Profile>;
   defm V_CVT_SCALEF32_PK_BF8_F16 : VOP3Inst<"v_cvt_scalef32_pk_bf8_f16", 
VOP3_CVT_SCALE_PK_FP8BF8_F16BF16_Profile>;
   defm V_CVT_SCALEF32_PK_BF8_BF16 : VOP3Inst<"v_cvt_scalef32_pk_bf8_bf16", 
VOP3_CVT_SCALE_PK_FP8BF8_F16BF16_Profile>;
+  defm V_CVT_SCALEF32_PK_F16_BF8: VOP3Inst<"v_cvt_scalef32_pk_f16_bf8",  
VOP3_CVT_SCALE_PK_F16BF16F32_FP4FP8BF8_Profile>;
+  defm V_CVT_SCALEF32_PK_BF16_BF8   : VOP3Inst<"v_cvt_scalef32_pk_bf16_bf8", 
VOP3_CVT_SCALE_PK_F16BF16F32_FP4FP8BF8_Profile>;
 }
 
 let SubtargetPredicate = HasFP4ConversionScaleInsts, mayRaiseFPException = 0 
in {
@@ -1908,6 +1912,8 @@ defm V_CVT_SCALEF32_PK_FP8_F32 : VOP3OpSel_Real_gfx9 
<0x235>;
 defm V_CVT_SCALEF32_PK_F32_FP8 : VOP3OpSel_Real_gfx9 <0x239>;
 defm V_CVT_SCALEF32_PK_FP8_F16 : VOP3OpSel_Real_gfx9 <0x240>;
 defm V_CVT_SCALEF32_PK_FP8_BF16: VOP3OpSel_Real_gfx9 <0x244>;
+defm V_CVT_SCALEF32_PK_F16_FP8  : VOP3OpSel_Real_gfx9<0x248>;
+defm V_CVT_SCALEF32_PK_BF16_FP8 : VOP3OpSel_Real_gfx9<0x269>;
 }
 let OtherPredicates = [HasBF8ConversionScaleInsts] in {
 defm V_CVT_SCALEF32_F16_BF8 : VOP3OpSel_Real_gfx9 <0x24b>;
@@ -1916,6 +1922,8 @@ defm V_CVT_SCALEF32_PK_BF8_F32 : VOP3OpSel_Real_gfx9 
<0x236>;
 defm V_CVT_SCALEF32_PK_F32_BF8 : VOP3OpSel_Real_gfx9 <0x23a>;
 defm V_CVT_SCALEF32_PK_BF8_F16 : VOP3OpSel_Real_gfx9 <0x241>;
 defm V_CVT_SCALEF32_PK_BF8_BF16: VOP3OpSel_Real_gfx9 <0x245>;
+defm V_CVT_SCALEF32_PK_F16_BF8  : VOP3OpSel_Real_gfx9<0x249>;
+defm V_CVT_SCALEF32_PK_BF16_BF8 : VOP3OpSel_Real_gfx9<0x26a>;
 }
 let OtherPredicates = [HasFP4ConversionScaleInsts] in {
 defm V_CVT_SCALEF32_PK_F32_FP4 : VOP3OpSel_Real_gfx9 <0x23f>;
diff --git a/llvm/test/MC/AMDGPU/gfx950_asm_features.s 
b/llvm/test/MC/AMDGPU/gfx950_asm_features.s
index 1aef267537aa55..e505b6ff4ad58b 100644
--- a/llvm/test/MC/AMDGPU/gfx950_asm_features.s
+++ b/llvm/test/MC/AMDGPU/gfx950_asm_features.s
@@ -929,3 +929,99 @@ v_cvt_scalef32_pk32_fp6_bf16 v[20:25], v[10:25], v8
 // NOT-GFX950: :[[@LINE+2]]:{{[0-9]+}}: error:
 // GFX950: v_cvt_scalef32_pk32_fp6_f16 v[20:25], v[10:25], v8 ; encoding: 
[0x14,0x00,0x58,0xd2,0x0a,0x11,0x02,0x00]
 v_cvt_scalef32_pk32_fp6_f16 v[20:25], v[10:25], v8
+
+// NOT-GFX950: :[[@LINE+2]]:{{[0-9]+}}: error:
+// GFX950: v_cvt_scalef32_pk_f16_fp8 v1, v2, v3; encoding: 
[0x01,0x00,0x48,0xd2,0x02,0x07,0x02,0x00]
+v_cvt_scalef32_pk_f16_fp8 v1, v2, v3
+
+// NOT-GFX950: :[[@LINE+2]]:{{[0-9]+}}: error:
+// GFX950: v_cvt_scalef32_pk_f16_fp8 v1, v2, s3; encoding: 
[0x01,0x00,0x48,0xd2,0x02,0x07,0x00,0x00]
+v_cvt_scalef32_pk_f16_fp8 v1, v2, s3
+
+// NOT-GFX950: :[[@LINE+2]]:{{[0-9]+}}: error:
+// GFX950: v_cvt_scalef32_pk_f16_fp8 v1, s2, 3 ; encoding: 
[0x01,0x00,0x48,0xd2,0x02,0x06,0x01,0x00]
+v_cvt_scalef32_pk_f16_fp8 v1, s2, 3
+
+// NOT-GFX950: :[[@LINE+2]]:{{[0-9]+}}: error:
+// GFX950: v_cvt_scalef32_pk_f16_fp8 v1, v2, v3 op_sel:[1,0,0] ; encoding: 
[0x01,0x08,0x48,0xd2,0x02,0x07,0x02,0x00]
+v_cvt_scalef32_pk_f16_fp8 v1, v2, v3 op_sel:[1,0,0]
+
+// 

[llvm-branch-commits] [llvm] AMDGPU: MC support for v_cvt_scalef32_pk_{bf|f}16_{bf|fp}8 of gfx950. (PR #117593)

2024-11-25 Thread Matt Arsenault via llvm-branch-commits

arsenm wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/117593?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#117594** https://app.graphite.dev/github/pr/llvm/llvm-project/117594?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117593** https://app.graphite.dev/github/pr/llvm/llvm-project/117593?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/117593?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#117592** https://app.graphite.dev/github/pr/llvm/llvm-project/117592?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117591** https://app.graphite.dev/github/pr/llvm/llvm-project/117591?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117590** https://app.graphite.dev/github/pr/llvm/llvm-project/117590?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117418** https://app.graphite.dev/github/pr/llvm/llvm-project/117418?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117417** https://app.graphite.dev/github/pr/llvm/llvm-project/117417?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117384** https://app.graphite.dev/github/pr/llvm/llvm-project/117384?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117383** https://app.graphite.dev/github/pr/llvm/llvm-project/117383?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117382** https://app.graphite.dev/github/pr/llvm/llvm-project/117382?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117381** https://app.graphite.dev/github/pr/llvm/llvm-project/117381?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117380** https://app.graphite.dev/github/pr/llvm/llvm-project/117380?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117379** https://app.graphite.dev/github/pr/llvm/llvm-project/117379?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117378** https://app.graphite.dev/github/pr/llvm/llvm-project/117378?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117287** https://app.graphite.dev/github/pr/llvm/llvm-project/117287?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117286** https://app.graphite.dev/github/pr/llvm/llvm-project/117286?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117285** https://app.graphite.dev/github/pr/llvm/llvm-project/117285?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117284** https://app.graphite.dev/github/pr/llvm/llvm-project/117284?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117283** https://app.graphite.dev/github/pr/llvm/llvm-project/117283?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117263** https://app.graphite.dev/github/pr/llvm/llvm-project/117263?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117262** https

[llvm-branch-commits] [llvm] AMDGPU: MC support for v_cvt_scalef32_pk32_{bf|f}16_{bf|fp}6 of gfx950. (PR #117591)

2024-11-25 Thread Matt Arsenault via llvm-branch-commits

arsenm wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/117591?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#117594** https://app.graphite.dev/github/pr/llvm/llvm-project/117594?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117593** https://app.graphite.dev/github/pr/llvm/llvm-project/117593?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117592** https://app.graphite.dev/github/pr/llvm/llvm-project/117592?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117591** https://app.graphite.dev/github/pr/llvm/llvm-project/117591?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/117591?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#117590** https://app.graphite.dev/github/pr/llvm/llvm-project/117590?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117418** https://app.graphite.dev/github/pr/llvm/llvm-project/117418?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117417** https://app.graphite.dev/github/pr/llvm/llvm-project/117417?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117384** https://app.graphite.dev/github/pr/llvm/llvm-project/117384?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117383** https://app.graphite.dev/github/pr/llvm/llvm-project/117383?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117382** https://app.graphite.dev/github/pr/llvm/llvm-project/117382?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117381** https://app.graphite.dev/github/pr/llvm/llvm-project/117381?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117380** https://app.graphite.dev/github/pr/llvm/llvm-project/117380?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117379** https://app.graphite.dev/github/pr/llvm/llvm-project/117379?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117378** https://app.graphite.dev/github/pr/llvm/llvm-project/117378?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117287** https://app.graphite.dev/github/pr/llvm/llvm-project/117287?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117286** https://app.graphite.dev/github/pr/llvm/llvm-project/117286?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117285** https://app.graphite.dev/github/pr/llvm/llvm-project/117285?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117284** https://app.graphite.dev/github/pr/llvm/llvm-project/117284?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117283** https://app.graphite.dev/github/pr/llvm/llvm-project/117283?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117263** https://app.graphite.dev/github/pr/llvm/llvm-project/117263?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117262** https

[llvm-branch-commits] [clang] [llvm] AMDGPU: Add support for v_dot2_f32_bf16 instruction for gfx950 (PR #117597)

2024-11-25 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm created 
https://github.com/llvm/llvm-project/pull/117597

v_dot2_f32_bf16 was added in gfx11 along with v_dot2_f16_f16 and 
v_dot2_bf16_bf16.
All three instructions were part of Dot9 instructions in the compiler.

This patch will split existing dot9 (v_dot2_f16_f16, v_dot2_bf16_bf16, 
v_dot2_f32_bf16)
into new dot9 (v_dot2_f16_f16 and v_dot2_bf16_bf16), and dot12 
(v_dot2_f32_bf16).

All necessary changes to gfx11 and gfx12 are updated to reflect this change.

Co-authored-by: Sirish Pande 

>From f221f63e40154aaf7f97acc3e48a8b7ba5659f8d Mon Sep 17 00:00:00 2001
From: Sirish Pande 
Date: Fri, 10 May 2024 17:33:59 -0500
Subject: [PATCH] AMDGPU: Add support for v_dot2_f32_bf16 instruction for
 gfx950

v_dot2_f32_bf16 was added in gfx11 along with v_dot2_f16_f16 and 
v_dot2_bf16_bf16.
All three instructions were part of Dot9 instructions in the compiler.

This patch will split existing dot9 (v_dot2_f16_f16, v_dot2_bf16_bf16, 
v_dot2_f32_bf16)
into new dot9 (v_dot2_f16_f16 and v_dot2_bf16_bf16), and dot12 
(v_dot2_f32_bf16).

All necessary changes to gfx11 and gfx12 are updated to reflect this change.

Co-authored-by: Sirish Pande 
---
 clang/include/clang/Basic/BuiltinsAMDGPU.def  |  2 +-
 clang/test/CodeGenOpenCL/amdgpu-features.cl   | 24 +++
 .../builtins-amdgcn-dl-insts-err.cl   |  4 +-
 .../CodeGenOpenCL/builtins-amdgcn-gfx950.cl   | 25 +++
 llvm/lib/Target/AMDGPU/AMDGPU.td  | 14 +++-
 llvm/lib/Target/AMDGPU/GCNSubtarget.h |  5 ++
 llvm/lib/Target/AMDGPU/VOP3PInstructions.td   |  5 +-
 llvm/lib/TargetParser/TargetParser.cpp|  3 +
 .../AMDGPU/llvm.amdgcn.fdot2.f32.bf16.ll  | 68 +++
 llvm/test/MC/AMDGPU/gfx950_dlops.s| 61 +
 .../Disassembler/AMDGPU/gfx950_dasm_vop3.txt  | 60 
 11 files changed, 253 insertions(+), 18 deletions(-)
 create mode 100644 llvm/test/MC/AMDGPU/gfx950_dlops.s

diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def 
b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index fd449697e91216..7d0019eead96b6 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -263,7 +263,7 @@ TARGET_BUILTIN(__builtin_amdgcn_global_load_lds, 
"vv*1v*3IUiIiIUi", "t", "gfx940
 TARGET_BUILTIN(__builtin_amdgcn_fdot2, "fV2hV2hfIb", "nc", "dot10-insts")
 TARGET_BUILTIN(__builtin_amdgcn_fdot2_f16_f16, "hV2hV2hh", "nc", "dot9-insts")
 TARGET_BUILTIN(__builtin_amdgcn_fdot2_bf16_bf16, "sV2sV2ss", "nc", 
"dot9-insts")
-TARGET_BUILTIN(__builtin_amdgcn_fdot2_f32_bf16, "fV2sV2sfIb", "nc", 
"dot9-insts")
+TARGET_BUILTIN(__builtin_amdgcn_fdot2_f32_bf16, "fV2sV2sfIb", "nc", 
"dot12-insts")
 TARGET_BUILTIN(__builtin_amdgcn_sdot2, "SiV2SsV2SsSiIb", "nc", "dot2-insts")
 TARGET_BUILTIN(__builtin_amdgcn_udot2, "UiV2UsV2UsUiIb", "nc", "dot2-insts")
 TARGET_BUILTIN(__builtin_amdgcn_sdot4, "SiSiSiSiIb", "nc", "dot1-insts")
diff --git a/clang/test/CodeGenOpenCL/amdgpu-features.cl 
b/clang/test/CodeGenOpenCL/amdgpu-features.cl
index db7fd76ec91189..0b698035ee54c7 100644
--- a/clang/test/CodeGenOpenCL/amdgpu-features.cl
+++ b/clang/test/CodeGenOpenCL/amdgpu-features.cl
@@ -89,7 +89,7 @@
 // GFX941: 
"target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+fp8-conversion-insts,+fp8-insts,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64,+xf32-insts"
 // GFX942: 
"target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+fp8-conversion-insts,+fp8-insts,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64,+xf32-insts"
 // GFX9_4_Generic: 
"target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64"
-// GFX950: 
"target-features"="+16-bit-insts,+ashr-pk-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+f16bf16-to-fp6bf6-cvt-scale-insts,+fp8-conversion-insts,+fp8-insts,+gfx

[llvm-branch-commits] [compiler-rt] [libcxx] [libcxxabi] [llvm] Reapply "[runtimes] Allow building against an installed LLVM tree" (PR #114307)

2024-11-25 Thread Alexander Richardson via llvm-branch-commits

https://github.com/arichardson updated 
https://github.com/llvm/llvm-project/pull/114307

>From 6a6483cfe53ad33d3a5cd4432c33a5af93694668 Mon Sep 17 00:00:00 2001
From: Alexander Richardson 
Date: Wed, 30 Oct 2024 14:33:11 -0700
Subject: [PATCH 1/2] =?UTF-8?q?[=F0=9D=98=80=F0=9D=97=BD=F0=9D=97=BF]=20in?=
 =?UTF-8?q?itial=20version?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Created using spr 1.3.6-beta.1
---
 compiler-rt/cmake/Modules/AddCompilerRT.cmake |  1 +
 compiler-rt/test/hwasan/lit.cfg.py|  9 +
 compiler-rt/test/lit.common.configured.in |  1 +
 libcxx/CMakeLists.txt | 12 +++---
 libcxxabi/CMakeLists.txt  |  6 +--
 runtimes/CMakeLists.txt   | 40 +--
 6 files changed, 53 insertions(+), 16 deletions(-)

diff --git a/compiler-rt/cmake/Modules/AddCompilerRT.cmake 
b/compiler-rt/cmake/Modules/AddCompilerRT.cmake
index e3d81d241b1054..b2f33d1a961c74 100644
--- a/compiler-rt/cmake/Modules/AddCompilerRT.cmake
+++ b/compiler-rt/cmake/Modules/AddCompilerRT.cmake
@@ -773,6 +773,7 @@ function(configure_compiler_rt_lit_site_cfg input output)
 
   string(REPLACE ${CMAKE_CFG_INTDIR} ${LLVM_BUILD_MODE} 
COMPILER_RT_RESOLVED_TEST_COMPILER ${COMPILER_RT_TEST_COMPILER})
   string(REPLACE ${CMAKE_CFG_INTDIR} ${LLVM_BUILD_MODE} 
COMPILER_RT_RESOLVED_OUTPUT_DIR ${COMPILER_RT_OUTPUT_DIR})
+  string(REPLACE ${CMAKE_CFG_INTDIR} ${LLVM_BUILD_MODE} 
COMPILER_RT_RESOLVED_EXEC_OUTPUT_DIR ${COMPILER_RT_EXEC_OUTPUT_DIR})
   string(REPLACE ${CMAKE_CFG_INTDIR} ${LLVM_BUILD_MODE} 
COMPILER_RT_RESOLVED_LIBRARY_OUTPUT_DIR ${output_dir})
 
   configure_lit_site_cfg(${input} ${output})
diff --git a/compiler-rt/test/hwasan/lit.cfg.py 
b/compiler-rt/test/hwasan/lit.cfg.py
index 594f3294a84ac1..bbf23e683240ac 100644
--- a/compiler-rt/test/hwasan/lit.cfg.py
+++ b/compiler-rt/test/hwasan/lit.cfg.py
@@ -2,6 +2,9 @@
 
 import os
 
+from lit.llvm import llvm_config
+from lit.llvm.subst import ToolSubst, FindTool
+
 # Setup config name.
 config.name = "HWAddressSanitizer" + getattr(config, "name_suffix", "default")
 
@@ -74,6 +77,12 @@ def build_invocation(compile_flags):
 ("%env_hwasan_opts=", "env HWASAN_OPTIONS=" + default_hwasan_opts_str)
 )
 
+# Ensure that we can use hwasan_symbolize from the expected location
+llvm_config.add_tool_substitutions(
+[ToolSubst("hwasan_symbolize", unresolved="fatal")],
+search_dirs=[config.compiler_rt_bindir],
+)
+
 # Default test suffixes.
 config.suffixes = [".c", ".cpp"]
 
diff --git a/compiler-rt/test/lit.common.configured.in 
b/compiler-rt/test/lit.common.configured.in
index 66935c358afedd..050792b6b26217 100644
--- a/compiler-rt/test/lit.common.configured.in
+++ b/compiler-rt/test/lit.common.configured.in
@@ -28,6 +28,7 @@ set_default("python_executable", "@Python3_EXECUTABLE@")
 set_default("compiler_rt_debug", @COMPILER_RT_DEBUG_PYBOOL@)
 set_default("compiler_rt_intercept_libdispatch", 
@COMPILER_RT_INTERCEPT_LIBDISPATCH_PYBOOL@)
 set_default("compiler_rt_output_dir", "@COMPILER_RT_RESOLVED_OUTPUT_DIR@")
+set_default("compiler_rt_bindir", "@COMPILER_RT_RESOLVED_EXEC_OUTPUT_DIR@")
 set_default("compiler_rt_libdir", "@COMPILER_RT_RESOLVED_LIBRARY_OUTPUT_DIR@")
 set_default("emulator", "@COMPILER_RT_EMULATOR@")
 set_default("asan_shadow_scale", "@COMPILER_RT_ASAN_SHADOW_SCALE@")
diff --git a/libcxx/CMakeLists.txt b/libcxx/CMakeLists.txt
index 95a7d10f055ea7..7b3f032fd82126 100644
--- a/libcxx/CMakeLists.txt
+++ b/libcxx/CMakeLists.txt
@@ -413,9 +413,9 @@ if(LLVM_ENABLE_PER_TARGET_RUNTIME_DIR AND NOT APPLE)
 string(APPEND LIBCXX_TARGET_SUBDIR /${LIBCXX_LIBDIR_SUBDIR})
   endif()
   set(LIBCXX_LIBRARY_DIR ${LLVM_LIBRARY_OUTPUT_INTDIR}/${LIBCXX_TARGET_SUBDIR})
-  set(LIBCXX_GENERATED_INCLUDE_DIR "${LLVM_BINARY_DIR}/include/c++/v1")
-  set(LIBCXX_GENERATED_MODULE_DIR "${LLVM_BINARY_DIR}/modules/c++/v1")
-  set(LIBCXX_GENERATED_INCLUDE_TARGET_DIR 
"${LLVM_BINARY_DIR}/include/${LIBCXX_TARGET_SUBDIR}/c++/v1")
+  set(LIBCXX_GENERATED_INCLUDE_DIR "${LIBCXX_BINARY_DIR}/include/c++/v1")
+  set(LIBCXX_GENERATED_MODULE_DIR "${LIBCXX_BINARY_DIR}/modules/c++/v1")
+  set(LIBCXX_GENERATED_INCLUDE_TARGET_DIR 
"${LIBCXX_BINARY_DIR}/include/${LIBCXX_TARGET_SUBDIR}/c++/v1")
   set(LIBCXX_INSTALL_LIBRARY_DIR 
lib${LLVM_LIBDIR_SUFFIX}/${LIBCXX_TARGET_SUBDIR} CACHE STRING
   "Path where built libc++ libraries should be installed.")
   set(LIBCXX_INSTALL_INCLUDE_TARGET_DIR 
"${CMAKE_INSTALL_INCLUDEDIR}/${LIBCXX_TARGET_SUBDIR}/c++/v1" CACHE STRING
@@ -424,13 +424,11 @@ if(LLVM_ENABLE_PER_TARGET_RUNTIME_DIR AND NOT APPLE)
 else()
   if(LLVM_LIBRARY_OUTPUT_INTDIR)
 set(LIBCXX_LIBRARY_DIR ${LLVM_LIBRARY_OUTPUT_INTDIR})
-set(LIBCXX_GENERATED_INCLUDE_DIR "${LLVM_BINARY_DIR}/include/c++/v1")
-set(LIBCXX_GENERATED_MODULE_DIR "${LLVM_BINARY_DIR}/modules/c++/v1")
   else()
 set(LIBCXX_LIBRARY_DIR ${CMAKE_BINARY_DIR}/lib${LIBCXX_LIBDIR_SUFFIX})
-set(LIB

[llvm-branch-commits] [llvm] AMDGPU: MC support for v_cvt_scalef32_pk32_{bf|f}16_{bf|fp}6 of gfx950. (PR #117591)

2024-11-25 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm ready_for_review 
https://github.com/llvm/llvm-project/pull/117591
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [compiler-rt] [libcxx] [libcxxabi] [llvm] Reapply "[runtimes] Allow building against an installed LLVM tree" (PR #114307)

2024-11-25 Thread Alexander Richardson via llvm-branch-commits

https://github.com/arichardson updated 
https://github.com/llvm/llvm-project/pull/114307

>From 6a6483cfe53ad33d3a5cd4432c33a5af93694668 Mon Sep 17 00:00:00 2001
From: Alexander Richardson 
Date: Wed, 30 Oct 2024 14:33:11 -0700
Subject: [PATCH 1/2] =?UTF-8?q?[=F0=9D=98=80=F0=9D=97=BD=F0=9D=97=BF]=20in?=
 =?UTF-8?q?itial=20version?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Created using spr 1.3.6-beta.1
---
 compiler-rt/cmake/Modules/AddCompilerRT.cmake |  1 +
 compiler-rt/test/hwasan/lit.cfg.py|  9 +
 compiler-rt/test/lit.common.configured.in |  1 +
 libcxx/CMakeLists.txt | 12 +++---
 libcxxabi/CMakeLists.txt  |  6 +--
 runtimes/CMakeLists.txt   | 40 +--
 6 files changed, 53 insertions(+), 16 deletions(-)

diff --git a/compiler-rt/cmake/Modules/AddCompilerRT.cmake 
b/compiler-rt/cmake/Modules/AddCompilerRT.cmake
index e3d81d241b1054..b2f33d1a961c74 100644
--- a/compiler-rt/cmake/Modules/AddCompilerRT.cmake
+++ b/compiler-rt/cmake/Modules/AddCompilerRT.cmake
@@ -773,6 +773,7 @@ function(configure_compiler_rt_lit_site_cfg input output)
 
   string(REPLACE ${CMAKE_CFG_INTDIR} ${LLVM_BUILD_MODE} 
COMPILER_RT_RESOLVED_TEST_COMPILER ${COMPILER_RT_TEST_COMPILER})
   string(REPLACE ${CMAKE_CFG_INTDIR} ${LLVM_BUILD_MODE} 
COMPILER_RT_RESOLVED_OUTPUT_DIR ${COMPILER_RT_OUTPUT_DIR})
+  string(REPLACE ${CMAKE_CFG_INTDIR} ${LLVM_BUILD_MODE} 
COMPILER_RT_RESOLVED_EXEC_OUTPUT_DIR ${COMPILER_RT_EXEC_OUTPUT_DIR})
   string(REPLACE ${CMAKE_CFG_INTDIR} ${LLVM_BUILD_MODE} 
COMPILER_RT_RESOLVED_LIBRARY_OUTPUT_DIR ${output_dir})
 
   configure_lit_site_cfg(${input} ${output})
diff --git a/compiler-rt/test/hwasan/lit.cfg.py 
b/compiler-rt/test/hwasan/lit.cfg.py
index 594f3294a84ac1..bbf23e683240ac 100644
--- a/compiler-rt/test/hwasan/lit.cfg.py
+++ b/compiler-rt/test/hwasan/lit.cfg.py
@@ -2,6 +2,9 @@
 
 import os
 
+from lit.llvm import llvm_config
+from lit.llvm.subst import ToolSubst, FindTool
+
 # Setup config name.
 config.name = "HWAddressSanitizer" + getattr(config, "name_suffix", "default")
 
@@ -74,6 +77,12 @@ def build_invocation(compile_flags):
 ("%env_hwasan_opts=", "env HWASAN_OPTIONS=" + default_hwasan_opts_str)
 )
 
+# Ensure that we can use hwasan_symbolize from the expected location
+llvm_config.add_tool_substitutions(
+[ToolSubst("hwasan_symbolize", unresolved="fatal")],
+search_dirs=[config.compiler_rt_bindir],
+)
+
 # Default test suffixes.
 config.suffixes = [".c", ".cpp"]
 
diff --git a/compiler-rt/test/lit.common.configured.in 
b/compiler-rt/test/lit.common.configured.in
index 66935c358afedd..050792b6b26217 100644
--- a/compiler-rt/test/lit.common.configured.in
+++ b/compiler-rt/test/lit.common.configured.in
@@ -28,6 +28,7 @@ set_default("python_executable", "@Python3_EXECUTABLE@")
 set_default("compiler_rt_debug", @COMPILER_RT_DEBUG_PYBOOL@)
 set_default("compiler_rt_intercept_libdispatch", 
@COMPILER_RT_INTERCEPT_LIBDISPATCH_PYBOOL@)
 set_default("compiler_rt_output_dir", "@COMPILER_RT_RESOLVED_OUTPUT_DIR@")
+set_default("compiler_rt_bindir", "@COMPILER_RT_RESOLVED_EXEC_OUTPUT_DIR@")
 set_default("compiler_rt_libdir", "@COMPILER_RT_RESOLVED_LIBRARY_OUTPUT_DIR@")
 set_default("emulator", "@COMPILER_RT_EMULATOR@")
 set_default("asan_shadow_scale", "@COMPILER_RT_ASAN_SHADOW_SCALE@")
diff --git a/libcxx/CMakeLists.txt b/libcxx/CMakeLists.txt
index 95a7d10f055ea7..7b3f032fd82126 100644
--- a/libcxx/CMakeLists.txt
+++ b/libcxx/CMakeLists.txt
@@ -413,9 +413,9 @@ if(LLVM_ENABLE_PER_TARGET_RUNTIME_DIR AND NOT APPLE)
 string(APPEND LIBCXX_TARGET_SUBDIR /${LIBCXX_LIBDIR_SUBDIR})
   endif()
   set(LIBCXX_LIBRARY_DIR ${LLVM_LIBRARY_OUTPUT_INTDIR}/${LIBCXX_TARGET_SUBDIR})
-  set(LIBCXX_GENERATED_INCLUDE_DIR "${LLVM_BINARY_DIR}/include/c++/v1")
-  set(LIBCXX_GENERATED_MODULE_DIR "${LLVM_BINARY_DIR}/modules/c++/v1")
-  set(LIBCXX_GENERATED_INCLUDE_TARGET_DIR 
"${LLVM_BINARY_DIR}/include/${LIBCXX_TARGET_SUBDIR}/c++/v1")
+  set(LIBCXX_GENERATED_INCLUDE_DIR "${LIBCXX_BINARY_DIR}/include/c++/v1")
+  set(LIBCXX_GENERATED_MODULE_DIR "${LIBCXX_BINARY_DIR}/modules/c++/v1")
+  set(LIBCXX_GENERATED_INCLUDE_TARGET_DIR 
"${LIBCXX_BINARY_DIR}/include/${LIBCXX_TARGET_SUBDIR}/c++/v1")
   set(LIBCXX_INSTALL_LIBRARY_DIR 
lib${LLVM_LIBDIR_SUFFIX}/${LIBCXX_TARGET_SUBDIR} CACHE STRING
   "Path where built libc++ libraries should be installed.")
   set(LIBCXX_INSTALL_INCLUDE_TARGET_DIR 
"${CMAKE_INSTALL_INCLUDEDIR}/${LIBCXX_TARGET_SUBDIR}/c++/v1" CACHE STRING
@@ -424,13 +424,11 @@ if(LLVM_ENABLE_PER_TARGET_RUNTIME_DIR AND NOT APPLE)
 else()
   if(LLVM_LIBRARY_OUTPUT_INTDIR)
 set(LIBCXX_LIBRARY_DIR ${LLVM_LIBRARY_OUTPUT_INTDIR})
-set(LIBCXX_GENERATED_INCLUDE_DIR "${LLVM_BINARY_DIR}/include/c++/v1")
-set(LIBCXX_GENERATED_MODULE_DIR "${LLVM_BINARY_DIR}/modules/c++/v1")
   else()
 set(LIBCXX_LIBRARY_DIR ${CMAKE_BINARY_DIR}/lib${LIBCXX_LIBDIR_SUFFIX})
-set(LIB

[llvm-branch-commits] [clang] [llvm] AMDGPU: Support v_cvt_scalef32_pk32_{bf|f}6_{bf|fp}16 for gfx950 (PR #117592)

2024-11-25 Thread Matt Arsenault via llvm-branch-commits

arsenm wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/117592?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#117594** https://app.graphite.dev/github/pr/llvm/llvm-project/117594?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117593** https://app.graphite.dev/github/pr/llvm/llvm-project/117593?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117592** https://app.graphite.dev/github/pr/llvm/llvm-project/117592?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/117592?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#117591** https://app.graphite.dev/github/pr/llvm/llvm-project/117591?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117590** https://app.graphite.dev/github/pr/llvm/llvm-project/117590?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117418** https://app.graphite.dev/github/pr/llvm/llvm-project/117418?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117417** https://app.graphite.dev/github/pr/llvm/llvm-project/117417?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117384** https://app.graphite.dev/github/pr/llvm/llvm-project/117384?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117383** https://app.graphite.dev/github/pr/llvm/llvm-project/117383?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117382** https://app.graphite.dev/github/pr/llvm/llvm-project/117382?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117381** https://app.graphite.dev/github/pr/llvm/llvm-project/117381?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117380** https://app.graphite.dev/github/pr/llvm/llvm-project/117380?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117379** https://app.graphite.dev/github/pr/llvm/llvm-project/117379?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117378** https://app.graphite.dev/github/pr/llvm/llvm-project/117378?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117287** https://app.graphite.dev/github/pr/llvm/llvm-project/117287?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117286** https://app.graphite.dev/github/pr/llvm/llvm-project/117286?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117285** https://app.graphite.dev/github/pr/llvm/llvm-project/117285?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117284** https://app.graphite.dev/github/pr/llvm/llvm-project/117284?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117283** https://app.graphite.dev/github/pr/llvm/llvm-project/117283?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117263** https://app.graphite.dev/github/pr/llvm/llvm-project/117263?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117262** https

[llvm-branch-commits] [llvm] AMDGPU: MC support for v_cvt_scalef32_pk_fp4_{f|bf}16 on gfx950. (PR #117594)

2024-11-25 Thread Matt Arsenault via llvm-branch-commits

arsenm wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/117594?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#117594** https://app.graphite.dev/github/pr/llvm/llvm-project/117594?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/117594?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#117593** https://app.graphite.dev/github/pr/llvm/llvm-project/117593?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117592** https://app.graphite.dev/github/pr/llvm/llvm-project/117592?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117591** https://app.graphite.dev/github/pr/llvm/llvm-project/117591?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117590** https://app.graphite.dev/github/pr/llvm/llvm-project/117590?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117418** https://app.graphite.dev/github/pr/llvm/llvm-project/117418?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117417** https://app.graphite.dev/github/pr/llvm/llvm-project/117417?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117384** https://app.graphite.dev/github/pr/llvm/llvm-project/117384?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117383** https://app.graphite.dev/github/pr/llvm/llvm-project/117383?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117382** https://app.graphite.dev/github/pr/llvm/llvm-project/117382?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117381** https://app.graphite.dev/github/pr/llvm/llvm-project/117381?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117380** https://app.graphite.dev/github/pr/llvm/llvm-project/117380?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117379** https://app.graphite.dev/github/pr/llvm/llvm-project/117379?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117378** https://app.graphite.dev/github/pr/llvm/llvm-project/117378?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117287** https://app.graphite.dev/github/pr/llvm/llvm-project/117287?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117286** https://app.graphite.dev/github/pr/llvm/llvm-project/117286?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117285** https://app.graphite.dev/github/pr/llvm/llvm-project/117285?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117284** https://app.graphite.dev/github/pr/llvm/llvm-project/117284?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117283** https://app.graphite.dev/github/pr/llvm/llvm-project/117283?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117263** https://app.graphite.dev/github/pr/llvm/llvm-project/117263?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117262** https

[llvm-branch-commits] [clang] [llvm] AMDGPU: Add support for v_ashr_pk_i8/u8_i32 instructions for gfx950 (PR #117596)

2024-11-25 Thread Matt Arsenault via llvm-branch-commits

arsenm wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/117596?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#117597** https://app.graphite.dev/github/pr/llvm/llvm-project/117597?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117596** https://app.graphite.dev/github/pr/llvm/llvm-project/117596?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/117596?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#117595** https://app.graphite.dev/github/pr/llvm/llvm-project/117595?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117594** https://app.graphite.dev/github/pr/llvm/llvm-project/117594?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117593** https://app.graphite.dev/github/pr/llvm/llvm-project/117593?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117592** https://app.graphite.dev/github/pr/llvm/llvm-project/117592?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117591** https://app.graphite.dev/github/pr/llvm/llvm-project/117591?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117590** https://app.graphite.dev/github/pr/llvm/llvm-project/117590?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117418** https://app.graphite.dev/github/pr/llvm/llvm-project/117418?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117417** https://app.graphite.dev/github/pr/llvm/llvm-project/117417?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117384** https://app.graphite.dev/github/pr/llvm/llvm-project/117384?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117383** https://app.graphite.dev/github/pr/llvm/llvm-project/117383?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117382** https://app.graphite.dev/github/pr/llvm/llvm-project/117382?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117381** https://app.graphite.dev/github/pr/llvm/llvm-project/117381?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117380** https://app.graphite.dev/github/pr/llvm/llvm-project/117380?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117379** https://app.graphite.dev/github/pr/llvm/llvm-project/117379?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117378** https://app.graphite.dev/github/pr/llvm/llvm-project/117378?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117287** https://app.graphite.dev/github/pr/llvm/llvm-project/117287?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117286** https://app.graphite.dev/github/pr/llvm/llvm-project/117286?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117285** https://app.graphite.dev/github/pr/llvm/llvm-project/117285?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117284** https

[llvm-branch-commits] [clang] [llvm] AMDGPU: Support v_cvt_scalef32_2xpk16_{bf|fp}6_f32 for gfx950. (PR #117595)

2024-11-25 Thread Matt Arsenault via llvm-branch-commits

arsenm wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/117595?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#117596** https://app.graphite.dev/github/pr/llvm/llvm-project/117596?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117595** https://app.graphite.dev/github/pr/llvm/llvm-project/117595?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/117595?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#117594** https://app.graphite.dev/github/pr/llvm/llvm-project/117594?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117593** https://app.graphite.dev/github/pr/llvm/llvm-project/117593?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117592** https://app.graphite.dev/github/pr/llvm/llvm-project/117592?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117591** https://app.graphite.dev/github/pr/llvm/llvm-project/117591?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117590** https://app.graphite.dev/github/pr/llvm/llvm-project/117590?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117418** https://app.graphite.dev/github/pr/llvm/llvm-project/117418?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117417** https://app.graphite.dev/github/pr/llvm/llvm-project/117417?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117384** https://app.graphite.dev/github/pr/llvm/llvm-project/117384?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117383** https://app.graphite.dev/github/pr/llvm/llvm-project/117383?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117382** https://app.graphite.dev/github/pr/llvm/llvm-project/117382?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117381** https://app.graphite.dev/github/pr/llvm/llvm-project/117381?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117380** https://app.graphite.dev/github/pr/llvm/llvm-project/117380?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117379** https://app.graphite.dev/github/pr/llvm/llvm-project/117379?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117378** https://app.graphite.dev/github/pr/llvm/llvm-project/117378?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117287** https://app.graphite.dev/github/pr/llvm/llvm-project/117287?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117286** https://app.graphite.dev/github/pr/llvm/llvm-project/117286?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117285** https://app.graphite.dev/github/pr/llvm/llvm-project/117285?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117284** https://app.graphite.dev/github/pr/llvm/llvm-project/117284?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117283** https

[llvm-branch-commits] [clang] [llvm] AMDGPU: Add support for v_dot2_f32_bf16 instruction for gfx950 (PR #117597)

2024-11-25 Thread Matt Arsenault via llvm-branch-commits

arsenm wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/117597?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#117597** https://app.graphite.dev/github/pr/llvm/llvm-project/117597?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/117597?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#117596** https://app.graphite.dev/github/pr/llvm/llvm-project/117596?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117595** https://app.graphite.dev/github/pr/llvm/llvm-project/117595?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117594** https://app.graphite.dev/github/pr/llvm/llvm-project/117594?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117593** https://app.graphite.dev/github/pr/llvm/llvm-project/117593?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117592** https://app.graphite.dev/github/pr/llvm/llvm-project/117592?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117591** https://app.graphite.dev/github/pr/llvm/llvm-project/117591?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117590** https://app.graphite.dev/github/pr/llvm/llvm-project/117590?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117418** https://app.graphite.dev/github/pr/llvm/llvm-project/117418?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117417** https://app.graphite.dev/github/pr/llvm/llvm-project/117417?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117384** https://app.graphite.dev/github/pr/llvm/llvm-project/117384?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117383** https://app.graphite.dev/github/pr/llvm/llvm-project/117383?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117382** https://app.graphite.dev/github/pr/llvm/llvm-project/117382?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117381** https://app.graphite.dev/github/pr/llvm/llvm-project/117381?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117380** https://app.graphite.dev/github/pr/llvm/llvm-project/117380?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117379** https://app.graphite.dev/github/pr/llvm/llvm-project/117379?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117378** https://app.graphite.dev/github/pr/llvm/llvm-project/117378?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117287** https://app.graphite.dev/github/pr/llvm/llvm-project/117287?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117286** https://app.graphite.dev/github/pr/llvm/llvm-project/117286?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117285** https://app.graphite.dev/github/pr/llvm/llvm-project/117285?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117284** https

[llvm-branch-commits] [llvm] AMDGPU: MC support for v_cvt_scalef32_pk32_f32_[fp|bf]6 of gfx950 (PR #117590)

2024-11-25 Thread Matt Arsenault via llvm-branch-commits

arsenm wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/117590?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#117594** https://app.graphite.dev/github/pr/llvm/llvm-project/117594?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117593** https://app.graphite.dev/github/pr/llvm/llvm-project/117593?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117592** https://app.graphite.dev/github/pr/llvm/llvm-project/117592?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117591** https://app.graphite.dev/github/pr/llvm/llvm-project/117591?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117590** https://app.graphite.dev/github/pr/llvm/llvm-project/117590?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/117590?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#117418** https://app.graphite.dev/github/pr/llvm/llvm-project/117418?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117417** https://app.graphite.dev/github/pr/llvm/llvm-project/117417?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117384** https://app.graphite.dev/github/pr/llvm/llvm-project/117384?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117383** https://app.graphite.dev/github/pr/llvm/llvm-project/117383?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117382** https://app.graphite.dev/github/pr/llvm/llvm-project/117382?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117381** https://app.graphite.dev/github/pr/llvm/llvm-project/117381?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117380** https://app.graphite.dev/github/pr/llvm/llvm-project/117380?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117379** https://app.graphite.dev/github/pr/llvm/llvm-project/117379?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117378** https://app.graphite.dev/github/pr/llvm/llvm-project/117378?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117287** https://app.graphite.dev/github/pr/llvm/llvm-project/117287?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117286** https://app.graphite.dev/github/pr/llvm/llvm-project/117286?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117285** https://app.graphite.dev/github/pr/llvm/llvm-project/117285?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117284** https://app.graphite.dev/github/pr/llvm/llvm-project/117284?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117283** https://app.graphite.dev/github/pr/llvm/llvm-project/117283?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117263** https://app.graphite.dev/github/pr/llvm/llvm-project/117263?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117262** https

[llvm-branch-commits] [llvm] AMDGPU: MC support for v_cvt_scalef32_pk32_f32_[fp|bf]6 of gfx950 (PR #117590)

2024-11-25 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm ready_for_review 
https://github.com/llvm/llvm-project/pull/117590
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] AMDGPU: Add support for v_dot2_f32_bf16 instruction for gfx950 (PR #117597)

2024-11-25 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)


Changes

v_dot2_f32_bf16 was added in gfx11 along with v_dot2_f16_f16 and 
v_dot2_bf16_bf16.
All three instructions were part of Dot9 instructions in the compiler.

This patch will split existing dot9 (v_dot2_f16_f16, v_dot2_bf16_bf16, 
v_dot2_f32_bf16)
into new dot9 (v_dot2_f16_f16 and v_dot2_bf16_bf16), and dot12 
(v_dot2_f32_bf16).

All necessary changes to gfx11 and gfx12 are updated to reflect this change.

Co-authored-by: Sirish Pande 

---

Patch is 30.80 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/117597.diff


11 Files Affected:

- (modified) clang/include/clang/Basic/BuiltinsAMDGPU.def (+1-1) 
- (modified) clang/test/CodeGenOpenCL/amdgpu-features.cl (+12-12) 
- (modified) clang/test/CodeGenOpenCL/builtins-amdgcn-dl-insts-err.cl (+2-2) 
- (modified) clang/test/CodeGenOpenCL/builtins-amdgcn-gfx950.cl (+25) 
- (modified) llvm/lib/Target/AMDGPU/AMDGPU.td (+13-1) 
- (modified) llvm/lib/Target/AMDGPU/GCNSubtarget.h (+5) 
- (modified) llvm/lib/Target/AMDGPU/VOP3PInstructions.td (+3-2) 
- (modified) llvm/lib/TargetParser/TargetParser.cpp (+3) 
- (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fdot2.f32.bf16.ll (+68) 
- (added) llvm/test/MC/AMDGPU/gfx950_dlops.s (+61) 
- (modified) llvm/test/MC/Disassembler/AMDGPU/gfx950_dasm_vop3.txt (+60) 


``diff
diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def 
b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index fd449697e91216..7d0019eead96b6 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -263,7 +263,7 @@ TARGET_BUILTIN(__builtin_amdgcn_global_load_lds, 
"vv*1v*3IUiIiIUi", "t", "gfx940
 TARGET_BUILTIN(__builtin_amdgcn_fdot2, "fV2hV2hfIb", "nc", "dot10-insts")
 TARGET_BUILTIN(__builtin_amdgcn_fdot2_f16_f16, "hV2hV2hh", "nc", "dot9-insts")
 TARGET_BUILTIN(__builtin_amdgcn_fdot2_bf16_bf16, "sV2sV2ss", "nc", 
"dot9-insts")
-TARGET_BUILTIN(__builtin_amdgcn_fdot2_f32_bf16, "fV2sV2sfIb", "nc", 
"dot9-insts")
+TARGET_BUILTIN(__builtin_amdgcn_fdot2_f32_bf16, "fV2sV2sfIb", "nc", 
"dot12-insts")
 TARGET_BUILTIN(__builtin_amdgcn_sdot2, "SiV2SsV2SsSiIb", "nc", "dot2-insts")
 TARGET_BUILTIN(__builtin_amdgcn_udot2, "UiV2UsV2UsUiIb", "nc", "dot2-insts")
 TARGET_BUILTIN(__builtin_amdgcn_sdot4, "SiSiSiSiIb", "nc", "dot1-insts")
diff --git a/clang/test/CodeGenOpenCL/amdgpu-features.cl 
b/clang/test/CodeGenOpenCL/amdgpu-features.cl
index db7fd76ec91189..0b698035ee54c7 100644
--- a/clang/test/CodeGenOpenCL/amdgpu-features.cl
+++ b/clang/test/CodeGenOpenCL/amdgpu-features.cl
@@ -89,7 +89,7 @@
 // GFX941: 
"target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+fp8-conversion-insts,+fp8-insts,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64,+xf32-insts"
 // GFX942: 
"target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+fp8-conversion-insts,+fp8-insts,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64,+xf32-insts"
 // GFX9_4_Generic: 
"target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64"
-// GFX950: 
"target-features"="+16-bit-insts,+ashr-pk-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+f16bf16-to-fp6bf6-cvt-scale-insts,+fp8-conversion-insts,+fp8-insts,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+gfx950-insts,+mai-insts,+permlane16-swap,+permlane32-swap,+prng-inst,+s-memrealtime,+s-memtime-inst,+wavefrontsize64"
+// GFX950: 
"target-features"="+16-bit-insts,+ashr-pk-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot12-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,

[llvm-branch-commits] [clang] [llvm] AMDGPU: Add support for v_ashr_pk_i8/u8_i32 instructions for gfx950 (PR #117596)

2024-11-25 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)


Changes

This patch adds assembly and builtin support for v_ashr_pk_i8/u8_i32
instructions.

Co-authored-by: Sirish Pande 

---

Patch is 22.00 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/117596.diff


13 Files Affected:

- (modified) clang/include/clang/Basic/BuiltinsAMDGPU.def (+3) 
- (modified) clang/test/CodeGenOpenCL/amdgpu-features.cl (+1-1) 
- (modified) clang/test/CodeGenOpenCL/builtins-amdgcn-gfx950.cl (+46) 
- (modified) llvm/include/llvm/IR/IntrinsicsAMDGPU.td (+10) 
- (modified) llvm/lib/Target/AMDGPU/AMDGPU.td (+16-6) 
- (modified) llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp (+2) 
- (modified) llvm/lib/Target/AMDGPU/GCNSubtarget.h (+3) 
- (modified) llvm/lib/Target/AMDGPU/SIInstrInfo.td (+1) 
- (modified) llvm/lib/Target/AMDGPU/VOP3Instructions.td (+8) 
- (modified) llvm/lib/Target/AMDGPU/VOPInstructions.td (+1) 
- (modified) llvm/lib/TargetParser/TargetParser.cpp (+1) 
- (modified) llvm/test/MC/AMDGPU/gfx950_asm_vop3.s (+72) 
- (modified) llvm/test/MC/Disassembler/AMDGPU/gfx950_dasm_vop3.txt (+36) 


``diff
diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def 
b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index dacbf5aa902f60..fd449697e91216 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -467,6 +467,9 @@ TARGET_BUILTIN(__builtin_amdgcn_ds_read_tr6_b96_v3i32, 
"V3iV3i*3", "nc", "gfx950
 TARGET_BUILTIN(__builtin_amdgcn_ds_read_tr8_b64_v2i32, "V2iV2i*3", "nc", 
"gfx950-insts")
 TARGET_BUILTIN(__builtin_amdgcn_ds_read_tr16_b64_v4i16, "V4sV4s*3", "nc", 
"gfx950-insts")
 
+TARGET_BUILTIN(__builtin_amdgcn_ashr_pk_i8_i32, "UsUiUiUi", "nc", 
"ashr-pk-insts")
+TARGET_BUILTIN(__builtin_amdgcn_ashr_pk_u8_i32, "UsUiUiUi", "nc", 
"ashr-pk-insts")
+
 TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_2xpk16_fp6_f32, "V6UiV16fV16ff", 
"nc", "gfx950-insts")
 TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_2xpk16_bf6_f32, "V6UiV16fV16ff", 
"nc", "gfx950-insts")
 
diff --git a/clang/test/CodeGenOpenCL/amdgpu-features.cl 
b/clang/test/CodeGenOpenCL/amdgpu-features.cl
index 56013dad9b6651..db7fd76ec91189 100644
--- a/clang/test/CodeGenOpenCL/amdgpu-features.cl
+++ b/clang/test/CodeGenOpenCL/amdgpu-features.cl
@@ -89,7 +89,7 @@
 // GFX941: 
"target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+fp8-conversion-insts,+fp8-insts,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64,+xf32-insts"
 // GFX942: 
"target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+fp8-conversion-insts,+fp8-insts,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64,+xf32-insts"
 // GFX9_4_Generic: 
"target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64"
-// GFX950: 
"target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+f16bf16-to-fp6bf6-cvt-scale-insts,+fp8-conversion-insts,+fp8-insts,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+gfx950-insts,+mai-insts,+permlane16-swap,+permlane32-swap,+prng-inst,+s-memrealtime,+s-memtime-inst,+wavefrontsize64"
+// GFX950: 
"target-features"="+16-bit-insts,+ashr-pk-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+f16bf16-to-fp6bf6-cvt-scale-insts,+fp8-conversion-insts,+fp8-insts,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+gfx950-insts,+mai-insts,+permlane16-swap,+permlane32-swap,+prng-inst,+s-memrealtime,+s-memtime-inst,+wavefrontsize64"
 // GFX1010: 
"target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dpp,+gfx10-insts,+g

[llvm-branch-commits] [llvm] AMDGPU: MC support for v_cvt_scalef32_pk32_f32_[fp|bf]6 of gfx950 (PR #117590)

2024-11-25 Thread via llvm-branch-commits

github-actions[bot] wrote:




:warning: C/C++ code formatter, clang-format found issues in your code. 
:warning:



You can test this locally with the following command:


``bash
git-clang-format --diff 8f7e780a4014c19daa5e980d943a381a48e6152f 
5801905fe13b783780dc09cb3ac4c177c92b10d5 --extensions h,cpp -- 
llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h 
llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp 
llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.h
``





View the diff from clang-format here.


``diff
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h 
b/llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h
index 1a09f55dfd..ea77cfe720 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h
@@ -185,7 +185,9 @@ public:
 
   bool hasFP4ConversionScaleInsts() const { return HasFP4ConversionScaleInsts; 
}
 
-  bool hasFP6BF6ConversionScaleInsts() const { return 
HasFP6BF6ConversionScaleInsts; }
+  bool hasFP6BF6ConversionScaleInsts() const {
+return HasFP6BF6ConversionScaleInsts;
+  }
 
   bool hasMadMacF32Insts() const {
 return HasMadMacF32Insts || !isGCN();
diff --git a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp 
b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
index fa5f86b078..cb2c71bb0a 100644
--- a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
+++ b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
@@ -1530,7 +1530,8 @@ unsigned AMDGPUDisassembler::getVgprClassId(const 
OpWidthTy Width) const {
   case OPWV232: return VReg_64RegClassID;
   case OPW96: return VReg_96RegClassID;
   case OPW128: return VReg_128RegClassID;
-  case OPW192: return VReg_192RegClassID;
+  case OPW192:
+return VReg_192RegClassID;
   case OPW160: return VReg_160RegClassID;
   case OPW256: return VReg_256RegClassID;
   case OPW288: return VReg_288RegClassID;

``




https://github.com/llvm/llvm-project/pull/117590
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] AMDGPU: Support v_cvt_scalef32_pk32_{bf|f}6_{bf|fp}16 for gfx950 (PR #117592)

2024-11-25 Thread via llvm-branch-commits

github-actions[bot] wrote:




:warning: C/C++ code formatter, clang-format found issues in your code. 
:warning:



You can test this locally with the following command:


``bash
git-clang-format --diff 145c4c8611307f4039f390a1a69fad4fe4c14ee3 
3ba5c37284ce7df02470662c790cc5280e0a62a2 --extensions h,cpp -- 
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp 
llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h 
llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp 
llvm/lib/TargetParser/TargetParser.cpp
``





View the diff from clang-format here.


``diff
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h 
b/llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h
index 742f4e6e80..79e8bb9146 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h
@@ -188,7 +188,9 @@ public:
 
   bool hasFP6BF6ConversionScaleInsts() const { return 
HasFP6BF6ConversionScaleInsts; }
 
-  bool hasF16BF16ToFP6BF6ConversionScaleInsts() const { return 
HasF16BF16ToFP6BF6ConversionScaleInsts; }
+  bool hasF16BF16ToFP6BF6ConversionScaleInsts() const {
+return HasF16BF16ToFP6BF6ConversionScaleInsts;
+  }
 
   bool hasMadMacF32Insts() const {
 return HasMadMacF32Insts || !isGCN();

``




https://github.com/llvm/llvm-project/pull/117592
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] AMDGPU: Add support for v_dot2_f32_bf16 instruction for gfx950 (PR #117597)

2024-11-25 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm ready_for_review 
https://github.com/llvm/llvm-project/pull/117597
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] AMDGPU: Add support for v_dot2c_f32_bf16 instruction for gfx950 (PR #117598)

2024-11-25 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm ready_for_review 
https://github.com/llvm/llvm-project/pull/117598
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Add minimum3/maximum3 pkf16 for gfx950 encodings (PR #117601)

2024-11-25 Thread Matt Arsenault via llvm-branch-commits

arsenm wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/117601?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#117601** https://app.graphite.dev/github/pr/llvm/llvm-project/117601?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/117601?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#117600** https://app.graphite.dev/github/pr/llvm/llvm-project/117600?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117599** https://app.graphite.dev/github/pr/llvm/llvm-project/117599?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117598** https://app.graphite.dev/github/pr/llvm/llvm-project/117598?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117597** https://app.graphite.dev/github/pr/llvm/llvm-project/117597?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117596** https://app.graphite.dev/github/pr/llvm/llvm-project/117596?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117595** https://app.graphite.dev/github/pr/llvm/llvm-project/117595?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117594** https://app.graphite.dev/github/pr/llvm/llvm-project/117594?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117593** https://app.graphite.dev/github/pr/llvm/llvm-project/117593?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117592** https://app.graphite.dev/github/pr/llvm/llvm-project/117592?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117591** https://app.graphite.dev/github/pr/llvm/llvm-project/117591?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117590** https://app.graphite.dev/github/pr/llvm/llvm-project/117590?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117418** https://app.graphite.dev/github/pr/llvm/llvm-project/117418?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117417** https://app.graphite.dev/github/pr/llvm/llvm-project/117417?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117384** https://app.graphite.dev/github/pr/llvm/llvm-project/117384?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117383** https://app.graphite.dev/github/pr/llvm/llvm-project/117383?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117382** https://app.graphite.dev/github/pr/llvm/llvm-project/117382?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117381** https://app.graphite.dev/github/pr/llvm/llvm-project/117381?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117380** https://app.graphite.dev/github/pr/llvm/llvm-project/117380?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117379** https://app.graphite.dev/github/pr/llvm/llvm-project/117379?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117378** https

[llvm-branch-commits] [clang] [llvm] AMDGPU: Support buffer_atomic_pk_add_bf16 for gfx950 (PR #117599)

2024-11-25 Thread Matt Arsenault via llvm-branch-commits

arsenm wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/117599?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#117600** https://app.graphite.dev/github/pr/llvm/llvm-project/117600?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117599** https://app.graphite.dev/github/pr/llvm/llvm-project/117599?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/117599?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#117598** https://app.graphite.dev/github/pr/llvm/llvm-project/117598?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117597** https://app.graphite.dev/github/pr/llvm/llvm-project/117597?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117596** https://app.graphite.dev/github/pr/llvm/llvm-project/117596?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117595** https://app.graphite.dev/github/pr/llvm/llvm-project/117595?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117594** https://app.graphite.dev/github/pr/llvm/llvm-project/117594?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117593** https://app.graphite.dev/github/pr/llvm/llvm-project/117593?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117592** https://app.graphite.dev/github/pr/llvm/llvm-project/117592?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117591** https://app.graphite.dev/github/pr/llvm/llvm-project/117591?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117590** https://app.graphite.dev/github/pr/llvm/llvm-project/117590?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117418** https://app.graphite.dev/github/pr/llvm/llvm-project/117418?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117417** https://app.graphite.dev/github/pr/llvm/llvm-project/117417?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117384** https://app.graphite.dev/github/pr/llvm/llvm-project/117384?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117383** https://app.graphite.dev/github/pr/llvm/llvm-project/117383?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117382** https://app.graphite.dev/github/pr/llvm/llvm-project/117382?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117381** https://app.graphite.dev/github/pr/llvm/llvm-project/117381?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117380** https://app.graphite.dev/github/pr/llvm/llvm-project/117380?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117379** https://app.graphite.dev/github/pr/llvm/llvm-project/117379?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117378** https://app.graphite.dev/github/pr/llvm/llvm-project/117378?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117287** https

[llvm-branch-commits] [clang] [llvm] AMDGPU: Support buffer_atomic_pk_add_bf16 for gfx950 (PR #117599)

2024-11-25 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm ready_for_review 
https://github.com/llvm/llvm-project/pull/117599
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Add encodings for minimum3/maximum3 f32 for gfx950 (PR #117600)

2024-11-25 Thread Matt Arsenault via llvm-branch-commits

arsenm wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/117600?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#117600** https://app.graphite.dev/github/pr/llvm/llvm-project/117600?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/117600?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#117599** https://app.graphite.dev/github/pr/llvm/llvm-project/117599?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117598** https://app.graphite.dev/github/pr/llvm/llvm-project/117598?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117597** https://app.graphite.dev/github/pr/llvm/llvm-project/117597?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117596** https://app.graphite.dev/github/pr/llvm/llvm-project/117596?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117595** https://app.graphite.dev/github/pr/llvm/llvm-project/117595?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117594** https://app.graphite.dev/github/pr/llvm/llvm-project/117594?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117593** https://app.graphite.dev/github/pr/llvm/llvm-project/117593?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117592** https://app.graphite.dev/github/pr/llvm/llvm-project/117592?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117591** https://app.graphite.dev/github/pr/llvm/llvm-project/117591?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117590** https://app.graphite.dev/github/pr/llvm/llvm-project/117590?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117418** https://app.graphite.dev/github/pr/llvm/llvm-project/117418?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117417** https://app.graphite.dev/github/pr/llvm/llvm-project/117417?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117384** https://app.graphite.dev/github/pr/llvm/llvm-project/117384?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117383** https://app.graphite.dev/github/pr/llvm/llvm-project/117383?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117382** https://app.graphite.dev/github/pr/llvm/llvm-project/117382?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117381** https://app.graphite.dev/github/pr/llvm/llvm-project/117381?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117380** https://app.graphite.dev/github/pr/llvm/llvm-project/117380?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117379** https://app.graphite.dev/github/pr/llvm/llvm-project/117379?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117378** https://app.graphite.dev/github/pr/llvm/llvm-project/117378?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117287** https

[llvm-branch-commits] [clang] [llvm] AMDGPU: Add support for v_dot2c_f32_bf16 instruction for gfx950 (PR #117598)

2024-11-25 Thread Matt Arsenault via llvm-branch-commits

arsenm wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/117598?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#117600** https://app.graphite.dev/github/pr/llvm/llvm-project/117600?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117599** https://app.graphite.dev/github/pr/llvm/llvm-project/117599?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117598** https://app.graphite.dev/github/pr/llvm/llvm-project/117598?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/117598?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#117597** https://app.graphite.dev/github/pr/llvm/llvm-project/117597?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117596** https://app.graphite.dev/github/pr/llvm/llvm-project/117596?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117595** https://app.graphite.dev/github/pr/llvm/llvm-project/117595?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117594** https://app.graphite.dev/github/pr/llvm/llvm-project/117594?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117593** https://app.graphite.dev/github/pr/llvm/llvm-project/117593?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117592** https://app.graphite.dev/github/pr/llvm/llvm-project/117592?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117591** https://app.graphite.dev/github/pr/llvm/llvm-project/117591?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117590** https://app.graphite.dev/github/pr/llvm/llvm-project/117590?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117418** https://app.graphite.dev/github/pr/llvm/llvm-project/117418?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117417** https://app.graphite.dev/github/pr/llvm/llvm-project/117417?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117384** https://app.graphite.dev/github/pr/llvm/llvm-project/117384?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117383** https://app.graphite.dev/github/pr/llvm/llvm-project/117383?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117382** https://app.graphite.dev/github/pr/llvm/llvm-project/117382?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117381** https://app.graphite.dev/github/pr/llvm/llvm-project/117381?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117380** https://app.graphite.dev/github/pr/llvm/llvm-project/117380?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117379** https://app.graphite.dev/github/pr/llvm/llvm-project/117379?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117378** https://app.graphite.dev/github/pr/llvm/llvm-project/117378?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#117287** https

[llvm-branch-commits] [llvm] AMDGPU: Add encodings for minimum3/maximum3 f32 for gfx950 (PR #117600)

2024-11-25 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm ready_for_review 
https://github.com/llvm/llvm-project/pull/117600
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] AMDGPU: Add support for v_ashr_pk_i8/u8_i32 instructions for gfx950 (PR #117596)

2024-11-25 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm created 
https://github.com/llvm/llvm-project/pull/117596

This patch adds assembly and builtin support for v_ashr_pk_i8/u8_i32
instructions.

Co-authored-by: Sirish Pande 

>From 75056a46ee4d7eb6543c2ce99a157a1627a54158 Mon Sep 17 00:00:00 2001
From: Sirish Pande 
Date: Tue, 13 Feb 2024 10:54:51 -0600
Subject: [PATCH] AMDGPU: Add support for v_ashr_pk_i8/u8_i32 instructions for
 gfx950

This patch adds assembly and builtin support for v_ashr_pk_i8/u8_i32
instructions.

Co-authored-by: Sirish Pande 
---
 clang/include/clang/Basic/BuiltinsAMDGPU.def  |  3 +
 clang/test/CodeGenOpenCL/amdgpu-features.cl   |  2 +-
 .../CodeGenOpenCL/builtins-amdgcn-gfx950.cl   | 46 
 llvm/include/llvm/IR/IntrinsicsAMDGPU.td  | 10 +++
 llvm/lib/Target/AMDGPU/AMDGPU.td  | 22 --
 .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp  |  2 +
 llvm/lib/Target/AMDGPU/GCNSubtarget.h |  3 +
 llvm/lib/Target/AMDGPU/SIInstrInfo.td |  1 +
 llvm/lib/Target/AMDGPU/VOP3Instructions.td|  8 +++
 llvm/lib/Target/AMDGPU/VOPInstructions.td |  1 +
 llvm/lib/TargetParser/TargetParser.cpp|  1 +
 llvm/test/MC/AMDGPU/gfx950_asm_vop3.s | 72 +++
 .../Disassembler/AMDGPU/gfx950_dasm_vop3.txt  | 36 ++
 13 files changed, 200 insertions(+), 7 deletions(-)

diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def 
b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index dacbf5aa902f60..fd449697e91216 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -467,6 +467,9 @@ TARGET_BUILTIN(__builtin_amdgcn_ds_read_tr6_b96_v3i32, 
"V3iV3i*3", "nc", "gfx950
 TARGET_BUILTIN(__builtin_amdgcn_ds_read_tr8_b64_v2i32, "V2iV2i*3", "nc", 
"gfx950-insts")
 TARGET_BUILTIN(__builtin_amdgcn_ds_read_tr16_b64_v4i16, "V4sV4s*3", "nc", 
"gfx950-insts")
 
+TARGET_BUILTIN(__builtin_amdgcn_ashr_pk_i8_i32, "UsUiUiUi", "nc", 
"ashr-pk-insts")
+TARGET_BUILTIN(__builtin_amdgcn_ashr_pk_u8_i32, "UsUiUiUi", "nc", 
"ashr-pk-insts")
+
 TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_2xpk16_fp6_f32, "V6UiV16fV16ff", 
"nc", "gfx950-insts")
 TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_2xpk16_bf6_f32, "V6UiV16fV16ff", 
"nc", "gfx950-insts")
 
diff --git a/clang/test/CodeGenOpenCL/amdgpu-features.cl 
b/clang/test/CodeGenOpenCL/amdgpu-features.cl
index 56013dad9b6651..db7fd76ec91189 100644
--- a/clang/test/CodeGenOpenCL/amdgpu-features.cl
+++ b/clang/test/CodeGenOpenCL/amdgpu-features.cl
@@ -89,7 +89,7 @@
 // GFX941: 
"target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+fp8-conversion-insts,+fp8-insts,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64,+xf32-insts"
 // GFX942: 
"target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+fp8-conversion-insts,+fp8-insts,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64,+xf32-insts"
 // GFX9_4_Generic: 
"target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64"
-// GFX950: 
"target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+f16bf16-to-fp6bf6-cvt-scale-insts,+fp8-conversion-insts,+fp8-insts,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+gfx950-insts,+mai-insts,+permlane16-swap,+permlane32-swap,+prng-inst,+s-memrealtime,+s-memtime-inst,+wavefrontsize64"
+// GFX950: 
"target-features"="+16-bit-insts,+ashr-pk-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+f16bf16-to-fp6bf6-cvt-scale-insts,+fp8-conversion-insts,+fp8-insts,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+gfx950-insts,+mai-insts,+permlane16-swap,+permlane32-swap,+prng-inst,+s-memre

[llvm-branch-commits] [clang] [llvm] AMDGPU: Support v_cvt_scalef32_2xpk16_{bf|fp}6_f32 for gfx950. (PR #117595)

2024-11-25 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm created 
https://github.com/llvm/llvm-project/pull/117595

Scale packed 16-component single-precision float vectors from
two  source inputs using the exponent provided by the third
single-precision float input, then convert the values to a packed
32-component FP6 float value.

Co-authored-by: Pravin Jagtap 

>From a559035a27de3a7cde8e07f6438814b1cce79a14 Mon Sep 17 00:00:00 2001
From: Pravin Jagtap 
Date: Mon, 8 Apr 2024 08:56:14 -0400
Subject: [PATCH] AMDGPU: Support v_cvt_scalef32_2xpk16_{bf|fp}6_f32 for
 gfx950.

Scale packed 16-component single-precision float vectors from
two  source inputs using the exponent provided by the third
single-precision float input, then convert the values to a packed
32-component FP6 float value.

Co-authored-by: Pravin Jagtap 
---
 clang/include/clang/Basic/BuiltinsAMDGPU.def  |   3 +
 .../CodeGenOpenCL/builtins-amdgcn-gfx950.cl   |  22 ++-
 llvm/include/llvm/IR/IntrinsicsAMDGPU.td  |   6 +
 .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp  |   2 +
 llvm/lib/Target/AMDGPU/SIInstrInfo.td |   1 +
 llvm/lib/Target/AMDGPU/VOP3Instructions.td|   8 ++
 .../llvm.amdgcn.cvt.scalef32.pk.gfx950.ll | 128 ++
 llvm/test/MC/AMDGPU/gfx950_asm_features.s |  24 
 llvm/test/MC/AMDGPU/gfx950_err.s  |  24 
 .../Disassembler/AMDGPU/gfx950_dasm_vop3.txt  |  18 +++
 10 files changed, 235 insertions(+), 1 deletion(-)
 create mode 100644 
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.scalef32.pk.gfx950.ll

diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def 
b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index e09dc0e1107a82..dacbf5aa902f60 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -467,6 +467,9 @@ TARGET_BUILTIN(__builtin_amdgcn_ds_read_tr6_b96_v3i32, 
"V3iV3i*3", "nc", "gfx950
 TARGET_BUILTIN(__builtin_amdgcn_ds_read_tr8_b64_v2i32, "V2iV2i*3", "nc", 
"gfx950-insts")
 TARGET_BUILTIN(__builtin_amdgcn_ds_read_tr16_b64_v4i16, "V4sV4s*3", "nc", 
"gfx950-insts")
 
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_2xpk16_fp6_f32, "V6UiV16fV16ff", 
"nc", "gfx950-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_2xpk16_bf6_f32, "V6UiV16fV16ff", 
"nc", "gfx950-insts")
+
 
//===--===//
 // GFX12+ only builtins.
 
//===--===//
diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx950.cl 
b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx950.cl
index 779aadd96f3f41..6f3c81b26be0b8 100644
--- a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx950.cl
+++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx950.cl
@@ -7,6 +7,8 @@ typedef unsigned int __attribute__((ext_vector_type(2))) uint2;
 typedef unsigned int __attribute__((ext_vector_type(6))) uint6;
 typedef __bf16 __attribute__((ext_vector_type(32))) bfloat32;
 typedef half __attribute__((ext_vector_type(32))) half32;
+typedef short __attribute__((ext_vector_type(2))) short2;
+typedef float __attribute__((ext_vector_type(16))) float16;
 
 // CHECK-LABEL: @test_prng_b32(
 // CHECK-NEXT:  entry:
@@ -115,10 +117,14 @@ void test_permlane32_swap(global uint2* out, uint old, 
uint src) {
 // CHECK-NEXT:[[OUT6_ADDR:%.*]] = alloca ptr addrspace(1), align 8, 
addrspace(5)
 // CHECK-NEXT:[[SRCBF32_ADDR:%.*]] = alloca <32 x bfloat>, align 64, 
addrspace(5)
 // CHECK-NEXT:[[SRCH32_ADDR:%.*]] = alloca <32 x half>, align 64, 
addrspace(5)
+// CHECK-NEXT:[[SRC0F32_ADDR:%.*]] = alloca <16 x float>, align 64, 
addrspace(5)
+// CHECK-NEXT:[[SRC1F32_ADDR:%.*]] = alloca <16 x float>, align 64, 
addrspace(5)
 // CHECK-NEXT:[[SCALE_ADDR:%.*]] = alloca float, align 4, addrspace(5)
 // CHECK-NEXT:store ptr addrspace(1) [[OUT6:%.*]], ptr addrspace(5) 
[[OUT6_ADDR]], align 8
 // CHECK-NEXT:store <32 x bfloat> [[SRCBF32:%.*]], ptr addrspace(5) 
[[SRCBF32_ADDR]], align 64
 // CHECK-NEXT:store <32 x half> [[SRCH32:%.*]], ptr addrspace(5) 
[[SRCH32_ADDR]], align 64
+// CHECK-NEXT:store <16 x float> [[SRC0F32:%.*]], ptr addrspace(5) 
[[SRC0F32_ADDR]], align 64
+// CHECK-NEXT:store <16 x float> [[SRC1F32:%.*]], ptr addrspace(5) 
[[SRC1F32_ADDR]], align 64
 // CHECK-NEXT:store float [[SCALE:%.*]], ptr addrspace(5) [[SCALE_ADDR]], 
align 4
 // CHECK-NEXT:[[TMP0:%.*]] = load <32 x bfloat>, ptr addrspace(5) 
[[SRCBF32_ADDR]], align 64
 // CHECK-NEXT:[[TMP1:%.*]] = load float, ptr addrspace(5) [[SCALE_ADDR]], 
align 4
@@ -140,12 +146,26 @@ void test_permlane32_swap(global uint2* out, uint old, 
uint src) {
 // CHECK-NEXT:[[TMP14:%.*]] = call <6 x i32> 
@llvm.amdgcn.cvt.scalef32.pk32.fp6.f16(<32 x half> [[TMP12]], float [[TMP13]])
 // CHECK-NEXT:[[TMP15:%.*]] = load ptr addrspace(1), ptr addrspace(5) 
[[OUT6_ADDR]], align 8
 // CHECK-NEXT:store <6 x i32> [[TMP14]], ptr addrspace(1) [[TMP15]], align 
32
+// CHECK-NEXT:[[TMP16:%.*]] = load <16 x flo

[llvm-branch-commits] [llvm] AMDGPU: MC support for v_cvt_scalef32_pk32_{bf|f}16_{bf|fp}6 of gfx950. (PR #117591)

2024-11-25 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm created 
https://github.com/llvm/llvm-project/pull/117591

Co-authored-by: Pravin Jagtap 

>From 145c4c8611307f4039f390a1a69fad4fe4c14ee3 Mon Sep 17 00:00:00 2001
From: Pravin Jagtap 
Date: Mon, 8 Apr 2024 01:53:50 -0400
Subject: [PATCH] AMDGPU: MC support for v_cvt_scalef32_pk32_{bf|f}16_{bf|fp}6
 of gfx950.

Co-authored-by: Pravin Jagtap 
---
 llvm/lib/Target/AMDGPU/SIInstrInfo.td |  1 +
 llvm/lib/Target/AMDGPU/VOP3Instructions.td|  8 
 llvm/test/MC/AMDGPU/gfx950_asm_features.s | 22 -
 llvm/test/MC/AMDGPU/gfx950_err.s  | 48 +++
 .../Disassembler/AMDGPU/gfx950_dasm_vop3.txt  | 12 +
 5 files changed, 90 insertions(+), 1 deletion(-)

diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.td 
b/llvm/lib/Target/AMDGPU/SIInstrInfo.td
index f20d6526e20b2c..ea36347423c57c 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.td
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.td
@@ -1697,6 +1697,7 @@ class getVALUDstForVT {
VOPDstOperand_t16Lo128),
 VOPDstOperand);
   RegisterOperand ret = !cond(!eq(VT.Size, 1024) : VOPDstOperand,
+  !eq(VT.Size, 512) : VOPDstOperand,
   !eq(VT.Size, 256) : VOPDstOperand,
   !eq(VT.Size, 128) : VOPDstOperand,
   !eq(VT.Size, 64)  : VOPDstOperand,
diff --git a/llvm/lib/Target/AMDGPU/VOP3Instructions.td 
b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
index 1009f2d9593609..554aff7082010a 100644
--- a/llvm/lib/Target/AMDGPU/VOP3Instructions.td
+++ b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
@@ -966,6 +966,10 @@ let SubtargetPredicate = HasFP4ConversionScaleInsts, 
mayRaiseFPException = 0 in
 let SubtargetPredicate = HasFP6BF6ConversionScaleInsts, mayRaiseFPException = 
0 in {
   defm V_CVT_SCALEF32_PK32_F32_FP6  : VOP3Inst<"v_cvt_scalef32_pk32_f32_fp6", 
VOP3_CVT_SCALEF32_PK_F864_Profile>;
   defm V_CVT_SCALEF32_PK32_F32_BF6  : VOP3Inst<"v_cvt_scalef32_pk32_f32_bf6", 
VOP3_CVT_SCALEF32_PK_F864_Profile>;
+  defm V_CVT_SCALEF32_PK32_F16_FP6  : VOP3Inst<"v_cvt_scalef32_pk32_f16_fp6",  
VOP3_CVT_SCALEF32_PK_F864_Profile>;
+  defm V_CVT_SCALEF32_PK32_BF16_FP6 : VOP3Inst<"v_cvt_scalef32_pk32_bf16_fp6", 
VOP3_CVT_SCALEF32_PK_F864_Profile>;
+  defm V_CVT_SCALEF32_PK32_F16_BF6  : VOP3Inst<"v_cvt_scalef32_pk32_f16_bf6",  
VOP3_CVT_SCALEF32_PK_F864_Profile>;
+  defm V_CVT_SCALEF32_PK32_BF16_BF6 : VOP3Inst<"v_cvt_scalef32_pk32_bf16_bf6", 
VOP3_CVT_SCALEF32_PK_F864_Profile>;
 }
 
 let SubtargetPredicate = isGFX10Plus in {
@@ -1915,4 +1919,8 @@ defm V_CVT_SCALEF32_PK_BF16_FP4 : VOP3OpSel_Real_gfx9 
<0x251>;
 let OtherPredicates = [HasFP6BF6ConversionScaleInsts] in {
 defm V_CVT_SCALEF32_PK32_F32_FP6 : VOP3_Real_gfx9<0x256, 
"v_cvt_scalef32_pk32_f32_fp6">;
 defm V_CVT_SCALEF32_PK32_F32_BF6 : VOP3_Real_gfx9<0x257, 
"v_cvt_scalef32_pk32_f32_bf6">;
+defm V_CVT_SCALEF32_PK32_F16_FP6  : VOP3_Real_gfx9<0x260, 
"v_cvt_scalef32_pk32_f16_fp6">;
+defm V_CVT_SCALEF32_PK32_BF16_FP6 : VOP3_Real_gfx9<0x261, 
"v_cvt_scalef32_pk32_bf16_fp6">;
+defm V_CVT_SCALEF32_PK32_F16_BF6  : VOP3_Real_gfx9<0x262, 
"v_cvt_scalef32_pk32_f16_bf6">;
+defm V_CVT_SCALEF32_PK32_BF16_BF6 : VOP3_Real_gfx9<0x263, 
"v_cvt_scalef32_pk32_bf16_bf6">;
 }
diff --git a/llvm/test/MC/AMDGPU/gfx950_asm_features.s 
b/llvm/test/MC/AMDGPU/gfx950_asm_features.s
index 95d31d2293075f..271ad4d62c3a43 100644
--- a/llvm/test/MC/AMDGPU/gfx950_asm_features.s
+++ b/llvm/test/MC/AMDGPU/gfx950_asm_features.s
@@ -892,4 +892,24 @@ v_cvt_scalef32_pk32_f32_fp6 v[2:33], v[2:7], v6
 
 // NOT-GFX950: :[[@LINE+2]]:{{[0-9]+}}: error:
 // GFX950: v_cvt_scalef32_pk32_f32_bf6 v[2:33], v[2:7], v6 ; encoding: 
[0x02,0x00,0x57,0xd2,0x02,0x0d,0x02,0x00]
-v_cvt_scalef32_pk32_f32_bf6 v[2:33], v[2:7], v6
\ No newline at end of file
+v_cvt_scalef32_pk32_f32_bf6 v[2:33], v[2:7], v6
+
+// NOT-GFX950: :[[@LINE+2]]:{{[0-9]+}}: error:
+// GFX950: v_cvt_scalef32_pk32_bf16_bf6 v[10:25], v[20:25], v8 ; encoding: 
[0x0a,0x00,0x63,0xd2,0x14,0x11,0x02,0x00]
+v_cvt_scalef32_pk32_bf16_bf6 v[10:25], v[20:25], v8
+
+// NOT-GFX950: :[[@LINE+2]]:{{[0-9]+}}: error:
+// GFX950: v_cvt_scalef32_pk32_bf16_bf6 v[10:25], v[20:25], v8 ; encoding: 
[0x0a,0x00,0x63,0xd2,0x14,0x11,0x02,0x00]
+v_cvt_scalef32_pk32_bf16_bf6 v[10:25], v[20:25], v8
+
+// NOT-GFX950: :[[@LINE+2]]:{{[0-9]+}}: error:
+// GFX950: v_cvt_scalef32_pk32_f16_bf6 v[10:25], v[20:25], v8 ; encoding: 
[0x0a,0x00,0x62,0xd2,0x14,0x11,0x02,0x00]
+v_cvt_scalef32_pk32_f16_bf6 v[10:25], v[20:25], v8
+
+// NOT-GFX950: :[[@LINE+2]]:{{[0-9]+}}: error:
+// GFX950: v_cvt_scalef32_pk32_bf16_fp6 v[10:25], v[20:25], v8 ; encoding: 
[0x0a,0x00,0x61,0xd2,0x14,0x11,0x02,0x00]
+v_cvt_scalef32_pk32_bf16_fp6 v[10:25], v[20:25], v8
+
+// NOT-GFX950: :[[@LINE+2]]:{{[0-9]+}}: error:
+// GFX950: v_cvt_scalef32_pk32_f16_fp6 v[10:25], v[20:25], v8 ; encoding: 
[0x0a,0x00,0x60,0xd2,0x14,0x11,0x02,0x00]
+v_cvt_scalef32_pk32_f16_fp6 v[10:25

[llvm-branch-commits] [llvm] AMDGPU: MC support for v_cvt_scalef32_pk_fp4_{f|bf}16 on gfx950. (PR #117594)

2024-11-25 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm created 
https://github.com/llvm/llvm-project/pull/117594

These instructions have non-standard use of OPSEL bits to select
dest write byte. The src2_modifiers operand is used without having
its corresponding src2 operand by introducing dummy src2.

Co-authored-by: Pravin Jagtap 

>From a87b139e074e856cd0c61ef61e8f092feff6bff6 Mon Sep 17 00:00:00 2001
From: Pravin Jagtap 
Date: Wed, 10 Apr 2024 05:47:54 -0400
Subject: [PATCH] AMDGPU: MC support for v_cvt_scalef32_pk_fp4_{f|bf}16 on
 gfx950.

These instructions have non-standard use of OPSEL bits to select
dest write byte. The src2_modifiers operand is used without having
its corresponding src2 operand by introducing dummy src2.

Co-authored-by: Pravin Jagtap 
---
 .../AMDGPU/AsmParser/AMDGPUAsmParser.cpp  |  4 +-
 llvm/lib/Target/AMDGPU/VOP3Instructions.td| 26 
 llvm/test/MC/AMDGPU/gfx950_asm_features.s | 40 +++
 llvm/test/MC/AMDGPU/gfx950_err.s  | 24 +++
 .../Disassembler/AMDGPU/gfx950_dasm_vop3.txt  | 30 ++
 5 files changed, 123 insertions(+), 1 deletion(-)

diff --git a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp 
b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
index a1d45822837c5f..afd35842ba87f4 100644
--- a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
+++ b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
@@ -8824,7 +8824,9 @@ void AMDGPUAsmParser::cvtVOP3P(MCInst &Inst, const 
OperandVector &Operands,
 
   const bool IsPacked = (Desc.TSFlags & SIInstrFlags::IsPacked) != 0;
 
-  if (Opc == AMDGPU::V_CVT_SR_BF8_F32_vi ||
+  if (Opc == AMDGPU::V_CVT_SCALEF32_PK_FP4_F16_vi ||
+  Opc == AMDGPU::V_CVT_SCALEF32_PK_FP4_BF16_vi ||
+  Opc == AMDGPU::V_CVT_SR_BF8_F32_vi ||
   Opc == AMDGPU::V_CVT_SR_FP8_F32_vi ||
   Opc == AMDGPU::V_CVT_SR_BF8_F32_gfx12_e64_gfx12 ||
   Opc == AMDGPU::V_CVT_SR_FP8_F32_gfx12_e64_gfx12) {
diff --git a/llvm/lib/Target/AMDGPU/VOP3Instructions.td 
b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
index fdffb2c36dcccf..7776688156419a 100644
--- a/llvm/lib/Target/AMDGPU/VOP3Instructions.td
+++ b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
@@ -899,6 +899,23 @@ def VOP3_CVT_SCALE_FP4FP8BF8_F32_Profile : 
VOP3_Profile,
+  VOP3_OPSEL> {
+  let InsVOP3OpSel = (ins FP32InputMods:$src0_modifiers, Src0RC64:$src0,
+  FP32InputMods:$src1_modifiers, Src1RC64:$src1,
+  FP32InputMods:$src2_modifiers, VGPR_32:$src2,
+  op_sel0:$op_sel);
+  let HasClamp = 0;
+  let HasSrc2 = 0;
+  let HasSrc2Mods = 1;
+  let HasOpSel = 1;
+  let AsmVOP3OpSel = !subst(", $src2_modifiers", "",
+getAsmVOP3OpSel<3, HasClamp, HasOMod,
+HasSrc0FloatMods, HasSrc1FloatMods,
+HasSrc2FloatMods>.ret);
+  let HasExtVOP3DPP = 0;
+}
+
 class VOP3_CVT_SCALE_PK_F16BF16F32_FP4FP8BF8_Profile : 
VOP3_Profile,
   VOP3_OPSEL> {
   let InsVOP3OpSel = (ins FP32InputMods:$src0_modifiers, Src0RC64:$src0,
@@ -965,6 +982,13 @@ let SubtargetPredicate = HasFP4ConversionScaleInsts, 
mayRaiseFPException = 0 in
   defm V_CVT_SCALEF32_PK_FP4_F32 : VOP3Inst<"v_cvt_scalef32_pk_fp4_f32", 
VOP3_CVT_SCALE_FP4FP8BF8_F32_Profile>;
   defm V_CVT_SCALEF32_PK_F16_FP4 : VOP3Inst<"v_cvt_scalef32_pk_f16_fp4", 
VOP3_CVT_SCALE_PK_F16BF16F32_FP4FP8BF8_Profile>;
   defm V_CVT_SCALEF32_PK_BF16_FP4 : VOP3Inst<"v_cvt_scalef32_pk_bf16_fp4", 
VOP3_CVT_SCALE_PK_F16BF16F32_FP4FP8BF8_Profile>;
+
+  // These instructions have non-standard use of op_sel. In particular they are
+  // using op_sel bits 2 and 3 while only having two sources.
+  let Constraints = "$vdst = $src2", DisableEncoding = "$src2" in {
+defm V_CVT_SCALEF32_PK_FP4_F16 : VOP3Inst<"v_cvt_scalef32_pk_fp4_f16", 
VOP3_CVT_SCALE_FP4_F16BF16_Profile>;
+defm V_CVT_SCALEF32_PK_FP4_BF16 : VOP3Inst<"v_cvt_scalef32_pk_fp4_bf16", 
VOP3_CVT_SCALE_FP4_F16BF16_Profile>;
+  }
 }
 
 let SubtargetPredicate = HasFP6BF6ConversionScaleInsts, mayRaiseFPException = 
0 in {
@@ -1930,6 +1954,8 @@ defm V_CVT_SCALEF32_PK_F32_FP4 : VOP3OpSel_Real_gfx9 
<0x23f>;
 defm V_CVT_SCALEF32_PK_FP4_F32 : VOP3OpSel_Real_gfx9 <0x23d>;
 defm V_CVT_SCALEF32_PK_F16_FP4 : VOP3OpSel_Real_gfx9 <0x250>;
 defm V_CVT_SCALEF32_PK_BF16_FP4 : VOP3OpSel_Real_gfx9 <0x251>;
+defm V_CVT_SCALEF32_PK_FP4_F16 : VOP3OpSel_Real_gfx9_forced_opsel2 <0x24c>;
+defm V_CVT_SCALEF32_PK_FP4_BF16: VOP3OpSel_Real_gfx9_forced_opsel2 <0x24d>;
 }
 let OtherPredicates = [HasFP6BF6ConversionScaleInsts] in {
 defm V_CVT_SCALEF32_PK32_F32_FP6 : VOP3_Real_gfx9<0x256, 
"v_cvt_scalef32_pk32_f32_fp6">;
diff --git a/llvm/test/MC/AMDGPU/gfx950_asm_features.s 
b/llvm/test/MC/AMDGPU/gfx950_asm_features.s
index e505b6ff4ad58b..12340dfaa78e91 100644
--- a/llvm/test/MC/AMDGPU/gfx950_asm_features.s
+++ b/llvm/test/MC/AMDGPU/gfx

[llvm-branch-commits] [llvm] AMDGPU: MC support for v_cvt_scalef32_pk_fp4_{f|bf}16 on gfx950. (PR #117594)

2024-11-25 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)


Changes

These instructions have non-standard use of OPSEL bits to select
dest write byte. The src2_modifiers operand is used without having
its corresponding src2 operand by introducing dummy src2.

Co-authored-by: Pravin Jagtap 

---
Full diff: https://github.com/llvm/llvm-project/pull/117594.diff


5 Files Affected:

- (modified) llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp (+3-1) 
- (modified) llvm/lib/Target/AMDGPU/VOP3Instructions.td (+26) 
- (modified) llvm/test/MC/AMDGPU/gfx950_asm_features.s (+40) 
- (modified) llvm/test/MC/AMDGPU/gfx950_err.s (+24) 
- (modified) llvm/test/MC/Disassembler/AMDGPU/gfx950_dasm_vop3.txt (+30) 


``diff
diff --git a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp 
b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
index a1d45822837c5f..afd35842ba87f4 100644
--- a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
+++ b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
@@ -8824,7 +8824,9 @@ void AMDGPUAsmParser::cvtVOP3P(MCInst &Inst, const 
OperandVector &Operands,
 
   const bool IsPacked = (Desc.TSFlags & SIInstrFlags::IsPacked) != 0;
 
-  if (Opc == AMDGPU::V_CVT_SR_BF8_F32_vi ||
+  if (Opc == AMDGPU::V_CVT_SCALEF32_PK_FP4_F16_vi ||
+  Opc == AMDGPU::V_CVT_SCALEF32_PK_FP4_BF16_vi ||
+  Opc == AMDGPU::V_CVT_SR_BF8_F32_vi ||
   Opc == AMDGPU::V_CVT_SR_FP8_F32_vi ||
   Opc == AMDGPU::V_CVT_SR_BF8_F32_gfx12_e64_gfx12 ||
   Opc == AMDGPU::V_CVT_SR_FP8_F32_gfx12_e64_gfx12) {
diff --git a/llvm/lib/Target/AMDGPU/VOP3Instructions.td 
b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
index fdffb2c36dcccf..7776688156419a 100644
--- a/llvm/lib/Target/AMDGPU/VOP3Instructions.td
+++ b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
@@ -899,6 +899,23 @@ def VOP3_CVT_SCALE_FP4FP8BF8_F32_Profile : 
VOP3_Profile,
+  VOP3_OPSEL> {
+  let InsVOP3OpSel = (ins FP32InputMods:$src0_modifiers, Src0RC64:$src0,
+  FP32InputMods:$src1_modifiers, Src1RC64:$src1,
+  FP32InputMods:$src2_modifiers, VGPR_32:$src2,
+  op_sel0:$op_sel);
+  let HasClamp = 0;
+  let HasSrc2 = 0;
+  let HasSrc2Mods = 1;
+  let HasOpSel = 1;
+  let AsmVOP3OpSel = !subst(", $src2_modifiers", "",
+getAsmVOP3OpSel<3, HasClamp, HasOMod,
+HasSrc0FloatMods, HasSrc1FloatMods,
+HasSrc2FloatMods>.ret);
+  let HasExtVOP3DPP = 0;
+}
+
 class VOP3_CVT_SCALE_PK_F16BF16F32_FP4FP8BF8_Profile : 
VOP3_Profile,
   VOP3_OPSEL> {
   let InsVOP3OpSel = (ins FP32InputMods:$src0_modifiers, Src0RC64:$src0,
@@ -965,6 +982,13 @@ let SubtargetPredicate = HasFP4ConversionScaleInsts, 
mayRaiseFPException = 0 in
   defm V_CVT_SCALEF32_PK_FP4_F32 : VOP3Inst<"v_cvt_scalef32_pk_fp4_f32", 
VOP3_CVT_SCALE_FP4FP8BF8_F32_Profile>;
   defm V_CVT_SCALEF32_PK_F16_FP4 : VOP3Inst<"v_cvt_scalef32_pk_f16_fp4", 
VOP3_CVT_SCALE_PK_F16BF16F32_FP4FP8BF8_Profile>;
   defm V_CVT_SCALEF32_PK_BF16_FP4 : VOP3Inst<"v_cvt_scalef32_pk_bf16_fp4", 
VOP3_CVT_SCALE_PK_F16BF16F32_FP4FP8BF8_Profile>;
+
+  // These instructions have non-standard use of op_sel. In particular they are
+  // using op_sel bits 2 and 3 while only having two sources.
+  let Constraints = "$vdst = $src2", DisableEncoding = "$src2" in {
+defm V_CVT_SCALEF32_PK_FP4_F16 : VOP3Inst<"v_cvt_scalef32_pk_fp4_f16", 
VOP3_CVT_SCALE_FP4_F16BF16_Profile>;
+defm V_CVT_SCALEF32_PK_FP4_BF16 : VOP3Inst<"v_cvt_scalef32_pk_fp4_bf16", 
VOP3_CVT_SCALE_FP4_F16BF16_Profile>;
+  }
 }
 
 let SubtargetPredicate = HasFP6BF6ConversionScaleInsts, mayRaiseFPException = 
0 in {
@@ -1930,6 +1954,8 @@ defm V_CVT_SCALEF32_PK_F32_FP4 : VOP3OpSel_Real_gfx9 
<0x23f>;
 defm V_CVT_SCALEF32_PK_FP4_F32 : VOP3OpSel_Real_gfx9 <0x23d>;
 defm V_CVT_SCALEF32_PK_F16_FP4 : VOP3OpSel_Real_gfx9 <0x250>;
 defm V_CVT_SCALEF32_PK_BF16_FP4 : VOP3OpSel_Real_gfx9 <0x251>;
+defm V_CVT_SCALEF32_PK_FP4_F16 : VOP3OpSel_Real_gfx9_forced_opsel2 <0x24c>;
+defm V_CVT_SCALEF32_PK_FP4_BF16: VOP3OpSel_Real_gfx9_forced_opsel2 <0x24d>;
 }
 let OtherPredicates = [HasFP6BF6ConversionScaleInsts] in {
 defm V_CVT_SCALEF32_PK32_F32_FP6 : VOP3_Real_gfx9<0x256, 
"v_cvt_scalef32_pk32_f32_fp6">;
diff --git a/llvm/test/MC/AMDGPU/gfx950_asm_features.s 
b/llvm/test/MC/AMDGPU/gfx950_asm_features.s
index e505b6ff4ad58b..12340dfaa78e91 100644
--- a/llvm/test/MC/AMDGPU/gfx950_asm_features.s
+++ b/llvm/test/MC/AMDGPU/gfx950_asm_features.s
@@ -1025,3 +1025,43 @@ v_cvt_scalef32_pk_bf16_bf8 v1, v2, s3 op_sel:[1,0,0]
 // NOT-GFX950: :[[@LINE+2]]:{{[0-9]+}}: error:
 // GFX950: v_cvt_scalef32_pk_bf16_bf8 v1, s2, 3 op_sel:[1,0,0] ; encoding: 
[0x01,0x08,0x6a,0xd2,0x02,0x06,0x01,0x00]
 v_cvt_scalef32_pk_bf16_bf8 v1, s2, 3 op_sel:[1,0,0]
+
+// NOT-GFX950: error: instru

[llvm-branch-commits] [llvm] AMDGPU: MC support for v_cvt_scalef32_pk_fp4_{f|bf}16 on gfx950. (PR #117594)

2024-11-25 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm ready_for_review 
https://github.com/llvm/llvm-project/pull/117594
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] AMDGPU: Support v_cvt_scalef32_pk32_{bf|f}6_{bf|fp}16 for gfx950 (PR #117592)

2024-11-25 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-clang

Author: Matt Arsenault (arsenm)


Changes

Co-authored-by: Pravin Jagtap 

---

Patch is 49.45 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/117592.diff


16 Files Affected:

- (modified) clang/include/clang/Basic/BuiltinsAMDGPU.def (+4) 
- (modified) clang/test/CodeGenOpenCL/amdgpu-features.cl (+1-1) 
- (modified) clang/test/CodeGenOpenCL/builtins-amdgcn-gfx950.cl (+43) 
- (modified) llvm/include/llvm/IR/IntrinsicsAMDGPU.td (+9) 
- (modified) llvm/lib/Target/AMDGPU/AMDGPU.td (+16-1) 
- (modified) llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp (+4) 
- (modified) llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h (+3) 
- (modified) llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp (+1) 
- (modified) llvm/lib/Target/AMDGPU/SIInstrInfo.td (+6-1) 
- (modified) llvm/lib/Target/AMDGPU/SIRegisterInfo.td (+1) 
- (modified) llvm/lib/Target/AMDGPU/VOP3Instructions.td (+14) 
- (modified) llvm/lib/TargetParser/TargetParser.cpp (+1) 
- (added) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.scalef32.pk.ll (+474) 
- (modified) llvm/test/MC/AMDGPU/gfx950_asm_features.s (+16) 
- (modified) llvm/test/MC/AMDGPU/gfx950_err.s (+48) 
- (modified) llvm/test/MC/Disassembler/AMDGPU/gfx950_dasm_vop3.txt (+12) 


``diff
diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def 
b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index a42ad56ce4f998..e09dc0e1107a82 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -559,6 +559,10 @@ 
TARGET_BUILTIN(__builtin_amdgcn_swmmac_f32_16x16x32_bf8_fp8_w64, "V4fiV2iV4fs",
 TARGET_BUILTIN(__builtin_amdgcn_swmmac_f32_16x16x32_bf8_bf8_w64, 
"V4fiV2iV4fs", "nc", "gfx12-insts,wavefrontsize64")
 
 TARGET_BUILTIN(__builtin_amdgcn_prng_b32, "UiUi", "nc", "prng-inst")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_pk32_fp6_f16, "V6UiV32hf", "nc", 
"f16bf16-to-fp6bf6-cvt-scale-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_pk32_bf6_f16, "V6UiV32hf", "nc", 
"f16bf16-to-fp6bf6-cvt-scale-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_pk32_fp6_bf16, "V6UiV32yf", "nc", 
"f16bf16-to-fp6bf6-cvt-scale-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_pk32_bf6_bf16, "V6UiV32yf", "nc", 
"f16bf16-to-fp6bf6-cvt-scale-insts")
 
 #undef BUILTIN
 #undef TARGET_BUILTIN
diff --git a/clang/test/CodeGenOpenCL/amdgpu-features.cl 
b/clang/test/CodeGenOpenCL/amdgpu-features.cl
index f9e07fbc6b0480..56013dad9b6651 100644
--- a/clang/test/CodeGenOpenCL/amdgpu-features.cl
+++ b/clang/test/CodeGenOpenCL/amdgpu-features.cl
@@ -89,7 +89,7 @@
 // GFX941: 
"target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+fp8-conversion-insts,+fp8-insts,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64,+xf32-insts"
 // GFX942: 
"target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+fp8-conversion-insts,+fp8-insts,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64,+xf32-insts"
 // GFX9_4_Generic: 
"target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64"
-// GFX950: 
"target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+fp8-conversion-insts,+fp8-insts,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+gfx950-insts,+mai-insts,+permlane16-swap,+permlane32-swap,+prng-inst,+s-memrealtime,+s-memtime-inst,+wavefrontsize64"
+// GFX950: 
"target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+f16bf16-to-fp6bf6-cvt-scale-insts,+fp8-conversion-insts,+fp8-insts,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-i

[llvm-branch-commits] [clang] [llvm] AMDGPU: Support v_cvt_scalef32_2xpk16_{bf|fp}6_f32 for gfx950. (PR #117595)

2024-11-25 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)


Changes

Scale packed 16-component single-precision float vectors from
two  source inputs using the exponent provided by the third
single-precision float input, then convert the values to a packed
32-component FP6 float value.

Co-authored-by: Pravin Jagtap 

---

Patch is 22.01 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/117595.diff


10 Files Affected:

- (modified) clang/include/clang/Basic/BuiltinsAMDGPU.def (+3) 
- (modified) clang/test/CodeGenOpenCL/builtins-amdgcn-gfx950.cl (+21-1) 
- (modified) llvm/include/llvm/IR/IntrinsicsAMDGPU.td (+6) 
- (modified) llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp (+2) 
- (modified) llvm/lib/Target/AMDGPU/SIInstrInfo.td (+1) 
- (modified) llvm/lib/Target/AMDGPU/VOP3Instructions.td (+8) 
- (added) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.scalef32.pk.gfx950.ll (+128) 
- (modified) llvm/test/MC/AMDGPU/gfx950_asm_features.s (+24) 
- (modified) llvm/test/MC/AMDGPU/gfx950_err.s (+24) 
- (modified) llvm/test/MC/Disassembler/AMDGPU/gfx950_dasm_vop3.txt (+18) 


``diff
diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def 
b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index e09dc0e1107a82..dacbf5aa902f60 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -467,6 +467,9 @@ TARGET_BUILTIN(__builtin_amdgcn_ds_read_tr6_b96_v3i32, 
"V3iV3i*3", "nc", "gfx950
 TARGET_BUILTIN(__builtin_amdgcn_ds_read_tr8_b64_v2i32, "V2iV2i*3", "nc", 
"gfx950-insts")
 TARGET_BUILTIN(__builtin_amdgcn_ds_read_tr16_b64_v4i16, "V4sV4s*3", "nc", 
"gfx950-insts")
 
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_2xpk16_fp6_f32, "V6UiV16fV16ff", 
"nc", "gfx950-insts")
+TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_2xpk16_bf6_f32, "V6UiV16fV16ff", 
"nc", "gfx950-insts")
+
 
//===--===//
 // GFX12+ only builtins.
 
//===--===//
diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx950.cl 
b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx950.cl
index 779aadd96f3f41..6f3c81b26be0b8 100644
--- a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx950.cl
+++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx950.cl
@@ -7,6 +7,8 @@ typedef unsigned int __attribute__((ext_vector_type(2))) uint2;
 typedef unsigned int __attribute__((ext_vector_type(6))) uint6;
 typedef __bf16 __attribute__((ext_vector_type(32))) bfloat32;
 typedef half __attribute__((ext_vector_type(32))) half32;
+typedef short __attribute__((ext_vector_type(2))) short2;
+typedef float __attribute__((ext_vector_type(16))) float16;
 
 // CHECK-LABEL: @test_prng_b32(
 // CHECK-NEXT:  entry:
@@ -115,10 +117,14 @@ void test_permlane32_swap(global uint2* out, uint old, 
uint src) {
 // CHECK-NEXT:[[OUT6_ADDR:%.*]] = alloca ptr addrspace(1), align 8, 
addrspace(5)
 // CHECK-NEXT:[[SRCBF32_ADDR:%.*]] = alloca <32 x bfloat>, align 64, 
addrspace(5)
 // CHECK-NEXT:[[SRCH32_ADDR:%.*]] = alloca <32 x half>, align 64, 
addrspace(5)
+// CHECK-NEXT:[[SRC0F32_ADDR:%.*]] = alloca <16 x float>, align 64, 
addrspace(5)
+// CHECK-NEXT:[[SRC1F32_ADDR:%.*]] = alloca <16 x float>, align 64, 
addrspace(5)
 // CHECK-NEXT:[[SCALE_ADDR:%.*]] = alloca float, align 4, addrspace(5)
 // CHECK-NEXT:store ptr addrspace(1) [[OUT6:%.*]], ptr addrspace(5) 
[[OUT6_ADDR]], align 8
 // CHECK-NEXT:store <32 x bfloat> [[SRCBF32:%.*]], ptr addrspace(5) 
[[SRCBF32_ADDR]], align 64
 // CHECK-NEXT:store <32 x half> [[SRCH32:%.*]], ptr addrspace(5) 
[[SRCH32_ADDR]], align 64
+// CHECK-NEXT:store <16 x float> [[SRC0F32:%.*]], ptr addrspace(5) 
[[SRC0F32_ADDR]], align 64
+// CHECK-NEXT:store <16 x float> [[SRC1F32:%.*]], ptr addrspace(5) 
[[SRC1F32_ADDR]], align 64
 // CHECK-NEXT:store float [[SCALE:%.*]], ptr addrspace(5) [[SCALE_ADDR]], 
align 4
 // CHECK-NEXT:[[TMP0:%.*]] = load <32 x bfloat>, ptr addrspace(5) 
[[SRCBF32_ADDR]], align 64
 // CHECK-NEXT:[[TMP1:%.*]] = load float, ptr addrspace(5) [[SCALE_ADDR]], 
align 4
@@ -140,12 +146,26 @@ void test_permlane32_swap(global uint2* out, uint old, 
uint src) {
 // CHECK-NEXT:[[TMP14:%.*]] = call <6 x i32> 
@llvm.amdgcn.cvt.scalef32.pk32.fp6.f16(<32 x half> [[TMP12]], float [[TMP13]])
 // CHECK-NEXT:[[TMP15:%.*]] = load ptr addrspace(1), ptr addrspace(5) 
[[OUT6_ADDR]], align 8
 // CHECK-NEXT:store <6 x i32> [[TMP14]], ptr addrspace(1) [[TMP15]], align 
32
+// CHECK-NEXT:[[TMP16:%.*]] = load <16 x float>, ptr addrspace(5) 
[[SRC0F32_ADDR]], align 64
+// CHECK-NEXT:[[TMP17:%.*]] = load <16 x float>, ptr addrspace(5) 
[[SRC1F32_ADDR]], align 64
+// CHECK-NEXT:[[TMP18:%.*]] = load float, ptr addrspace(5) [[SCALE_ADDR]], 
align 4
+// CHECK-NEXT:[[TMP19:%.*]] = call <6 x i32> 
@llvm.amdgcn.cvt.scalef32.2xpk16.bf6.

[llvm-branch-commits] [llvm] AMDGPU: MC support for v_cvt_scalef32_pk_{bf|f}16_{bf|fp}8 of gfx950. (PR #117593)

2024-11-25 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)


Changes

OPSEL[0] selects src_word to read.

Co-authored-by: Pravin Jagtap 

---
Full diff: https://github.com/llvm/llvm-project/pull/117593.diff


4 Files Affected:

- (modified) llvm/lib/Target/AMDGPU/VOP3Instructions.td (+8) 
- (modified) llvm/test/MC/AMDGPU/gfx950_asm_features.s (+96) 
- (modified) llvm/test/MC/AMDGPU/gfx950_err.s (+49-1) 
- (modified) llvm/test/MC/Disassembler/AMDGPU/gfx950_dasm_vop3.txt (+72) 


``diff
diff --git a/llvm/lib/Target/AMDGPU/VOP3Instructions.td 
b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
index 764a2275205665..fdffb2c36dcccf 100644
--- a/llvm/lib/Target/AMDGPU/VOP3Instructions.td
+++ b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
@@ -945,6 +945,8 @@ let SubtargetPredicate = HasFP8ConversionScaleInsts, 
mayRaiseFPException = 0 in
   defm V_CVT_SCALEF32_PK_F32_FP8 : VOP3Inst<"v_cvt_scalef32_pk_f32_fp8", 
VOP3_CVT_SCALE_PK_F16BF16F32_FP4FP8BF8_Profile>;
   defm V_CVT_SCALEF32_PK_FP8_F16 : VOP3Inst<"v_cvt_scalef32_pk_fp8_f16", 
VOP3_CVT_SCALE_PK_FP8BF8_F16BF16_Profile>;
   defm V_CVT_SCALEF32_PK_FP8_BF16 : VOP3Inst<"v_cvt_scalef32_pk_fp8_bf16", 
VOP3_CVT_SCALE_PK_FP8BF8_F16BF16_Profile>;
+  defm V_CVT_SCALEF32_PK_F16_FP8: VOP3Inst<"v_cvt_scalef32_pk_f16_fp8",  
VOP3_CVT_SCALE_PK_F16BF16F32_FP4FP8BF8_Profile>;
+  defm V_CVT_SCALEF32_PK_BF16_FP8   : VOP3Inst<"v_cvt_scalef32_pk_bf16_fp8", 
VOP3_CVT_SCALE_PK_F16BF16F32_FP4FP8BF8_Profile>;
 }
 
 let SubtargetPredicate = HasBF8ConversionScaleInsts, mayRaiseFPException = 0 
in {
@@ -954,6 +956,8 @@ let SubtargetPredicate = HasBF8ConversionScaleInsts, 
mayRaiseFPException = 0 in
   defm V_CVT_SCALEF32_PK_F32_BF8 : VOP3Inst<"v_cvt_scalef32_pk_f32_bf8", 
VOP3_CVT_SCALE_PK_F16BF16F32_FP4FP8BF8_Profile>;
   defm V_CVT_SCALEF32_PK_BF8_F16 : VOP3Inst<"v_cvt_scalef32_pk_bf8_f16", 
VOP3_CVT_SCALE_PK_FP8BF8_F16BF16_Profile>;
   defm V_CVT_SCALEF32_PK_BF8_BF16 : VOP3Inst<"v_cvt_scalef32_pk_bf8_bf16", 
VOP3_CVT_SCALE_PK_FP8BF8_F16BF16_Profile>;
+  defm V_CVT_SCALEF32_PK_F16_BF8: VOP3Inst<"v_cvt_scalef32_pk_f16_bf8",  
VOP3_CVT_SCALE_PK_F16BF16F32_FP4FP8BF8_Profile>;
+  defm V_CVT_SCALEF32_PK_BF16_BF8   : VOP3Inst<"v_cvt_scalef32_pk_bf16_bf8", 
VOP3_CVT_SCALE_PK_F16BF16F32_FP4FP8BF8_Profile>;
 }
 
 let SubtargetPredicate = HasFP4ConversionScaleInsts, mayRaiseFPException = 0 
in {
@@ -1908,6 +1912,8 @@ defm V_CVT_SCALEF32_PK_FP8_F32 : VOP3OpSel_Real_gfx9 
<0x235>;
 defm V_CVT_SCALEF32_PK_F32_FP8 : VOP3OpSel_Real_gfx9 <0x239>;
 defm V_CVT_SCALEF32_PK_FP8_F16 : VOP3OpSel_Real_gfx9 <0x240>;
 defm V_CVT_SCALEF32_PK_FP8_BF16: VOP3OpSel_Real_gfx9 <0x244>;
+defm V_CVT_SCALEF32_PK_F16_FP8  : VOP3OpSel_Real_gfx9<0x248>;
+defm V_CVT_SCALEF32_PK_BF16_FP8 : VOP3OpSel_Real_gfx9<0x269>;
 }
 let OtherPredicates = [HasBF8ConversionScaleInsts] in {
 defm V_CVT_SCALEF32_F16_BF8 : VOP3OpSel_Real_gfx9 <0x24b>;
@@ -1916,6 +1922,8 @@ defm V_CVT_SCALEF32_PK_BF8_F32 : VOP3OpSel_Real_gfx9 
<0x236>;
 defm V_CVT_SCALEF32_PK_F32_BF8 : VOP3OpSel_Real_gfx9 <0x23a>;
 defm V_CVT_SCALEF32_PK_BF8_F16 : VOP3OpSel_Real_gfx9 <0x241>;
 defm V_CVT_SCALEF32_PK_BF8_BF16: VOP3OpSel_Real_gfx9 <0x245>;
+defm V_CVT_SCALEF32_PK_F16_BF8  : VOP3OpSel_Real_gfx9<0x249>;
+defm V_CVT_SCALEF32_PK_BF16_BF8 : VOP3OpSel_Real_gfx9<0x26a>;
 }
 let OtherPredicates = [HasFP4ConversionScaleInsts] in {
 defm V_CVT_SCALEF32_PK_F32_FP4 : VOP3OpSel_Real_gfx9 <0x23f>;
diff --git a/llvm/test/MC/AMDGPU/gfx950_asm_features.s 
b/llvm/test/MC/AMDGPU/gfx950_asm_features.s
index 1aef267537aa55..e505b6ff4ad58b 100644
--- a/llvm/test/MC/AMDGPU/gfx950_asm_features.s
+++ b/llvm/test/MC/AMDGPU/gfx950_asm_features.s
@@ -929,3 +929,99 @@ v_cvt_scalef32_pk32_fp6_bf16 v[20:25], v[10:25], v8
 // NOT-GFX950: :[[@LINE+2]]:{{[0-9]+}}: error:
 // GFX950: v_cvt_scalef32_pk32_fp6_f16 v[20:25], v[10:25], v8 ; encoding: 
[0x14,0x00,0x58,0xd2,0x0a,0x11,0x02,0x00]
 v_cvt_scalef32_pk32_fp6_f16 v[20:25], v[10:25], v8
+
+// NOT-GFX950: :[[@LINE+2]]:{{[0-9]+}}: error:
+// GFX950: v_cvt_scalef32_pk_f16_fp8 v1, v2, v3; encoding: 
[0x01,0x00,0x48,0xd2,0x02,0x07,0x02,0x00]
+v_cvt_scalef32_pk_f16_fp8 v1, v2, v3
+
+// NOT-GFX950: :[[@LINE+2]]:{{[0-9]+}}: error:
+// GFX950: v_cvt_scalef32_pk_f16_fp8 v1, v2, s3; encoding: 
[0x01,0x00,0x48,0xd2,0x02,0x07,0x00,0x00]
+v_cvt_scalef32_pk_f16_fp8 v1, v2, s3
+
+// NOT-GFX950: :[[@LINE+2]]:{{[0-9]+}}: error:
+// GFX950: v_cvt_scalef32_pk_f16_fp8 v1, s2, 3 ; encoding: 
[0x01,0x00,0x48,0xd2,0x02,0x06,0x01,0x00]
+v_cvt_scalef32_pk_f16_fp8 v1, s2, 3
+
+// NOT-GFX950: :[[@LINE+2]]:{{[0-9]+}}: error:
+// GFX950: v_cvt_scalef32_pk_f16_fp8 v1, v2, v3 op_sel:[1,0,0] ; encoding: 
[0x01,0x08,0x48,0xd2,0x02,0x07,0x02,0x00]
+v_cvt_scalef32_pk_f16_fp8 v1, v2, v3 op_sel:[1,0,0]
+
+// NOT-GFX950: :[[@LINE+2]]:{{[0-9]+}}: error:
+// GFX950: v_cvt_scalef32_pk_f16_fp8 v1, v2, s3 op_sel:[1,0,0] ; encoding: 
[0x01,0x08,0x48,0xd2,0x02,0x07,0x00,0x00]
+v_cvt_scalef32_pk_f16_fp8 v1, v2, s3 o

[llvm-branch-commits] [clang] [llvm] AMDGPU: Support v_cvt_scalef32_2xpk16_{bf|fp}6_f32 for gfx950. (PR #117595)

2024-11-25 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm ready_for_review 
https://github.com/llvm/llvm-project/pull/117595
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


  1   2   3   >